Skip to content

Conversation

@rjrudin
Copy link
Contributor

@rjrudin rjrudin commented Dec 30, 2025

Added DocumentWriteSetFilter as a generic interface for modifying a DocumentWriteSet before it's written. IncrementalWriteFilter is then the entry point, with a Builder for customizing its behavior.

Also started moving some tests into "com.marklogic.client.datamovement" so we can have unit tests that verify protected methods.

Copilot AI review requested due to automatic review settings December 30, 2025 20:52
@github-actions
Copy link

github-actions bot commented Dec 30, 2025

Copyright Validation Results
Total: 14 | Passed: 12 | Failed: 1 | Skipped: 1 | at: 2025-12-30 21:20:35 UTC | commit: 8ea32e4

❌ Failed Files

  • test-app/src/main/ml-config/databases/content-database.json

    Error:

    - No copyright header found

    Expected header:

    Copyright (c) 2010-2025 Progress Software Corporation and/or its subsidiaries or affiliates. All Rights Reserved.
    

⏭️ Skipped (Excluded) Files

  • marklogic-client-api/build.gradle

✅ Valid Files

  • marklogic-client-api/src/main/java/com/marklogic/client/datamovement/DocumentWriteSetFilter.java
  • marklogic-client-api/src/main/java/com/marklogic/client/datamovement/WriteBatcher.java
  • marklogic-client-api/src/main/java/com/marklogic/client/datamovement/filter/IncrementalWriteEvalFilter.java
  • marklogic-client-api/src/main/java/com/marklogic/client/datamovement/filter/IncrementalWriteFilter.java
  • marklogic-client-api/src/main/java/com/marklogic/client/datamovement/filter/IncrementalWriteOpticFilter.java
  • marklogic-client-api/src/main/java/com/marklogic/client/datamovement/impl/BatchWriteSet.java
  • marklogic-client-api/src/main/java/com/marklogic/client/datamovement/impl/BatchWriter.java
  • marklogic-client-api/src/main/java/com/marklogic/client/datamovement/impl/WriteBatcherImpl.java
  • marklogic-client-api/src/main/java/com/marklogic/client/impl/okhttp/RetryIOExceptionInterceptor.java
  • marklogic-client-api/src/test/java/com/marklogic/client/datamovement/WriteNakedPropertiesTest.java
  • marklogic-client-api/src/test/java/com/marklogic/client/datamovement/filter/IncrementalWriteFilterTest.java
  • marklogic-client-api/src/test/java/com/marklogic/client/datamovement/filter/IncrementalWriteTest.java

🛠️ Guidance

Follow these steps to fix the failed files:

  1. Insert the expected header at the very top (within first 20 lines) of each failed file.
  2. Ensure the year range matches the configuration (start year through current year).
  3. Do not alter spacing or punctuation in the header line.
  4. Commit and push the changes to update this check.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements incremental write functionality for MarkLogic document batches, allowing documents to be skipped if their content hasn't changed. The implementation uses hash-based content comparison stored in a configurable MarkLogic field, with support for both Optic and eval-based hash retrieval strategies.

Key changes:

  • Introduced DocumentWriteSetFilter interface for pre-write document set modification
  • Implemented IncrementalWriteFilter with builder pattern for customizable incremental write behavior
  • Added comprehensive test coverage including JSON canonicalization scenarios

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
marklogic-client-api/build.gradle Added dependencies for JSON canonicalization and hash generation
marklogic-client-api/src/main/java/com/marklogic/client/datamovement/DocumentWriteSetFilter.java New interface for filtering document write sets before writing
marklogic-client-api/src/main/java/com/marklogic/client/datamovement/WriteBatcher.java Added withDocumentWriteSetFilter method to enable filter integration
marklogic-client-api/src/main/java/com/marklogic/client/datamovement/impl/WriteBatcherImpl.java Integrated filter support into batch writing workflow
marklogic-client-api/src/main/java/com/marklogic/client/datamovement/impl/BatchWriter.java Applied filter to document write sets before writing
marklogic-client-api/src/main/java/com/marklogic/client/datamovement/impl/BatchWriteSet.java Implemented Context interface and added method for updating filtered write sets
marklogic-client-api/src/main/java/com/marklogic/client/datamovement/filter/IncrementalWriteFilter.java Core abstract implementation for incremental write filtering with hash-based comparison
marklogic-client-api/src/main/java/com/marklogic/client/datamovement/filter/IncrementalWriteOpticFilter.java Optic-based implementation for retrieving existing hash values
marklogic-client-api/src/main/java/com/marklogic/client/datamovement/filter/IncrementalWriteEvalFilter.java Eval-based implementation for retrieving existing hash values
marklogic-client-api/src/main/java/com/marklogic/client/impl/okhttp/RetryIOExceptionInterceptor.java Added handling for MarkLogicIOException in retry logic
marklogic-client-api/src/test/java/com/marklogic/client/datamovement/WriteNakedPropertiesTest.java Moved test to datamovement package and simplified
marklogic-client-api/src/test/java/com/marklogic/client/datamovement/filter/IncrementalWriteTest.java Integration tests for incremental write functionality
marklogic-client-api/src/test/java/com/marklogic/client/datamovement/filter/IncrementalWriteFilterTest.java Unit tests for metadata handling in incremental writes
marklogic-client-api/src/test/java/com/marklogic/client/test/datamovement/IncrementalWriteTest.java Removed old test file (moved to new location)
test-app/src/main/ml-config/databases/content-database.json Added field and range index configuration for incremental write hash

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +192 to +193
{
"field-name": "incrementalWriteHash",
Copy link

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent indentation: this block uses tabs while the surrounding code uses spaces. Should use spaces to match the existing style.

Copilot uses AI. Check for mistakes.
Comment on lines +217 to +218
{
"scalar-type": "string",
Copy link

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent indentation: this block uses tabs while the surrounding code uses spaces. Should use spaces to match the existing style.

Copilot uses AI. Check for mistakes.

doc2 = IncrementalWriteFilter.addHashToMetadata(doc2, "theField", "abc123");

assertEquals(metadata, doc1.getMetadata(), "doc1 should stillhave the original metadata object");
Copy link

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected spelling of 'stillhave' to 'still have'.

Suggested change
assertEquals(metadata, doc1.getMetadata(), "doc1 should stillhave the original metadata object");
assertEquals(metadata, doc1.getMetadata(), "doc1 should still have the original metadata object");

Copilot uses AI. Check for mistakes.

for (DocumentWriteOperation doc : context.getDocumentWriteSet()) {
if (!DocumentWriteOperation.OperationType.DOCUMENT_WRITE.equals(doc.getOperationType())) {
newWriteSet.add(doc);
Copy link

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic error: non-DOCUMENT_WRITE operations are added to newWriteSet but then processing continues for all documents. This causes non-DOCUMENT_WRITE operations to be processed for hashing when they should be skipped. The continue statement is missing after adding non-DOCUMENT_WRITE operations.

Suggested change
newWriteSet.add(doc);
newWriteSet.add(doc);
continue;

Copilot uses AI. Check for mistakes.
Added DocumentWriteSetFilter as a generic interface for modifying a DocumentWriteSet before it's written. IncrementalWriteFilter is then the entry point, with a Builder for customizing its behavior.

Also started moving some tests into "com.marklogic.client.datamovement" so we can have unit tests that verify protected methods.
@rjrudin rjrudin force-pushed the feature/26420-incremental-write branch from 975a342 to 8ea32e4 Compare December 30, 2025 21:20
@rjrudin
Copy link
Contributor Author

rjrudin commented Dec 30, 2025

Going to break this up

@rjrudin rjrudin closed this Dec 30, 2025
@rjrudin rjrudin deleted the feature/26420-incremental-write branch December 31, 2025 00:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants