-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
Description
Bug Description
Document resync operation fails with database constraint violation error, and document deletion appears to not work correctly.
Steps to Reproduce
- Upload a document through Admin UI (
POST /api/v1/sources/upload) - Wait for document to reach
COMPLETEDstatus - Click "Resync" button for the same document
- Error occurs: "Error: could not execute statement [ERROR: duplicate ke..."
Expected Behavior
- Resync should successfully restart the ingestion pipeline for existing documents
- Document deletion should properly remove all related data
Actual Behavior
- Resync fails with duplicate key constraint violation
- Deletion operation appears incomplete
Technical Analysis
The issue seems to be related to:
- Resync Logic: When resyncing existing files, the system appears to create duplicate entries instead of updating existing ones
- Deletion Logic: Document deletion may not be properly cascading through all related tables/indexes
Possible Root Causes
- File checksum uniqueness constraint conflict during resync
- Incomplete cleanup in PostgreSQL/Elasticsearch during deletion
- Missing transaction rollback handling in resync workflow
Environment
- Version: 1.0.0
- Component: open-context-core (Spring Boot)
- Database: PostgreSQL 16
- Deployment: Docker Compose
Additional Context
This affects the core document management workflow and prevents users from:
- Re-processing documents after system updates
- Properly removing unwanted documents
- Maintaining clean document state
Suggested Investigation Areas
- Check
SourceDocumentRepositoryresync implementation - Verify CASCADE delete constraints in database schema
- Review transaction management in document ingestion pipeline
- Validate Elasticsearch index cleanup during deletion