-
Notifications
You must be signed in to change notification settings - Fork 1
[feat] Introduce new AI tasks and enhance document processing capabil… #156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ities - Added several new tasks including ChunkToVectorTask, ContextBuilderTask, DocumentEnricherTask, HierarchicalChunkerTask, and others to support advanced document processing workflows. - Enhanced the input handling for tasks to streamline the integration with the service registry and improve task execution. - Updated the documentation to reflect the new tasks and their functionalities, ensuring clarity for developers. - Implemented comprehensive tests for the new tasks to validate their behavior and integration within the workflow system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces comprehensive document processing and RAG (Retrieval-Augmented Generation) capabilities to the workglow framework. The changes add multiple new AI tasks for document chunking, vector storage, retrieval, and context building, along with supporting infrastructure for hierarchical document processing.
Key Changes:
- Added 13 new AI tasks supporting document processing workflows (chunking, embedding, storage, retrieval, reranking)
- Implemented vector repository system with multiple storage backends (InMemory, SQLite, PostgreSQL, EdgeVec)
- Added hierarchical document parsing and chunking capabilities
- Updated task execution signatures to accept input parameters directly instead of using
this.runInputData - Enhanced input resolver system to support repository resolution by format type
Reviewed changes
Copilot reviewed 52 out of 52 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| VectorStoreUpsertTask.test.ts | New comprehensive test suite for vector upsert operations |
| VectorStoreSearchTask.test.ts | New test suite for vector similarity search |
| TextChunkerTask.test.ts | New test suite for text chunking strategies |
| RetrievalTask.test.ts | New test suite for end-to-end retrieval |
| HybridSearchTask.test.ts | New test suite for hybrid vector+text search |
| ContextBuilderTask.test.ts | New test suite for context formatting |
| StructuralParser.test.ts | New test suite for hierarchical document parsing |
| HierarchicalChunker.test.ts | New test suite for hierarchical chunking |
| FullChain.test.ts | New integration test for complete workflow chains |
| EndToEnd.test.ts | New end-to-end test for RAG pipeline |
| DocumentRepository.test.ts | New test suite for document storage |
| ChunkToVector.test.ts | New test suite for chunk-to-vector transformation |
| TestTasks.ts | Updated task method signatures to new execute pattern |
| InputResolver.test.ts | Updated to new execute signature with input parameter |
| VectorStoreUpsertTask.ts | New task for upserting vectors to repository |
| VectorStoreSearchTask.ts | New task for vector similarity search |
| TopicSegmenterTask.ts | New task for topic-based text segmentation |
| TextChunkerTask.ts | New task for text chunking with multiple strategies |
| StructuralParserTask.ts | New task for parsing documents into hierarchical trees |
| RetrievalTask.ts | New task for end-to-end retrieval (embed + search) |
| RerankerTask.ts | New task for reranking retrieved chunks |
| QueryExpanderTask.ts | New task for query expansion |
| HybridSearchTask.ts | New task for hybrid vector+text search |
| HierarchyJoinTask.ts | New task for enriching results with hierarchy context |
| HierarchicalChunkerTask.ts | New task for hierarchical document chunking |
| ContextBuilderTask.ts | New task for formatting chunks into LLM context |
| InMemoryVectorRepository.ts | New in-memory vector storage implementation |
| SqliteVectorRepository.ts | New SQLite vector storage implementation |
| PostgresVectorRepository.ts | New PostgreSQL+pgvector storage implementation |
| EdgeVecRepository.ts | New browser/edge vector storage implementation |
| IVectorRepository.ts | New vector repository interface definition |
| VectorRepositoryRegistry.ts | New registry for vector repositories |
| InMemoryDocumentRepository.ts | New in-memory document storage |
| IDocumentRepository.ts | New document repository interface |
| DocumentRepositoryRegistry.ts | New registry for document repositories |
| RepositorySchema.ts | Updated repository schema definitions to use anyOf |
| TabularRepositoryRegistry.ts | Updated to register specific format resolver |
| InputResolver.ts | Enhanced to support full format matching before prefix |
| Workflow.ts | Updated type signatures for better type inference |
| README.md files | Updated documentation for new features |
| common.ts files | Added exports for new vector and document modules |
The PR appears well-structured with comprehensive test coverage for all new features. No critical issues were identified during the review.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| import { | ||
| ChunkNode, | ||
| ChunkToVectorTask, | ||
| ChunkToVectorTaskOutput, | ||
| HierarchicalChunkerTask, | ||
| HierarchicalChunkerTaskOutput, | ||
| NodeIdGenerator, | ||
| StructuralParser, | ||
| } from "@workglow/ai"; |
Copilot
AI
Jan 4, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused imports ChunkToVectorTask, HierarchicalChunkerTask.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot open a new pull request to apply changes based on this feedback
| input: { repository: ITabularRepository<any, any, any, any, any>; query: string }, | ||
| _context: IExecuteContext | ||
| ): Promise<{ results: any[] }> { | ||
| const { repository, query } = input; |
Copilot
AI
Jan 4, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused variable query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot open a new pull request to apply changes based on this feedback
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Initial plan * Remove unused query variable from InputResolver test Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com>
* Initial plan * Remove unused imports ChunkToVectorTask and HierarchicalChunkerTask Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com>
…ities