Skip to content

Conversation

@sroussey
Copy link
Collaborator

@sroussey sroussey commented Jan 4, 2026

…ities

  • Added several new tasks including ChunkToVectorTask, ContextBuilderTask, DocumentEnricherTask, HierarchicalChunkerTask, and others to support advanced document processing workflows.
  • Enhanced the input handling for tasks to streamline the integration with the service registry and improve task execution.
  • Updated the documentation to reflect the new tasks and their functionalities, ensuring clarity for developers.
  • Implemented comprehensive tests for the new tasks to validate their behavior and integration within the workflow system.

…ities

- Added several new tasks including ChunkToVectorTask, ContextBuilderTask, DocumentEnricherTask, HierarchicalChunkerTask, and others to support advanced document processing workflows.
- Enhanced the input handling for tasks to streamline the integration with the service registry and improve task execution.
- Updated the documentation to reflect the new tasks and their functionalities, ensuring clarity for developers.
- Implemented comprehensive tests for the new tasks to validate their behavior and integration within the workflow system.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces comprehensive document processing and RAG (Retrieval-Augmented Generation) capabilities to the workglow framework. The changes add multiple new AI tasks for document chunking, vector storage, retrieval, and context building, along with supporting infrastructure for hierarchical document processing.

Key Changes:

  • Added 13 new AI tasks supporting document processing workflows (chunking, embedding, storage, retrieval, reranking)
  • Implemented vector repository system with multiple storage backends (InMemory, SQLite, PostgreSQL, EdgeVec)
  • Added hierarchical document parsing and chunking capabilities
  • Updated task execution signatures to accept input parameters directly instead of using this.runInputData
  • Enhanced input resolver system to support repository resolution by format type

Reviewed changes

Copilot reviewed 52 out of 52 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
VectorStoreUpsertTask.test.ts New comprehensive test suite for vector upsert operations
VectorStoreSearchTask.test.ts New test suite for vector similarity search
TextChunkerTask.test.ts New test suite for text chunking strategies
RetrievalTask.test.ts New test suite for end-to-end retrieval
HybridSearchTask.test.ts New test suite for hybrid vector+text search
ContextBuilderTask.test.ts New test suite for context formatting
StructuralParser.test.ts New test suite for hierarchical document parsing
HierarchicalChunker.test.ts New test suite for hierarchical chunking
FullChain.test.ts New integration test for complete workflow chains
EndToEnd.test.ts New end-to-end test for RAG pipeline
DocumentRepository.test.ts New test suite for document storage
ChunkToVector.test.ts New test suite for chunk-to-vector transformation
TestTasks.ts Updated task method signatures to new execute pattern
InputResolver.test.ts Updated to new execute signature with input parameter
VectorStoreUpsertTask.ts New task for upserting vectors to repository
VectorStoreSearchTask.ts New task for vector similarity search
TopicSegmenterTask.ts New task for topic-based text segmentation
TextChunkerTask.ts New task for text chunking with multiple strategies
StructuralParserTask.ts New task for parsing documents into hierarchical trees
RetrievalTask.ts New task for end-to-end retrieval (embed + search)
RerankerTask.ts New task for reranking retrieved chunks
QueryExpanderTask.ts New task for query expansion
HybridSearchTask.ts New task for hybrid vector+text search
HierarchyJoinTask.ts New task for enriching results with hierarchy context
HierarchicalChunkerTask.ts New task for hierarchical document chunking
ContextBuilderTask.ts New task for formatting chunks into LLM context
InMemoryVectorRepository.ts New in-memory vector storage implementation
SqliteVectorRepository.ts New SQLite vector storage implementation
PostgresVectorRepository.ts New PostgreSQL+pgvector storage implementation
EdgeVecRepository.ts New browser/edge vector storage implementation
IVectorRepository.ts New vector repository interface definition
VectorRepositoryRegistry.ts New registry for vector repositories
InMemoryDocumentRepository.ts New in-memory document storage
IDocumentRepository.ts New document repository interface
DocumentRepositoryRegistry.ts New registry for document repositories
RepositorySchema.ts Updated repository schema definitions to use anyOf
TabularRepositoryRegistry.ts Updated to register specific format resolver
InputResolver.ts Enhanced to support full format matching before prefix
Workflow.ts Updated type signatures for better type inference
README.md files Updated documentation for new features
common.ts files Added exports for new vector and document modules

The PR appears well-structured with comprehensive test coverage for all new features. No critical issues were identified during the review.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 7 to 15
import {
ChunkNode,
ChunkToVectorTask,
ChunkToVectorTaskOutput,
HierarchicalChunkerTask,
HierarchicalChunkerTaskOutput,
NodeIdGenerator,
StructuralParser,
} from "@workglow/ai";
Copy link

Copilot AI Jan 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused imports ChunkToVectorTask, HierarchicalChunkerTask.

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot open a new pull request to apply changes based on this feedback

input: { repository: ITabularRepository<any, any, any, any, any>; query: string },
_context: IExecuteContext
): Promise<{ results: any[] }> {
const { repository, query } = input;
Copy link

Copilot AI Jan 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused variable query.

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot open a new pull request to apply changes based on this feedback

sroussey and others added 2 commits January 4, 2026 00:49
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Contributor

Copilot AI commented Jan 4, 2026

@sroussey I've opened a new pull request, #160, to work on those changes. Once the pull request is ready, I'll request review from you.

Copy link
Contributor

Copilot AI commented Jan 4, 2026

@sroussey I've opened a new pull request, #161, to work on those changes. Once the pull request is ready, I'll request review from you.

Copilot AI and others added 2 commits January 4, 2026 01:00
* Initial plan

* Remove unused query variable from InputResolver test

Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com>
* Initial plan

* Remove unused imports ChunkToVectorTask and HierarchicalChunkerTask

Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com>
@sroussey sroussey closed this Jan 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants