[feat] Introduce new AI tasks and enhance document processing capabil… #156

sroussey · 2026-01-04T00:42:14Z

…ities

Added several new tasks including ChunkToVectorTask, ContextBuilderTask, DocumentEnricherTask, HierarchicalChunkerTask, and others to support advanced document processing workflows.
Enhanced the input handling for tasks to streamline the integration with the service registry and improve task execution.
Updated the documentation to reflect the new tasks and their functionalities, ensuring clarity for developers.
Implemented comprehensive tests for the new tasks to validate their behavior and integration within the workflow system.

…ities - Added several new tasks including ChunkToVectorTask, ContextBuilderTask, DocumentEnricherTask, HierarchicalChunkerTask, and others to support advanced document processing workflows. - Enhanced the input handling for tasks to streamline the integration with the service registry and improve task execution. - Updated the documentation to reflect the new tasks and their functionalities, ensuring clarity for developers. - Implemented comprehensive tests for the new tasks to validate their behavior and integration within the workflow system.

Copilot

Pull request overview

This PR introduces comprehensive document processing and RAG (Retrieval-Augmented Generation) capabilities to the workglow framework. The changes add multiple new AI tasks for document chunking, vector storage, retrieval, and context building, along with supporting infrastructure for hierarchical document processing.

Key Changes:

Added 13 new AI tasks supporting document processing workflows (chunking, embedding, storage, retrieval, reranking)
Implemented vector repository system with multiple storage backends (InMemory, SQLite, PostgreSQL, EdgeVec)
Added hierarchical document parsing and chunking capabilities
Updated task execution signatures to accept input parameters directly instead of using this.runInputData
Enhanced input resolver system to support repository resolution by format type

Reviewed changes

Copilot reviewed 52 out of 52 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
VectorStoreUpsertTask.test.ts	New comprehensive test suite for vector upsert operations
VectorStoreSearchTask.test.ts	New test suite for vector similarity search
TextChunkerTask.test.ts	New test suite for text chunking strategies
RetrievalTask.test.ts	New test suite for end-to-end retrieval
HybridSearchTask.test.ts	New test suite for hybrid vector+text search
ContextBuilderTask.test.ts	New test suite for context formatting
StructuralParser.test.ts	New test suite for hierarchical document parsing
HierarchicalChunker.test.ts	New test suite for hierarchical chunking
FullChain.test.ts	New integration test for complete workflow chains
EndToEnd.test.ts	New end-to-end test for RAG pipeline
DocumentRepository.test.ts	New test suite for document storage
ChunkToVector.test.ts	New test suite for chunk-to-vector transformation
TestTasks.ts	Updated task method signatures to new execute pattern
InputResolver.test.ts	Updated to new execute signature with input parameter
VectorStoreUpsertTask.ts	New task for upserting vectors to repository
VectorStoreSearchTask.ts	New task for vector similarity search
TopicSegmenterTask.ts	New task for topic-based text segmentation
TextChunkerTask.ts	New task for text chunking with multiple strategies
StructuralParserTask.ts	New task for parsing documents into hierarchical trees
RetrievalTask.ts	New task for end-to-end retrieval (embed + search)
RerankerTask.ts	New task for reranking retrieved chunks
QueryExpanderTask.ts	New task for query expansion
HybridSearchTask.ts	New task for hybrid vector+text search
HierarchyJoinTask.ts	New task for enriching results with hierarchy context
HierarchicalChunkerTask.ts	New task for hierarchical document chunking
ContextBuilderTask.ts	New task for formatting chunks into LLM context
InMemoryVectorRepository.ts	New in-memory vector storage implementation
SqliteVectorRepository.ts	New SQLite vector storage implementation
PostgresVectorRepository.ts	New PostgreSQL+pgvector storage implementation
EdgeVecRepository.ts	New browser/edge vector storage implementation
IVectorRepository.ts	New vector repository interface definition
VectorRepositoryRegistry.ts	New registry for vector repositories
InMemoryDocumentRepository.ts	New in-memory document storage
IDocumentRepository.ts	New document repository interface
DocumentRepositoryRegistry.ts	New registry for document repositories
RepositorySchema.ts	Updated repository schema definitions to use anyOf
TabularRepositoryRegistry.ts	Updated to register specific format resolver
InputResolver.ts	Enhanced to support full format matching before prefix
Workflow.ts	Updated type signatures for better type inference
README.md files	Updated documentation for new features
common.ts files	Added exports for new vector and document modules

The PR appears well-structured with comprehensive test coverage for all new features. No critical issues were identified during the review.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

packages/ai/src/task/ChunkToVectorTask.ts

packages/ai/src/task/QueryExpanderTask.ts

Copilot · 2026-01-04T00:44:28Z

packages/test/src/test/hierarchical/ChunkToVector.test.ts

+import {
+  ChunkNode,
+  ChunkToVectorTask,
+  ChunkToVectorTaskOutput,
+  HierarchicalChunkerTask,
+  HierarchicalChunkerTaskOutput,
+  NodeIdGenerator,
+  StructuralParser,
+} from "@workglow/ai";


Unused imports ChunkToVectorTask, HierarchicalChunkerTask.

@copilot open a new pull request to apply changes based on this feedback

Copilot · 2026-01-04T00:44:28Z

packages/test/src/test/task-graph/InputResolver.test.ts

+        input: { repository: ITabularRepository<any, any, any, any, any>; query: string },
+        _context: IExecuteContext
+      ): Promise<{ results: any[] }> {
+        const { repository, query } = input;


Unused variable query.

@copilot open a new pull request to apply changes based on this feedback

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot · 2026-01-04T00:49:36Z

@sroussey I've opened a new pull request, #160, to work on those changes. Once the pull request is ready, I'll request review from you.

Copilot · 2026-01-04T00:49:44Z

@sroussey I've opened a new pull request, #161, to work on those changes. Once the pull request is ready, I'll request review from you.

* Initial plan * Remove unused query variable from InputResolver test Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com>

* Initial plan * Remove unused imports ChunkToVectorTask and HierarchicalChunkerTask Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com>

sroussey requested a review from Copilot January 4, 2026 00:42

github-actions bot assigned sroussey Jan 4, 2026

Copilot started reviewing on behalf of sroussey January 4, 2026 00:42 View session

Copilot AI reviewed Jan 4, 2026

View reviewed changes

sroussey and others added 2 commits January 4, 2026 00:49

Update packages/ai/src/task/ChunkToVectorTask.ts

a5b8dba

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update packages/ai/src/task/QueryExpanderTask.ts

cbd20eb

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot AI mentioned this pull request Jan 4, 2026

Remove unused task class imports from ChunkToVector.test.ts #160

Merged

Copilot AI mentioned this pull request Jan 4, 2026

Remove unused query variable from InputResolver test #161

Merged

Copilot AI and others added 2 commits January 4, 2026 01:00

sroussey closed this Jan 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[feat] Introduce new AI tasks and enhance document processing capabil… #156

[feat] Introduce new AI tasks and enhance document processing capabil… #156

Uh oh!

sroussey commented Jan 4, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI Jan 4, 2026

Uh oh!

sroussey Jan 4, 2026

Uh oh!

Copilot AI Jan 4, 2026

Uh oh!

sroussey Jan 4, 2026

Uh oh!

Copilot AI commented Jan 4, 2026

Uh oh!

Copilot AI commented Jan 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[feat] Introduce new AI tasks and enhance document processing capabil… #156

[feat] Introduce new AI tasks and enhance document processing capabil… #156

Uh oh!

Conversation

sroussey commented Jan 4, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI Jan 4, 2026

Choose a reason for hiding this comment

Uh oh!

sroussey Jan 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 4, 2026

Choose a reason for hiding this comment

Uh oh!

sroussey Jan 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI commented Jan 4, 2026

Uh oh!

Copilot AI commented Jan 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants