Skip to content

Conversation

@sroussey
Copy link
Collaborator

@sroussey sroussey commented Jan 4, 2026

  • [feat] New VectorQuantizeTask, updated VectorSimilarityTask

  • [WIP] rework document

  • [refactor] Update task input handling and smartClone method for improved input data handling for tests

  • Replaced structuredClone and JSON methods with a new smartClone function that deep-clones plain objects and arrays while preserving class instances by reference.

  • quick versions of tasks as functions now pass input to run not the constructor which means no defaults and cloning

  • [refactor] Removed unnecessary checks for undefined values when copying additional input properties.

  • [refactor] Enhance tasks with service registry integration

  • Updated IExecuteContext and IRunConfig to include registry support.
  • Refactored TaskRunner and TaskGraphRunner to utilize the service registry for improved task execution and model retrieval.
  • Ensured backward compatibility while enhancing the overall architecture for better service management.
  • Introduced a service registry to manage model repositories and execution contexts in AiTask.
  • [feat] Introduce input resolver system for enhanced schema handling
  • Added a new InputResolver to manage schema-annotated inputs, allowing for automatic resolution of string IDs to their corresponding instances.
  • Implemented repository and model resolution capabilities, improving task input handling and validation.
  • Created new schemas for tabular, vector, and document repositories to facilitate input resolution.
  • Enhanced AiTask and TaskRunner to utilize the input resolver for better integration with service registries.
  • Added comprehensive tests to ensure the functionality of the input resolver system and its integration with tasks.
  • [feat] Introduce new AI tasks and enhance document processing capabilities
  • Added several new tasks including ChunkToVectorTask, ContextBuilderTask, DocumentEnricherTask, HierarchicalChunkerTask, and others to support advanced document processing workflows.
  • Enhanced the input handling for tasks to streamline the integration with the service registry and improve task execution.
  • Updated the documentation to reflect the new tasks and their functionalities, ensuring clarity for developers.
  • Implemented comprehensive tests for the new tasks to validate their behavior and integration within the workflow system.
  • Update packages/ai/src/source/ProvenanceUtils.ts

  • Update packages/ai/src/source/StructuralParser.ts

  • Update packages/ai/src/source/ProvenanceUtils.ts

  • Update packages/ai/src/task/ChunkToVectorTask.ts

  • Update packages/ai/src/task/QueryExpanderTask.ts

  • Update packages/task-graph/src/task/Task.ts

  • Update packages/util/src/vector/Tensor.ts

  • Initial plan

  • Update packages/ai/src/task/VectorQuantizeTask.ts

  • Update packages/util/src/vector/VectorUtils.ts

  • Optimize quantizeToUint8 and quantizeToUint16 with single-pass min/max

  • Remove unused query variable from InputResolver test (Remove unused query variable from InputResolver test #161)

  • Initial plan

  • Remove unused query variable from InputResolver test










  • Enhance jaccardSimilarity function to handle negative values by normalizing inputs to a non-negative range. This includes calculating the global minimum across both vectors and adjusting values accordingly.

  • [test] Add tests for VectorUtils, covering magnitude, inner product, normalization, and handling of various TypedArray types. Update normalize function to support an additional parameter for Float32Array conversion.

  • Add circular reference detection to smartClone method (Add circular reference detection to smartClone method #162)

  • Initial plan

  • Add circular reference detection to smartClone method

  • Fix circular reference detection to handle shared references correctly

  • Refactor TaskEvents to import TaskStatus from TaskTypes and add unit tests for smartClone method

  • Updated TaskEvents to import TaskStatus from the correct module.
  • Added comprehensive unit tests for the smartClone method, including cases for circular reference detection and handling various data structures.


* [feat] New VectorQuantizeTask, updated VectorSimilarityTask

* [WIP] rework document

* [refactor] Update task input handling and smartClone method for improved input data handling for tests

- Replaced structuredClone and JSON methods with a new smartClone function that deep-clones plain objects and arrays while preserving class instances by reference.

- quick versions of tasks as functions now pass input to run not the constructor which means no defaults and cloning

* [refactor] Removed unnecessary checks for undefined values when copying additional input properties.

* [refactor] Enhance tasks with service registry integration

- Updated IExecuteContext and IRunConfig to include registry support.
- Refactored TaskRunner and TaskGraphRunner to utilize the service registry for improved task execution and model retrieval.
- Ensured backward compatibility while enhancing the overall architecture for better service management.
- Introduced a service registry to manage model repositories and execution contexts in AiTask.

* [feat] Introduce input resolver system for enhanced schema handling

- Added a new InputResolver to manage schema-annotated inputs, allowing for automatic resolution of string IDs to their corresponding instances.
- Implemented repository and model resolution capabilities, improving task input handling and validation.
- Created new schemas for tabular, vector, and document repositories to facilitate input resolution.
- Enhanced AiTask and TaskRunner to utilize the input resolver for better integration with service registries.
- Added comprehensive tests to ensure the functionality of the input resolver system and its integration with tasks.

* [feat] Introduce new AI tasks and enhance document processing capabilities

- Added several new tasks including ChunkToVectorTask, ContextBuilderTask, DocumentEnricherTask, HierarchicalChunkerTask, and others to support advanced document processing workflows.
- Enhanced the input handling for tasks to streamline the integration with the service registry and improve task execution.
- Updated the documentation to reflect the new tasks and their functionalities, ensuring clarity for developers.
- Implemented comprehensive tests for the new tasks to validate their behavior and integration within the workflow system.

* Update packages/ai/src/source/ProvenanceUtils.ts

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update packages/ai/src/source/StructuralParser.ts

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update packages/ai/src/source/ProvenanceUtils.ts

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update packages/ai/src/task/ChunkToVectorTask.ts

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update packages/ai/src/task/QueryExpanderTask.ts

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update packages/task-graph/src/task/Task.ts

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update packages/util/src/vector/Tensor.ts

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Initial plan

* Update packages/ai/src/task/VectorQuantizeTask.ts

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update packages/util/src/vector/VectorUtils.ts

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Optimize quantizeToUint8 and quantizeToUint16 with single-pass min/max

Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com>

* Remove unused query variable from InputResolver test (#161)

* Initial plan

* Remove unused query variable from InputResolver test

Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com>

* Fix edge case: return non-zero range for empty arrays in findMinMax

Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com>

* Fix markdown auto-detection to use header pattern matching (#157)

* Initial plan

* Improve markdown auto-detection with robust pattern matching

Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com>

* Remove unused task class imports from ChunkToVector.test.ts (#160)

* Initial plan

* Remove unused imports ChunkToVectorTask and HierarchicalChunkerTask

Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com>

* Fix inconsistent vector/tensor terminology in Tensor.ts (#167)

* Initial plan

* Update Tensor.ts to use consistent "tensor" terminology throughout

Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com>

* Optimize VectorQuantizeTask min/max calculation for large vectors (#168)

* Initial plan

* Optimize quantizeToUint8 and quantizeToUint16 to use single loop for min/max

Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com>

* Add empty array guard to quantization methods

Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com>

* Replace unsafe type assertions with type-safe field extraction in Document.addVariant (#158)

* Initial plan

* Use extractConfigFields for type-safe provenance handling

Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com>

* Add comprehensive tests for type-safe provenance handling

Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com>

* Add test coverage for VectorSimilarityUtils functions (#165)

* Initial plan

* Add comprehensive tests for VectorSimilarityUtils

Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com>

* Extract magic number to named constant in ProvenanceUtils (#159)

* Initial plan

* Extract magic number 512 to DEFAULT_MAX_TOKENS constant

Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com>

* Add support for Float16Array in normalize function of VectorUtils.ts

* Update packages/ai/src/source/StructuralParser.ts

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Fix naming inconsistency between Vector type and TensorSchema (#169)

* Initial plan

* Fix naming inconsistency: rename Vector to Tensor in Tensor.ts

Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com>

* Enhance jaccardSimilarity function to handle negative values by normalizing inputs to a non-negative range. This includes calculating the global minimum across both vectors and adjusting values accordingly.

* [test] Add tests for VectorUtils, covering magnitude, inner product, normalization, and handling of various TypedArray types. Update normalize function to support an additional parameter for Float32Array conversion.

* Add circular reference detection to smartClone method (#162)

* Initial plan

* Add circular reference detection to smartClone method

Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com>

* Fix circular reference detection to handle shared references correctly

Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com>

* Refactor TaskEvents to import TaskStatus from TaskTypes and add unit tests for smartClone method

- Updated TaskEvents to import TaskStatus from the correct module.
- Added comprehensive unit tests for the smartClone method, including cases for circular reference detection and handling various data structures.

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com>
Co-authored-by: Steven Roussey <sroussey@gmail.com>

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces comprehensive vector storage and RAG (Retrieval-Augmented Generation) infrastructure with repository registries, input resolution, and hierarchical document processing capabilities. The implementation adds service registry integration, vector similarity utilities, and multiple vector repository backends (in-memory, SQLite, PostgreSQL, EdgeVec).

Key changes:

  • New vector storage infrastructure with support for quantized vectors and multiple backends
  • Input resolver system for automatic resolution of string IDs to repository instances
  • Service registry integration throughout task execution pipeline
  • Hierarchical document chunking and processing capabilities

Reviewed changes

Copilot reviewed 126 out of 127 changed files in this pull request and generated no comments.

Show a summary per file
File Description
packages/util/src/vector/VectorUtils.ts Vector operations (magnitude, inner product, normalization)
packages/util/src/vector/VectorSimilarityUtils.ts Similarity calculations (cosine, Jaccard, Hamming)
packages/util/src/vector/TypedArray.ts TypedArray type definitions and schema
packages/util/src/vector/Tensor.ts Tensor schema for vector representations
packages/util/src/di/InputResolverRegistry.ts Input resolver registry for schema-based resolution
packages/util/src/di/ServiceRegistry.ts Made container public for child registry creation
packages/storage/src/vector/* Vector repository implementations (InMemory, SQLite, Postgres, EdgeVec)
packages/storage/src/schema/RepositorySchema.ts Repository schema type helpers
packages/task-graph/src/task/Task.ts Smart clone with circular reference detection, narrowInput signature change
packages/task-graph/src/task/TaskRunner.ts Input resolution integration
packages/task-graph/src/task/InputResolver.ts Schema-based input resolution logic
packages/test/src/test/util/* Comprehensive tests for vector utilities
packages/test/src/test/task/* Tests for vector tasks and repositories
packages/tasks/src/task/* Updated helper functions to use run(input) pattern

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants