Skip to content

Conversation

@nishika26
Copy link
Collaborator

@nishika26 nishika26 commented Jan 20, 2026

Summary

Target issue is #489

Checklist

Before submitting a pull request, please ensure that you mark these task.

  • Ran fastapi run --reload app/main.py or docker compose up in the repository root and test.
  • If you've fixed a bug or added code that is tested and has test cases.

Notes

  • New Features

    • Collections now include provider, name, and description; names are enforced unique per project. we check this unique-ness at the router level itself and if the name is not unique we give back an error telling the user the given name already exists
    • Provider registry and an OpenAI provider added; provider-based create/delete flows available.
    • Create and delete collection service is completely provider agnostic and the creation as well as deletion logic will be implemented according to the provider set by user in the input.
    • Since dalgo still creates assistant through these endpoints and adding an input like provider would be breaking change for them, we internally check that if model and instructions parameter are given then provider value is sent internally and then only sent to the collection table, so they wont have to make any change as of now.
  • Tests

    • Tests and helpers updated to exercise provider flows and new collection shape.

Summary by CodeRabbit

  • New Features

    • Collections can have optional names and descriptions.
    • Enforced unique collection names per project (conflict returns 409).
    • Provider-agnostic creation/deletion via pluggable LLM providers; OpenAI provider included.
  • Refactor

    • Collection model and flows moved from organization-scoped to project-scoped.
    • Centralized provider registry and provider interface; creation/deletion use provider contracts.
  • Documentation

    • Collection creation docs updated for provider-based flow and collection-job info endpoint.
  • Tests

    • Test utilities and suites updated to exercise provider-based flows and new helpers.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Jan 20, 2026

📝 Walkthrough

Walkthrough

Adds a provider abstraction and registry, migrates Collection schema to include provider/name/description and remove organization_id, enforces per-project unique collection names, updates create/delete flows to use providers, and updates test utilities and tests to the provider-driven flow.

Changes

Cohort / File(s) Summary
DB Migration
backend/app/alembic/versions/041_extend_collection_table_for_provider_.py
Adds providertype enum and provider column; adds name and description; creates unique partial constraint on (project_id, name) for active rows; drops organization_id FK/column; includes upgrade/downgrade steps.
Models & Exports
backend/app/models/collection.py, backend/app/models/__init__.py, backend/app/models/organization.py
Adds ProviderType; adds provider, name, description to Collection; removes organization_id and Organization.collections relationship; introduces CollectionOptions and updates CreationRequest; re-exports new types.
Provider Layer
backend/app/services/collections/providers/...
__init__.py, base.py, openai.py, registry.py
New provider API: BaseProvider interface; OpenAIProvider implements create/delete; registry (LLMProvider) and get_llm_provider factory that builds providers from credentials.
Collection Services & Helpers
backend/app/services/collections/create_collection.py, .../delete_collection.py, .../helpers.py
Replace OpenAI-specific orchestration with provider-driven create/delete flows; add get_service_name and ensure_unique_name; remove legacy backout; execute_job delegates remote ops to providers and persists provider metadata.
API & CRUD
backend/app/api/routes/collections.py, backend/app/crud/collection/collection.py
create_collection calls ensure_unique_name when name provided; CollectionCrud adds exists_by_name.
Tests — Utilities
backend/app/tests/utils/collection.py, backend/app/tests/utils/llm_provider.py
Rename helpers (get_collectionget_assistant_collection); add get_vector_store_collection; add llm_provider test utilities (mocks, id generators, mock provider); tests set provider=ProviderType.openai.
Tests — Routes & Services
multiple backend/app/tests/...
Tests updated to use get_llm_provider and mocked providers; assertions updated to use get_service_name("openai"); creation/delete flows mock provider.create/provider.delete; import paths adjusted.
Provider Tests
backend/app/tests/services/collections/providers/test_openai_provider.py
New unit tests for OpenAIProvider covering vector-store-only creation, assistant creation, deletion, and exception propagation.

Sequence Diagram(s)

sequenceDiagram
    participant Client as API Client
    participant Route as Collections Route
    participant Registry as Provider Registry
    participant Provider as Provider Impl
    participant Storage as Cloud Storage
    participant DB as Database

    Client->>Route: POST /collections (name, documents, provider)
    Route->>Route: ensure_unique_name(project_id, name)
    Route->>DB: enqueue create job / persist job record
    Route->>Registry: get_llm_provider(provider, project_id, org_id)
    Registry->>DB: fetch provider credentials
    Registry->>Provider: instantiate provider
    Route->>Provider: provider.create(request, storage, document_crud)
    Provider->>Provider: batch_documents()
    Provider->>Storage: create/upload vector store
    alt with assistant
        Provider->>Storage: create assistant
    end
    Provider-->>Route: return llm_service_id & llm_service_name
    Route->>DB: persist Collection (provider, name, description, ids)
    Route-->>Client: send success callback / job completion
Loading
sequenceDiagram
    participant Client as API Client
    participant Route as Collections Route
    participant Registry as Provider Registry
    participant Provider as Provider Impl
    participant Storage as Cloud Storage
    participant DB as Database

    Client->>Route: DELETE /collections/{id}
    Route->>DB: fetch Collection (provider, llm_service_name, ids)
    Route->>DB: enqueue delete job
    Route->>Registry: get_llm_provider(collection.provider, project_id, org_id)
    Registry->>DB: fetch provider credentials
    Registry->>Provider: instantiate provider
    Route->>Provider: provider.delete(collection)
    alt llm_service_name indicates assistant
        Provider->>Storage: delete assistant
    else
        Provider->>Storage: delete vector store
    end
    Provider-->>Route: deletion success
    Route->>DB: remove Collection record
    Route-->>Client: send success callback
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • avirajsingh7
  • Prajna1999

Poem

🐇 I hopped through migrations, fields in tow,
Providers now ready, names set to grow.
Jobs queued and mocked, deletions dispatched,
Unique names guarded, no duplicates matched.
A little rabbit cheers — modular code, nicely patched.

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 54.22% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main objective: making the collection module provider-agnostic, which is the primary architectural change across the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@nishika26 nishika26 marked this pull request as ready for review January 20, 2026 06:17
@nishika26 nishika26 changed the title Enhancement/collection provider agnostic Colelction: making the module provider agnostic Jan 20, 2026
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
backend/app/tests/api/routes/test_assistants.py (1)

1-10: Remove the duplicate patch import.
Line 9 redefines patch already imported on Line 1.

🧹 Proposed fix
-from unittest.mock import patch
 from typing import Any
 from uuid import uuid4
@@
-from unittest.mock import patch
 from app.tests.utils.llm_provider import mock_openai_assistant
backend/app/tests/crud/collections/collection/test_crud_collection_read_all.py (1)

18-24: Double creation of collection will cause errors.

get_assistant_collection already persists the collection to the database (per the relevant snippet, it calls CollectionCrud(db, project.id).create(collection)). Calling crud.create(collection, documents) on line 24 attempts to create the same collection again, which will likely cause a primary key constraint violation or duplicate entries.

🐛 Proposed fix: use the returned collection without re-creating
 def create_collections(db: Session, n: int) -> Collection:
     crud = None
     project = get_project(db)
     openai_mock = OpenAIMock()
     with openai_mock.router:
         client = OpenAI(api_key="sk-test-key")
         for _ in range(n):
             collection = get_assistant_collection(db, project=project)
             store = DocumentStore(db, project_id=collection.project_id)
             documents = store.fill(1)
             if crud is None:
                 crud = CollectionCrud(db, collection.project_id)
-            crud.create(collection, documents)
+            # Collection is already created by get_assistant_collection
+            # If documents need to be associated, use a different method
+            # or modify get_assistant_collection to accept documents

         return crud.project_id
backend/app/tests/services/collections/test_delete_collection.py (1)

1-12: Critical: Missing Session import causing pipeline failure.

The pipeline failure indicates Session is not defined. The test functions use db: Session type hints but Session is not imported.

Proposed fix
 from typing import Any
 from unittest.mock import patch, MagicMock
 from uuid import uuid4, UUID
 
+from sqlmodel import Session
 
 from app.models.collection import DeletionRequest
-
 from app.tests.utils.utils import get_project
🤖 Fix all issues with AI agents
In `@backend/app/alembic/versions/041_extend_collection_table_for_provider_.py`:
- Around line 31-40: The migration adds a NOT NULL column "provider" via
op.add_column using provider_type which will fail on existing rows; change the
migration to add the column as nullable (set nullable=True) or with a
server_default, then run the UPDATE (op.execute("UPDATE collection SET provider
= 'OPENAI' WHERE provider IS NULL")), and finally call
op.alter_column("collection", "provider", existing_type=provider_type,
nullable=False) to enforce NOT NULL; reference the op.add_column call that uses
provider_type and the op.execute update and add the op.alter_column step (or
alternatively add server_default in op.add_column) to fix the ordering and avoid
migration failures.

In `@backend/app/models/collection.py`:
- Around line 31-37: The provider field on the Collection model is incorrectly
defined as a one-element tuple and uses a set for sa_column_kwargs; update the
ProviderType Field declaration (the provider attribute) to return just
Field(...) (remove the trailing comma and surrounding parentheses so it's not a
tuple) and change sa_column_kwargs from {"LLM provider used for this
collection"} to {"comment": "LLM provider used for this collection"} so
SQLAlchemy gets a proper dict comment.

In `@backend/app/services/collections/create_collection.py`:
- Around line 158-161: Replace the explicit equality check against True with a
direct truthiness test: locate the conditional using with_assistant (the if
statement that currently reads with_assistant == True) in create_collection
(affecting creation_request.provider assignment) and change it to use if
with_assistant: (removing the unnecessary parentheses/equality comparison) so
the block sets creation_request.provider = "openai" when with_assistant is
truthy.

In `@backend/app/services/collections/providers/base.py`:
- Around line 22-28: The __init__ method in the provider base class is missing a
return type annotation; update the constructor signature (def __init__(self,
client: Any)) to include an explicit return type of None (def __init__(self,
client: Any) -> None:) so it follows the project's type-hinting guidelines while
leaving the client parameter and assignment (self.client = client) unchanged.

In `@backend/app/services/collections/providers/openai.py`:
- Around line 44-47: The log message prefixes in OpenAIProvider's create method
are incorrect—replace instances of "[OpenAIProvider.execute]" with
"[OpenAIProvider.create]" so the log prefix matches the method name; update all
logger calls inside the create method (the logger.info/error lines that include
vector_store_id, batches, and any other similar messages) to use the correct
"[OpenAIProvider.create]" tag.
- Around line 120-124: The cleanup method is calling _backout with the wrong
arguments; instead of passing result.llm_service_id and result.llm_service_name
it must instantiate the same CRUD class used in delete (based on
result.llm_service_name) and pass that CRUD instance plus the llm_service_id to
_backout; update cleanup to mirror delete’s pattern: determine which CRUD to
construct (e.g., OpenAIAssistantCRUD, PineconeVectorStoreCRUD, etc.) using
result.llm_service_name, create the CRUD instance, then call _backout(crud,
result.llm_service_id).

In
`@backend/app/tests/crud/collections/collection/test_crud_collection_read_one.py`:
- Around line 15-20: The test creates an unused OpenAI instance assigned to the
variable client (OpenAI(api_key="sk-test-key")) inside the openai_mock.router
context; remove that unused assignment or, if the OpenAI constructor must run
for its side effects, change the assignment to _ (i.e., assign to a throwaway
variable) so client is not defined but the constructor still runs; update the
code around the OpenAIMock context where client is currently created.

In `@backend/app/tests/utils/collection.py`:
- Around line 69-75: The Collection creation hardcodes llm_service_name via
get_service_name("openai") which mismatches the dynamic provider argument;
update the llm_service_name assignment in the Collection constructor to derive
the service name from the provider parameter (e.g., call get_service_name with
provider or a normalized form of provider) so llm_service_name aligns with the
provider value used for provider and llm_service_id.

In `@backend/app/tests/utils/llm_provider.py`:
- Around line 22-26: The function mock_openai_assistant currently uses a mutable
default for vector_store_ids which can lead to surprising statefulness; change
the signature to use vector_store_ids: Optional[list[str]] = None and inside
mock_openai_assistant initialize vector_store_ids = ["vs_1", "vs_2"] if
vector_store_ids is None (preserving the Optional[list[str]] type hint), so
callers get a fresh list each invocation and you avoid mutating a module-level
default.
🧹 Nitpick comments (13)
backend/app/services/collections/providers/base.py (1)

37-51: Align create/delete/cleanup docstrings with the actual signature/return.
The current docs mention args (batch_size, with_assistant, assistant_options) and returns (llm_service_id, llm_service_name) that don’t exist in the signature and can mislead implementers.

📝 Suggested docstring cleanup
-        Args:
-            collection_request: Collection parameters (name, description, document list, etc.)
-            storage: Cloud storage instance for file access
-            document_crud: DocumentCrud instance for fetching documents
-            batch_size: Number of documents to process per batch
-            with_assistant: Whether to create an assistant/agent
-            assistant_options: Options for assistant creation (provider-specific)
-
-        Returns:
-            llm_service_id: ID of the resource to delete
-            llm_service_name: Name of the service (determines resource type)
+        Args:
+            collection_request: Collection parameters (name, description, document list, etc.)
+            storage: Cloud storage instance for file access
+            document_crud: DocumentCrud instance for fetching documents
+
+        Returns:
+            Collection created by the provider
backend/app/tests/crud/collections/collection/test_crud_collection_create.py (1)

18-24: Prefer ProviderType.OPENAI over a raw string.

Using the enum keeps values consistent with the model definition and avoids casing drift.

♻️ Suggested tweak
-from app.models import DocumentCollection, Collection
+from app.models import DocumentCollection, Collection, ProviderType
@@
-            provider="OPENAI",
+            provider=ProviderType.OPENAI,
backend/app/services/collections/providers/registry.py (1)

44-71: Unreachable else branch: LLMProvider.get() already raises for unsupported providers.

The else branch (lines 65-69) can never execute because LLMProvider.get(provider) on line 47 already raises a ValueError for any unsupported provider. This is dead code.

Consider removing the else branch or restructuring the logic to avoid the redundant check:

♻️ Proposed refactor
 def get_llm_provider(
     session: Session, provider: str, project_id: int, organization_id: int
 ) -> BaseProvider:
     provider_class = LLMProvider.get(provider)
 
     credentials = get_provider_credential(
         session=session,
         provider=provider,
         project_id=project_id,
         org_id=organization_id,
     )
 
     if not credentials:
         raise ValueError(
             f"Credentials for provider '{provider}' not configured for this project."
         )
 
     if provider == LLMProvider.OPENAI:
         if "api_key" not in credentials:
             raise ValueError("OpenAI credentials not configured for this project.")
         client = OpenAI(api_key=credentials["api_key"])
-    else:
-        logger.error(
-            f"[get_llm_provider] Unsupported provider type requested: {provider}"
-        )
-        raise ValueError(f"Provider '{provider}' is not supported.")
+    else:
+        # Future providers will be handled here
+        raise NotImplementedError(f"Client creation for '{provider}' not implemented.")
 
     return provider_class(client=client)

Alternatively, when adding new providers, this pattern ensures extensibility while keeping the else branch meaningful.

backend/app/tests/crud/collections/collection/test_crud_collection_delete.py (1)

12-14: Add proper type hints for function parameters.

As per coding guidelines, all function parameters should have type hints. The client parameter lacks a type hint, and project_id should be Optional[int] since it can be None.

♻️ Proposed fix
+from typing import Optional
+
-def get_assistant_collection_for_delete(
-    db: Session, client=None, project_id: int = None
-) -> Collection:
+def get_assistant_collection_for_delete(
+    db: Session, client: Optional[OpenAI] = None, project_id: Optional[int] = None
+) -> Collection:
backend/app/services/collections/helpers.py (2)

20-27: Consider raising an error or logging for unknown providers.

get_service_name silently returns an empty string for unknown providers. This could lead to subtle bugs where an invalid provider name goes unnoticed. Consider either raising an error or logging a warning.

♻️ Suggested improvement
 def get_service_name(provider: str) -> str:
     """Get the collection service name for a provider."""
     names = {
         "openai": "openai vector store",
         #   "bedrock": "bedrock knowledge base",
         #  "gemini": "gemini file search store",
     }
-    return names.get(provider.lower(), "")
+    name = names.get(provider.lower())
+    if name is None:
+        logger.warning(f"[get_service_name] Unknown provider: {provider}")
+        return ""
+    return name

116-132: Consider using Session type instead of SessionDep for helper functions.

SessionDep is typically a FastAPI dependency type alias (e.g., Annotated[Session, Depends(get_db)]) meant for route handlers. For a standalone helper function called from various places, using the plain Session type is more appropriate and avoids coupling to FastAPI's dependency injection.

♻️ Proposed fix
+from sqlmodel import Session
+
 def ensure_unique_name(
-    session: SessionDep,
+    session: Session,
     project_id: int,
     requested_name: str,
 ) -> str:
backend/app/alembic/versions/041_extend_collection_table_for_provider_.py (1)

67-67: Use explicit constraint name for unique constraint.

Using None as the constraint name causes Alembic to auto-generate a name, which may differ across environments and complicate future migrations. The downgrade at line 98 already expects collection_name_key - ensure consistency.

♻️ Proposed fix
-    op.create_unique_constraint(None, "collection", ["name"])
+    op.create_unique_constraint("collection_name_key", "collection", ["name"])
backend/app/tests/services/collections/test_create_collection.py (2)

117-119: Add return type annotation to test function.

Per coding guidelines, all functions should have type hints. The db parameter also needs a type hint.

Proposed fix
 def test_execute_job_success_flow_updates_job_and_creates_collection(
-    mock_get_llm_provider, db: Session
-):
+    mock_get_llm_provider: MagicMock, db: Session
+) -> None:

193-195: Add type hints to test function parameters.

Per coding guidelines, add type hints to all function parameters and return values.

Proposed fix
 def test_execute_job_assistant_create_failure_marks_failed_and_deletes_collection(
-    mock_get_llm_provider, db
-):
+    mock_get_llm_provider: MagicMock, db: Session
+) -> None:
backend/app/tests/utils/llm_provider.py (1)

176-179: Add return type annotation.

Per coding guidelines, add return type hints to all functions.

Proposed fix
 def get_mock_provider(
     llm_service_id: str = "mock_service_id",
     llm_service_name: str = "mock_service_name",
-):
+) -> MagicMock:
backend/app/services/collections/create_collection.py (1)

142-142: Unused task_instance parameter.

The task_instance parameter is flagged as unused by static analysis. If this is required for the Celery task interface signature, consider adding a comment to clarify this, or prefixing with underscore (_task_instance) to indicate intentional non-use.

backend/app/tests/services/collections/test_delete_collection.py (1)

74-77: Add type hints to test function parameters.

Per coding guidelines, add type hints to all function parameters.

Proposed fix
 `@patch`("app.services.collections.delete_collection.get_llm_provider")
 def test_execute_job_delete_success_updates_job_and_calls_delete(
-    mock_get_llm_provider, db
-):
+    mock_get_llm_provider: MagicMock, db: Session
+) -> None:
backend/app/models/collection.py (1)

171-177: Align ProviderOptions with ProviderType enum for consistency.

The provider field in ProviderOptions uses Literal["openai"] while the Collection model uses the ProviderType enum. Since ProviderType is already defined for this purpose and the Collection model's description anticipates multiple providers ("openai", "bedrock", "gemini", etc.), use ProviderType instead of Literal to maintain consistency and prepare for future provider support.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@backend/app/tests/services/collections/test_delete_collection.py`:
- Around line 73-76: Update the test function signatures to include typing:
annotate the patched mock parameter as MagicMock, the db parameter as Session,
and add an explicit return type -> None for the listed tests
(test_execute_job_delete_success_updates_job_and_calls_delete,
test_execute_job_delete_failure_marks_job_failed,
test_execute_job_delete_success_with_callback_sends_success_payload,
test_execute_job_delete_remote_failure_with_callback_sends_failure_payload);
import typing symbols if missing (from unittest.mock import MagicMock and from
sqlalchemy.orm import Session) and apply the annotations to the mock and db
parameters and the function return type for each test.

@codecov
Copy link

codecov bot commented Jan 20, 2026

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
backend/app/services/collections/helpers.py (1)

83-102: Add missing type hints and fix function name typo

The function pick_service_for_documennt has a typo in the name and is missing type hint annotations for session, a_crud, and v_crud parameters, as well as a return type annotation. Per coding guidelines, all function parameters and return values require type hints.

Required changes
-def pick_service_for_documennt(session, doc_id: UUID, a_crud, v_crud):
+def pick_service_for_document(session: SessionDep, doc_id: UUID, a_crud: OpenAIAssistantCrud, v_crud: OpenAIVectorStoreCrud) -> Union[OpenAIAssistantCrud, OpenAIVectorStoreCrud]:

Update imports in backend/app/api/routes/documents.py:

  • Change import from pick_service_for_documennt to pick_service_for_document (line 31)
  • Change both function calls from pick_service_for_documennt to pick_service_for_document (lines 190, 223)
backend/app/models/collection.py (2)

87-105: Add explicit return type to model_post_init.

Return type is missing, which violates the repo’s type-hint requirement.

🔧 Suggested fix
-    def model_post_init(self, __context: Any):
+    def model_post_init(self, __context: Any) -> None:
         self.documents = list(set(self.documents))

As per coding guidelines, please add explicit return types.


179-188: Add return type (and a concrete class type) for extract_super_type.

This method yields tuples but has no return type hint.

🔧 Suggested fix
-from typing import Any, Literal
+from typing import Any, Literal, Iterator
@@
-    def extract_super_type(self, cls: "CreationRequest"):
+    def extract_super_type(self, cls: type[SQLModel]) -> Iterator[tuple[str, Any]]:
         for field_name in cls.model_fields.keys():
             field_value = getattr(self, field_name)
             yield (field_name, field_value)

As per coding guidelines, please add explicit return types.

🤖 Fix all issues with AI agents
In `@backend/app/alembic/versions/041_extend_collection_table_for_provider_.py`:
- Line 68: Replace the global unique constraint on the collection name with a
composite unique constraint for per-project uniqueness: update the migration
call to op.create_unique_constraint (the one currently creating a unique
constraint on "collection" with ["name"]) so it instead creates the constraint
on ["project_id","name"] to match the application logic enforced by
ensure_unique_name and CollectionCrud.exists_by_name.

In `@backend/app/services/collections/create_collection.py`:
- Around line 135-143: Annotate the unused job runner arg by importing
typing.Any and changing the execute_job signature to declare task_instance: Any,
then explicitly mark it as intentionally unused inside the function (e.g. assign
it to _ or add a comment like "_ = task_instance  # intentionally unused") so
Ruff stops flagging it; keep other parameter annotations and the return type
as-is and reference the execute_job function and the task_instance parameter
when making the change.

In `@backend/app/services/collections/providers/openai.py`:
- Around line 16-21: The OpenAIProvider.__init__ duplicates assigning
self.client after BaseProvider.__init__ already sets it and the constructor
lacks the return type hint; remove the redundant "self.client = client"
assignment in the OpenAIProvider.__init__ and update the method signature to
include the None return type (i.e., def __init__(self, client: OpenAI) -> None:)
while leaving the call to super().__init__(client) intact.

In `@backend/app/tests/services/collections/providers/test_openai.py`:
- Around line 1-13: The test module name collides with another test (causing
pytest collection errors); rename the test file from test_openai.py to a unique
name like test_openai_provider.py, update any references/imports if present, and
rerun test discovery to confirm no duplicate modules; focus on the module that
tests OpenAIProvider and helpers (OpenAIProvider, generate_openai_id,
get_mock_openai_client_with_vector_store) so their imports still resolve after
renaming.
♻️ Duplicate comments (3)
backend/app/tests/services/collections/test_delete_collection.py (2)

73-76: Missing type hint for db parameter.

The db parameter is missing the Session type annotation. This was flagged in a previous review and should be addressed for consistency with coding guidelines.

 `@patch`("app.services.collections.delete_collection.get_llm_provider")
 def test_execute_job_delete_success_updates_job_and_calls_delete(
-    mock_get_llm_provider: MagicMock, db
+    mock_get_llm_provider: MagicMock, db: Session
 ) -> None:

139-142: Missing Session type hint on db parameter in remaining test functions.

Apply the same fix to the other test functions:

  • test_execute_job_delete_failure_marks_job_failed (line 141)
  • test_execute_job_delete_success_with_callback_sends_success_payload (line 216)
  • test_execute_job_delete_remote_failure_with_callback_sends_failure_payload (line 298)

Also applies to: 213-217, 293-299

backend/app/models/collection.py (1)

31-37: Fix provider Field tuple wrapping.

Line 31 wraps Field(...) in a one-element tuple, which breaks SQLModel field definition and schema mapping.

🐛 Proposed fix
-    provider: ProviderType = (
-        Field(
-            nullable=False,
-            description="LLM provider used for this collection (e.g., 'openai', 'bedrock', 'gemini', etc)",
-            sa_column_kwargs={"comment": "LLM provider used for this collection"},
-        ),
-    )
+    provider: ProviderType = Field(
+        nullable=False,
+        description="LLM provider used for this collection (e.g., 'openai', 'bedrock', 'gemini', etc)",
+        sa_column_kwargs={"comment": "LLM provider used for this collection"},
+    )

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@backend/app/tests/services/collections/providers/test_openai_provider.py`:
- Around line 98-113: The test test_delete_openai_assistant creates a Collection
without required fields; update the Collection instantiation used in
test_delete_openai_assistant (or replace with the test factory from
backend/app/tests/utils/collection.py) to include the non-nullable fields
provider and project_id (e.g., provider=ProviderType.OPENAI or "OPENAI" and a
valid project_id int), so the Collection(...) passed to OpenAIProvider.delete is
fully valid; keep the rest of the test (MagicMock client, OpenAIProvider,
patching OpenAIAssistantCrud, and the final assert) unchanged.
♻️ Duplicate comments (1)
backend/app/tests/utils/llm_provider.py (1)

22-26: Mutable default argument detected.

Using a mutable list as a default argument can cause unexpected behavior. Replace with None and initialize within the function.

Proposed fix
 def mock_openai_assistant(
     assistant_id: str = "assistant_mock",
-    vector_store_ids: Optional[list[str]] = ["vs_1", "vs_2"],
+    vector_store_ids: Optional[list[str]] = None,
     max_num_results: int = 30,
 ) -> OpenAIAssistant:
+    if vector_store_ids is None:
+        vector_store_ids = ["vs_1", "vs_2"]
     return OpenAIAssistant(
🧹 Nitpick comments (2)
backend/app/tests/utils/llm_provider.py (2)

176-179: Add return type hint to get_mock_provider.

Per coding guidelines, all function return values should have type hints.

Proposed fix
 def get_mock_provider(
     llm_service_id: str = "mock_service_id",
     llm_service_name: str = "mock_service_name",
-):
+) -> MagicMock:

111-134: Add return type hint for consistency.

Similar to get_mock_provider, this function should include a return type hint.

Proposed fix
-def get_mock_openai_client_with_vector_store() -> MagicMock:
+def get_mock_openai_client_with_vector_store() -> MagicMock:

The function already has the return type hint. No change needed.

Actually, this function already has the type hint. LGTM.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
backend/app/tests/api/routes/collections/test_collection_info.py (1)

194-230: Undefined name get_collection will cause test failure.

Line 204 calls get_collection(db, project), but this function is not imported. The imports at lines 9-12 only include get_assistant_collection and get_vector_store_collection. This will raise a NameError at runtime.

Additionally, line 197 is missing the type hint for user_api_key_header (should be dict[str, str] per coding guidelines).

🐛 Proposed fix
 def test_collection_info_include_docs_and_url(
     client: TestClient,
     db: Session,
-    user_api_key_header,
+    user_api_key_header: dict[str, str],
 ) -> None:
     """
     Test that when include_docs=true and include_url=true,
     the endpoint returns documents with their URLs.
     """
     project = get_project(db, "Dalgo")
-    collection = get_collection(db, project)
+    collection = get_assistant_collection(db, project)

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@backend/app/models/collection.py`:
- Around line 15-19: ProviderType enum values are uppercase and clash with
ProviderOptions.provider/CreationRequest which expect lowercase literals; change
the enum members in ProviderType (e.g., ProviderType.OPENAI) to use lowercase
strings (e.g., "openai") so Pydantic validation matches the Literal["openai"]
used by ProviderOptions/CreationRequest, and ensure any other members follow the
same lowercase convention.
♻️ Duplicate comments (1)
backend/app/models/collection.py (1)

40-46: Remove tuple wrapping around the provider Field.

This still creates a one‑element tuple instead of a Field, which breaks SQLModel field definition. This was flagged previously and appears unchanged.

🐛 Proposed fix
-    provider: ProviderType = (
-        Field(
-            nullable=False,
-            description="LLM provider used for this collection (e.g., 'openai', 'bedrock', 'gemini', etc)",
-            sa_column_kwargs={"comment": "LLM provider used for this collection"},
-        ),
-    )
+    provider: ProviderType = Field(
+        nullable=False,
+        description="LLM provider used for this collection (e.g., 'openai', 'bedrock', 'gemini', etc)",
+        sa_column_kwargs={"comment": "LLM provider used for this collection"},
+    )
🧹 Nitpick comments (1)
backend/app/models/collection.py (1)

112-113: Add missing type hints to comply with project typing rules.

Both model_post_init and extract_super_type are missing return/parameter typing.

♻️ Proposed fix
-from typing import Any, Literal
+from typing import Any, Literal, Iterator

-    def model_post_init(self, __context: Any):
+    def model_post_init(self, __context: Any) -> None:
         self.documents = list(set(self.documents))

-    def extract_super_type(self, cls: "CreationRequest"):
+    def extract_super_type(self, cls: type[SQLModel]) -> Iterator[tuple[str, Any]]:
         for field_name in cls.model_fields.keys():
             field_value = getattr(self, field_name)
             yield (field_name, field_value)

Please verify this still satisfies the project’s typing checks (e.g., mypy/pyright). As per coding guidelines.

Also applies to: 193-196

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
backend/app/tests/api/routes/collections/test_collection_info.py (1)

194-204: Missing type hint for user_api_key_header parameter.

Line 197 is missing the type annotation dict[str, str] that is consistently applied to the same parameter in other test functions in this file (lines 43, 83, 113, 148).

Proposed fix
 def test_collection_info_include_docs_and_url(
     client: TestClient,
     db: Session,
-    user_api_key_header,
+    user_api_key_header: dict[str, str],
 ) -> None:

As per coding guidelines, always add type hints to all function parameters.

@nishika26 nishika26 self-assigned this Jan 20, 2026
@nishika26 nishika26 added enhancement New feature or request ready-for-review labels Jan 20, 2026
@nishika26 nishika26 linked an issue Jan 20, 2026 that may be closed by this pull request
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (6)
backend/app/services/collections/delete_collection.py (1)

141-149: Add a type hint for task_instance (and mark it unused).

This param is untyped and unused; add a type hint and explicitly ignore it to satisfy typing rules and linting. As per coding guidelines, please add explicit type hints and mark unused params.

🔧 Suggested fix
+from typing import Any
@@
 def execute_job(
     request: dict,
     project_id: int,
     organization_id: int,
     task_id: str,
     job_id: str,
     collection_id: str,
-    task_instance,
+    task_instance: Any,
 ) -> None:
     """Celery worker entrypoint for deleting a collection (both remote and local)."""
+    _ = task_instance  # reserved for task runner hooks; unused for now
backend/app/models/collection.py (1)

112-113: Add explicit -> None return type to model_post_init.

The model_post_init method is missing its return type annotation. Per coding guidelines and Pydantic conventions, all function return types must be explicitly annotated.

Suggested fix
-    def model_post_init(self, __context: Any):
+    def model_post_init(self, __context: Any) -> None:
         self.documents = list(set(self.documents))
backend/app/services/collections/create_collection.py (1)

165-190: Move provider.create() inside the active session context.

document_crud is created inside the with Session(engine) block but passed to provider.create(), which is called outside the block. The provider.create() method calls batch_documents(), which in turn calls document_crud.read_each(). Since DocumentCrud stores the session reference and executes queries via self.session.exec(), any database access after the session closes will fail.

🔧 Suggested fix
        with Session(engine) as session:
            collection_job_crud = CollectionJobCrud(session, project_id)
            collection_job = collection_job_crud.read_one(job_uuid)
            collection_job = collection_job_crud.update(
                job_uuid,
                CollectionJobUpdate(
                    task_id=task_id,
                    status=CollectionJobStatus.PROCESSING,
                ),
            )

            storage = get_cloud_storage(session=session, project_id=project_id)
            document_crud = DocumentCrud(session, project_id)

            provider = get_llm_provider(
                session=session,
                provider=creation_request.provider,
                project_id=project_id,
                organization_id=organization_id,
            )
-
-        result = provider.create(
-            collection_request=creation_request,
-            storage=storage,
-            document_crud=document_crud,
-        )
+            result = provider.create(
+                collection_request=creation_request,
+                storage=storage,
+                document_crud=document_crud,
+            )
backend/app/tests/api/routes/collections/test_collection_info.py (1)

194-198: Add missing type hint for user_api_key_header parameter.

This parameter is missing the type annotation dict[str, str], which is used consistently in other test functions in this file (lines 43, 83, 113, 148, 179). As per coding guidelines, all function parameters should have type hints.

Proposed fix
 def test_collection_info_include_docs_and_url(
     client: TestClient,
     db: Session,
-    user_api_key_header,
+    user_api_key_header: dict[str, str],
 ) -> None:
backend/app/tests/crud/collections/collection/test_crud_collection_delete.py (1)

12-31: Add type hints and fix potential None value for project_id.

The function parameters client and project_id are missing type hints, violating the coding guidelines. Additionally, project_id can be None by default but is passed directly to Collection(project_id=project_id, ...) where the model requires a non-null value. The function retrieves a project via get_project(db) but never uses its id as a fallback.

Proposed fix
+from typing import Optional
+from openai import OpenAI as OpenAIClient
+
 def get_assistant_collection_for_delete(
-    db: Session, client=None, project_id: int = None
+    db: Session,
+    client: Optional[OpenAIClient] = None,
+    project_id: Optional[int] = None,
 ) -> Collection:
     project = get_project(db)
     if client is None:
         client = OpenAI(api_key="test_api_key")
+    if project_id is None:
+        project_id = project.id

     vector_store = client.vector_stores.create()
     assistant = client.beta.assistants.create(
         model="gpt-4o",
         tools=[{"type": "file_search"}],
         tool_resources={"file_search": {"vector_store_ids": [vector_store.id]}},
     )

     return Collection(
         project_id=project_id,
         llm_service_id=assistant.id,
         llm_service_name="gpt-4o",
         provider=ProviderType.openai,
     )
backend/app/tests/utils/collection.py (1)

43-51: Remove organization_id field — it no longer exists on the Collection model.

The Collection model was scoped to projects only and no longer has an organization_id field. Setting this field during instantiation will cause a model validation error.

Proposed fix
     collection = Collection(
         id=collection_id or uuid4(),
         project_id=project.id,
-        organization_id=project.organization_id,
         llm_service_name=model,
         llm_service_id=assistant_id,
         provider=ProviderType.openai,
     )
🤖 Fix all issues with AI agents
In `@backend/app/api/docs/collections/create.md`:
- Around line 13-18: Update the wording and capitalization for all occurrences
of "Openai" and "Openai vector Store" in the document: replace with "OpenAI" and
"OpenAI vector store" respectively, and tighten the sentence about cleanup to
read more concisely (e.g., "If any LLM service interaction fails, all service
resources are cleaned up; for example, if an OpenAI vector store cannot be
created, any files uploaded to OpenAI are removed. Failures can result from
service outages, invalid parameters, or unsupported document types such as
unparseable PDFs."). Apply the same edits to the similar paragraph referenced
(the other occurrence around lines 20-22).

In `@backend/app/services/collections/create_collection.py`:
- Around line 149-155: Initialize creation_request to None before the try block
so the except path can reference it without causing UnboundLocalError;
specifically, add creation_request = None above the try, keep the
CreationRequest(**request) call inside the try to assign creation_request, and
ensure any failure callback logic (which references creation_request) will
handle a None value safely if construction failed.
♻️ Duplicate comments (3)
backend/app/services/collections/create_collection.py (1)

135-143: Type-annotate and mark task_instance as intentionally unused.

This param is untyped and unused; add a type hint and explicitly ignore it to satisfy typing rules and linting. As per coding guidelines, please add explicit type hints and mark unused params.

🔧 Suggested fix
+from typing import Any
@@
 def execute_job(
     request: dict,
     with_assistant: bool,
     project_id: int,
     organization_id: int,
     task_id: str,
     job_id: str,
-    task_instance,
+    task_instance: Any,
 ) -> None:
@@
-    start_time = time.time()
+    _ = task_instance  # reserved for task runner hooks; unused for now
+    start_time = time.time()
backend/app/models/collection.py (1)

40-46: Fix provider Field tuple wrapping (SQLModel won’t map the column).

The trailing comma makes this a tuple rather than a Field, so the column mapping will be broken.

🔧 Suggested fix
-    provider: ProviderType = (
-        Field(
-            nullable=False,
-            description="LLM provider used for this collection (e.g., 'openai', 'bedrock', 'gemini', etc)",
-            sa_column_kwargs={"comment": "LLM provider used for this collection"},
-        ),
-    )
+    provider: ProviderType = Field(
+        nullable=False,
+        description="LLM provider used for this collection (e.g., 'openai', 'bedrock', 'gemini', etc)",
+        sa_column_kwargs={"comment": "LLM provider used for this collection"},
+    )
backend/app/tests/services/collections/test_delete_collection.py (1)

73-76: Type-annotate db in patched tests.

Several test functions still accept db without a type hint. As per coding guidelines, add db: Session to each.

🔧 Suggested fix
 `@patch`("app.services.collections.delete_collection.get_llm_provider")
 def test_execute_job_delete_success_updates_job_and_calls_delete(
-    mock_get_llm_provider: MagicMock, db
+    mock_get_llm_provider: MagicMock, db: Session
 ) -> None:
@@
 `@patch`("app.services.collections.delete_collection.get_llm_provider")
 def test_execute_job_delete_failure_marks_job_failed(
-    mock_get_llm_provider: MagicMock, db
+    mock_get_llm_provider: MagicMock, db: Session
 ) -> None:
@@
 def test_execute_job_delete_success_with_callback_sends_success_payload(
     mock_send_callback: MagicMock,
     mock_get_llm_provider: MagicMock,
-    db,
+    db: Session,
 ) -> None:
@@
 def test_execute_job_delete_remote_failure_with_callback_sends_failure_payload(
     mock_send_callback: MagicMock,
     mock_get_llm_provider: MagicMock,
-    db,
+    db: Session,
 ) -> None:

Also applies to: 138-141, 210-214, 291-295

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
backend/app/services/collections/create_collection.py (1)

166-191: Keep the DB session alive for provider.create inputs.

document_crud (and potentially storage) are built inside the with Session(...) block but used after the block exits. If provider.create touches the DB via document_crud, this will hit a closed session at runtime. Move the provider.create(...) call into the same session scope (or recreate DocumentCrud inside a new session that stays open for the call).

🔧 Suggested fix
-        with Session(engine) as session:
+        with Session(engine) as session:
             collection_job_crud = CollectionJobCrud(session, project_id)
             collection_job = collection_job_crud.read_one(job_uuid)
             collection_job = collection_job_crud.update(
                 job_uuid,
                 CollectionJobUpdate(
                     task_id=task_id,
                     status=CollectionJobStatus.PROCESSING,
                 ),
             )
 
             storage = get_cloud_storage(session=session, project_id=project_id)
             document_crud = DocumentCrud(session, project_id)
 
             provider = get_llm_provider(
                 session=session,
                 provider=creation_request.provider,
                 project_id=project_id,
                 organization_id=organization_id,
             )
-
-        result = provider.create(
-            collection_request=creation_request,
-            storage=storage,
-            document_crud=document_crud,
-        )
+            result = provider.create(
+                collection_request=creation_request,
+                storage=storage,
+                document_crud=document_crud,
+            )
♻️ Duplicate comments (1)
backend/app/services/collections/create_collection.py (1)

135-143: Type-annotate and mark task_instance as intentionally unused.

Ruff flags task_instance as unused, and it lacks a type hint. If it’s required by the task runner, explicitly annotate it and mark it unused to satisfy linting.

🔧 Suggested fix
+from typing import Any
@@
 def execute_job(
     request: dict,
     with_assistant: bool,
     project_id: int,
     organization_id: int,
     task_id: str,
     job_id: str,
-    task_instance,
+    task_instance: Any,
 ) -> None:
@@
+    _ = task_instance  # reserved for task runner hooks; unused for now

As per coding guidelines, please add explicit type hints for all parameters and returns.

@vprashrex
Copy link
Collaborator

Collection crud -> it is soft deleting the collection entity, but the collection data is usually present in the db, but the DB unique constraint (project_id, name) still treats deleted rows as existing.
However, exists_by_name ignores soft-deleted collections, which can lead to a mismatch where the router reports a name as available but the database later rejects it.

Suggestion:
add partial unique index -> only active row ... soft deleted rows will not be taken in account,
or else remove this line from the exists_by_nane function -> where(Collection.deleted_at.is_(None)) -> which would be non-feasible for the user since they cannot re-use the deleted username,

I am not sure about the whole flow. if u think my analogy is wrong then ignore it.

def exists_by_name(self, collection_name: str) -> bool:
        statement = (
            select(Collection.id)
            .where(Collection.project_id == self.project_id)
            .where(Collection.name == collection_name)
            .where(Collection.deleted_at.is_(None))  # -> here it will exclude the soft deleted collection name 
        )
        result = self.session.exec(statement).first()
        return result is not None
        

Only soft deleting is happening here in collection

def delete_by_id(self, collection_id: UUID) -> Collection:
       coll = self.read_one(collection_id)
       coll.deleted_at = now()

       return self._update(coll)

unique constraint is there at the db level

__table_args__ = (
       UniqueConstraint(
           "project_id",
           "name",
           name="uq_collection_project_id_name",  
       ),
   )

@nishika26 nishika26 changed the title Colelction: making the module provider agnostic Collection: making the module provider agnostic Jan 22, 2026
Copy link
Collaborator

@Prajna1999 Prajna1999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comments are primarily related to creating separate Baseproviders and its related methods which are already inside the llm/call service. It's ideal to reuse those. But we have code changes coming up for STT and TTS usecases as well. So if we reuse those there would be conflicts. In two minds on how to take care of the issue

sa_column_kwargs={"comment": "Reference to the organization"},
name: str = Field(
nullable=True,
description="Name of the collection",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: we can avoid description for self-evident keys

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

description has been added to all the parameter in all data models, if we start picking out self-evident keys now and remove description from them, then inconsistency will occur

)
deleted_at: datetime | None = Field(
default=None,
description="Timestamp when the collection was deleted",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

likewise nitpick

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

description: str | None = Field(
default=None, description="Description of the collection"
)
documents: list[UUID] = Field(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

list of strings to avoid throwing validation error if the document id is an int or non UUID string. The case might arise for liberal type enforcing later point in time.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but why will we be less strict with this ID format later, like why will a scenario occur where we will have to take int or str values here

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like since even if the document list is not an UUID, we should allow the caller to do the downstream processing. Mostly it would be an UUID, but event if some bug occurs and its not a UUID, the requirement could be relaxed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't understand why it should not be an UUID, we are going to be using UUID as the primary identifier for documents, and if a non uuid is given , it should instantly throw an error

def get_service_name(provider: str) -> str:
"""Get the collection service name for a provider."""
names = {
"openai": "openai vector store",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use underscores among strings

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this string goes to DB, and then is also there in the value of "llm service name" when we return the response body, so then if we add underscores, returning the string with underscores in response body won't be as clear to users

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay for now we can keep it as is.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
backend/app/crud/collection/collection.py (1)

1-5: Remove duplicate import logging statement.

logging is imported twice (lines 1 and 5).

 import logging
 import functools as ft
 from uuid import UUID
 from typing import Optional
-import logging
backend/app/models/collection.py (1)

212-220: Add missing fields to CollectionPublic response model.

The CollectionPublic response model is missing three fields that exist in the Collection model and are returned in API list and creation responses:

  • provider: Identifies the LLM service type
  • name: User-provided collection name
  • description: User-provided collection metadata

Since clients need access to these fields through the API, they should be added to CollectionPublic:

Fields to add to CollectionPublic
class CollectionPublic(SQLModel):
    id: UUID
    provider: ProviderType
    llm_service_id: str
    llm_service_name: str
    name: str | None
    description: str | None
    project_id: int
    inserted_at: datetime
    updated_at: datetime
    deleted_at: datetime | None = None
🤖 Fix all issues with AI agents
In `@backend/app/crud/collection/collection.py`:
- Around line 96-103: The exists_by_name method currently omits the project
scope and should mirror other methods by adding a filter Collection.project_id
== self.project_id to the statement so the name check is per-project; update the
select in exists_by_name (method name: exists_by_name, model: Collection, field:
project_id, instance: self.project_id) to include that where-clause and keep the
same session execution (self.session.exec(...).first()); also add a debug/error
log entry prefixed with the function name (e.g., "exists_by_name: ...")
following the project's logging convention to record the checked collection_name
and project_id.

In `@backend/app/models/collection.py`:
- Around line 61-70: The type hints for the Collection model fields are
inconsistent with nullable=True: update the type annotations for the name and
description fields (symbols: name, description in the Collection model) to allow
None (e.g., change from str to str | None or Optional[str]) so the Python types
match the DB nullable setting; ensure imports are added/adjusted if using
Optional from typing.
🧹 Nitpick comments (1)
backend/app/models/collection.py (1)

7-7: Remove unused UniqueConstraint import.

UniqueConstraint is imported but not used. The __table_args__ now uses Index with unique=True for the partial unique constraint.

-from sqlalchemy import UniqueConstraint, Index, text
+from sqlalchemy import Index, text

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
backend/app/models/collection.py (2)

114-115: Add return type for model_post_init.

Missing return type violates the project’s typing rule.

✏️ Proposed fix
-    def model_post_init(self, __context: Any):
+    def model_post_init(self, __context: Any) -> None:
         self.documents = list(set(self.documents))
As per coding guidelines.

195-198: Type-hint cls and return type in extract_super_type.

Make cls a class type and add a return annotation to satisfy typing rules.

✏️ Proposed fix
-    def extract_super_type(self, cls: "CreationRequest"):
+    def extract_super_type(self, cls: type["CreationRequest"]) -> list[tuple[str, Any]]:
         for field_name in cls.model_fields.keys():
             field_value = getattr(self, field_name)
             yield (field_name, field_value)
As per coding guidelines.

@nishika26
Copy link
Collaborator Author

Suggestion: add partial unique index -> only active row ... soft deleted rows will not be taken in account, or else remove this line from the exists_by_nane function -> where(Collection.deleted_at.is_(None)) -> which would be non-feasible for the user since they cannot re-use the deleted username,
Thank you for this suggeston @vprashrex , I have incorporated it

@nishika26 nishika26 requested review from vprashrex and removed request for AkhileshNegi January 28, 2026 05:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request ready-for-review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Collections: making this module llm provider agnostic

4 participants