Skip to content

Conversation

@are-ces
Copy link
Contributor

@are-ces are-ces commented Jan 23, 2026

Description

  • updated lls to 0.4.2
  • removed unused methods from deprecated endpoints
  • updated /models endpoint parsing of lls models
  • added huggingface folder creation in container
  • did some refactoring to satisfy the stricter linting/mypy checks
  • updated run.yaml

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

  • Assisted-by: Claude

Related Tickets & Documents

  • Related Issue # LCORE-1216

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

Check e2e tests

Summary by CodeRabbit

  • New Features

    • Added validation for feedback requests (UUID format, sentiment values, and required feedback fields).
    • Added media type validation for query requests.
  • Chores

    • Updated Llama Stack dependency to 0.4.2.
    • Removed telemetry configuration.
    • Simplified query and streaming endpoints.
    • Updated container base images.
    • Removed legacy token counting utilities.

✏️ Tip: You can customize this high-level summary in your review settings.

- updated lls to 0.4.2
- removed unused methods from deprecated endpoints
- updated /models endpoint parsing of lls models
- added huggingface folder creation in container
- did some refactoring to satisfy the stricter linting/mypy checks
- updated run.yaml
@are-ces are-ces requested review from asimurka and tisnik January 23, 2026 13:44
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 23, 2026

Walkthrough

This PR upgrades Llama Stack to 0.4.2, introducing widespread API changes where model attributes shift from identifier/provider_id to id/custom_metadata, and refactors query and streaming endpoints to use new backend integration patterns while removing legacy components.

Changes

Cohort / File(s) Summary
Dependency & Version Updates
pyproject.toml, src/constants.py
Upgraded llama-stack and llama-stack-client from 0.3.5 to 0.4.2; updated maximal supported version constant.
Query Endpoint Refactoring
src/app/endpoints/query.py
Major restructuring: removed legacy wrappers (query_endpoint_handler, get_topic_summary, retrieve_response, get_rag_toolgroups, parse_referenced_documents); updated model selection to use model.id and custom_metadata; simplified integration with backend services.
Streaming Query Refactoring
src/app/endpoints/streaming_query.py
Significant reduction: removed stream_build_event, _handle_inference_event, create_agent_response_generator, streaming_query_endpoint_handler; replaced with streamlined base handler; removed Agent API-specific streaming logic.
A2A Server Architecture
src/app/endpoints/a2a.py
Introduced new A2AAgentExecutor and TaskResultAggregator classes; replaced external OpenAI/Responses imports with internal a2a.server architecture; added lazy-initialized task and context stores; updated streaming and task state handling.
Model Attribute Schema Changes
src/app/endpoints/models.py, src/app/endpoints/query_v2.py, src/app/endpoints/streaming_query_v2.py, src/metrics/utils.py, src/utils/shields.py
Consistently updated model access patterns: identifier → id; added custom_metadata extraction for model_type and provider_id; updated filtering and provider resolution logic across multiple endpoints.
Health Endpoint & Type Updates
src/app/endpoints/health.py
Introduced local HealthStatus enum to replace external dependency; defined members (OK, ERROR, NOT_IMPLEMENTED, HEALTHY, UNKNOWN).
Configuration & Default Value Updates
src/app/endpoints/rags.py, src/app/endpoints/conversations_v3.py
RAGs endpoint: added default fallbacks for object, status, usage_bytes; Conversations: changed parameter passing from NOT_GIVEN to None.
Request Validation
src/models/requests.py
Added validators to QueryRequest (media_type validation); added multiple validators to FeedbackRequest (UUID, sentiment, categories deduplication, feedback presence check); removed get_documents method.
Utility Refactoring
src/utils/token_counter.py, src/utils/types.py, src/utils/query.py, src/utils/common.py
Removed token extraction/metrics functions; removed TurnSummary.append_tool_calls_from_llama; updated OpenAI response imports; updated AsyncLlamaStackAsLibraryClient import path.
Import Path Updates
src/client.py, src/utils/common.py, src/utils/query.py, src/app/endpoints/query_v2.py, src/app/endpoints/rlsapi_v1.py
Updated imports: AsyncLlamaStackAsLibraryClient (llama_stack → llama_stack.core.library_client); OpenAI response models (llama_stack.apis.agents.openai_responses → llama_stack_api.openai_responses).
CI/CD Workflow
.github/workflows/e2e_tests.yaml
Added conditional guard to "Docker Login for quay access" step to only run when matrix.mode equals 'server'.
Container Configuration
Containerfile, test.containerfile
Added creation of Hugging Face cache directory (/opt/app-root/src/.cache/huggingface) with proper ownership; bumped test base image from rhoai-3.2 to rhoai-3.3.
Configuration Files
run.yaml, tests/e2e/configs/run-ci.yaml
Removed telemetry API entry and top-level telemetry configuration block; cleared registered_resources.models list; removed gpt-4o-mini model registration.
Endpoint Test Updates
tests/unit/app/endpoints/test_*.py, tests/integration/endpoints/test_*.py
Widespread test refactoring: updated mock model attributes (identifier → id, custom_metadata usage); simplified query and streaming test scaffolding; added new parameterized tests for model hint evaluation; removed legacy test paths.
Feature Tests & Expectations
tests/e2e/features/faiss.feature, tests/e2e/features/info.feature
Updated expected RAG vector store ID and llama-stack version in response assertions.
Utility Tests
tests/unit/metrics/test_utis.py, tests/unit/utils/test_shields.py, tests/unit/models/requests/test_query_request.py
Removed tests for removed utility functions; updated mock model construction to use custom_metadata; simplified test suites.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

Suggested labels

ok-to-test

Suggested reviewers

  • tisnik
  • asimurka
🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title directly and accurately reflects the primary objective: upgrading llama-stack from 0.3.5 to 0.4.2, which is the main theme across all changes.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)
tests/unit/app/endpoints/test_streaming_query.py (1)

26-27: Update comment to reflect current llama-stack version.

The comment references "llama-stack-client 0.3.x" but this PR upgrades to 0.4.2. Update the comment to accurately reflect the version context.

Suggested fix
-# Note: content_delta module doesn't exist in llama-stack-client 0.3.x
+# Note: content_delta module doesn't exist in llama-stack-client 0.4.x
 # These are mock classes for backward compatibility with Agent API tests
src/utils/types.py (1)

154-211: Remove the unused _extract_rag_chunks_from_response method.

This private method at lines 154–211 is no longer called anywhere in the codebase. Since append_tool_calls_from_llama was removed, this method became dead code and should be deleted.

src/app/endpoints/streaming_query.py (1)

693-725: SSE error responses must keep text/event-stream Content-Type.

The APIStatusError path currently sets the HTTP media type to query_request.media_type, but the payload is still SSE-formatted. This can break clients expecting SSE frames.

🐛 Suggested fix
     return StreamingResponse(
         stream_http_error(error_response),
         status_code=error_response.status_code,
-        media_type=query_request.media_type or MEDIA_TYPE_JSON,
+        media_type="text/event-stream",
     )
src/app/endpoints/query.py (1)

474-505: Guard against missing provider_id in custom_metadata.

The custom_metadata field in llama-stack's model response is optional and free-form. If provider_id is absent, the function returns an empty string that propagates to conversation history, database storage, and validation logic. This can cause silent data corruption and incorrect model matching. Consider falling back to parsing provider_id from the model.id prefix (format: provider/model) when it's missing.

Suggested fix
             provider_id = (
                 str(model.custom_metadata.get("provider_id", ""))
                 if model.custom_metadata
                 else ""
             )
+            if not provider_id and "/" in model.id:
+                provider_id = model.id.split("/", 1)[0]
tests/unit/app/endpoints/test_query.py (1)

513-517: Incorrect type annotations for parameterized test function.

The type hints list are inaccurate for these parameters. Based on the parametrize values, these should be:

  • user_conversation: UserConversation | None
  • request_values: tuple[str | None, str | None]
  • expected_values: tuple[str | None, str | None]
🔧 Proposed fix
 def test_evaluate_model_hints(
-    user_conversation: list,
-    request_values: list,
-    expected_values: list,
+    user_conversation: UserConversation | None,
+    request_values: tuple[str | None, str | None],
+    expected_values: tuple[str | None, str | None],
 ) -> None:
🤖 Fix all issues with AI agents
In `@src/app/endpoints/conversations_v3.py`:
- Around line 334-337: The conversations.items.list call is passing explicit
None for parameters (after, include, limit, order) which sends JSON nulls;
update the call in the code that invokes client.conversations.items.list (search
for conversation_items_response or llama_stack_conv_id usage) to either remove
those keyword args entirely so they are omitted, or import and pass the
NOT_GIVEN sentinel from llama_stack_client and set after=NOT_GIVEN,
include=NOT_GIVEN, limit=NOT_GIVEN, order=NOT_GIVEN to preserve intended
defaults.

In `@src/utils/common.py`:
- Line 9: The import on line 9 uses a non-compliant path; replace or verify the
symbol AsyncLlamaStackAsLibraryClient so it comes from the documented package
(llama_stack_client) rather than llama_stack.core; locate the import statement
that references AsyncLlamaStackAsLibraryClient in src/utils/common.py and either
change it to import AsyncLlamaStackAsLibraryClient from llama_stack_client (or,
if that symbol is not exported in llama-stack 0.4.2, remove/replace it with the
supported AsyncLlamaStackClient from llama_stack_client) and update any code
using AsyncLlamaStackAsLibraryClient accordingly so all references use the
supported client class.
🧹 Nitpick comments (8)
test.containerfile (1)

11-13: Ownership differs from production Containerfile.

The Hugging Face cache directory setup is appropriate for embedding model caching. However, note that test.containerfile uses 1001:0 (OpenShift-style) while the production Containerfile uses 1001:1001 for the same directory. This inconsistency could cause permission issues if tests don't reflect production behavior accurately.

Consider aligning ownership with the production Containerfile for consistency:

Suggested alignment
-    chown -R 1001:0 /opt/app-root/src/.llama && \
+    chown -R 1001:1001 /opt/app-root/src/.llama && \
     mkdir -p /opt/app-root/src/.cache/huggingface && \
-    chown -R 1001:0 /opt/app-root/src/.cache
+    chown -R 1001:1001 /opt/app-root/src/.cache
tests/unit/app/endpoints/test_streaming_query.py (1)

46-218: Consider consolidating duplicate mock class definitions.

All these mock classes (ToolCallDelta, TurnResponseEvent, AgentTurnResponseStreamChunk, etc.) share identical __init__ implementations. You could reduce duplication with a factory function or base class:

Optional consolidation approach
def _create_mock_class(name: str, doc: str) -> type:
    """Factory to create mock classes for Agent API types."""
    def __init__(self, **kwargs: Any):
        for key, value in kwargs.items():
            setattr(self, key, value)
    __init__.__doc__ = f"Initialize {name} with provided kwargs."
    return type(name, (), {"__init__": __init__, "__doc__": doc})

ToolCallDelta = _create_mock_class("ToolCallDelta", "Mock ToolCallDelta for Agent API tests.")
TurnResponseEvent = _create_mock_class("TurnResponseEvent", "Mock TurnResponseEvent for Agent API tests.")
# ... etc

However, the explicit class definitions do provide clarity on which types are being mocked, which can be valuable in test code.

src/models/requests.py (1)

4-6: Use typing_extensions.Self to avoid Python <3.11 import errors.

typing.Self will break on older runtimes and conflicts with the repo’s guidance for model validators. Please switch to typing_extensions.Self unless the project is strictly 3.11+.

🔧 Proposed change
-from typing import Optional, Self
+from typing import Optional
+from typing_extensions import Self

As per coding guidelines, prefer typing_extensions.Self for model validators.

tests/unit/models/requests/test_query_request.py (1)

78-105: Add a negative test for unsupported media_type.

This validator is new behavior; a failure-path test will lock it in and improve coverage.

🧪 Suggested test
+    def test_constructor_invalid_media_type(self) -> None:
+        """Test that unsupported media_type values are rejected."""
+        with pytest.raises(ValueError, match="media_type must be either"):
+            _ = QueryRequest(query="Tell me about Kubernetes", media_type="application/xml")

As per coding guidelines, new validation paths should be covered by unit tests.

src/app/endpoints/a2a.py (2)

67-94: Guard lazy store initialization to avoid concurrent double-creation.

Concurrent requests can race and create multiple stores. Consider a lightweight async lock around initialization.

🔧 Example guard
-_TASK_STORE: Optional[TaskStore] = None
-_CONTEXT_STORE: Optional[A2AContextStore] = None
+_TASK_STORE: Optional[TaskStore] = None
+_CONTEXT_STORE: Optional[A2AContextStore] = None
+_TASK_STORE_LOCK: Optional[asyncio.Lock] = None
+_CONTEXT_STORE_LOCK: Optional[asyncio.Lock] = None

 async def _get_task_store() -> TaskStore:
     """Get the A2A task store, creating it if necessary."""
     global _TASK_STORE  # pylint: disable=global-statement
-    if _TASK_STORE is None:
-        _TASK_STORE = await A2AStorageFactory.create_task_store(configuration.a2a_state)
+    global _TASK_STORE_LOCK  # pylint: disable=global-statement
+    if _TASK_STORE_LOCK is None:
+        _TASK_STORE_LOCK = asyncio.Lock()
+    async with _TASK_STORE_LOCK:
+        if _TASK_STORE is None:
+            _TASK_STORE = await A2AStorageFactory.create_task_store(configuration.a2a_state)
     return _TASK_STORE

 async def _get_context_store() -> A2AContextStore:
     """Get the A2A context store, creating it if necessary."""
     global _CONTEXT_STORE  # pylint: disable=global-statement
-    if _CONTEXT_STORE is None:
-        _CONTEXT_STORE = await A2AStorageFactory.create_context_store(
-            configuration.a2a_state
-        )
+    global _CONTEXT_STORE_LOCK  # pylint: disable=global-statement
+    if _CONTEXT_STORE_LOCK is None:
+        _CONTEXT_STORE_LOCK = asyncio.Lock()
+    async with _CONTEXT_STORE_LOCK:
+        if _CONTEXT_STORE is None:
+            _CONTEXT_STORE = await A2AStorageFactory.create_context_store(
+                configuration.a2a_state
+            )
     return _CONTEXT_STORE

803-808: Consider making the 300s stream idle timeout configurable.

Long-running tool calls with sparse output could exceed this and terminate streams prematurely.

src/metrics/utils.py (1)

39-45: Consider simplifying the redundant custom_metadata check.

Since models are already filtered at line 31 to only include those where model.custom_metadata is truthy, the check at line 42 is redundant. However, this defensive pattern is harmless and protects against future refactoring.

♻️ Optional simplification
     for model in models:
-        provider = (
-            str(model.custom_metadata.get("provider_id", ""))
-            if model.custom_metadata
-            else ""
-        )
+        provider = str(model.custom_metadata.get("provider_id", ""))
         model_name = model.id
tests/unit/app/endpoints/test_query.py (1)

450-512: Consider adding a test case for conflicting request and conversation values.

The parameterized test covers key scenarios but misses the case where the request specifies a different model/provider than the user conversation. This edge case would verify the expected precedence behavior.

📝 Suggested additional test case
# Request values differ from user conversation (request should take precedence)
(
    UserConversation(
        id="conv1",
        user_id="user1",
        last_used_provider="old_provider",
        last_used_model="old_model",
        message_count=1,
    ),
    ("new_provider", "new_model"),
    # Expect request values to override conversation
    ("new_provider", "new_model"),
),

Add to the ids list: "Request overrides user conversation"

Comment on lines +334 to +337
after=None,
include=None,
limit=None,
order=None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, let's see the file and the specific lines in question
cat -n src/app/endpoints/conversations_v3.py | sed -n '320,350p'

Repository: lightspeed-core/lightspeed-stack

Length of output: 1319


🏁 Script executed:

# Check if NOT_GIVEN is used elsewhere in this file
rg -n "NOT_GIVEN" src/app/endpoints/conversations_v3.py

Repository: lightspeed-core/lightspeed-stack

Length of output: 58


🏁 Script executed:

# Search for NOT_GIVEN usage patterns across the codebase
rg -n "NOT_GIVEN" --type py | head -20

Repository: lightspeed-core/lightspeed-stack

Length of output: 58


🏁 Script executed:

# Look for llama-stack-client imports and usage patterns
rg -n "from llama_stack_client" --type py | head -10

Repository: lightspeed-core/lightspeed-stack

Length of output: 1132


🏁 Script executed:

# Check if there's a requirements or dependencies file
fd -e txt -e toml -e cfg -e lock | grep -E "(requirements|pyproject|setup|lock)"

Repository: lightspeed-core/lightspeed-stack

Length of output: 270


🏁 Script executed:

# Check pyproject.toml for llama-stack-client version
cat pyproject.toml | grep -A 5 -B 5 "llama"

Repository: lightspeed-core/lightspeed-stack

Length of output: 1265


🏁 Script executed:

# Broader search for NOT_GIVEN across all files
rg -n "NOT_GIVEN" --type py

Repository: lightspeed-core/lightspeed-stack

Length of output: 58


🏁 Script executed:

# Check if there are any test files for conversations endpoint to see expected patterns
find . -name "*conversations*" -type f | head -20

Repository: lightspeed-core/lightspeed-stack

Length of output: 543


🌐 Web query:

llama-stack-client 0.4.2 conversations items list API parameters None handling

💡 Result:

For llama-stack-client==0.4.2 (uploaded Jan 16, 2026) [1], the Conversations → Items → List call is a paginated “list” response (it includes first_id, last_id, has_more), and those pagination IDs can legitimately be None when absent/empty. [3]

None handling” (what to do in practice)

  • For request parameters (query/body): if you want a parameter to be omitted, don’t pass it at all (rather than passing None). Passing None typically means “send an explicit null / explicit value” and can change server-side validation/behavior.
  • For response fields: None can mean either the JSON field was missing or it was explicitly null. The client documents how to distinguish those cases using the Pydantic model’s .model_fields_set. [2]

If you paste the exact Python call you’re making (including the kwargs you’re passing as None) and the error/traceback, I can map it to the specific parameter(s) that should be omitted vs explicitly set.


🏁 Script executed:

# Check test files to see how conversations.items.list is properly called
cat -n tests/unit/app/endpoints/test_conversations_v2.py | head -100

Repository: lightspeed-core/lightspeed-stack

Length of output: 4100


🏁 Script executed:

# Check test_conversations_v3 if it exists to see pattern
ls -la tests/unit/app/endpoints/test_conversations* 2>/dev/null

Repository: lightspeed-core/lightspeed-stack

Length of output: 272


🏁 Script executed:

# Search for how other endpoints call list methods with optional parameters
rg -n "\.list\(" src/app/endpoints/ --type py -A 3 | head -40

Repository: lightspeed-core/lightspeed-stack

Length of output: 2711


🏁 Script executed:

# Check if there's any usage of omitting parameters instead of passing None
rg -n "conversations\.items\.list" --type py -A 5

Repository: lightspeed-core/lightspeed-stack

Length of output: 551


🌐 Web query:

llama-stack-client 0.4.2 NOT_GIVEN sentinel optional parameters conversations

💡 Result:

In llama-stack-client==0.4.2 (released Jan 16, 2026) (pypi.org), many SDK method signatures use a Stainless-style sentinel:

  • param: T | NotGiven = NOT_GIVEN
  • Meaning:
    • leave it as NOT_GIVEN (the default) ⇒ the field is omitted from the request body/query entirely
    • pass None (when the type allows Optional[...]) ⇒ the field is sent as JSON null (i.e., explicitly set/cleared)

This pattern is typical of Stainless-generated Python SDKs. (pypi.org)

Practical use with conversations

You normally don’t need to import/use NOT_GIVEN—just don’t pass the optional kwarg(s):

conv = client.conversations.create()

Creating a conversation like this is the standard usage pattern. (git.kvant.cloud)

You’d use NOT_GIVEN mainly when you’re programmatically assembling kwargs and want to ensure a key is not sent:

from llama_stack_client import NOT_GIVEN

kwargs = {"metadata": NOT_GIVEN}   # omit metadata entirely
conv = client.conversations.create(**kwargs)

Whereas:

conv = client.conversations.create(metadata=None)

would explicitly send metadata: null (only useful if the API treats “null” differently from “missing”).

Citations:


Remove explicit None parameters from conversations.items.list() call or use NOT_GIVEN sentinel.

In llama-stack-client 0.4.2, passing None sends an explicit JSON null value, which differs semantically from omitting the parameter. For after, include, limit, and order in the conversations API, omit these parameters entirely:

conversation_items_response = await client.conversations.items.list(
    conversation_id=llama_stack_conv_id,
)

Alternatively, if programmatically assembling parameters, import and use the NOT_GIVEN sentinel:

from llama_stack_client import NOT_GIVEN

conversation_items_response = await client.conversations.items.list(
    conversation_id=llama_stack_conv_id,
    after=NOT_GIVEN,
    include=NOT_GIVEN,
    limit=NOT_GIVEN,
    order=NOT_GIVEN,
)

This ensures the backend treats these as omitted parameters rather than explicitly set to null, preserving the intended default behavior.

🤖 Prompt for AI Agents
In `@src/app/endpoints/conversations_v3.py` around lines 334 - 337, The
conversations.items.list call is passing explicit None for parameters (after,
include, limit, order) which sends JSON nulls; update the call in the code that
invokes client.conversations.items.list (search for conversation_items_response
or llama_stack_conv_id usage) to either remove those keyword args entirely so
they are omitted, or import and pass the NOT_GIVEN sentinel from
llama_stack_client and set after=NOT_GIVEN, include=NOT_GIVEN, limit=NOT_GIVEN,
order=NOT_GIVEN to preserve intended defaults.


from llama_stack_client import AsyncLlamaStackClient
from llama_stack import AsyncLlamaStackAsLibraryClient
from llama_stack.core.library_client import AsyncLlamaStackAsLibraryClient
Copy link
Contributor

@coderabbitai coderabbitai bot Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

llama-stack 0.4.2 AsyncLlamaStackClient import path documentation

💡 Result:

For the Python async client, the import path is:

from llama_stack_client import AsyncLlamaStackClient

This is documented in the llama-stack-client SDK README and PyPI docs (the async section shows importing AsyncLlamaStackClient directly from the llama_stack_client top-level package). [1], [2]

Note: the separate llama-stack package exposes AsyncLlamaStack (not AsyncLlamaStackClient). [3]

Sources: [1] [2] [3]


🏁 Script executed:

# Check the actual file to see the import and its usage
cat -n src/utils/common.py

Repository: lightspeed-core/lightspeed-stack

Length of output: 6744


Line 9 import path does not follow coding guidelines and is not documented in llama-stack 0.4.2.

Line 8 correctly imports from llama_stack_client import AsyncLlamaStackClient, but line 9's import from llama_stack.core.library_client import AsyncLlamaStackAsLibraryClient violates the coding guidelines which mandate using from llama_stack_client import AsyncLlamaStackClient. The web search confirms the correct pattern is from the llama_stack_client package, not llama_stack.core.library_client. Verify that AsyncLlamaStackAsLibraryClient is a valid export from llama-stack 0.4.2 or update line 9 to use the compliant import path.

🤖 Prompt for AI Agents
In `@src/utils/common.py` at line 9, The import on line 9 uses a non-compliant
path; replace or verify the symbol AsyncLlamaStackAsLibraryClient so it comes
from the documented package (llama_stack_client) rather than llama_stack.core;
locate the import statement that references AsyncLlamaStackAsLibraryClient in
src/utils/common.py and either change it to import
AsyncLlamaStackAsLibraryClient from llama_stack_client (or, if that symbol is
not exported in llama-stack 0.4.2, remove/replace it with the supported
AsyncLlamaStackClient from llama_stack_client) and update any code using
AsyncLlamaStackAsLibraryClient accordingly so all references use the supported
client class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AsyncLlamaStackAsLibraryClient is not in the llama_stack_client package

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@are-ces, you're absolutely correct, and I apologize for the confusion in my previous comment. The AsyncLlamaStackAsLibraryClient is indeed imported from llama_stack.core.library_client, not from the llama_stack_client package.

The code correctly uses two different clients for two different purposes:

  1. Line 8: from llama_stack_client import AsyncLlamaStackClient — for connecting to a remote Llama Stack server (service mode)
  2. Line 9: from llama_stack.core.library_client import AsyncLlamaStackAsLibraryClient — for using Llama Stack as an in-process library (library mode)

Both imports are necessary and correct. The code chooses between them at lines 46-53 based on the configuration.llama_stack.use_as_library_client setting. The coding guideline about using llama_stack_client imports applies to the service client (AsyncLlamaStackClient), which is already correctly imported on line 8.

The import on line 9 is valid and should remain as-is.


🧠 Learnings used
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-11T16:30:41.784Z
Learning: Applies to **/*.py : Use Llama Stack imports: `from llama_stack_client import AsyncLlamaStackClient`

Copy link
Contributor

@tisnik tisnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM: deps are ok, e2e seems to be correct too, API remains the same which is good. The refactoring itself - I'd like to have @asimurka 's ACK there too, if possible.

"llama-stack==0.3.5",
"llama-stack-client==0.3.5",
"llama-stack==0.4.2",
"llama-stack-client==0.4.2",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we sure there are no other dependencies?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we resolve this comment?

after=None,
include=None,
limit=None,
order=None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@are-ces seeing this reminds me that we should verify whether the order argument's default value is still correctly ordering the conversation items. Create multiple requests on query endpoint with same conversation_id and then check v1/conversations/{conversation_id} whether the items are ordered such that the oldest turn is at the beginning.

Copy link
Contributor

@asimurka asimurka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, conversation items ordering is correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants