LCORE-1216: Bump up to llama-stack 0.4.2 #1048

are-ces · 2026-01-23T13:44:29Z

Description

updated lls to 0.4.2
removed unused methods from deprecated endpoints
updated /models endpoint parsing of lls models
added huggingface folder creation in container
did some refactoring to satisfy the stricter linting/mypy checks
updated run.yaml

Type of change

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

Assisted-by: Claude

Related Tickets & Documents

Related Issue # LCORE-1216

Checklist before requesting a review

I have performed a self-review of my code.
PR has passed all pre-merge test jobs.
If it is a core feature, I have added thorough tests.

Testing

Check e2e tests

Summary by CodeRabbit

New Features
- Added validation for feedback requests (UUID format, sentiment values, and required feedback fields).
- Added media type validation for query requests.
Chores
- Updated Llama Stack dependency to 0.4.2.
- Removed telemetry configuration.
- Simplified query and streaming endpoints.
- Updated container base images.
- Removed legacy token counting utilities.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

- updated lls to 0.4.2 - removed unused methods from deprecated endpoints - updated /models endpoint parsing of lls models - added huggingface folder creation in container - did some refactoring to satisfy the stricter linting/mypy checks - updated run.yaml

coderabbitai · 2026-01-23T13:44:53Z

Walkthrough

This PR upgrades Llama Stack to 0.4.2, introducing widespread API changes where model attributes shift from identifier/provider_id to id/custom_metadata, and refactors query and streaming endpoints to use new backend integration patterns while removing legacy components.

Changes

Cohort / File(s)	Summary
Dependency & Version Updates `pyproject.toml`, `src/constants.py`	Upgraded llama-stack and llama-stack-client from 0.3.5 to 0.4.2; updated maximal supported version constant.
Query Endpoint Refactoring `src/app/endpoints/query.py`	Major restructuring: removed legacy wrappers (query_endpoint_handler, get_topic_summary, retrieve_response, get_rag_toolgroups, parse_referenced_documents); updated model selection to use model.id and custom_metadata; simplified integration with backend services.
Streaming Query Refactoring `src/app/endpoints/streaming_query.py`	Significant reduction: removed stream_build_event, _handle_inference_event, create_agent_response_generator, streaming_query_endpoint_handler; replaced with streamlined base handler; removed Agent API-specific streaming logic.
A2A Server Architecture `src/app/endpoints/a2a.py`	Introduced new A2AAgentExecutor and TaskResultAggregator classes; replaced external OpenAI/Responses imports with internal a2a.server architecture; added lazy-initialized task and context stores; updated streaming and task state handling.
Model Attribute Schema Changes `src/app/endpoints/models.py`, `src/app/endpoints/query_v2.py`, `src/app/endpoints/streaming_query_v2.py`, `src/metrics/utils.py`, `src/utils/shields.py`	Consistently updated model access patterns: identifier → id; added custom_metadata extraction for model_type and provider_id; updated filtering and provider resolution logic across multiple endpoints.
Health Endpoint & Type Updates `src/app/endpoints/health.py`	Introduced local HealthStatus enum to replace external dependency; defined members (OK, ERROR, NOT_IMPLEMENTED, HEALTHY, UNKNOWN).
Configuration & Default Value Updates `src/app/endpoints/rags.py`, `src/app/endpoints/conversations_v3.py`	RAGs endpoint: added default fallbacks for object, status, usage_bytes; Conversations: changed parameter passing from NOT_GIVEN to None.
Request Validation `src/models/requests.py`	Added validators to QueryRequest (media_type validation); added multiple validators to FeedbackRequest (UUID, sentiment, categories deduplication, feedback presence check); removed get_documents method.
Utility Refactoring `src/utils/token_counter.py`, `src/utils/types.py`, `src/utils/query.py`, `src/utils/common.py`	Removed token extraction/metrics functions; removed TurnSummary.append_tool_calls_from_llama; updated OpenAI response imports; updated AsyncLlamaStackAsLibraryClient import path.
Import Path Updates `src/client.py`, `src/utils/common.py`, `src/utils/query.py`, `src/app/endpoints/query_v2.py`, `src/app/endpoints/rlsapi_v1.py`	Updated imports: AsyncLlamaStackAsLibraryClient (llama_stack → llama_stack.core.library_client); OpenAI response models (llama_stack.apis.agents.openai_responses → llama_stack_api.openai_responses).
CI/CD Workflow `.github/workflows/e2e_tests.yaml`	Added conditional guard to "Docker Login for quay access" step to only run when matrix.mode equals 'server'.
Container Configuration `Containerfile`, `test.containerfile`	Added creation of Hugging Face cache directory (/opt/app-root/src/.cache/huggingface) with proper ownership; bumped test base image from rhoai-3.2 to rhoai-3.3.
Configuration Files `run.yaml`, `tests/e2e/configs/run-ci.yaml`	Removed telemetry API entry and top-level telemetry configuration block; cleared registered_resources.models list; removed gpt-4o-mini model registration.
Endpoint Test Updates `tests/unit/app/endpoints/test_.py`, `tests/integration/endpoints/test_.py`	Widespread test refactoring: updated mock model attributes (identifier → id, custom_metadata usage); simplified query and streaming test scaffolding; added new parameterized tests for model hint evaluation; removed legacy test paths.
Feature Tests & Expectations `tests/e2e/features/faiss.feature`, `tests/e2e/features/info.feature`	Updated expected RAG vector store ID and llama-stack version in response assertions.
Utility Tests `tests/unit/metrics/test_utis.py`, `tests/unit/utils/test_shields.py`, `tests/unit/models/requests/test_query_request.py`	Removed tests for removed utility functions; updated mock model construction to use custom_metadata; simplified test suites.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

LCORE-695: bump-up Llama Stack to version 0.2.22 #611 — Updates Llama Stack dependency pins and maximal version constant in pyproject.toml and src/constants.py alongside test expectations.
LCORE-332: Lightspeed core needs to fully support WatsonX LLM provider #943 — Introduces A2A (agent-to-agent) end-to-end support with new a2a endpoints, executor/storage components, and agent card/config.
LCORE-849: Shields implementation using Moderations API #979 — Modifies shields/moderation flow across src/utils/shields.py, src/utils/types.py, and streaming endpoints with overlapping component changes.

Suggested labels

ok-to-test

Suggested reviewers

tisnik
asimurka

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title directly and accurately reflects the primary objective: upgrading llama-stack from 0.3.5 to 0.4.2, which is the main theme across all changes.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)

tests/unit/app/endpoints/test_streaming_query.py (1)
26-27: Update comment to reflect current llama-stack version.

The comment references "llama-stack-client 0.3.x" but this PR upgrades to 0.4.2. Update the comment to accurately reflect the version context.
Suggested fix
-# Note: content_delta module doesn't exist in llama-stack-client 0.3.x
+# Note: content_delta module doesn't exist in llama-stack-client 0.4.x
 # These are mock classes for backward compatibility with Agent API tests
src/utils/types.py (1)

154-211: Remove the unused _extract_rag_chunks_from_response method.

This private method at lines 154–211 is no longer called anywhere in the codebase. Since append_tool_calls_from_llama was removed, this method became dead code and should be deleted.
src/app/endpoints/streaming_query.py (1)
693-725: SSE error responses must keep text/event-stream Content-Type.

The APIStatusError path currently sets the HTTP media type to query_request.media_type, but the payload is still SSE-formatted. This can break clients expecting SSE frames.
🐛 Suggested fix
     return StreamingResponse(
         stream_http_error(error_response),
         status_code=error_response.status_code,
-        media_type=query_request.media_type or MEDIA_TYPE_JSON,
+        media_type="text/event-stream",
     )
src/app/endpoints/query.py (1)
474-505: Guard against missing provider_id in custom_metadata.

The custom_metadata field in llama-stack's model response is optional and free-form. If provider_id is absent, the function returns an empty string that propagates to conversation history, database storage, and validation logic. This can cause silent data corruption and incorrect model matching. Consider falling back to parsing provider_id from the model.id prefix (format: provider/model) when it's missing.
Suggested fix
             provider_id = (
                 str(model.custom_metadata.get("provider_id", ""))
                 if model.custom_metadata
                 else ""
             )
+            if not provider_id and "/" in model.id:
+                provider_id = model.id.split("/", 1)[0]
tests/unit/app/endpoints/test_query.py (1)
513-517: Incorrect type annotations for parameterized test function.

The type hints list are inaccurate for these parameters. Based on the parametrize values, these should be:

user_conversation: UserConversation | None

request_values: tuple[str | None, str | None]

expected_values: tuple[str | None, str | None]
🔧 Proposed fix
 def test_evaluate_model_hints(
-    user_conversation: list,
-    request_values: list,
-    expected_values: list,
+    user_conversation: UserConversation | None,
+    request_values: tuple[str | None, str | None],
+    expected_values: tuple[str | None, str | None],
 ) -> None:

🤖 Fix all issues with AI agents

In `@src/app/endpoints/conversations_v3.py`:
- Around line 334-337: The conversations.items.list call is passing explicit
None for parameters (after, include, limit, order) which sends JSON nulls;
update the call in the code that invokes client.conversations.items.list (search
for conversation_items_response or llama_stack_conv_id usage) to either remove
those keyword args entirely so they are omitted, or import and pass the
NOT_GIVEN sentinel from llama_stack_client and set after=NOT_GIVEN,
include=NOT_GIVEN, limit=NOT_GIVEN, order=NOT_GIVEN to preserve intended
defaults.

In `@src/utils/common.py`:
- Line 9: The import on line 9 uses a non-compliant path; replace or verify the
symbol AsyncLlamaStackAsLibraryClient so it comes from the documented package
(llama_stack_client) rather than llama_stack.core; locate the import statement
that references AsyncLlamaStackAsLibraryClient in src/utils/common.py and either
change it to import AsyncLlamaStackAsLibraryClient from llama_stack_client (or,
if that symbol is not exported in llama-stack 0.4.2, remove/replace it with the
supported AsyncLlamaStackClient from llama_stack_client) and update any code
using AsyncLlamaStackAsLibraryClient accordingly so all references use the
supported client class.

🧹 Nitpick comments (8)

test.containerfile (1)
11-13: Ownership differs from production Containerfile.

The Hugging Face cache directory setup is appropriate for embedding model caching. However, note that test.containerfile uses 1001:0 (OpenShift-style) while the production Containerfile uses 1001:1001 for the same directory. This inconsistency could cause permission issues if tests don't reflect production behavior accurately.

Consider aligning ownership with the production Containerfile for consistency:
Suggested alignment
-    chown -R 1001:0 /opt/app-root/src/.llama && \
+    chown -R 1001:1001 /opt/app-root/src/.llama && \
     mkdir -p /opt/app-root/src/.cache/huggingface && \
-    chown -R 1001:0 /opt/app-root/src/.cache
+    chown -R 1001:1001 /opt/app-root/src/.cache
tests/unit/app/endpoints/test_streaming_query.py (1)
46-218: Consider consolidating duplicate mock class definitions.

All these mock classes (ToolCallDelta, TurnResponseEvent, AgentTurnResponseStreamChunk, etc.) share identical __init__ implementations. You could reduce duplication with a factory function or base class:
Optional consolidation approach
def _create_mock_class(name: str, doc: str) -> type:
    """Factory to create mock classes for Agent API types."""
    def __init__(self, **kwargs: Any):
        for key, value in kwargs.items():
            setattr(self, key, value)
    __init__.__doc__ = f"Initialize {name} with provided kwargs."
    return type(name, (), {"__init__": __init__, "__doc__": doc})

ToolCallDelta = _create_mock_class("ToolCallDelta", "Mock ToolCallDelta for Agent API tests.")
TurnResponseEvent = _create_mock_class("TurnResponseEvent", "Mock TurnResponseEvent for Agent API tests.")
# ... etc
However, the explicit class definitions do provide clarity on which types are being mocked, which can be valuable in test code.
src/models/requests.py (1)
4-6: Use typing_extensions.Self to avoid Python <3.11 import errors.

typing.Self will break on older runtimes and conflicts with the repo’s guidance for model validators. Please switch to typing_extensions.Self unless the project is strictly 3.11+.
🔧 Proposed change
-from typing import Optional, Self
+from typing import Optional
+from typing_extensions import Self
As per coding guidelines, prefer typing_extensions.Self for model validators.
tests/unit/models/requests/test_query_request.py (1)
78-105: Add a negative test for unsupported media_type.

This validator is new behavior; a failure-path test will lock it in and improve coverage.
🧪 Suggested test
+    def test_constructor_invalid_media_type(self) -> None:
+        """Test that unsupported media_type values are rejected."""
+        with pytest.raises(ValueError, match="media_type must be either"):
+            _ = QueryRequest(query="Tell me about Kubernetes", media_type="application/xml")
As per coding guidelines, new validation paths should be covered by unit tests.
src/app/endpoints/a2a.py (2)
67-94: Guard lazy store initialization to avoid concurrent double-creation.

Concurrent requests can race and create multiple stores. Consider a lightweight async lock around initialization.
🔧 Example guard
-_TASK_STORE: Optional[TaskStore] = None
-_CONTEXT_STORE: Optional[A2AContextStore] = None
+_TASK_STORE: Optional[TaskStore] = None
+_CONTEXT_STORE: Optional[A2AContextStore] = None
+_TASK_STORE_LOCK: Optional[asyncio.Lock] = None
+_CONTEXT_STORE_LOCK: Optional[asyncio.Lock] = None

 async def _get_task_store() -> TaskStore:
     """Get the A2A task store, creating it if necessary."""
     global _TASK_STORE  # pylint: disable=global-statement
-    if _TASK_STORE is None:
-        _TASK_STORE = await A2AStorageFactory.create_task_store(configuration.a2a_state)
+    global _TASK_STORE_LOCK  # pylint: disable=global-statement
+    if _TASK_STORE_LOCK is None:
+        _TASK_STORE_LOCK = asyncio.Lock()
+    async with _TASK_STORE_LOCK:
+        if _TASK_STORE is None:
+            _TASK_STORE = await A2AStorageFactory.create_task_store(configuration.a2a_state)
     return _TASK_STORE

 async def _get_context_store() -> A2AContextStore:
     """Get the A2A context store, creating it if necessary."""
     global _CONTEXT_STORE  # pylint: disable=global-statement
-    if _CONTEXT_STORE is None:
-        _CONTEXT_STORE = await A2AStorageFactory.create_context_store(
-            configuration.a2a_state
-        )
+    global _CONTEXT_STORE_LOCK  # pylint: disable=global-statement
+    if _CONTEXT_STORE_LOCK is None:
+        _CONTEXT_STORE_LOCK = asyncio.Lock()
+    async with _CONTEXT_STORE_LOCK:
+        if _CONTEXT_STORE is None:
+            _CONTEXT_STORE = await A2AStorageFactory.create_context_store(
+                configuration.a2a_state
+            )
     return _CONTEXT_STORE
803-808: Consider making the 300s stream idle timeout configurable.

Long-running tool calls with sparse output could exceed this and terminate streams prematurely.
src/metrics/utils.py (1)
39-45: Consider simplifying the redundant custom_metadata check.

Since models are already filtered at line 31 to only include those where model.custom_metadata is truthy, the check at line 42 is redundant. However, this defensive pattern is harmless and protects against future refactoring.
♻️ Optional simplification
     for model in models:
-        provider = (
-            str(model.custom_metadata.get("provider_id", ""))
-            if model.custom_metadata
-            else ""
-        )
+        provider = str(model.custom_metadata.get("provider_id", ""))
         model_name = model.id
tests/unit/app/endpoints/test_query.py (1)
450-512: Consider adding a test case for conflicting request and conversation values.

The parameterized test covers key scenarios but misses the case where the request specifies a different model/provider than the user conversation. This edge case would verify the expected precedence behavior.
📝 Suggested additional test case
# Request values differ from user conversation (request should take precedence)
(
    UserConversation(
        id="conv1",
        user_id="user1",
        last_used_provider="old_provider",
        last_used_model="old_model",
        message_count=1,
    ),
    ("new_provider", "new_model"),
    # Expect request values to override conversation
    ("new_provider", "new_model"),
),
Add to the ids list: "Request overrides user conversation"

coderabbitai · 2026-01-23T14:00:55Z

src/app/endpoints/conversations_v3.py

+            after=None,
+            include=None,
+            limit=None,
+            order=None,


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, let's see the file and the specific lines in question cat -n src/app/endpoints/conversations_v3.py | sed -n '320,350p'

Repository: lightspeed-core/lightspeed-stack

Length of output: 1319

🏁 Script executed:

# Check if NOT_GIVEN is used elsewhere in this file rg -n "NOT_GIVEN" src/app/endpoints/conversations_v3.py

Repository: lightspeed-core/lightspeed-stack

Length of output: 58

🏁 Script executed:

# Search for NOT_GIVEN usage patterns across the codebase rg -n "NOT_GIVEN" --type py | head -20

Repository: lightspeed-core/lightspeed-stack

Length of output: 58

🏁 Script executed:

# Look for llama-stack-client imports and usage patterns rg -n "from llama_stack_client" --type py | head -10

Repository: lightspeed-core/lightspeed-stack

Length of output: 1132

🏁 Script executed:

# Check if there's a requirements or dependencies file fd -e txt -e toml -e cfg -e lock | grep -E "(requirements|pyproject|setup|lock)"

Repository: lightspeed-core/lightspeed-stack

Length of output: 270

🏁 Script executed:

# Check pyproject.toml for llama-stack-client version cat pyproject.toml | grep -A 5 -B 5 "llama"

Repository: lightspeed-core/lightspeed-stack

Length of output: 1265

🏁 Script executed:

# Broader search for NOT_GIVEN across all files rg -n "NOT_GIVEN" --type py

Repository: lightspeed-core/lightspeed-stack

Length of output: 58

🏁 Script executed:

# Check if there are any test files for conversations endpoint to see expected patterns find . -name "*conversations*" -type f | head -20

Repository: lightspeed-core/lightspeed-stack

Length of output: 543

🌐 Web query:

llama-stack-client 0.4.2 conversations items list API parameters None handling

💡 Result:

For llama-stack-client==0.4.2 (uploaded Jan 16, 2026) [1], the Conversations → Items → List call is a paginated “list” response (it includes first_id, last_id, has_more), and those pagination IDs can legitimately be None when absent/empty. [3]

“None handling” (what to do in practice)

For request parameters (query/body): if you want a parameter to be omitted, don’t pass it at all (rather than passing None). Passing None typically means “send an explicit null / explicit value” and can change server-side validation/behavior.

For response fields: None can mean either the JSON field was missing or it was explicitly null. The client documents how to distinguish those cases using the Pydantic model’s .model_fields_set. [2]

If you paste the exact Python call you’re making (including the kwargs you’re passing as None) and the error/traceback, I can map it to the specific parameter(s) that should be omitted vs explicitly set.

🏁 Script executed:

# Check test files to see how conversations.items.list is properly called cat -n tests/unit/app/endpoints/test_conversations_v2.py | head -100

Repository: lightspeed-core/lightspeed-stack

Length of output: 4100

🏁 Script executed:

# Check test_conversations_v3 if it exists to see pattern ls -la tests/unit/app/endpoints/test_conversations* 2>/dev/null

Repository: lightspeed-core/lightspeed-stack

Length of output: 272

🏁 Script executed:

# Search for how other endpoints call list methods with optional parameters rg -n "\.list\(" src/app/endpoints/ --type py -A 3 | head -40

Repository: lightspeed-core/lightspeed-stack

Length of output: 2711

🏁 Script executed:

# Check if there's any usage of omitting parameters instead of passing None rg -n "conversations\.items\.list" --type py -A 5

Repository: lightspeed-core/lightspeed-stack

Length of output: 551

🌐 Web query:

llama-stack-client 0.4.2 NOT_GIVEN sentinel optional parameters conversations

💡 Result:

In llama-stack-client==0.4.2 (released Jan 16, 2026) (pypi.org), many SDK method signatures use a Stainless-style sentinel:

param: T | NotGiven = NOT_GIVEN

Meaning:

leave it as NOT_GIVEN (the default) ⇒ the field is omitted from the request body/query entirely

pass None (when the type allows Optional[...]) ⇒ the field is sent as JSON null (i.e., explicitly set/cleared)

This pattern is typical of Stainless-generated Python SDKs. (pypi.org)

Practical use with conversations

You normally don’t need to import/use NOT_GIVEN—just don’t pass the optional kwarg(s):

conv = client.conversations.create()

Creating a conversation like this is the standard usage pattern. (git.kvant.cloud)

You’d use NOT_GIVEN mainly when you’re programmatically assembling kwargs and want to ensure a key is not sent:

from llama_stack_client import NOT_GIVEN kwargs = {"metadata": NOT_GIVEN} # omit metadata entirely conv = client.conversations.create(**kwargs)

Whereas:

conv = client.conversations.create(metadata=None)

would explicitly send metadata: null (only useful if the API treats “null” differently from “missing”).

Citations:

1: https://pypi.org/project/llama-stack-client

2: https://pypi.org/project/llama-stack-client

3: https://git.kvant.cloud/phoenix-oss/llama-stack-mirror/commits/commit/dac1d7be1c5a34239dca8ff9e91a48c76568950b/docs/static/llama-stack-spec.yaml?utm_source=openai

Remove explicit None parameters from conversations.items.list() call or use NOT_GIVEN sentinel.

In llama-stack-client 0.4.2, passing None sends an explicit JSON null value, which differs semantically from omitting the parameter. For after, include, limit, and order in the conversations API, omit these parameters entirely:

conversation_items_response = await client.conversations.items.list( conversation_id=llama_stack_conv_id, )

Alternatively, if programmatically assembling parameters, import and use the NOT_GIVEN sentinel:

from llama_stack_client import NOT_GIVEN conversation_items_response = await client.conversations.items.list( conversation_id=llama_stack_conv_id, after=NOT_GIVEN, include=NOT_GIVEN, limit=NOT_GIVEN, order=NOT_GIVEN, )

This ensures the backend treats these as omitted parameters rather than explicitly set to null, preserving the intended default behavior.

🤖 Prompt for AI Agents

In `@src/app/endpoints/conversations_v3.py` around lines 334 - 337, The conversations.items.list call is passing explicit None for parameters (after, include, limit, order) which sends JSON nulls; update the call in the code that invokes client.conversations.items.list (search for conversation_items_response or llama_stack_conv_id usage) to either remove those keyword args entirely so they are omitted, or import and pass the NOT_GIVEN sentinel from llama_stack_client and set after=NOT_GIVEN, include=NOT_GIVEN, limit=NOT_GIVEN, order=NOT_GIVEN to preserve intended defaults.

coderabbitai · 2026-01-23T14:00:55Z

src/utils/common.py


 from llama_stack_client import AsyncLlamaStackClient
-from llama_stack import AsyncLlamaStackAsLibraryClient
+from llama_stack.core.library_client import AsyncLlamaStackAsLibraryClient


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

llama-stack 0.4.2 AsyncLlamaStackClient import path documentation

💡 Result:

For the Python async client, the import path is:

from llama_stack_client import AsyncLlamaStackClient

This is documented in the llama-stack-client SDK README and PyPI docs (the async section shows importing AsyncLlamaStackClient directly from the llama_stack_client top-level package). [1], [2]

Note: the separate llama-stack package exposes AsyncLlamaStack (not AsyncLlamaStackClient). [3]

Sources: [1] [2] [3]

🏁 Script executed:

# Check the actual file to see the import and its usage cat -n src/utils/common.py

Repository: lightspeed-core/lightspeed-stack

Length of output: 6744

Line 9 import path does not follow coding guidelines and is not documented in llama-stack 0.4.2.

Line 8 correctly imports from llama_stack_client import AsyncLlamaStackClient, but line 9's import from llama_stack.core.library_client import AsyncLlamaStackAsLibraryClient violates the coding guidelines which mandate using from llama_stack_client import AsyncLlamaStackClient. The web search confirms the correct pattern is from the llama_stack_client package, not llama_stack.core.library_client. Verify that AsyncLlamaStackAsLibraryClient is a valid export from llama-stack 0.4.2 or update line 9 to use the compliant import path.

🤖 Prompt for AI Agents

In `@src/utils/common.py` at line 9, The import on line 9 uses a non-compliant path; replace or verify the symbol AsyncLlamaStackAsLibraryClient so it comes from the documented package (llama_stack_client) rather than llama_stack.core; locate the import statement that references AsyncLlamaStackAsLibraryClient in src/utils/common.py and either change it to import AsyncLlamaStackAsLibraryClient from llama_stack_client (or, if that symbol is not exported in llama-stack 0.4.2, remove/replace it with the supported AsyncLlamaStackClient from llama_stack_client) and update any code using AsyncLlamaStackAsLibraryClient accordingly so all references use the supported client class.

AsyncLlamaStackAsLibraryClient is not in the llama_stack_client package

@are-ces, you're absolutely correct, and I apologize for the confusion in my previous comment. The AsyncLlamaStackAsLibraryClient is indeed imported from llama_stack.core.library_client, not from the llama_stack_client package.

The code correctly uses two different clients for two different purposes:

Line 8: from llama_stack_client import AsyncLlamaStackClient — for connecting to a remote Llama Stack server (service mode)

Line 9: from llama_stack.core.library_client import AsyncLlamaStackAsLibraryClient — for using Llama Stack as an in-process library (library mode)

Both imports are necessary and correct. The code chooses between them at lines 46-53 based on the configuration.llama_stack.use_as_library_client setting. The coding guideline about using llama_stack_client imports applies to the service client (AsyncLlamaStackClient), which is already correctly imported on line 8.

The import on line 9 is valid and should remain as-is.

🧠 Learnings used

Learnt from: CR Repo: lightspeed-core/lightspeed-stack PR: 0 File: CLAUDE.md:0-0 Timestamp: 2026-01-11T16:30:41.784Z Learning: Applies to **/*.py : Use Llama Stack imports: `from llama_stack_client import AsyncLlamaStackClient`

tisnik

LGTM: deps are ok, e2e seems to be correct too, API remains the same which is good. The refactoring itself - I'd like to have @asimurka 's ACK there too, if possible.

tisnik · 2026-01-23T14:28:11Z

pyproject.toml

-    "llama-stack==0.3.5",
-    "llama-stack-client==0.3.5",
+    "llama-stack==0.4.2",
+    "llama-stack-client==0.4.2",


are we sure there are no other dependencies?

can we resolve this comment?

asimurka · 2026-01-23T15:11:57Z

src/app/endpoints/conversations_v3.py

+            after=None,
+            include=None,
+            limit=None,
+            order=None,


@are-ces seeing this reminds me that we should verify whether the order argument's default value is still correctly ordering the conversation items. Create multiple requests on query endpoint with same conversation_id and then check v1/conversations/{conversation_id} whether the items are ordered such that the oldest turn is at the beginning.

asimurka

LGTM, conversation items ordering is correct.

Bump up to llama-stack 0.4.2

5149638

- updated lls to 0.4.2 - removed unused methods from deprecated endpoints - updated /models endpoint parsing of lls models - added huggingface folder creation in container - did some refactoring to satisfy the stricter linting/mypy checks - updated run.yaml

are-ces requested review from asimurka and tisnik January 23, 2026 13:44

coderabbitai bot reviewed Jan 23, 2026

View reviewed changes

tisnik approved these changes Jan 23, 2026

View reviewed changes

asimurka reviewed Jan 23, 2026

View reviewed changes

asimurka approved these changes Jan 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LCORE-1216: Bump up to llama-stack 0.4.2 #1048

LCORE-1216: Bump up to llama-stack 0.4.2 #1048

Uh oh!

are-ces commented Jan 23, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 23, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Jan 23, 2026

Uh oh!

coderabbitai bot Jan 23, 2026 •

edited

Loading

Uh oh!

are-ces Jan 23, 2026

Uh oh!

coderabbitai bot Jan 23, 2026

Uh oh!

tisnik left a comment

Uh oh!

tisnik Jan 23, 2026

Uh oh!

are-ces Jan 23, 2026

Uh oh!

asimurka Jan 23, 2026

Uh oh!

asimurka left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

LCORE-1216: Bump up to llama-stack 0.4.2 #1048

Are you sure you want to change the base?

LCORE-1216: Bump up to llama-stack 0.4.2 #1048

Uh oh!

Conversation

are-ces commented Jan 23, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Tools used to create PR

Related Tickets & Documents

Checklist before requesting a review

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 23, 2026

Choose a reason for hiding this comment

“None handling” (what to do in practice)

Practical use with conversations

Uh oh!

coderabbitai bot Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

are-ces Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

tisnik left a comment

Choose a reason for hiding this comment

Uh oh!

tisnik Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

are-ces Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

asimurka Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

asimurka left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

are-ces commented Jan 23, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 23, 2026 •

edited

Loading

“`None` handling” (what to do in practice)

Practical use with `conversations`

coderabbitai bot Jan 23, 2026 •

edited

Loading