-
Notifications
You must be signed in to change notification settings - Fork 67
LCORE-1216: Bump up to llama-stack 0.4.2 #1048
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
LCORE-1216: Bump up to llama-stack 0.4.2 #1048
Conversation
- updated lls to 0.4.2 - removed unused methods from deprecated endpoints - updated /models endpoint parsing of lls models - added huggingface folder creation in container - did some refactoring to satisfy the stricter linting/mypy checks - updated run.yaml
WalkthroughThis PR upgrades Llama Stack to 0.4.2, introducing widespread API changes where model attributes shift from Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (5)
tests/unit/app/endpoints/test_streaming_query.py (1)
26-27: Update comment to reflect current llama-stack version.The comment references "llama-stack-client 0.3.x" but this PR upgrades to 0.4.2. Update the comment to accurately reflect the version context.
Suggested fix
-# Note: content_delta module doesn't exist in llama-stack-client 0.3.x +# Note: content_delta module doesn't exist in llama-stack-client 0.4.x # These are mock classes for backward compatibility with Agent API testssrc/utils/types.py (1)
154-211: Remove the unused_extract_rag_chunks_from_responsemethod.This private method at lines 154–211 is no longer called anywhere in the codebase. Since
append_tool_calls_from_llamawas removed, this method became dead code and should be deleted.src/app/endpoints/streaming_query.py (1)
693-725: SSE error responses must keeptext/event-streamContent-Type.The APIStatusError path currently sets the HTTP media type to
query_request.media_type, but the payload is still SSE-formatted. This can break clients expecting SSE frames.🐛 Suggested fix
return StreamingResponse( stream_http_error(error_response), status_code=error_response.status_code, - media_type=query_request.media_type or MEDIA_TYPE_JSON, + media_type="text/event-stream", )src/app/endpoints/query.py (1)
474-505: Guard against missingprovider_idincustom_metadata.The
custom_metadatafield in llama-stack's model response is optional and free-form. Ifprovider_idis absent, the function returns an empty string that propagates to conversation history, database storage, and validation logic. This can cause silent data corruption and incorrect model matching. Consider falling back to parsingprovider_idfrom themodel.idprefix (format:provider/model) when it's missing.Suggested fix
provider_id = ( str(model.custom_metadata.get("provider_id", "")) if model.custom_metadata else "" ) + if not provider_id and "/" in model.id: + provider_id = model.id.split("/", 1)[0]tests/unit/app/endpoints/test_query.py (1)
513-517: Incorrect type annotations for parameterized test function.The type hints
listare inaccurate for these parameters. Based on the parametrize values, these should be:
user_conversation:UserConversation | Nonerequest_values:tuple[str | None, str | None]expected_values:tuple[str | None, str | None]🔧 Proposed fix
def test_evaluate_model_hints( - user_conversation: list, - request_values: list, - expected_values: list, + user_conversation: UserConversation | None, + request_values: tuple[str | None, str | None], + expected_values: tuple[str | None, str | None], ) -> None:
🤖 Fix all issues with AI agents
In `@src/app/endpoints/conversations_v3.py`:
- Around line 334-337: The conversations.items.list call is passing explicit
None for parameters (after, include, limit, order) which sends JSON nulls;
update the call in the code that invokes client.conversations.items.list (search
for conversation_items_response or llama_stack_conv_id usage) to either remove
those keyword args entirely so they are omitted, or import and pass the
NOT_GIVEN sentinel from llama_stack_client and set after=NOT_GIVEN,
include=NOT_GIVEN, limit=NOT_GIVEN, order=NOT_GIVEN to preserve intended
defaults.
In `@src/utils/common.py`:
- Line 9: The import on line 9 uses a non-compliant path; replace or verify the
symbol AsyncLlamaStackAsLibraryClient so it comes from the documented package
(llama_stack_client) rather than llama_stack.core; locate the import statement
that references AsyncLlamaStackAsLibraryClient in src/utils/common.py and either
change it to import AsyncLlamaStackAsLibraryClient from llama_stack_client (or,
if that symbol is not exported in llama-stack 0.4.2, remove/replace it with the
supported AsyncLlamaStackClient from llama_stack_client) and update any code
using AsyncLlamaStackAsLibraryClient accordingly so all references use the
supported client class.
🧹 Nitpick comments (8)
test.containerfile (1)
11-13: Ownership differs from production Containerfile.The Hugging Face cache directory setup is appropriate for embedding model caching. However, note that
test.containerfileuses1001:0(OpenShift-style) while the productionContainerfileuses1001:1001for the same directory. This inconsistency could cause permission issues if tests don't reflect production behavior accurately.Consider aligning ownership with the production
Containerfilefor consistency:Suggested alignment
- chown -R 1001:0 /opt/app-root/src/.llama && \ + chown -R 1001:1001 /opt/app-root/src/.llama && \ mkdir -p /opt/app-root/src/.cache/huggingface && \ - chown -R 1001:0 /opt/app-root/src/.cache + chown -R 1001:1001 /opt/app-root/src/.cachetests/unit/app/endpoints/test_streaming_query.py (1)
46-218: Consider consolidating duplicate mock class definitions.All these mock classes (ToolCallDelta, TurnResponseEvent, AgentTurnResponseStreamChunk, etc.) share identical
__init__implementations. You could reduce duplication with a factory function or base class:Optional consolidation approach
def _create_mock_class(name: str, doc: str) -> type: """Factory to create mock classes for Agent API types.""" def __init__(self, **kwargs: Any): for key, value in kwargs.items(): setattr(self, key, value) __init__.__doc__ = f"Initialize {name} with provided kwargs." return type(name, (), {"__init__": __init__, "__doc__": doc}) ToolCallDelta = _create_mock_class("ToolCallDelta", "Mock ToolCallDelta for Agent API tests.") TurnResponseEvent = _create_mock_class("TurnResponseEvent", "Mock TurnResponseEvent for Agent API tests.") # ... etcHowever, the explicit class definitions do provide clarity on which types are being mocked, which can be valuable in test code.
src/models/requests.py (1)
4-6: Usetyping_extensions.Selfto avoid Python <3.11 import errors.
typing.Selfwill break on older runtimes and conflicts with the repo’s guidance for model validators. Please switch totyping_extensions.Selfunless the project is strictly 3.11+.🔧 Proposed change
-from typing import Optional, Self +from typing import Optional +from typing_extensions import SelfAs per coding guidelines, prefer
typing_extensions.Selffor model validators.tests/unit/models/requests/test_query_request.py (1)
78-105: Add a negative test for unsupportedmedia_type.This validator is new behavior; a failure-path test will lock it in and improve coverage.
🧪 Suggested test
+ def test_constructor_invalid_media_type(self) -> None: + """Test that unsupported media_type values are rejected.""" + with pytest.raises(ValueError, match="media_type must be either"): + _ = QueryRequest(query="Tell me about Kubernetes", media_type="application/xml")As per coding guidelines, new validation paths should be covered by unit tests.
src/app/endpoints/a2a.py (2)
67-94: Guard lazy store initialization to avoid concurrent double-creation.Concurrent requests can race and create multiple stores. Consider a lightweight async lock around initialization.
🔧 Example guard
-_TASK_STORE: Optional[TaskStore] = None -_CONTEXT_STORE: Optional[A2AContextStore] = None +_TASK_STORE: Optional[TaskStore] = None +_CONTEXT_STORE: Optional[A2AContextStore] = None +_TASK_STORE_LOCK: Optional[asyncio.Lock] = None +_CONTEXT_STORE_LOCK: Optional[asyncio.Lock] = None async def _get_task_store() -> TaskStore: """Get the A2A task store, creating it if necessary.""" global _TASK_STORE # pylint: disable=global-statement - if _TASK_STORE is None: - _TASK_STORE = await A2AStorageFactory.create_task_store(configuration.a2a_state) + global _TASK_STORE_LOCK # pylint: disable=global-statement + if _TASK_STORE_LOCK is None: + _TASK_STORE_LOCK = asyncio.Lock() + async with _TASK_STORE_LOCK: + if _TASK_STORE is None: + _TASK_STORE = await A2AStorageFactory.create_task_store(configuration.a2a_state) return _TASK_STORE async def _get_context_store() -> A2AContextStore: """Get the A2A context store, creating it if necessary.""" global _CONTEXT_STORE # pylint: disable=global-statement - if _CONTEXT_STORE is None: - _CONTEXT_STORE = await A2AStorageFactory.create_context_store( - configuration.a2a_state - ) + global _CONTEXT_STORE_LOCK # pylint: disable=global-statement + if _CONTEXT_STORE_LOCK is None: + _CONTEXT_STORE_LOCK = asyncio.Lock() + async with _CONTEXT_STORE_LOCK: + if _CONTEXT_STORE is None: + _CONTEXT_STORE = await A2AStorageFactory.create_context_store( + configuration.a2a_state + ) return _CONTEXT_STORE
803-808: Consider making the 300s stream idle timeout configurable.Long-running tool calls with sparse output could exceed this and terminate streams prematurely.
src/metrics/utils.py (1)
39-45: Consider simplifying the redundantcustom_metadatacheck.Since models are already filtered at line 31 to only include those where
model.custom_metadatais truthy, the check at line 42 is redundant. However, this defensive pattern is harmless and protects against future refactoring.♻️ Optional simplification
for model in models: - provider = ( - str(model.custom_metadata.get("provider_id", "")) - if model.custom_metadata - else "" - ) + provider = str(model.custom_metadata.get("provider_id", "")) model_name = model.idtests/unit/app/endpoints/test_query.py (1)
450-512: Consider adding a test case for conflicting request and conversation values.The parameterized test covers key scenarios but misses the case where the request specifies a different model/provider than the user conversation. This edge case would verify the expected precedence behavior.
📝 Suggested additional test case
# Request values differ from user conversation (request should take precedence) ( UserConversation( id="conv1", user_id="user1", last_used_provider="old_provider", last_used_model="old_model", message_count=1, ), ("new_provider", "new_model"), # Expect request values to override conversation ("new_provider", "new_model"), ),Add to the
idslist:"Request overrides user conversation"
| after=None, | ||
| include=None, | ||
| limit=None, | ||
| order=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, let's see the file and the specific lines in question
cat -n src/app/endpoints/conversations_v3.py | sed -n '320,350p'Repository: lightspeed-core/lightspeed-stack
Length of output: 1319
🏁 Script executed:
# Check if NOT_GIVEN is used elsewhere in this file
rg -n "NOT_GIVEN" src/app/endpoints/conversations_v3.pyRepository: lightspeed-core/lightspeed-stack
Length of output: 58
🏁 Script executed:
# Search for NOT_GIVEN usage patterns across the codebase
rg -n "NOT_GIVEN" --type py | head -20Repository: lightspeed-core/lightspeed-stack
Length of output: 58
🏁 Script executed:
# Look for llama-stack-client imports and usage patterns
rg -n "from llama_stack_client" --type py | head -10Repository: lightspeed-core/lightspeed-stack
Length of output: 1132
🏁 Script executed:
# Check if there's a requirements or dependencies file
fd -e txt -e toml -e cfg -e lock | grep -E "(requirements|pyproject|setup|lock)"Repository: lightspeed-core/lightspeed-stack
Length of output: 270
🏁 Script executed:
# Check pyproject.toml for llama-stack-client version
cat pyproject.toml | grep -A 5 -B 5 "llama"Repository: lightspeed-core/lightspeed-stack
Length of output: 1265
🏁 Script executed:
# Broader search for NOT_GIVEN across all files
rg -n "NOT_GIVEN" --type pyRepository: lightspeed-core/lightspeed-stack
Length of output: 58
🏁 Script executed:
# Check if there are any test files for conversations endpoint to see expected patterns
find . -name "*conversations*" -type f | head -20Repository: lightspeed-core/lightspeed-stack
Length of output: 543
🌐 Web query:
llama-stack-client 0.4.2 conversations items list API parameters None handling
💡 Result:
For llama-stack-client==0.4.2 (uploaded Jan 16, 2026) [1], the Conversations → Items → List call is a paginated “list” response (it includes first_id, last_id, has_more), and those pagination IDs can legitimately be None when absent/empty. [3]
“None handling” (what to do in practice)
- For request parameters (query/body): if you want a parameter to be omitted, don’t pass it at all (rather than passing
None). PassingNonetypically means “send an explicit null / explicit value” and can change server-side validation/behavior. - For response fields:
Nonecan mean either the JSON field was missing or it was explicitlynull. The client documents how to distinguish those cases using the Pydantic model’s.model_fields_set. [2]
If you paste the exact Python call you’re making (including the kwargs you’re passing as None) and the error/traceback, I can map it to the specific parameter(s) that should be omitted vs explicitly set.
🏁 Script executed:
# Check test files to see how conversations.items.list is properly called
cat -n tests/unit/app/endpoints/test_conversations_v2.py | head -100Repository: lightspeed-core/lightspeed-stack
Length of output: 4100
🏁 Script executed:
# Check test_conversations_v3 if it exists to see pattern
ls -la tests/unit/app/endpoints/test_conversations* 2>/dev/nullRepository: lightspeed-core/lightspeed-stack
Length of output: 272
🏁 Script executed:
# Search for how other endpoints call list methods with optional parameters
rg -n "\.list\(" src/app/endpoints/ --type py -A 3 | head -40Repository: lightspeed-core/lightspeed-stack
Length of output: 2711
🏁 Script executed:
# Check if there's any usage of omitting parameters instead of passing None
rg -n "conversations\.items\.list" --type py -A 5Repository: lightspeed-core/lightspeed-stack
Length of output: 551
🌐 Web query:
llama-stack-client 0.4.2 NOT_GIVEN sentinel optional parameters conversations
💡 Result:
In llama-stack-client==0.4.2 (released Jan 16, 2026) (pypi.org), many SDK method signatures use a Stainless-style sentinel:
param: T | NotGiven = NOT_GIVEN- Meaning:
- leave it as
NOT_GIVEN(the default) ⇒ the field is omitted from the request body/query entirely - pass
None(when the type allowsOptional[...]) ⇒ the field is sent as JSONnull(i.e., explicitly set/cleared)
- leave it as
This pattern is typical of Stainless-generated Python SDKs. (pypi.org)
Practical use with conversations
You normally don’t need to import/use NOT_GIVEN—just don’t pass the optional kwarg(s):
conv = client.conversations.create()Creating a conversation like this is the standard usage pattern. (git.kvant.cloud)
You’d use NOT_GIVEN mainly when you’re programmatically assembling kwargs and want to ensure a key is not sent:
from llama_stack_client import NOT_GIVEN
kwargs = {"metadata": NOT_GIVEN} # omit metadata entirely
conv = client.conversations.create(**kwargs)Whereas:
conv = client.conversations.create(metadata=None)would explicitly send metadata: null (only useful if the API treats “null” differently from “missing”).
Citations:
- 1: https://pypi.org/project/llama-stack-client
- 2: https://pypi.org/project/llama-stack-client
- 3: https://git.kvant.cloud/phoenix-oss/llama-stack-mirror/commits/commit/dac1d7be1c5a34239dca8ff9e91a48c76568950b/docs/static/llama-stack-spec.yaml?utm_source=openai
Remove explicit None parameters from conversations.items.list() call or use NOT_GIVEN sentinel.
In llama-stack-client 0.4.2, passing None sends an explicit JSON null value, which differs semantically from omitting the parameter. For after, include, limit, and order in the conversations API, omit these parameters entirely:
conversation_items_response = await client.conversations.items.list(
conversation_id=llama_stack_conv_id,
)Alternatively, if programmatically assembling parameters, import and use the NOT_GIVEN sentinel:
from llama_stack_client import NOT_GIVEN
conversation_items_response = await client.conversations.items.list(
conversation_id=llama_stack_conv_id,
after=NOT_GIVEN,
include=NOT_GIVEN,
limit=NOT_GIVEN,
order=NOT_GIVEN,
)This ensures the backend treats these as omitted parameters rather than explicitly set to null, preserving the intended default behavior.
🤖 Prompt for AI Agents
In `@src/app/endpoints/conversations_v3.py` around lines 334 - 337, The
conversations.items.list call is passing explicit None for parameters (after,
include, limit, order) which sends JSON nulls; update the call in the code that
invokes client.conversations.items.list (search for conversation_items_response
or llama_stack_conv_id usage) to either remove those keyword args entirely so
they are omitted, or import and pass the NOT_GIVEN sentinel from
llama_stack_client and set after=NOT_GIVEN, include=NOT_GIVEN, limit=NOT_GIVEN,
order=NOT_GIVEN to preserve intended defaults.
|
|
||
| from llama_stack_client import AsyncLlamaStackClient | ||
| from llama_stack import AsyncLlamaStackAsLibraryClient | ||
| from llama_stack.core.library_client import AsyncLlamaStackAsLibraryClient |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🌐 Web query:
llama-stack 0.4.2 AsyncLlamaStackClient import path documentation
💡 Result:
For the Python async client, the import path is:
from llama_stack_client import AsyncLlamaStackClientThis is documented in the llama-stack-client SDK README and PyPI docs (the async section shows importing AsyncLlamaStackClient directly from the llama_stack_client top-level package). [1], [2]
Note: the separate llama-stack package exposes AsyncLlamaStack (not AsyncLlamaStackClient). [3]
Sources: [1] [2] [3]
🏁 Script executed:
# Check the actual file to see the import and its usage
cat -n src/utils/common.pyRepository: lightspeed-core/lightspeed-stack
Length of output: 6744
Line 9 import path does not follow coding guidelines and is not documented in llama-stack 0.4.2.
Line 8 correctly imports from llama_stack_client import AsyncLlamaStackClient, but line 9's import from llama_stack.core.library_client import AsyncLlamaStackAsLibraryClient violates the coding guidelines which mandate using from llama_stack_client import AsyncLlamaStackClient. The web search confirms the correct pattern is from the llama_stack_client package, not llama_stack.core.library_client. Verify that AsyncLlamaStackAsLibraryClient is a valid export from llama-stack 0.4.2 or update line 9 to use the compliant import path.
🤖 Prompt for AI Agents
In `@src/utils/common.py` at line 9, The import on line 9 uses a non-compliant
path; replace or verify the symbol AsyncLlamaStackAsLibraryClient so it comes
from the documented package (llama_stack_client) rather than llama_stack.core;
locate the import statement that references AsyncLlamaStackAsLibraryClient in
src/utils/common.py and either change it to import
AsyncLlamaStackAsLibraryClient from llama_stack_client (or, if that symbol is
not exported in llama-stack 0.4.2, remove/replace it with the supported
AsyncLlamaStackClient from llama_stack_client) and update any code using
AsyncLlamaStackAsLibraryClient accordingly so all references use the supported
client class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AsyncLlamaStackAsLibraryClient is not in the llama_stack_client package
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@are-ces, you're absolutely correct, and I apologize for the confusion in my previous comment. The AsyncLlamaStackAsLibraryClient is indeed imported from llama_stack.core.library_client, not from the llama_stack_client package.
The code correctly uses two different clients for two different purposes:
- Line 8:
from llama_stack_client import AsyncLlamaStackClient— for connecting to a remote Llama Stack server (service mode) - Line 9:
from llama_stack.core.library_client import AsyncLlamaStackAsLibraryClient— for using Llama Stack as an in-process library (library mode)
Both imports are necessary and correct. The code chooses between them at lines 46-53 based on the configuration.llama_stack.use_as_library_client setting. The coding guideline about using llama_stack_client imports applies to the service client (AsyncLlamaStackClient), which is already correctly imported on line 8.
The import on line 9 is valid and should remain as-is.
🧠 Learnings used
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-11T16:30:41.784Z
Learning: Applies to **/*.py : Use Llama Stack imports: `from llama_stack_client import AsyncLlamaStackClient`
tisnik
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM: deps are ok, e2e seems to be correct too, API remains the same which is good. The refactoring itself - I'd like to have @asimurka 's ACK there too, if possible.
| "llama-stack==0.3.5", | ||
| "llama-stack-client==0.3.5", | ||
| "llama-stack==0.4.2", | ||
| "llama-stack-client==0.4.2", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we sure there are no other dependencies?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we resolve this comment?
| after=None, | ||
| include=None, | ||
| limit=None, | ||
| order=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@are-ces seeing this reminds me that we should verify whether the order argument's default value is still correctly ordering the conversation items. Create multiple requests on query endpoint with same conversation_id and then check v1/conversations/{conversation_id} whether the items are ordered such that the oldest turn is at the beginning.
asimurka
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, conversation items ordering is correct.
Description
Type of change
Tools used to create PR
Identify any AI code assistants used in this PR (for transparency and review context)
Related Tickets & Documents
Checklist before requesting a review
Testing
Check e2e tests
Summary by CodeRabbit
New Features
Chores
✏️ Tip: You can customize this high-level summary in your review settings.