-
Notifications
You must be signed in to change notification settings - Fork 8
Langfuse: question_id not in traces #567
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
📝 WalkthroughWalkthroughAdded an optional Changes
Sequence Diagram(s)(omitted — changes are small data-shape addition plus test coverage; no multi-component control-flow requiring visualization) Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Yes this was there and also one more thing make sure in /backend/app/crud/evaluations/langfuse.py create_langfuse_dataset_run function It should not be It should be |
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
backend/app/crud/evaluations/score.py (1)
20-27: Fixquestion_idtype contract:intannotation conflicts with""(str) runtime values.
TraceData.question_idis typed asintbut initialized and retrieved as empty string""throughout the codebase:
- Initialized as empty string in langfuse.py line 426
- Retrieved with empty string default in langfuse.py line 449
- Checked explicitly for empty string in core.py line 378
Additionally, the grouping logic in core.py line 391 (
sorted(groups.keys())) will fail at runtime with a TypeError when trace data contains mixed int and str values forquestion_id, as the dict key type is annotated asintbut receives str values from fetched traces.Align the type to match runtime behavior: change
question_id: inttoquestion_id: int | Noneorquestion_id: str | int, or enforce conversion to int in the fetcher and initialize toNoneinstead of"".Example fix (int | None only)
class TraceData(TypedDict): """Data for a single trace including Q&A and scores.""" trace_id: str question: str llm_answer: str - question_id: int + question_id: int | None ground_truth_answer: str scores: list[TraceScore]
🤖 Fix all issues with AI agents
In `@backend/app/tests/crud/evaluations/test_langfuse.py`:
- Around line 712-742: Update the tests to match the declared
TraceData.question_id contract (int | None) and the normalization performed in
fetch_trace_scores_from_langfuse: when a trace has no question_id, assert
trace["question_id"] is None (not ""), and for non-int question_id values assert
they are converted to int (or None on failure) consistent with TraceData; adjust
expectations in test_fetch_trace_scores_without_question_id and
test_fetch_trace_scores_mixed_question_id_types to reference this behavior and
the fetch_trace_scores_from_langfuse normalization logic.
| def test_fetch_trace_scores_without_question_id(self) -> None: | ||
| """Test fetching traces without question_id (backwards compatibility).""" | ||
| mock_langfuse = MagicMock() | ||
|
|
||
| # Mock dataset run | ||
| mock_run_item = MagicMock() | ||
| mock_run_item.trace_id = "trace_1" | ||
| mock_dataset_run = MagicMock() | ||
| mock_dataset_run.dataset_run_items = [mock_run_item] | ||
| mock_langfuse.api.datasets.get_run.return_value = mock_dataset_run | ||
|
|
||
| # Mock trace without question_id in metadata | ||
| mock_trace = MagicMock() | ||
| mock_trace.input = {"question": "What is 2+2?"} | ||
| mock_trace.output = {"answer": "4"} | ||
| mock_trace.metadata = {"ground_truth": "4"} # No question_id | ||
| mock_trace.scores = [] | ||
|
|
||
| mock_langfuse.api.trace.get.return_value = mock_trace | ||
|
|
||
| result = fetch_trace_scores_from_langfuse( | ||
| langfuse=mock_langfuse, | ||
| dataset_name="test_dataset", | ||
| run_name="test_run", | ||
| ) | ||
|
|
||
| # Verify trace has empty string for question_id | ||
| assert len(result["traces"]) == 1 | ||
| trace = result["traces"][0] | ||
| assert trace["question_id"] == "" | ||
| assert trace["trace_id"] == "trace_1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests allow non-int question_id, conflicting with the declared contract.
test_fetch_trace_scores_without_question_id expects an empty string, and test_fetch_trace_scores_mixed_question_id_types expects a string. This conflicts with TraceData.question_id: int and the PR objective. Align tests with the final contract (e.g., int | None with normalization) or widen the type if strings are supported.
Also applies to: 979-1018
🤖 Prompt for AI Agents
In `@backend/app/tests/crud/evaluations/test_langfuse.py` around lines 712 - 742,
Update the tests to match the declared TraceData.question_id contract (int |
None) and the normalization performed in fetch_trace_scores_from_langfuse: when
a trace has no question_id, assert trace["question_id"] is None (not ""), and
for non-int question_id values assert they are converted to int (or None on
failure) consistent with TraceData; adjust expectations in
test_fetch_trace_scores_without_question_id and
test_fetch_trace_scores_mixed_question_id_types to reference this behavior and
the fetch_trace_scores_from_langfuse normalization logic.
Earlier
question_idwas not in the response for GET evaluations/{evaluation_id} endpoint.In the
app/crud/evaluations/score.pyThis pydantic model did not have
question_idearlier, hence in thefetch_trace_scores_from_langfusefunction where the model was used omitting the concerned key.Summary by CodeRabbit
New Features
Tests
✏️ Tip: You can customize this high-level summary in your review settings.