Langfuse: question_id not in traces #567

Prajna1999 · 2026-01-29T13:15:10Z

Earlier question_id was not in the response for GET evaluations/{evaluation_id} endpoint.

In the app/crud/evaluations/score.py

class TraceData(TypedDict):
    """Data for a single trace including Q&A and scores."""

    trace_id: str
    question: str
    llm_answer: str
    **question_id: int | None**
    ground_truth_answer: str
    scores: list[TraceScore]

This pydantic model did not have question_id earlier, hence in the fetch_trace_scores_from_langfuse function where the model was used omitting the concerned key.

Summary by CodeRabbit

New Features
- Trace data now includes optional question identifiers to enable finer-grained tracking, correlation, and filtering of evaluation results across runs and datasets.
- Enhanced score retrieval: supports numeric and categorical scores, aggregates summary scores, and handles mixed/empty datasets and per-trace failures robustly.
Tests
- Added comprehensive tests covering score fetching, aggregation, filtering, and error scenarios.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-29T13:15:30Z

📝 Walkthrough

Walkthrough

Added an optional question_id: int | None field to the TraceData TypedDict and introduced/exposed a Langfuse trace-score fetcher function with comprehensive tests covering score types, missing data, and error cases.

Changes

Cohort / File(s)	Summary
Type Definition `backend/app/crud/evaluations/score.py`	Added `question_id: int
Langfuse integration & tests `backend/app/crud/evaluations/langfuse.py`, `backend/app/tests/crud/evaluations/test_langfuse.py`	Exposed/added `fetch_trace_scores_from_langfuse` public API and added extensive tests for fetching trace scores (presence/absence of `question_id`, numeric/categorical scores, filtering incomplete scores, error handling, summary aggregation, per-trace failures).

Sequence Diagram(s)

(omitted — changes are small data-shape addition plus test coverage; no multi-component control-flow requiring visualization)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Evaluation: Fix score format #549 — Modifies the same TraceData TypedDict to include question_id, indicating closely related shape changes.
Evaluation: Add question id #553 — Propagates question_id through trace/score handling and touches Langfuse fetch paths; directly related to this change.

Suggested labels

bug

Suggested reviewers

AkhileshNegi
vprashrex

Poem

🐰 A tiny id hops into the trace,
tucked in data, finding its place.
Tests dusted off, the scores align,
Langfuse whispers, "All looks fine."
Hop, hop—merge time, carrot-bright delight! 🥕

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly relates to the main objective of the PR: ensuring question_id is included in trace data that was previously missing from responses.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

vprashrex · 2026-01-29T13:20:06Z

Yes this was there and also one more thing

make sure in /backend/app/crud/evaluations/langfuse.py

create_langfuse_dataset_run function

It should not be
if question_id:

It should be
if question_id is not None:

codecov · 2026-01-29T13:20:23Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

backend/app/crud/evaluations/score.py (1)
20-27: Fix question_id type contract: int annotation conflicts with "" (str) runtime values.

TraceData.question_id is typed as int but initialized and retrieved as empty string "" throughout the codebase:

Initialized as empty string in langfuse.py line 426

Retrieved with empty string default in langfuse.py line 449

Checked explicitly for empty string in core.py line 378

Additionally, the grouping logic in core.py line 391 (sorted(groups.keys())) will fail at runtime with a TypeError when trace data contains mixed int and str values for question_id, as the dict key type is annotated as int but receives str values from fetched traces.

Align the type to match runtime behavior: change question_id: int to question_id: int | None or question_id: str | int, or enforce conversion to int in the fetcher and initialize to None instead of "".
Example fix (int | None only)
 class TraceData(TypedDict):
     """Data for a single trace including Q&A and scores."""
 
     trace_id: str
     question: str
     llm_answer: str
-    question_id: int
+    question_id: int | None
     ground_truth_answer: str
     scores: list[TraceScore]

🤖 Fix all issues with AI agents

In `@backend/app/tests/crud/evaluations/test_langfuse.py`:
- Around line 712-742: Update the tests to match the declared
TraceData.question_id contract (int | None) and the normalization performed in
fetch_trace_scores_from_langfuse: when a trace has no question_id, assert
trace["question_id"] is None (not ""), and for non-int question_id values assert
they are converted to int (or None on failure) consistent with TraceData; adjust
expectations in test_fetch_trace_scores_without_question_id and
test_fetch_trace_scores_mixed_question_id_types to reference this behavior and
the fetch_trace_scores_from_langfuse normalization logic.

coderabbitai · 2026-01-29T13:22:11Z

backend/app/tests/crud/evaluations/test_langfuse.py

+    def test_fetch_trace_scores_without_question_id(self) -> None:
+        """Test fetching traces without question_id (backwards compatibility)."""
+        mock_langfuse = MagicMock()
+
+        # Mock dataset run
+        mock_run_item = MagicMock()
+        mock_run_item.trace_id = "trace_1"
+        mock_dataset_run = MagicMock()
+        mock_dataset_run.dataset_run_items = [mock_run_item]
+        mock_langfuse.api.datasets.get_run.return_value = mock_dataset_run
+
+        # Mock trace without question_id in metadata
+        mock_trace = MagicMock()
+        mock_trace.input = {"question": "What is 2+2?"}
+        mock_trace.output = {"answer": "4"}
+        mock_trace.metadata = {"ground_truth": "4"}  # No question_id
+        mock_trace.scores = []
+
+        mock_langfuse.api.trace.get.return_value = mock_trace
+
+        result = fetch_trace_scores_from_langfuse(
+            langfuse=mock_langfuse,
+            dataset_name="test_dataset",
+            run_name="test_run",
+        )
+
+        # Verify trace has empty string for question_id
+        assert len(result["traces"]) == 1
+        trace = result["traces"][0]
+        assert trace["question_id"] == ""
+        assert trace["trace_id"] == "trace_1"


⚠️ Potential issue | 🟠 Major

Tests allow non-int question_id, conflicting with the declared contract.

test_fetch_trace_scores_without_question_id expects an empty string, and test_fetch_trace_scores_mixed_question_id_types expects a string. This conflicts with TraceData.question_id: int and the PR objective. Align tests with the final contract (e.g., int | None with normalization) or widen the type if strings are supported.

Also applies to: 979-1018

🤖 Prompt for AI Agents

In `@backend/app/tests/crud/evaluations/test_langfuse.py` around lines 712 - 742, Update the tests to match the declared TraceData.question_id contract (int | None) and the normalization performed in fetch_trace_scores_from_langfuse: when a trace has no question_id, assert trace["question_id"] is None (not ""), and for non-int question_id values assert they are converted to int (or None on failure) consistent with TraceData; adjust expectations in test_fetch_trace_scores_without_question_id and test_fetch_trace_scores_mixed_question_id_types to reference this behavior and the fetch_trace_scores_from_langfuse normalization logic.

fix: question_id not in lf traces

627cfd8

Prajna1999 requested a review from vprashrex January 29, 2026 13:15

Prajna1999 requested review from AkhileshNegi and kartpop January 29, 2026 13:15

coderabbitai bot reviewed Jan 29, 2026

View reviewed changes

kartpop approved these changes Jan 29, 2026

View reviewed changes

chore: coderabbit suggestions

1d15440

vprashrex approved these changes Jan 29, 2026

View reviewed changes

Prajna1999 merged commit 5e91aea into main Jan 29, 2026
3 checks passed

Prajna1999 deleted the fix/lf-question-id-not-found branch January 29, 2026 13:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Langfuse: question_id not in traces #567

Langfuse: question_id not in traces #567

Uh oh!

Prajna1999 commented Jan 29, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Jan 29, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

vprashrex commented Jan 29, 2026

Uh oh!

codecov bot commented Jan 29, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Jan 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Langfuse: question_id not in traces #567

Langfuse: question_id not in traces #567

Uh oh!

Conversation

Prajna1999 commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

vprashrex commented Jan 29, 2026

Uh oh!

codecov bot commented Jan 29, 2026

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Prajna1999 commented Jan 29, 2026 •

edited

Loading

coderabbitai bot commented Jan 29, 2026 •

edited

Loading