-
Notifications
You must be signed in to change notification settings - Fork 8
Evaluation: add export_format query param for grouped trace export #562
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
📝 WalkthroughWalkthroughAdds an Changes
Sequence DiagramsequenceDiagram
participant Client
participant API_Route as API Route
participant Validator
participant CRUD
participant DB as EvalRun/Score
participant Response
Client->>API_Route: GET /api/v1/evaluations/{id}?export_format=grouped&get_trace_info=true
API_Route->>Validator: Validate export_format & get_trace_info
alt invalid (grouped without trace info)
Validator-->>API_Route: validation error
API_Route-->>Client: 400 Bad Request
else valid
API_Route->>DB: Load evaluation run & score (including traces)
DB-->>API_Route: score with traces
API_Route->>CRUD: group_traces_by_question_id(traces)
alt grouping succeeds
CRUD-->>API_Route: grouped traces list
API_Route->>Response: build 200 payload (grouped format)
Response-->>Client: 200 OK with grouped traces
else grouping fails
CRUD-->>API_Route: ValueError / error message
API_Route-->>Client: 400 Bad Request with failure message
end
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@backend/app/crud/evaluations/core.py`:
- Around line 341-344: The check currently only inspects traces[0] for a
question_id which allows later traces with missing or empty question_id to
create a None key and break sorting; update the validation in the function that
handles traces to iterate over all traces (e.g., use any(t.get("question_id") in
(None, "") for t in traces) or a for-loop) and raise the same ValueError if any
trace has a missing/empty question_id; also fix the error message text to
"whether" instead of "weather" so the raised message reads "Grouped export
format is not available for this evaluation." Reference the symbols traces,
question_id, and the grouping/sorting logic (groups and sorted(groups.keys()))
when making the change.
In `@backend/app/tests/api/routes/test_evaluation.py`:
- Around line 1037-1044: The inline comment next to the client.get call is
stale; in the params dict the key "get_trace_info" is already set to True, so
remove the incorrect comment ("# Missing get_trace_info=true") adjacent to the
response = client.get(...) call (the params containing "export_format" and
"get_trace_info") to avoid confusion.
🧹 Nitpick comments (7)
backend/app/crud/evaluations/core.py (3)
319-338: Incomplete docstring with placeholder text.The docstring contains placeholder descriptions (
Description) that should be properly filled in. Per coding guidelines, Python functions should have complete documentation.📝 Suggested docstring improvement
def group_traces_by_question_id( traces: list[dict[str, Any]], ) -> list[dict[str, Any]]: """ - Docstring for group_traces_by_question_id - - :param traces: Description - :type traces: list[dict[str, Any]] - :return: Description - :rtype: list[dict[str, Any]] - + Group traces by question_id for grouped export format. + + Args: + traces: List of trace dictionaries containing question_id, question, + ground_truth_answer, llm_answer, trace_id, and scores. + Returns: List of grouped traces sorted by question_id: [ { "question_id": 1, "question": "What is Python?", "ground_truth_answer": "...", "llm_answers": ["Answer 1", "Answer 2"], "trace_ids": ["trace-1", "trace-2"], "scores": [[...], [...]] } ] + + Raises: + ValueError: If traces lack a valid question_id. """
346-352: Type hint mismatch:question_idmay not always beint.The type hint
dict[int, list[dict[str, Any]]]assumesquestion_idis always an integer, but the code uses.get("question_id")which could return any type. Ifquestion_idvalues are strings or mixed types, this could lead to unexpected sorting behavior.♻️ Suggested type hint adjustment
- groups: dict[int, list[dict[str, Any]]] = {} + groups: dict[Any, list[dict[str, Any]]] = {}
367-368: Add trailing newline at end of file.Per static analysis hint (Ruff W292), add a newline at the end of the file.
logger.info(f"[group_traces_by_question_id] Created {len(result)} groups") return result +backend/app/api/routes/evaluations/evaluation.py (2)
141-145: Minor style inconsistency: extra space around=instatus_code.Line 143 uses
status_code = 400with spaces, while line 138 usesstatus_code=400without spaces. Consider keeping consistent style.♻️ Consistent formatting
if export_format == "grouped" and not get_trace_info: raise HTTPException( - status_code = 400, + status_code=400, detail="export_format=grouped requires get_trace_info=true" )
164-171: Consider moving the import to the top of the file and avoiding in-place mutation.
The import at line 166 should ideally be at the top of the file with other imports for better readability and consistency.
The code mutates
eval_run.score["traces"]in-place (line 169). Ifeval_runis a SQLModel instance that might be used elsewhere or persisted, this could have unintended side effects. Consider creating a copy for the response instead.♻️ Proposed refactor
Move the import to the top of the file:
from app.crud.evaluations.core import group_traces_by_question_idThen modify the grouping logic to avoid in-place mutation:
# Formatter = grouped if export_format == "grouped" and eval_run.score and "traces" in eval_run.score: - from app.crud.evaluations.core import group_traces_by_question_id try: grouped_traces = group_traces_by_question_id(eval_run.score["traces"]) - eval_run.score["traces"] = grouped_traces + # Create a copy to avoid mutating the model instance + eval_run.score = {**eval_run.score, "grouped_traces": grouped_traces} except ValueError as e: return APIResponse.failure_response(error=str(e), data=eval_run)Note: This also aligns with the documentation which shows
grouped_tracesas a separate field.backend/app/tests/api/routes/test_evaluation.py (2)
959-1061: Consider adding test for ValueError case (traces without question_id).The success test is comprehensive, but there's no test coverage for the case where traces lack
question_idand the grouping function raises aValueError. This is an important error path handled in the route (lines 170-171 of evaluation.py).📝 Suggested additional test
def test_get_evaluation_run_grouped_format_missing_question_id_fails( self, client: TestClient, user_api_key_header: dict[str, str], db: Session, user_api_key: TestAuthContext, create_test_dataset: EvaluationDataset, ) -> None: """Test grouped export fails when traces lack question_id.""" eval_run = EvaluationRun( run_name="test_run_no_qid", dataset_name=create_test_dataset.name, dataset_id=create_test_dataset.id, config={"model": "gpt-4o"}, status="completed", total_items=1, score={ "traces": [ { "trace_id": "trace-1", "question": "What is Python?", "llm_answer": "A language", "scores": [], # No question_id } ], "summary_scores": [], }, organization_id=user_api_key.organization_id, project_id=user_api_key.project_id, ) db.add(eval_run) db.commit() db.refresh(eval_run) response = client.get( f"/api/v1/evaluations/{eval_run.id}", params={"export_format": "grouped", "get_trace_info": True}, headers=user_api_key_header, ) assert response.status_code == 200 response_data = response.json() assert response_data["success"] is False assert "not available" in response_data["error"].lower()
1061-1062: Missing blank line before next class definition.PEP 8 recommends two blank lines before class definitions at the module level.
) + class TestGetDataset:
backend/app/crud/evaluations/core.py
Outdated
| # weather question_id exists in the traces | ||
| if traces and (traces[0].get("question_id") is None or traces[0].get("question_id") == ""): | ||
| raise ValueError( | ||
| "Grouped export format is not available for this evaluation.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validation only checks the first trace, missing question_id in other traces will cause issues.
The current validation only checks traces[0] for question_id. If subsequent traces have missing or empty question_id, they will be grouped under None as a key, and sorted(groups.keys()) on line 355 will raise a TypeError when comparing None with integers.
Also, typo: "weather" should be "whether".
🐛 Proposed fix for robust validation
- # weather question_id exists in the traces
- if traces and (traces[0].get("question_id") is None or traces[0].get("question_id") == ""):
+ # Validate that all traces have a valid question_id
+ if not traces:
+ return []
+
+ for trace in traces:
+ question_id = trace.get("question_id")
+ if question_id is None or question_id == "":
+ raise ValueError(
+ "Grouped export format is not available for this evaluation.")
- raise ValueError(
- "Grouped export format is not available for this evaluation.")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # weather question_id exists in the traces | |
| if traces and (traces[0].get("question_id") is None or traces[0].get("question_id") == ""): | |
| raise ValueError( | |
| "Grouped export format is not available for this evaluation.") | |
| # Validate that all traces have a valid question_id | |
| if not traces: | |
| return [] | |
| for trace in traces: | |
| question_id = trace.get("question_id") | |
| if question_id is None or question_id == "": | |
| raise ValueError( | |
| "Grouped export format is not available for this evaluation.") |
🤖 Prompt for AI Agents
In `@backend/app/crud/evaluations/core.py` around lines 341 - 344, The check
currently only inspects traces[0] for a question_id which allows later traces
with missing or empty question_id to create a None key and break sorting; update
the validation in the function that handles traces to iterate over all traces
(e.g., use any(t.get("question_id") in (None, "") for t in traces) or a
for-loop) and raise the same ValueError if any trace has a missing/empty
question_id; also fix the error message text to "whether" instead of "weather"
so the raised message reads "Grouped export format is not available for this
evaluation." Reference the symbols traces, question_id, and the grouping/sorting
logic (groups and sorted(groups.keys())) when making the change.
| response = client.get( | ||
| f"/api/v1/evaluations/{eval_run.id}", | ||
| params={ | ||
| "export_format": "grouped", | ||
| "get_trace_info": True, | ||
| }, # Missing get_trace_info=true | ||
| headers=user_api_key_header, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stale comment: get_trace_info is actually True here.
The comment on line 1042 says # Missing get_trace_info=true but get_trace_info: True is present in the params. This appears to be copy-paste from the previous test.
🧹 Remove stale comment
response = client.get(
f"/api/v1/evaluations/{eval_run.id}",
params={
"export_format": "grouped",
"get_trace_info": True,
- }, # Missing get_trace_info=true
+ },
headers=user_api_key_header,
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| response = client.get( | |
| f"/api/v1/evaluations/{eval_run.id}", | |
| params={ | |
| "export_format": "grouped", | |
| "get_trace_info": True, | |
| }, # Missing get_trace_info=true | |
| headers=user_api_key_header, | |
| ) | |
| response = client.get( | |
| f"/api/v1/evaluations/{eval_run.id}", | |
| params={ | |
| "export_format": "grouped", | |
| "get_trace_info": True, | |
| }, | |
| headers=user_api_key_header, | |
| ) |
🤖 Prompt for AI Agents
In `@backend/app/tests/api/routes/test_evaluation.py` around lines 1037 - 1044,
The inline comment next to the client.get call is stale; in the params dict the
key "get_trace_info" is already set to True, so remove the incorrect comment ("#
Missing get_trace_info=true") adjacent to the response = client.get(...) call
(the params containing "export_format" and "get_trace_info") to avoid confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@backend/app/crud/evaluations/core.py`:
- Around line 358-366: The docstring for group_traces_by_question_id has
trailing whitespace and Black formatting violations; run the project's
formatter/pre-commit (e.g., black/isort) and remove any trailing spaces in that
docstring block, reflow or re-indent lines to match Black's style (keep the
triple-quoted docstring, parameter lines and Returns section neatly wrapped),
then re-run tests/CI to ensure the formatting errors are resolved.
🧹 Nitpick comments (1)
backend/app/tests/api/routes/test_evaluation.py (1)
939-1045: Prefer a factory/fixture forEvaluationRunsetup in these tests.To align with the test fixture pattern and reduce duplication, consider extracting the
EvaluationRuncreation into a factory/fixture helper (e.g.,create_test_evaluation_run).As per coding guidelines: “backend/app/tests/**/*.py: Use factory pattern for test fixtures in backend/app/tests/”.
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
…pdate docstring for clarity
a164ee5 to
e6de3d0
Compare
Summary
Target issue is #545
Explain the motivation for making this change. What existing problem does the pull request solve?
The frontend needs a new CSV export format that groups repeated questions horizontally, but the current API only supports row-based export where each iteration appears as a separate row.
Solution: This PR extends the existing evaluation export API to support a grouped format with structure:
Checklist
Before submitting a pull request, please ensure that you mark these task.
fastapi run --reload app/main.pyordocker compose upin the repository root and test.Notes
GET /api/v1/evaluations/{id}?get_trace_info=true&export_format=grouped
Validation: export_format=grouped requires get_trace_info=true, returns error if traces don't have question_id.
Need to merge this PR: #553 , before merging this.
Summary by CodeRabbit
New Features
Bug Fixes / Validation
Tests
Documentation
✏️ Tip: You can customize this high-level summary in your review settings.