Skip to content

Conversation

@AAgnihotry
Copy link
Contributor

@AAgnihotry AAgnihotry commented Jan 21, 2026

Summary

This PR updates evaluation telemetry properties, removes verbose logging, removes all GUID-named spans from traces, and adds configuration for the Application Insights connection string environment variable.

Changes

Telemetry Property Updates

  • Renamed AgentIdEntrypoint in individual events
  • Added AgentId as alias for ProjectId in enrichment
  • Renamed EvalItemIdEvalId
  • Renamed EvalItemNameEvalName
  • Removed EvalRunId property
  • Added AgentType property (LowCode for agent.json, Coded for .py files)
  • Added Runtime property (set to "URT")
  • Removed .URT suffix from all event names (e.g., EvalSetRun.Start.URTEvalSetRun.Start)

Logging Improvements

  • Suppressed verbose logging from Application Insights SDK
  • Suppressed OpenTelemetry library logs
  • Removed informational debug logs from telemetry subscriber

Span Cleanup - Removed All GUID-Named Spans

Removed evaluation set spans:

  • Deleted _send_parent_trace() method that created evaluation set GUID spans (~44 lines)
  • Deleted _send_eval_run_trace() method that created evaluation run GUID spans (~48 lines)
  • Removed call to _send_parent_trace in handle_create_eval_set_run

Removed evaluator spans:

  • Deleted "Evaluators" parent span creation
  • Deleted individual evaluator span creation loops (~190 lines)
  • Evaluator timing information still captured, just not sent as separate spans

Impact: ~280 lines of span creation code removed for cleaner codebase and uncluttered trace views

Configuration Enhancement

  • Added configure_connection_string_env_var() function to allow customizing the Application Insights connection string environment variable
  • Default environment variable changed from TELEMETRY_CONNECTION_STRING to APPLICATIONINSIGHTS_CONNECTION_STRING

Other Changes

  • Updated calculator sample to use uipath==2.5.27
  • Version bump from 2.5.27 → 2.5.28

Testing

All checks passed:

  • ✅ Linting
  • ✅ Formatting
  • ✅ Type checking
  • ✅ Unit tests (1782 passed)

🤖 Generated with Claude Code

Development Package

  • Use uipath pack --nolock to get the latest dev build from this PR (requires version range).
  • Add this package as a dependency in your pyproject.toml:
[project]
dependencies = [
  # Exact version:
  "uipath==2.5.28.dev1011704118",

  # Any version from PR
  "uipath>=2.5.28.dev1011700000,<2.5.28.dev1011710000"
]

[[tool.uv.index]]
name = "testpypi"
url = "https://test.pypi.org/simple/"
publish-url = "https://test.pypi.org/legacy/"
explicit = true

[tool.uv.sources]
uipath = { index = "testpypi" }

[tool.uv]
override-dependencies = [
    "uipath>=2.5.28.dev1011700000,<2.5.28.dev1011710000",
]

@github-actions github-actions bot added test:uipath-langchain Triggers tests in the uipath-langchain-python repository test:uipath-llamaindex Triggers tests in the uipath-llamaindex-python repository labels Jan 21, 2026
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@AAgnihotry AAgnihotry force-pushed the feat/evalTel branch 3 times, most recently from 807de2e to bf9c1f9 Compare January 21, 2026 20:14
@AAgnihotry AAgnihotry added the build:dev Create a dev build from the pr label Jan 21, 2026
AAgnihotry and others added 13 commits January 21, 2026 15:48
- Import opentelemetry.trace module
- Wrap chat_completions HTTP calls with manual "LLM call" spans
- Set OpenInference semantic convention attributes (llm.model_name, llm.request.type, max_tokens, temperature)
- Add uipath.custom_instrumentation=true to identify manually instrumented spans
- HTTPXClientInstrumentor creates child "Model run" spans automatically

This creates the hierarchical "LLM call" → "Model run" span structure for evaluator LLM calls.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Added uipath.custom_instrumentation=true attribute to all evaluation
runtime spans for proper identification in LLMOps trace view:
- EvalSet span ("Evaluation Set Run")
- Evaluation span ("Evaluation")
- Evaluator span ("Evaluator: {name}")

This complements the LLM call spans which already have this attribute
set in the chat_completions methods.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Added "Model run" child span inside "LLM call" span for evaluator
LLM completions to match the proper OpenInference span hierarchy used
by agent runs.

Hierarchy is now:
- LLM call (with llm.* attributes)
  - Model run (with openinference.span.kind = "LLM")

This creates the proper hierarchical structure for LLMOps trace view,
matching how agent LLM calls appear in the execution trace.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Removed the "Model run" child span and instead set all OpenInference
attributes directly on the "LLM call" span:
- openinference.span.kind = "LLM"
- llm.model_name
- llm.request.type
- llm.request.max_tokens
- llm.request.temperature
- uipath.custom_instrumentation = true

This creates a single LLM call span (not parent-child hierarchy) that
displays properly in the LLMOps trace view with all tool calls, input,
output, and metadata attached directly to it.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Added comprehensive OpenInference semantic attributes to LLM call spans:

**Input attributes:**
- input.value: JSON serialized input messages
- llm.input_messages.{i}.message.role: Role for each message
- llm.input_messages.{i}.message.content: Content for each message

**Output attributes:**
- output.value: JSON serialized output with content and tool calls
- llm.output_messages.0.message.role: Output message role
- llm.output_messages.0.message.content: Output message content
- llm.output_messages.0.message.tool_calls.{i}.tool_call.*: Tool call details

**Token usage attributes:**
- llm.token_count.prompt: Prompt tokens used
- llm.token_count.completion: Completion tokens used
- llm.token_count.total: Total tokens used

This ensures the LLMOps trace view displays complete information
including tool calls, input/output data, and token usage metrics.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Removed the intermediate message structure attributes that were creating
nested "0 message:" sections in the Input and Output views:

**Removed attributes:**
- llm.input_messages.{i}.message.role
- llm.input_messages.{i}.message.content
- llm.output_messages.0.message.role
- llm.output_messages.0.message.content
- llm.output_messages.0.message.tool_calls.{i}.tool_call.*

**Kept attributes:**
- input.value: Contains complete input messages as JSON
- output.value: Contains complete output (role, content, tool_calls) as JSON
- llm.token_count.*: Token usage metrics

This simplifies the trace view by removing the nested message abstraction
and showing data directly in the input.value and output.value fields.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Removed the intermediate "root" span from agent execution traces by
filtering it out in _get_and_clear_execution_data method.

This makes "Agent run" appear as a direct child of "Evaluation" span
instead of being nested under "root", creating a cleaner span hierarchy:

Before:
- Evaluation
  - root
    - Agent run

After:
- Evaluation
  - Agent run

The root span was created by the agent runtime but added unnecessary
nesting in the evaluation trace view.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Extended span filtering to also remove HTTP instrumentation spans
(POST and GET) that add noise to the trace view.

Filtered spans:
- "root": Makes Agent run a direct child of Evaluation
- "POST": HTTP request spans from httpx instrumentation
- "GET": HTTP request spans from httpx instrumentation

This creates a cleaner trace hierarchy by removing low-level HTTP
spans that don't provide meaningful information in the evaluation
context. The LLM call spans already contain all relevant information
about the requests.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
**Input/Output Display:**
- For single-user-message evaluator calls, display content string directly
- For multi-turn conversations, display full messages array as JSON
- For outputs without tool calls, display content string directly
- For outputs with tool calls, display full message object with tool calls

This removes the nested array index display (e.g., "{} 0:") that appears
when JSON arrays are shown in the trace UI.

**Telemetry:**
- Keep HTTPXClientInstrumentor enabled to send POST/GET spans to AppInsights
- POST/GET/root spans are filtered in _get_and_clear_execution_data
- All spans still sent to AppInsights for telemetry, just hidden from eval UI

Before (Input):
  input.value
    {} 0:
      role: user
      content: ...

After (Input):
  input.value
    As an expert evaluator, analyze...

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…ring

**Removed HTTPXClientInstrumentor:**
- LLM call spans are already manually created in _llm_gateway_service.py
- HTTPXClientInstrumentor was creating redundant POST/GET spans
- Removed dependency on opentelemetry-instrumentation-httpx

**Fixed Span Filtering:**
- Moved filtering from _get_and_clear_execution_data to ExecutionSpanExporter.export()
- This ensures POST/GET/root spans are filtered at export time
- Filtered spans never stored, so they won't appear in evaluation traces
- Other exporters (AppInsights, Studio Web, JSONL) still receive all spans

The filtering now works correctly because it happens when spans are exported
to ExecutionSpanExporter, not after retrieval. This prevents POST/GET/root
spans from appearing in the evaluation UI while still allowing them to be
sent to other exporters for telemetry.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
**Reparenting Logic:**
- LiveTrackingSpanProcessor now tracks filtered spans (root/POST/GET)
- When a span's parent was filtered, it's reparented to its grandparent
- This makes "Agent run" a direct child of "Evaluation" in Studio Web traces

**Implementation:**
- Added `parent_id_override` parameter to `LlmOpsHttpExporter.upsert_span()`
- LiveTrackingSpanProcessor filters root/POST/GET and tracks their parents
- When upserting spans, checks if parent was filtered and overrides ParentId
- ExecutionSpanExporter simplified (reparenting handled by LiveTracking)

**Result:**
Before: Evaluation → root → Agent run (root filtered, broken parent)
After: Evaluation → Agent run (properly reparented)

Filtered spans (root/POST/GET) are still sent to AppInsights for telemetry
but excluded from Studio Web and evaluation traces with correct parent links.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Filter root spans in _export_agent_execution_spans and track their parents
- Update LiveTrackingSpanProcessor's filtered_parents mapping for reparenting
- Pass LiveTrackingSpanProcessor to StudioWebProgressReporter for access
- Simplified LiveTrackingSpanProcessor to only filter "root" (not POST/GET)
- Updated ExecutionSpanExporter docstring to reflect reparenting approach

This creates cleaner evaluation traces by making Agent run a direct child
of Evaluation span instead of root span.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Two critical fixes:

1. Agent run spans being dropped:
   - Manually reparent spans in _export_agent_execution_spans before export
   - Use upsert_span with parent_id_override for each span whose parent was filtered
   - This ensures agent run (child of root) gets reparented to Evaluation span

2. Input/output appearing in metadata instead of inputs:
   - Preserve input/output values from input.value/output.value attributes
   - Prevent _map_llm_call_attributes from overwriting with empty llm.input_messages
   - Use llm.input_messages/llm.output_messages only as fallback

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@AAgnihotry AAgnihotry closed this Jan 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

build:dev Create a dev build from the pr test:uipath-langchain Triggers tests in the uipath-langchain-python repository test:uipath-llamaindex Triggers tests in the uipath-llamaindex-python repository

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant