Skip to content

Conversation

@ldrozdz93
Copy link

@ldrozdz93 ldrozdz93 commented Jan 13, 2026

Problem

Message chunking logic was duplicated across worker-cli and orb-discovery, using a heuristic algorithm. It turned out to not be enough for a customer's dataset.

Solution

Centralized chunking logic in the SDK with a configurable chunk size parameter (default: 3.0 MB). The implementation uses greedy bin-packing that accumulates entities until adding the next entity would exceed the size limit, then starts a new chunk.

Changes

  • Added netboxlabs/diode/sdk/chunking.py with create_message_chunks() and estimate_message_size()
  • Added comprehensive test suite (tests/test_chunking.py) with 10 test cases covering edge cases

@github-actions
Copy link

github-actions bot commented Jan 13, 2026

Coverage

Coverage Report
FileStmtsMissCoverMissing
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/netboxlabs/diode/sdk
   chunking.py33197%112
   client.py5094092%165–166, 185–187, 190–193, 482, 548, 553, 557, 643–648, 693–695, 700, 705, 710, 715, 725, 729, 733, 755, 776, 778, 852, 883, 891, 933, 969, 984–985, 992–993
   exceptions.py44393%69, 82–83
TOTAL6014493% 

Tests Skipped Failures Errors Time
185 0 💤 0 ❌ 0 🔥 1.589s ⏱️

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request centralizes message chunking logic into the SDK to eliminate duplication across worker-cli and orb-discovery. The implementation provides a configurable chunk size parameter with a default of 3.0 MB and uses greedy bin-packing to split entities into appropriately-sized chunks for gRPC ingestion.

Changes:

  • Added chunking module with create_message_chunks() and estimate_message_size() functions
  • Exported chunking functions from the SDK's public API
  • Added comprehensive test suite covering edge cases including empty lists, single/multiple chunks, custom chunk sizes, order preservation, and large entities

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
netboxlabs/diode/sdk/chunking.py New module implementing greedy bin-packing chunking algorithm with size estimation
netboxlabs/diode/sdk/init.py Exports chunking functions to SDK's public API
tests/test_chunking.py Comprehensive test suite with 10 test cases covering various chunking scenarios

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Member

@mfiedorowicz mfiedorowicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ldrozdz93 , generally it's advisable to create GitHub issue first so we can track, triage and prioritise issues properly. Nevertheless I see it valuable, the main ask here is to document chunking in the README. Additionally, we would like to have functional parity in Diode Go SDK too, hence could you please add feature request issue there?

Comment on lines +20 to +42
This function chunks entities to ensure each chunk stays under the specified
size limit. It uses a greedy bin-packing algorithm that accumulates entities
until adding the next entity would exceed the limit, then starts a new chunk.
The default chunk size of 3.0 MB provides a safe margin below the gRPC 4 MB
message size limit, accounting for protobuf serialization overhead.
Args:
entities: Iterable of Entity protobuf messages to chunk
max_chunk_size_mb: Maximum chunk size in MB (default 3.0)
Returns:
List of entity chunks, each under max_chunk_size_mb. Returns at least
one chunk even if the input is empty.
Examples:
>>> entities = [entity1, entity2, entity3, ...]
>>> chunks = create_message_chunks(entities)
>>> for chunk in chunks:
... client.ingest(chunk)
>>> # Use a custom chunk size
>>> chunks = create_message_chunks(entities, max_chunk_size_mb=3.5)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see some basic documentation of chunking and how to use it in the README as well

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check

@mfiedorowicz
Copy link
Member

Mind failing linting issues:

Error: netboxlabs/diode/sdk/chunking.py:3:1: D213 Multi-line docstring summary should start at the second line
Error: netboxlabs/diode/sdk/chunking.py:10:1: UP035 Import from `collections.abc` instead: `Iterable`
Error: netboxlabs/diode/sdk/chunking.py:18:5: D213 Multi-line docstring summary should start at the second line
Error: netboxlabs/diode/sdk/chunking.py:35:5: D413 Missing blank line after last section ("Examples")
Error: netboxlabs/diode/sdk/chunking.py:88:5: D213 Multi-line docstring summary should start at the second line
Error: netboxlabs/diode/sdk/chunking.py:99:5: D413 Missing blank line after last section ("Examples")
Error: tests/test_chunking.py:182:5: D213 Multi-line docstring summary should start at the second line

@ldrozdz93
Copy link
Author

ldrozdz93 commented Jan 14, 2026

Thanks @ldrozdz93 , generally it's advisable to create GitHub issue first so we can track, triage and prioritise issues properly. Nevertheless I see it valuable, the main ask here is to document chunking in the README. Additionally, we would like to have functional parity in Diode Go SDK too, hence could you please add feature request issue there?

Thanks for the feedback @mfiedorowicz. I'm aware this is kind of a shortcut. A customer is actively blocked by this and I wanted to make it quick.

I'll create a FR for Go SDK and add docs.

README.md Outdated

# Decide whether chunking is needed
if size_mb > 3.0:
chunks = create_message_chunks(entities)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then, how to ingest this chunks?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A typo. Check now.

@ldrozdz93
Copy link
Author

@mfiedorowicz ready for review

@ldrozdz93
Copy link
Author

@mfiedorowicz I've created a feature request for diode-sdk-go. Let me know if that's not what you meant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants