feat: add message chunking to sdk #80

ldrozdz93 · 2026-01-13T17:26:31Z

Problem

Message chunking logic was duplicated across worker-cli and orb-discovery, using a heuristic algorithm. It turned out to not be enough for a customer's dataset.

Solution

Centralized chunking logic in the SDK with a configurable chunk size parameter (default: 3.0 MB). The implementation uses greedy bin-packing that accumulates entities until adding the next entity would exceed the size limit, then starts a new chunk.

Changes

Added netboxlabs/diode/sdk/chunking.py with create_message_chunks() and estimate_message_size()
Added comprehensive test suite (tests/test_chunking.py) with 10 test cases covering edge cases

github-actions · 2026-01-13T17:26:59Z

Coverage Report

File	Stmts	Miss	Cover	Missing
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/netboxlabs/diode/sdk
chunking.py	33	1	97%	112
client.py	509	40	92%	165–166, 185–187, 190–193, 482, 548, 553, 557, 643–648, 693–695, 700, 705, 710, 715, 725, 729, 733, 755, 776, 778, 852, 883, 891, 933, 969, 984–985, 992–993
exceptions.py	44	3	93%	69, 82–83
TOTAL	601	44	93%

Tests	Skipped	Failures	Errors	Time
185	0 💤	0 ❌	0 🔥	1.589s ⏱️

Copilot

Pull request overview

This pull request centralizes message chunking logic into the SDK to eliminate duplication across worker-cli and orb-discovery. The implementation provides a configurable chunk size parameter with a default of 3.0 MB and uses greedy bin-packing to split entities into appropriately-sized chunks for gRPC ingestion.

Changes:

Added chunking module with create_message_chunks() and estimate_message_size() functions
Exported chunking functions from the SDK's public API
Added comprehensive test suite covering edge cases including empty lists, single/multiple chunks, custom chunk sizes, order preservation, and large entities

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File	Description
netboxlabs/diode/sdk/chunking.py	New module implementing greedy bin-packing chunking algorithm with size estimation
netboxlabs/diode/sdk/init.py	Exports chunking functions to SDK's public API
tests/test_chunking.py	Comprehensive test suite with 10 test cases covering various chunking scenarios

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

netboxlabs/diode/sdk/__init__.py

mfiedorowicz

Thanks @ldrozdz93 , generally it's advisable to create GitHub issue first so we can track, triage and prioritise issues properly. Nevertheless I see it valuable, the main ask here is to document chunking in the README. Additionally, we would like to have functional parity in Diode Go SDK too, hence could you please add feature request issue there?

netboxlabs/diode/sdk/chunking.py

mfiedorowicz · 2026-01-13T17:49:08Z

netboxlabs/diode/sdk/chunking.py

+    This function chunks entities to ensure each chunk stays under the specified
+    size limit. It uses a greedy bin-packing algorithm that accumulates entities
+    until adding the next entity would exceed the limit, then starts a new chunk.
+
+    The default chunk size of 3.0 MB provides a safe margin below the gRPC 4 MB
+    message size limit, accounting for protobuf serialization overhead.
+
+    Args:
+        entities: Iterable of Entity protobuf messages to chunk
+        max_chunk_size_mb: Maximum chunk size in MB (default 3.0)
+
+    Returns:
+        List of entity chunks, each under max_chunk_size_mb. Returns at least
+        one chunk even if the input is empty.
+
+    Examples:
+        >>> entities = [entity1, entity2, entity3, ...]
+        >>> chunks = create_message_chunks(entities)
+        >>> for chunk in chunks:
+        ...     client.ingest(chunk)
+
+        >>> # Use a custom chunk size
+        >>> chunks = create_message_chunks(entities, max_chunk_size_mb=3.5)


I'd like to see some basic documentation of chunking and how to use it in the README as well

Please check

tests/test_chunking.py

mfiedorowicz · 2026-01-13T17:57:19Z

Mind failing linting issues:

Error: netboxlabs/diode/sdk/chunking.py:3:1: D213 Multi-line docstring summary should start at the second line
Error: netboxlabs/diode/sdk/chunking.py:10:1: UP035 Import from `collections.abc` instead: `Iterable`
Error: netboxlabs/diode/sdk/chunking.py:18:5: D213 Multi-line docstring summary should start at the second line
Error: netboxlabs/diode/sdk/chunking.py:35:5: D413 Missing blank line after last section ("Examples")
Error: netboxlabs/diode/sdk/chunking.py:88:5: D213 Multi-line docstring summary should start at the second line
Error: netboxlabs/diode/sdk/chunking.py:99:5: D413 Missing blank line after last section ("Examples")
Error: tests/test_chunking.py:182:5: D213 Multi-line docstring summary should start at the second line

ldrozdz93 · 2026-01-14T07:56:06Z

Thanks @ldrozdz93 , generally it's advisable to create GitHub issue first so we can track, triage and prioritise issues properly. Nevertheless I see it valuable, the main ask here is to document chunking in the README. Additionally, we would like to have functional parity in Diode Go SDK too, hence could you please add feature request issue there?

Thanks for the feedback @mfiedorowicz. I'm aware this is kind of a shortcut. A customer is actively blocked by this and I wanted to make it quick.

I'll create a FR for Go SDK and add docs.

leoparente · 2026-01-14T12:18:28Z

README.md

+
+# Decide whether chunking is needed
+if size_mb > 3.0:
+    chunks = create_message_chunks(entities)


Then, how to ingest this chunks?

A typo. Check now.

ldrozdz93 · 2026-01-15T08:25:00Z

@mfiedorowicz ready for review

ldrozdz93 · 2026-01-15T11:53:33Z

@mfiedorowicz I've created a feature request for diode-sdk-go. Let me know if that's not what you meant.

add message chunking to sdk

ccd22ec

ldrozdz93 requested a review from Copilot January 13, 2026 17:26

ldrozdz93 requested review from MicahParks, jajeffries, leoparente, ltucker and mfiedorowicz as code owners January 13, 2026 17:26

github-actions bot added the python label Jan 13, 2026

Copilot started reviewing on behalf of ldrozdz93 January 13, 2026 17:26 View session

Copilot AI reviewed Jan 13, 2026

View reviewed changes

netboxlabs/diode/sdk/__init__.py Show resolved Hide resolved

mfiedorowicz requested changes Jan 13, 2026

View reviewed changes

add readme doc

0b208e2

github-actions bot added the markdown label Jan 14, 2026

leoparente reviewed Jan 14, 2026

View reviewed changes

ldrozdz93 added 2 commits January 15, 2026 09:06

fix lint

06bea68

clarify docs

802cd16

ldrozdz93 mentioned this pull request Jan 15, 2026

Entities chunking - feature parity with python SDK netboxlabs/diode-sdk-go#58

Open

feat: add message chunking to sdk #80

Are you sure you want to change the base?

feat: add message chunking to sdk #80

Uh oh!

Conversation

ldrozdz93 commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Changes

Uh oh!

github-actions bot commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

mfiedorowicz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mfiedorowicz Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

ldrozdz93 Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mfiedorowicz commented Jan 13, 2026

Uh oh!

ldrozdz93 commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

leoparente Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

ldrozdz93 Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

ldrozdz93 commented Jan 15, 2026

Uh oh!

ldrozdz93 commented Jan 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ldrozdz93 commented Jan 13, 2026 •

edited

Loading

github-actions bot commented Jan 13, 2026 •

edited

Loading

ldrozdz93 commented Jan 14, 2026 •

edited

Loading