Skip to content

Conversation

@ChALkeR
Copy link
Member

@ChALkeR ChALkeR commented Feb 2, 2026

See nodejs/node#61609

This improves Buffer#toString('hex')  2-4x

In local tests on a MacBook, this improves .toString('hex') throughput from ~2.24 GiB/s to ~9.53 GiB/s on ~80 KiB chunks
For comparison, Uint8Array#toHex() shows 3.28 GiB/s.

Benchmark CI: https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/1790/console

10:57:55                                                 confidence improvement accuracy (*)   (**)  (***)
10:57:55 buffers/buffer-hex-decode.js n=1000000 len=1024        ***    170.96 %       ±1.02% ±1.38% ±1.82%
10:57:55 buffers/buffer-hex-decode.js n=1000000 len=64          ***     53.49 %       ±1.65% ±2.20% ±2.88%
10:57:55 buffers/buffer-hex-encode.js n=1000000 len=1024                 0.13 %       ±0.19% ±0.25% ±0.32%
10:57:55 buffers/buffer-hex-encode.js n=1000000 len=64                  -0.14 %       ±0.85% ±1.13% ±1.47%

@lemire
Copy link
Member

lemire commented Feb 2, 2026

Excellent. For some analysis, please see this article on my blog:

Daniel Lemire, "Converting data to hexadecimal outputs quickly," in Daniel Lemire's blog, February 2, 2026, https://lemire.me/blog/2026/02/02/converting-data-to-hexadecimal-outputs-quickly/.

@ChALkeR
Copy link
Member Author

ChALkeR commented Feb 2, 2026

This is not the fastest way yet though, there should be more optimizations possible, but now I estimate that the most significant part of the slowdown is on the wrapping code.

@lemire pls s/Skovorora/Skovoroda/g 🙃

@lemire
Copy link
Member

lemire commented Feb 2, 2026

@ChALkeR I'll update your name. Sorry.

What you have here is great and I don't recommend doing more.

The better place to get fancier would be simdutf.

simdutf/simdutf#565

We do invite contributions.

src/nbytes.cpp Outdated
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
-1, -1, -1, -1, -1, -1, -1, -1, -1};

inline char nibble(uint8_t x) { return x + '0' + ((x > 9) * 39); }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
inline char nibble(uint8_t x) { return x + '0' + ((x > 9) * 39); }
inline constexpr char nibble(uint8_t x) {
return x + '0' + ((x > 9) * 39);
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ChALkeR I think that one might get results just as good with something simpler

char nibble(uint8_t x) { 
    auto v =  x + '0';
    if(x>9) {
        v += 'a'-'0'- 10;
    }
    return v;
}

Copy link
Member Author

@ChALkeR ChALkeR Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lemire Wow. Again, contrary to intuition against ifs, this seems actually slightly (~5%?) faster from a quick check.
Will benchmark this on Linux on CI.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

contrary to intuition against ifs,

You could have used the ternary operator. Ah Ah.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lemire this seems worse on Linux CI:
nodejs/node#61609 (comment)

                                                confidence improvement accuracy (*)   (**)  (***)
buffers/buffer-hex-decode.js n=1000000 len=1024        ***     -2.03 %       ±0.46% ±0.62% ±0.82%
buffers/buffer-hex-decode.js n=1000000 len=64          ***      5.90 %       ±1.62% ±2.16% ±2.82%
buffers/buffer-hex-encode.js n=1000000 len=1024                -0.15 %       ±0.17% ±0.22% ±0.29%
buffers/buffer-hex-encode.js n=1000000 len=64                   0.45 %       ±1.02% ±1.36% ±1.77%

This is vs main, not vs the no-if impl.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ChALkeR

I think that's GCC doing its thing. Let me see.

src/nbytes.cpp Outdated
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
-1, -1, -1, -1, -1, -1, -1, -1, -1};

inline char nibble(uint8_t x) { return x + '0' + ((x > 9) * 39); }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment to this function and explain what it does?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that if you express it as I did above, it does not require an explanation anymore.

Co-authored-by: Yagiz Nizipli <yagiz@nizipli.com>
@aduh95 aduh95 changed the title src: use arithmetic HexEncode instead of lookups fix: use arithmetic HexEncode instead of lookups Feb 2, 2026
@lemire
Copy link
Member

lemire commented Feb 2, 2026

@ChALkeR

Try

   uint8_t add = (x >= 10) ? ('a' - 10) : '0';
   return x + add;

GCC seems to like it better than what I originally proposed.

@ChALkeR
Copy link
Member Author

ChALkeR commented Feb 2, 2026

@lemire Seems slightly worse on Linux (gcc?) and slightly better on macOS (clang?)
nodejs/node#61609 (comment)

Compared to main:

01:03:52                                                 confidence improvement accuracy (*)   (**)  (***)
01:03:52 buffers/buffer-hex-decode.js n=1000000 len=1024        ***    167.49 %       ±1.45% ±1.94% ±2.56%
01:03:52 buffers/buffer-hex-decode.js n=1000000 len=64          ***     49.47 %       ±1.94% ±2.59% ±3.38%
01:03:52 buffers/buffer-hex-encode.js n=1000000 len=1024                 0.15 %       ±0.17% ±0.22% ±0.29%
01:03:52 buffers/buffer-hex-encode.js n=1000000 len=64                   0.06 %       ±0.99% ±1.32% ±1.72%

@lemire
Copy link
Member

lemire commented Feb 2, 2026

@ChALkeR

Seems slightly worse on Linux...

What I see is that my last proposal is better than your original version under GCC, but slightly worse with LLVM. However, both are autovectorized in a worthwhile manner which would make your PR safe and performant.

Also, the code is easier to understand.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants