Skip to content

Conversation

@cruessler
Copy link
Contributor

This PR adds the sha256 feature to gix-commitgraph. I opened it as a draft, so we have a space where we can discuss how to best approach testing this new feature.

This PR was triggered by a comment in #2359.

@Byron
Copy link
Member

Byron commented Jan 13, 2026

Thanks a lot, I am looking forward to sorting this out!

There is also #2378, hopefully done in the next couple of days (it might conflict).

@Byron Byron mentioned this pull request Jan 14, 2026
10 tasks
@Byron
Copy link
Member

Byron commented Jan 14, 2026

Thanks a lot for starting this! This crate is special as it's the first one that parses a file-format that changes depending on the hash kind.

It seems like one commit currently in the gix-object PR would be useful here as well, in case you want to bring it up in its own PR.

@cruessler
Copy link
Contributor Author

Assuming you’re referring to the one adding Kind::all (6d950f0), I’ll extract that addition into its own PR!

Apart from that, my plan is to go through the tests and, for every tests that currently uses sha1, add a corresponding one that uses sha256. Is that the way to go? Are there other things that would need to be done in this PR?

@Byron
Copy link
Member

Byron commented Jan 14, 2026

Assuming you’re referring to the one adding Kind::all (6d950f0), I’ll extract that addition into its own PR!

Yes, that's the one, thanks 🙏.

Apart from that, my plan is to go through the tests and, for every tests that currently uses sha1, add a corresponding one that uses sha256. Is that the way to go? Are there other things that would need to be done in this PR?

Sorry for the confusion, that's the way. The only thing tests have to do is to actively set all hashes using feature toggles, and they should just set sha1/sha256 while they are at it. Once a crate has been converted, the tests should automatically test both hashes, and make sure they are compiled in.

@cruessler cruessler mentioned this pull request Jan 14, 2026
@cruessler cruessler force-pushed the add-sha-256-to-gix-commitgraph branch from bde9c33 to 7bc1409 Compare January 18, 2026 13:53
@cruessler
Copy link
Contributor Author

I added fixtures for SHA256 hashes to gix-testtools and changed gix-commitgraph tests to run for all available hash kinds (those depend on the feature flags passed to gix-commitgraph). This seems to have worked on my machine, but I’m not sure I got the logic in gix-testtools right (any pointers would be much appreciated!). I also added gix-commitgraph to unit-test in justfile, and now I want to see what feedback CI has to give. :-)

@cruessler
Copy link
Contributor Author

I think I’ve now gotten to a point where I need some feedback in order to be able to continue. (CI is still running, so I also might need to address any issue that comes up there.)

Specifically, I’ve got the following questions:

  • The PR runs tests for both hashes through cargo nextest run -p gix-commitgraph --features sha256 --no-fail-fast. Is that correct?
  • Last time CI ran it did not complain about any of the generated archives in gix-commitgraph, but when I delete both generated-archives as well as generated-do-not-edit in order to regenerate the archives, git status shows that most files have changed their size by 512 bytes and that one file has been both deleted and modified. (I’m running cargo test --features sha256 on Ubuntu 25.10.) What’s the best way to debug this?
  • Is there anything else I would need to change or include before I can mark this PR as ready?
# output of `gix status` inside `gix-commitgraph/tests/fixtures`
modified:   generated-archives/generation_number_overflow.tar
modified:   generated-archives/generation_number_overflow_sha256.tar
modified:   generated-archives/octopus_merges.tar
modified:   generated-archives/single_commit.tar
modified:   generated-archives/single_commit_huge_dates.tar
modified:   generated-archives/single_parent.tar
deleted:    generated-archives/single_parent_huge_dates.tar
modified:   generated-archives/two_parents.tar

Copy link
Member

@Byron Byron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is great!

The PR runs tests for both hashes through cargo nextest run -p gix-commitgraph --features sha256 --no-fail-fast. Is that correct?

I made a note in Cargo.toml which should lead to the extra run in the justfile to be removed. Please take a look to see if it makes sense.

Last time CI ran it did not complain about any of the generated archives in gix-commitgraph, but when I delete both generated-archives as well as generated-do-not-edit in order to regenerate the archives, git status shows that most files have changed their size by 512 bytes and that one file has been both deleted and modified. (I’m running cargo test --features sha256 on Ubuntu 25.10.) What’s the best way to debug this?

generated-do-no-edit would be the folder to delete for it to restore from archive or regenerate the local cache in generated-do-not-edit. That's all there is to it. With the sha256 versions showing up, I'd expect it to want to create the sha256 versions which seems to have happened as well. It seems to work from my PoV, but I will take a closer look when reviewing properly.
PS: removing generated-archives will regenerate them, and they always appear changed. That's normal, and one would avoid deleting the tracked files.

Is there anything else I would need to change or include before I can mark this PR as ready?

It looks like it's ready for a non-armchair and final review after this one 🎉.

if hash_kind != gix_hash::Kind::Sha1 {
path = path.join(hash_kind.to_string());
}
path = path.join(format!("{}-{}", script_identity, family_name()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it would be more cluttered, I seem to prefer the idea of not nesting hash_kind, but add it to the format string here.

In future, there will be even more of these as another parameter, the one for the refs backend, is added.

Alternatively… what do you think about looking at both parameters and using them for nesting?

path.join(hash_kind).join(refs_kind)

The refs_kind is nothing that exists now, but it would be added at some point.

Yes, that seems like the way to go.

So let's make a mild breaking change and always add the hash_kind to the path via join.

.env_remove("GIT_WORK_TREE")
.env_remove("GIT_COMMON_DIR")
.env_remove("GIT_ASKPASS")
.env_remove("GIT_DEFAULT_HASH")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the remove here isn't needed if we are always overriding. If there is something special, then let's keep this and document it though.

"overflow happened, can't represent huge dates"
);
assert_eq!(
for hash_kind in gix_hash::Kind::all() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could all() rather return an iterator that hands out owned versions of hash_kind? I think that should work with impl Iterator<Item = Kind>.


[features]
## Support for SHA256 hashes and digests.
sha256 = ["gix-hash/sha256"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'd want to use dev-dependencies to always add sha256 to the features when testing. That way, there is only one way to run tests: all of them.

@Byron
Copy link
Member

Byron commented Jan 22, 2026

As I am looking at Git for how they tackle this I wonder this: Should we, rather than changing every test and looping over it, prefer to build with all hashes, but run tests for each hash selectively, maybe by setting an environment variable that controls which hash is under test?

Performance-wise it should be very similar, even though development is a bit less convenient as one will have to run twice with different flags or environment variables set. There, having the tests run everything automatically, always, and what we have now, seems better.

Also, how do we handle expectations on hashes, particularly those with insta? It basically requires to chose one hash and find/replace the actual output to match that. I.e. insta::assert_debug_snapshot(hash.norm(actual), @r"expected with sha1"). And that remapping table, where would that come from? Is it passed in per function, or is it global? It can be global, and maybe that's how Git does it? I never checked, but it would be interesting to know.

So I think it's fine to keep this PR as is as it's happy with this simple and convenient solution, and we can probably layer moer on top of that when a need arises.

@cruessler
Copy link
Contributor Author

I’m glad you mention this other option because I’m still very hesitant to force every test to be rewritten to make it loop over hash kinds as well. What I could try instead is adding an environment variable, e. g. GIX_TEST_HASH, that defaults to sha1 if not set, but can be changed to sha256. Any place that wants code to work regardless of hash kind wouldn’t have to bother, but could rely on CI as well as fixture script creation to take care of running with all hash kinds. I’ll try that approach in this PR!

@cruessler
Copy link
Contributor Author

I’ve added a dev-depencency on gix-hash that activates the feature sha256, but I suspect that’s the reason why a couple of tests related to the size of objects now fail. At first sight, it looks like the feature is now enabled for every cargo nextest run. Apart from that, I think using an environment variable to set the hash per cargo nextest run works quite well.

@Byron
Copy link
Member

Byron commented Jan 26, 2026

I’m glad you mention this other option because I’m still very hesitant to force every test to be rewritten to make it loop over hash kinds as well. What I could try instead is adding an environment variable, e. g. GIX_TEST_HASH, that defaults to sha1 if not set, but can be changed to sha256. Any place that wants code to work regardless of hash kind wouldn’t have to bother, but could rely on CI as well as fixture script creation to take care of running with all hash kinds. I’ll try that approach in this PR!

That's fair, and probably that's best whenever there is no actual assertions against a hash.
When there is such assertions, one will have to touch these tests to have some sort of mapping, i.e. like with translations the test mentions only one kind of hash, probably SHA1, but it auto-translates to SHA256 when that's run.
Doing this ergonomically probably requires globals so one doesn't have to pass the hash kind in everywhere.

The manual looping is always something one can implement if that's a good choice in specific cases.

With such a system, we'd need just one more job to run the SHA256 version of the test suite, which just sets an environment variable.

I’ve added a dev-depencency on gix-hash that activates the feature sha256, but I suspect that’s the reason why a couple of tests related to the size of objects now fail. At first sight, it looks like the feature is now enabled for every cargo nextest run. Apart from that, I think using an environment variable to set the hash per cargo nextest run works quite well.

Oh, right, the nextest builds everything at once so has only one dependency tree for resolving feature flags.

If [dev-dependencies] would instead pull in gix-hash to enable one or more hashes (probably always all of them), then there would be no need for duplicating the hash related cargo features in the crate that uses gix-hash, and instead one would have to tell plumbing users to depend on gix-hash just to select hash features. This is what gix-features is there for, and maybe the hash selection could be added there as well.

In any case, it seems we'd have to find a way to get CI green. Something that I'd not want to lose is to see what the best possible size is, i.e. if just one hash is supported and that is the smallest hash. But then one would have to test the individual crate just with one hash kind which may also be difficult in future. So… maybe just change the size expectations to the new value, while also using [dev-dependencies] to enforce building with dual-hash support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants