Some performance improvements #33
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
created by claude ;-)
I also tested the changes against a large hsbc dataset, same json output for both old and new algorithm.
Summary
This PR delivers performance improvements to the
nac_yamlmodule and refactors unit tests for bettermaintainability.
Performance Optimizations
Optimized YAML file loading and deduplication, reducing execution time by 32-42%.
Real execution time on production dataset:
Profiling results:
deduplicate_list_items: 76.5% faster (30.32s → 7.12s)merge_list_item: 76.9% faster (30.18s → 6.98s)Test Refactoring
Refactored unit tests to use
@pytest.mark.parametrizefor improved readability and maintainability.Changes
Performance (
nac_yaml/yaml.py)yaml.YAML()instance perload_yaml_files()call instead ofcreating a new instance for each file
breakstatements inmerge_list_item()to exit comparison loopsimmediately when a mismatch is found
isinstance(v, (dict, list))instead of multiple ORconditions
Tests (
tests/unit/test_yaml.py)test_merge_dict: Consolidated 9 repetitive test cases into 1 parametrized test with descriptive IDstest_merge_list_item: Consolidated 6 repetitive test cases into 1 parametrized testtest_deduplicate_list_items: Consolidated 3 repetitive test cases into 1 parametrized testBenefits:
test_merge_dict[merge_dicts])Testing
This PR description clearly separates the two main changes (performance and tests) while showing the concrete improvements
in both areas.