Skip to content

Conversation

@PimSanders
Copy link
Contributor

This PR close #16 by expanding the data validation capabilities in SQLite3.

The SQLite3 WAL file can store multiple versions of the same frame, when reading only valid frames should be returned. The docs define a valid frame as follows:

A frame is considered valid if and only if the following conditions are true:

  1. The salt-1 and salt-2 values in the frame-header match salt values in the wal-header
  2. The checksum values in the final 8 bytes of the frame-header exactly match the checksum computed consecutively on the first 24 bytes of the WAL header and the first 8 bytes and the content of all frames up to and including the current frame.

The first check was already implemented, I have interpreted the second check as:

The checksum values in the final 8 bytes of the frame-header (checksum-1 and checksum-2) exactly match the computed checksum over:

  1. the first 24 bytes of the WAL header
  2. the first 8 bytes of each frame header (up to and including this frame)
  3. the page data of each frame (up to and including this frame)

When initializing a database the option validate_checksum can be passed to use the new validation. I have chosen to only calculate the salts by default (just like before) as this will probably be good enough, and a lot faster. See the example below for the time impact:

In [1]: %timeit -n10 list(list(sqlite3.SQLite3(Path("./big.sqlite"), Path("./big.sqlite-wal"), validate_checksum=False).tables())[0].rows())
33 ms ± 1.06 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [2]: %timeit -n10 list(list(sqlite3.SQLite3(Path("./big.sqlite"), Path("./big.sqlite-wal"), validate_checksum=True).tables())[0].rows())
1.05 s ± 4.88 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

@codecov
Copy link

codecov bot commented Feb 2, 2026

Codecov Report

❌ Patch coverage is 0% with 38 lines in your changes missing coverage. Please review.
✅ Project coverage is 0.00%. Comparing base (64ae6d8) to head (0e3eeb6).

Files with missing lines Patch % Lines
dissect/database/sqlite3/wal.py 0.00% 36 Missing ⚠️
dissect/database/sqlite3/sqlite3.py 0.00% 2 Missing ⚠️
Additional details and impacted files
@@          Coverage Diff          @@
##            main     #23   +/-   ##
=====================================
  Coverage   0.00%   0.00%           
=====================================
  Files        146     146           
  Lines       3881    3914   +33     
=====================================
- Misses      3881    3914   +33     
Flag Coverage Δ
unittests 0.00% <0.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@codspeed-hq
Copy link

codspeed-hq bot commented Feb 2, 2026

Merging this PR will not alter performance

✅ 6 untouched benchmarks


Comparing PimSanders:improvement/expand-wal-validation (0e3eeb6) with main (64ae6d8)

Open in CodSpeed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Expand SQLite3 data validation when reading from WAL

1 participant