Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi Simon,
I’m happy to share that the TpT implementation is now finalized code-wise on my side. This MR introduces a working
TpTDecisionTreeClassifierwith a depth-first builder, compatible with scikit-lexicographical-trees / scikit-longitudinal.References to cite
[1] Valla, M. Time-penalised trees (TpT): introducing a new tree-based data mining algorithm for time-varying covariates. Ann Math Artif Intell 92, 1609–1661 (2024). https://doi.org/10.1007/s10472-024-09950-w
[2] Mathias Valla, Xavier Milhaud. Time-penalized trees: consistency results and simulations. 2025. ⟨hal-05022929⟩ https://cnrs.hal.science/hal-05022929
Scope of this PR (minimal, functional)
TpTDecisionTreeClassifier(classification only in this PR’s target scope).DepthFirstTreeBuilder(no Best-First in this scope).plot_treeadaptation included here; regularsklearn.tree.plot_treeis usable for quick inspection.)Where the code lives
scikit_longitudinal/estimators/trees/TpT/(primary class:
TpTDecisionTreeClassifier, splitter, builder, structs)scikit_longitudinal/estimators/trees/TpT/_preprocessing.pyThis should be moved under or near
LongitudinalDatasetper your design. I’d be grateful if you could drop it into the right place in a temporary branch; I’ll review once moved. It currently handles long → wide only (not the reverse), without TIDAL/polars. It’s a first step toward 💡 Feature Request - From Wide to Long and vice-versa Longitudinal data formatting, inspired from TIDAL #64 but does not fully solve it.Quick example (concise)
That’s intentionally minimal (no non-essential utilities, no external metrics). It just loads data, fits TpT, prints a few structural fields, and plots.
Dependencies
rayfrom the dependencies due to local issues.Please feel free to restore it where appropriate (e.g., as an optional dependency / extra for parallelism or tests).
Ask / next steps
I’ve pushed this as far as I can right now. It would be ideal if you could manage the integration so TpT aligns perfectly with sklong’s patterns (API surface, dataset plumbing via
LongitudinalDataset, docs structure, examples, CI, etc.). I’ll follow up with any fixes you need during review.I’m also working on two additional papers involving TpT and will of course cite scikit-longitudinal as the reference implementation. For any future article specifically about the implementation, I’d be happy to include you as co-author for your guidance and help.
Thanks a lot, and please let me know how you’d like to proceed with the preprocessing relocation!
— Mathias