-
Notifications
You must be signed in to change notification settings - Fork 93
AutoPopulate 2.0: Per-table job management with enhanced status tracking #1303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
AutoPopulate 1.0 spec (specs/autopopulate-1.0.md): - Documents legacy system for reference - Key source generation and jobs_to_do computation - Schema-level ~jobs table with hash-based keys - Job reservation flow (reserve/complete/error/ignore) - Make method invocation (regular and generator patterns) - Transaction management and error handling - Limitations addressed by 2.0 (linked to GitHub issues) AutoPopulate 2.0 spec (docs/src/compute/autopopulate2.0-spec.md): - Per-table ~table__jobs with FK-derived primary keys - FK-only PK constraint for new tables (legacy supported) - Extended status: pending, reserved, success, error, ignore - Priority (uint8) and scheduled_time for job ordering - Duration tracking (float64) and version field - refresh() with stale_timeout and orphan_timeout - Deprecated: order, limit, keys parameters - reserve_jobs=False falls back to 1.0 behavior - Config sets defaults, parameters override Design decisions documented: - No target property (populate always populates self) - max_calls total across all processes - Ignore jobs permanent until manual delete - Success jobs re-pended if key in key_source but not in table Related: #1258, #1203, #749, #873, #665 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implementation plan (specs/autopopulate-2.0-implementation.md): - JobsTable class with ~~ prefix naming convention - FK-only PK constraint for new tables, legacy support - Two execution modes: direct (default) and distributed - AutoPopulate mixin updates (jobs property, populate paths) - Schema.jobs returning list of JobsTable objects - Configuration options and testing strategy Spec updates (docs/src/compute/autopopulate2.0-spec.md): - Changed table naming from ~table__jobs to ~~table - Removed default value from priority (set by refresh()) - Priority default controlled by config['jobs.default_priority'] 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Job class (src/datajoint/jobs.py): - Per-table job queue with ~~ prefix naming convention - Removed legacy JobTable class - FK-derived primary key extraction from target table - Status filter properties: pending, reserved, errors, ignored, completed - Core methods: refresh(), reserve(), complete(), error(), ignore(), progress() - Uses update1() for status transitions - Timestamps with millisecond precision (timestamp(3)) schema.jobs property (src/datajoint/schemas.py): - Now returns list of Job objects instead of single JobTable - Only returns Job for tables where both target and ~~job table exist - Job tables created lazily on first access to table.jobs Jobs configuration (src/datajoint/settings.py): - JobsSettings class with auto_refresh, keep_completed, stale_timeout, default_priority, and version_method options 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add jobs property to AutoPopulate for per-table Job access - Add _declare_check hook pattern for table validation - Implement FK-only PK constraint for Computed/Imported tables - Split populate() into _populate_direct() and _populate_distributed() - Update _populate1() to use new Job API with duration tracking - Remove deprecated parameters (keys, order, limit) - Add new parameters (priority, refresh) for distributed mode - Remove leftover swap file 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Job management changes: - Use dependency graph to identify FK-derived PK attributes (not lineage) - Fix semantic_check conflict in Job.refresh() operations - Fix SQL escaping for LIKE '~~%%' pattern - Add allow_new_pk_fields_in_computed_tables config option Schema changes: - Update schema.jobs to return list of Job objects for existing tables Test updates: - Update conftest.py fixtures to use new schema.jobs list API - Enable allow_new_pk_fields_in_computed_tables for legacy test tables - Rewrite test_autopopulate.py for new populate() signature - Rewrite test_jobs.py for per-table Job API 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Replace int/smallint/tinyint with int32/int16/uint8 - Replace float/double with float32/float64 - Replace timestamp with datetime - Convert Auto from autoincrement to explicit Lookup values - Update fixture teardown to check schema.exists - Update test_alter_part regex to handle type comments 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add fractional seconds precision support to datetime core type - Replace timestamp(3) with datetime(3) in Job table definition - Eliminates native type warnings for job table timestamps 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add config.jobs.add_job_metadata setting for hidden metadata columns - Add _job_start_time, _job_duration, _job_version to Computed/Imported tables - Replace NATURAL JOIN with explicit USING clause to exclude hidden attributes - Hidden attributes (prefixed with _) excluded from all binary operators - Add subquery requirement when joining multi-table expressions - Update jobs.py version field from varchar(255) to varchar(64) - Add tests for hidden job metadata feature 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove add_hidden_timestamp config setting (replaced by job metadata) - Remove hash-based timestamp column generation from declare.py - Remove unused sha1 import - Remove unused test fixtures: monkeysession, monkeymodule, enable_adapted_types - Clean up empty Utility Fixtures section in conftest.py Job metadata feature (config.jobs.add_job_metadata) remains for computed table provenance tracking. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add add_job_metadata_columns() migration utility to migrate.py - Adds hidden columns to existing Computed/Imported tables - Supports single tables or entire schemas - Dry-run mode for previewing changes - Optimize AutoPopulate.progress() with single aggregation query - Uses LEFT JOIN with COUNT(DISTINCT) for efficiency - Handles 1:many relationships correctly - Falls back to two-query method when no common attributes - Remove target property from AutoPopulate (always uses self) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Allow passing transaction, safemode, and force_masters kwargs to Part.delete() so users can nest Part deletions within larger transactions. Fixes #1276 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Member
Author
|
This PR also fixes #1276 - Part.delete now passes through kwargs (transaction, safemode, force_masters) to Table.delete, allowing Part deletions to be nested within larger transactions. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
breaking
Not backward compatible changes
documentation
Issues related to documentation
enhancement
Indicates new improvements
feature
Indicates new features
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR implements AutoPopulate 2.0, a complete redesign of the job handling system for distributed computing workflows. It addresses the scalability limitations identified in #1258 and resolves the confusing
limitvsmax_callsbehavior reported in #1203.Key Changes
Per-table Job Management
dj.Computed/dj.Importedtable gets its own hidden jobs table (~~table_name)pending,reserved,success,error,ignoreMyTable.jobsinstead of schema-levelschema.jobsJob Class API
jobs.refresh()- Sync job queue with key_sourcejobs.reserve()/jobs.complete()/jobs.error()- Status transitionsjobs.pending/jobs.reserved/jobs.errors/jobs.completed- Query propertiesjobs.progress()- Status breakdown for dashboardsFK-only Primary Key Validation
Hidden Job Metadata
_job_start_time,_job_duration,_job_versioncolumns for computed tablesconfig.jobs.add_job_metadataNATURAL JOIN → USING Clause
_) are excluded from join matchingSemantic Matching for Joins
~lineagetableschema.lineageproperty for viewing all lineagesOptimized
progress()MethodMigration Utility
add_job_metadata_columns()function to retrofit existing tablesBreaking Changes
config.add_hidden_timestampsetting (see rationale below)targetproperty from AutoPopulate (always usesself)limitandorderparameters inpopulate()(usemax_callsandpriority)Rationale: Deprecating
add_hidden_timestampThe old
add_hidden_timestampfeature has been removed for several reasons:Hash-based naming obsolete: Used
_<sha1_hash>_timestampto avoid NATURAL JOIN collisions. With the switch to USING clauses, hidden attributes are automatically excluded from joins, making hash-based naming unnecessary.Not modern best practice: General insert/update timestamps on all tables should be handled by server-side database auditing features (MySQL Enterprise Audit, MariaDB Audit Plugin, binary logs) rather than application-level hidden columns. Server-side auditing:
Job metadata is the right use case: Hidden attributes are appropriate for computation provenance (
_job_start_time,_job_duration,_job_version) which is tightly coupled to computed data and useful for reproducibility.Specification Documents
This PR includes comprehensive design specifications:
Configuration
New settings under
config.jobs:Related Issues & PRs
Closes #1258 - FEAT: Autopopulate 2.0
Addresses #1203 - Confusing
limitvsmax_callsbehavior (deprecatedlimit)Base branch PRs (must be merged first):
Related PR:
Test Plan
🤖 Generated with Claude Code