Skip to content

Conversation

@dimitri-yatsenko
Copy link
Member

@dimitri-yatsenko dimitri-yatsenko commented Jan 3, 2026

Summary

This PR implements intelligent primary key determination for join operations based on functional dependencies between operands, building on the semantic matching foundation.

The Core Concept: Functional Dependencies

When joining two expressions A and B, the result's primary key depends on whether one operand determines the other:

A → B (A determines B): Every attribute in B's primary key exists in A.

This relationship tells us that knowing A's primary key is sufficient to identify B's entities through A's structure.

Primary Key Rules

Condition Result PK Attribute Order
A → B PK(A) A's attributes first
B → A (not A → B) PK(B) B's attributes first
Both directions PK(A) Left operand preference
Neither PK(A) ∪ PK(B) Union of both

Example: Session/Trial Pattern

Session: {session_id*}
Trial: {session_id*, trial_num*}
  • Session → Trial? No (trial_num not in Session)
  • Trial → Session? Yes (session_id in Trial)
  • Result: Session * Trial has PK = {session_id, trial_num} with Trial's attributes first

Integration with Semantic Matching

PK determination is applied after semantic compatibility is verified:

  1. assert_join_compatibility() ensures all namesakes are homologous
  2. The "determines" relationship is computed using attribute names
  3. Left join validation (if applicable)

See the full specification: docs/src/design/pk-rules-spec.md

Left Join Constraint

Left joins now require A → B to ensure the result's PK can't have NULL values:

# Valid: Item → Topic (topic_id is in Item)
Item.join(Topic, left=True)

# Invalid: Topic ↛ Item (item_id not in Topic)
Topic.join(Item, left=True)  # Raises DataJointError

The extend() Operation

When A → B, a left join is conceptually not a join at all—it's closer to projection:

  • It adds new attributes to A (like A.proj(..., new_attr=...))
  • It preserves all rows of A
  • It preserves A's primary key
  • It lacks the Cartesian product aspect that defines joins

DataJoint provides an explicit extend() method for this pattern:

# These are equivalent when A → B:
A.join(B, left=True)
A.extend(B)           # clearer intent: extend A with B's attributes

Example:

# Trial has session_id in its PK, so Trial → Session
Trial.extend(Session)  # Adds 'date' from Session to each Trial

# Session does NOT determine Trial (trial_num not in Session)
Session.extend(Trial)  # Raises DataJointError

The extend() method:

  • Requires A → B (raises DataJointError otherwise)
  • Does not expose allow_nullable_pk (that's an internal mechanism)
  • Expresses the semantic intent: "add B's attributes to A's entities"

Changes

  • Heading.determines() - Check if A → B
  • Heading.join() - Apply PK rules based on functional dependencies
  • QueryExpression.join() - Add left join constraint with allow_nullable_pk bypass
  • QueryExpression.extend() - Semantic alias for join(left=True)
  • Aggregation.create() - Validate group → groupby requirement
  • U.aggr() - Rewritten to work without join (U.join removed)

Test plan

  • Unit tests for Heading.determines() (7 tests)
  • Unit tests for join PK determination (7 tests)
  • Unit tests for attribute ordering (2 tests)
  • Integration tests for left join constraint
  • Integration tests for extend() (valid and invalid cases)
  • All existing tests pass (511 passed, 2 skipped)

🤖 Generated with Claude Code

Add functional dependency-based PK determination for joins:
- A → B: PK = PK(A), A's attributes first
- B → A (not A → B): PK = PK(B), B's attributes first
- Neither: PK = union of both PKs

Key changes:
- Add Heading.determines() method to check A → B relationship
- Update Heading.join() to apply PK rules based on functional dependencies
- Add left join constraint requiring A → B (with allow_nullable_pk bypass)
- Update Aggregation.create() to validate group → groupby requirement
- Remove U.join() and rewrite U.aggr() to work without join
- Add pk-rules-spec.md with semantic matching integration

Tests: 509 passed (Python 3.12), 506 passed (Python 3.10)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@github-actions github-actions bot added enhancement Indicates new improvements documentation Issues related to documentation labels Jan 3, 2026
A.extend(B) is equivalent to A.join(B, left=True) but expresses clearer
intent: extending an entity set with additional attributes rather than
combining two entity sets.

- Add extend() method to QueryExpression
- Add 'extend' to supported_class_attrs for class-level access
- Update pk-rules-spec.md to document extend as actual API
- Add integration tests for valid and invalid extend cases

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@dimitri-yatsenko dimitri-yatsenko added breaking Not backward compatible changes feature Indicates new features labels Jan 3, 2026
@dimitri-yatsenko dimitri-yatsenko added this to the DataJoint 2.0 milestone Jan 3, 2026
@dimitri-yatsenko dimitri-yatsenko changed the title Implement primary key rules for join operations Implement primary key rules for joins Jan 4, 2026
Base automatically changed from claude/semantic-match to pre/v2.0 January 7, 2026 14:56
@dimitri-yatsenko dimitri-yatsenko merged commit 42441f5 into pre/v2.0 Jan 7, 2026
4 of 5 checks passed
@dimitri-yatsenko dimitri-yatsenko deleted the claude/pk-rules branch January 7, 2026 14:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking Not backward compatible changes documentation Issues related to documentation enhancement Indicates new improvements feature Indicates new features

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants