Skip to content

Reduce overhead of Partitioned hash join #19789

@Dandandan

Description

@Dandandan

What might be a good thing to try:

Currently, a high part of the cost of the Partitioned join is repartitioning the entire build side of the join - which currently copies all of the columns twice(!), this makes this hash join slower if this side is large/wide. We could avoid one copy in RepartitionExec (using arrow-rs coalesce API when fully implemented), but not two.

CollectLeft avoids this cost at the right side, but the building phase is single threaded, which greatly limits the parallelism in the query.

We should be able to do the hash % mod of the build side during the join, avoiding the need for a RepartitionExec passing the indices of the matching partitions to the left sides - which should greatly reduce the overhead of the repartitioning.

When this is implemented, we might want to look at hash_join_single_partition_threshold and hash_join_single_partition_threshold_rows again which could be reduced to make most joins run fully in parallel./

Originally posted by @Dandandan in #19761 (comment)

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions