-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
What might be a good thing to try:
Currently, a high part of the cost of the Partitioned join is repartitioning the entire build side of the join - which currently copies all of the columns twice(!), this makes this hash join slower if this side is large/wide. We could avoid one copy in RepartitionExec (using arrow-rs coalesce API when fully implemented), but not two.
CollectLeft avoids this cost at the right side, but the building phase is single threaded, which greatly limits the parallelism in the query.
We should be able to do the hash % mod of the build side during the join, avoiding the need for a RepartitionExec passing the indices of the matching partitions to the left sides - which should greatly reduce the overhead of the repartitioning.
When this is implemented, we might want to look at hash_join_single_partition_threshold and hash_join_single_partition_threshold_rows again which could be reduced to make most joins run fully in parallel./
Originally posted by @Dandandan in #19761 (comment)