Skip to content

Conversation

@gabotechs
Copy link
Contributor

Which issue does this PR close?

  • Closes #.

Rationale for this change

Running ./bench.sh run tpcds with a freshly created ./bench.sh data tpcds fails with the following error:

Please prepare TPC-DS data first by following instructions:
  ./bench.sh data tpcds

This PR fixes it

What changes are included in this PR?

Fixes the TPCDS_DIR variable in run_tpcds

Are these changes tested?

just benchmark scripts

Are there any user-facing changes?

no need

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @gabotechs I think it shouldn't be there. by default the script checks for datafusion-benchmarks repo here https://github.com/apache/datafusion-benchmarks/tree/main/tpcds/data/sf1 and there is no tpcds-sf1.

you can specify your own DATA_DIR like

export DATA_DIR=../../datafusion-benchmarks/tpcds/data/sf1/
and then run tpcds benchmarks

@gabotechs
Copy link
Contributor Author

gabotechs commented Jan 12, 2026

🤔 Are you sure? I get the impression that this is why the benchmark run commands are failing

#19761 (comment)

Also, note how the data_tpcds() function counterpart actually has this same line:

https://github.com/apache/datafusion/blob/main/benchmarks/bench.sh#L633

# Downloads TPC-DS data
data_tpcds() {
    TPCDS_DIR="${DATA_DIR}/tpcds_sf1"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants