TPC-DS workload

The workload is based on the TPC-DS documentation, with the queries and table schemas adapted for YDB.

This benchmark generates a workload typical for decision support systems.

Common command options

All commands support the common option --path, which specifies the path to the directory containing benchmark tables in the database:

ydb workload tpcds --path tpcds/s1 ...

Available options

Name Description Default value
--path or -p Path to the directory with tables. /

Initializing the load test

Before running the benchmark, create a table:

ydb workload tpcds --path tpcds/s1 init

See the command description to run the load:

ydb workload tpcds init --help

Available parameters

Name Description Default value
--store <value> Table storage type. Possible values: row, column, external-s3. row
--external-s3-prefix <value> Relevant only for external tables. Root path to the dataset in S3 storage.
--external-s3-endpoint <value> or -e <value> Relevant only for external tables. Link to the S3 bucket with data.
--string Use the String type for text fields. Utf8
--datetime Use for time-related fields of type Date, Datetime, and Timestamp. Date32, Datetime64, Timestamp64
--float-mode <value> Specifies the data type to use for fractional fields. Possible values are float, decimal, and decimal_ydb. float uses the Float type, decimal uses Decimal with dimensions specified by the test standard, and decimal_ydb uses Decimal(22,9) — the only type currently supported by YDB. float
--clear If the table at the specified path already exists, it will be deleted.

Loading data into the table

The data will be generated and loaded into the table directly by YDB CLI:

ydb workload tpcds --path tpcds/s1 import generator --scale 1

See the command description:

ydb workload tpcds import --help

Available options

Name Description Default value
--scale <value> Data scale. Typically, powers of ten are used.
--tables <value> Comma-separated list of tables to generate. Available tables: customer, nation, order_line, part_psupp, region, supplier. All tables
--process-count <value> or -C <value> Specifies the number of processes for parallel data generation. 1
--process-index <value> or -i <value> Specifies the process number when data generation is split into multiple processes. 0
--state <path> Path to the state file for resuming generation. If the generation is interrupted, it will resume from the same point when restarted.
--clear-state Relevant if the --state parameter is specified. Clears the state file and restarts the download from the beginning.

Common parameters of the import command

Name Description Default value
--upload-threads <value> or -t <value> The number of execution threads for data preparation. The number of available cores on the client.
--bulk-size <value> The size of the chunk for sending data, in rows. 10000
--max-in-flight <value> The maximum number of data chunks that can be processed simultaneously. 128

Run the load test

Run the load:

ydb workload tpcds --path tpcds/s1 run

During the benchmark, load statistics are displayed for each request.

See the command description:

ydb workload tpcds run --help

Common parameters for all load types

Name Description Default value
--output <value> The name of the file where the query execution results will be saved. results.out
--iterations <value> The number of times each load query will be executed. 1
--json <name> The name of the file where query execution statistics will be saved in json format. Not saved by default
--ministat <name> The name of the file where query execution statistics will be saved in ministat format. Not saved by default
--plan <name> The name of the file to save the query plan. Files like <name>.<query number>.explain and <name>.<query number>.<iteration number> will be saved in formats: ast, json, svg. Not saved by default
--query-settings <setting> Query execution settings. Each setting is added as a separate line at the beginning of each query. Use multiple times for multiple settings. Not specified by default
--include Query numbers or segments to be executed as part of the load. All queries executed
--exclude Query numbers or segments to be excluded from the load. None excluded by default
--executer Query execution engine. Available values: scan, generic. generic
--verbose or -v Print additional information to the screen during query execution.

TPC-DS-specific options

Name Description Default value
--ext-query-dir <name> Directory with external queries for load execution. Queries should be in files named q[1-99].sql.

Test data cleanup

Run cleanup:

ydb workload tpcds --path tpcds/s1 clean

The command has no parameters.