TPC-H workload

The workload is based on the TPC-H documentation, with the queries and table schemas adapted for YDB.

The benchmark generates a workload typical for decision support systems.

Common command options

All commands support the common --path option, which specifies the path to the directory containing tables in the database:

ydb workload tpch --path tpch/s1 ...

Available options

Name Description Default value
--path or -p Path to the directory with tables. /

Initializing a load test

Before running the benchmark, create a table:

ydb workload tpch --path tpch/s1 init

See the command description:

ydb workload tpch init --help

Available parameters

Name Description Default value
--store <value> Table storage type. Possible values: row, column, external-s3. row
--external-s3-prefix <value> Relevant only for external tables. Root path to the dataset in S3 storage.
--external-s3-endpoint <value> or -e <value> Relevant only for external tables. Link to the S3 bucket with data.
--string Use the String type for text fields. Utf8
--datetime Use for time-related fields of type Date, Datetime, and Timestamp. Date32, Datetime64, Timestamp64
--float-mode <value> Specifies the data type to use for fractional fields. Possible values are float, decimal, and decimal_ydb. float uses the Float type, decimal uses Decimal with dimensions specified by the test standard, and decimal_ydb uses Decimal(22,9) — the only type currently supported by YDB. float
--clear If the table at the specified path already exists, it will be deleted.

Loading data into a table

The data will be generated and loaded into a table directly by ydb:

ydb workload tpch --path tpch/s1 import generator --scale 1

See the command description:

ydb workload tpch import --help

Available options

Name Description Default value
--scale <value> Data scale. Powers of ten are usually used.
--tables <value> Comma-separated list of tables to generate. Available tables: customer, nation, order_line, part_psupp, region, supplier. All tables
--proccess-count <value> or -C <value> Data generation can be split into several processes, this parameter specifies the number of processes. 1
--proccess-index <value> or -i <value> Data generation can be split into several processes, this parameter specifies the process number. 0
--state <path> Path to the generation state file. If the generation was interrupted for some reason, the download will be continued from the same place when it is started again.
--clear-state Relevant if the --state parameter is specified. Clear the state file and start the download from the beginning.

Common parameters of the import command

Name Description Default value
--upload-threads <value> or -t <value> The number of execution threads for data preparation. The number of available cores on the client.
--bulk-size <value> The size of the chunk for sending data, in rows. 10000
--max-in-flight <value> The maximum number of data chunks that can be processed simultaneously. 128

Run the load test

Run the load:

ydb workload tpch --path tpch/s1 run

During the test, load statistics are displayed for each request.

See the command description:

ydb workload tpch run --help

Common parameters for all load types

Name Description Default value
--output <value> The name of the file where the query execution results will be saved. results.out
--iterations <value> The number of times each load query will be executed. 1
--json <name> The name of the file where query execution statistics will be saved in json format. Not saved by default
--ministat <name> The name of the file where query execution statistics will be saved in ministat format. Not saved by default
--plan <name> The name of the file to save the query plan. Files like <name>.<query number>.explain and <name>.<query number>.<iteration number> will be saved in formats: ast, json, svg. Not saved by default
--query-settings <setting> Query execution settings. Each setting is added as a separate line at the beginning of each query. Use multiple times for multiple settings. Not specified by default
--include Query numbers or segments to be executed as part of the load. All queries executed
--exclude Query numbers or segments to be excluded from the load. None excluded by default
--executer Query execution engine. Available values: scan, generic. generic
--verbose or -v Print additional information to the screen during query execution.

TPC-H-specific options

Name Description Default value
--ext-query-dir <name> Directory with external queries for load execution. Queries should be in files named q[1-23].sql.

Test data cleaning

Run cleaning:

ydb workload tpch --path tpch/s1 clean

The command has no parameters.