Running parametrized YQL queries and scripts
Overview
YDB CLI can execute parameterized YQL queries. To use parameters you need to declare them using the YQL DECLARE
command in your YQL query text.
To run parameterized YQL queries you can use the following YDB CLI commands:
These commands support the same query parametrization options. Parameter values can be set on the command line, uploaded from JSON files, and read from stdin
in binary or JSON format. On stdin
you can stream multiple parameter values triggering multiple YQL query executions with batching options.
Warning
Among the above commands, only the table query execute
applies retry policies. Such policies ensure reliable query execution and continuity when certain data ranges are unavailable for a short time because of partition changes or other regular processes in a distributed database.
Executing a single YQL query
To provide parameters for a YQL query execution, you can use command line, JSON files, and stdin
, using the following YDB CLI options:
Name | Description |
---|---|
-p, --param |
An expression in the format $name=value , where $name is the name of the YQL query parameter and value is its value (a correct JSON value). The option can be specified repeatedly.All the specified parameters must be declared in the YQL query by the DECLARE operator; otherwise, you will get an error "Query does not contain parameter". If you specify the same parameter several times, you will get an error "Parameter value found in more than one source". Depending on your operating system, you might need to escape the $ character or enclose your expression in single quotes (' ). |
--param-file |
Name of a file in JSON format in UTF-8 encoding that contains parameter values matched against the YQL query parameters by key names. The option can be specified repeatedly. If values of the same parameter are found in multiple files or set by the --param command line option, you'll get an error "Parameter value found in more than one source".Names of keys in the JSON file are expected without the leading $ sign. Keys that are present in the file but aren't declared in the YQL query will be ignored without an error message. |
--input-format |
Format of parameter values, applied to all sources of parameters (command line, file, or stdin ).Available options: |
--stdin-format |
Format of parameter values for stdin .The YDB CLI automatically detects that a file or an output of another shell command has been redirected to the standard input device stdin . In this case, the CLI interprets the incoming data based on the following available options:
stdin isn't specified, the --input-format is used. |
--stdin-par |
Name of a parameter whose value is provided on stdin , without a $ sign. This name is required when you use the raw format in --stdin-format .When used with JSON formats, stdin is interpreted not as a JSON document but as a JSON value passed to the parameter with the specified name. |
The query will be executed on the server once, provided that values are specified for all the parameters in the DECLARE
clause. If a value is absent for at least one parameter, the command fails with the "Missing value for parameter" message.
Examples
In our examples, we use the table query execute
command, but you can also run them using the yql
and scripting yql
commands.
Note
The examples use the quickstart
profile. To learn more, see Creating a profile to connect to a test database.
Passing the value of a single parameter
From the command line:
ydb -p quickstart table query execute -q 'declare $a as Int64;select $a' --param '$a=10'
Using a file:
echo '{"a":10}' > p1.json
ydb -p quickstart table query execute -q 'declare $a as Int64;select $a' --param-file p1.json
Through stdin
:
echo '{"a":10}' | ydb -p quickstart table query execute -q 'declare $a as Int64;select $a'
echo '10' | ydb -p quickstart table query execute -q 'declare $a as Int64;select $a' --stdin-par a
Passing the values of parameters of different types from multiple sources
echo '{ "a":10, "b":"Some text", "x":"Ignore me" }' > p1.json
echo '{ "c":"2012-04-23T18:25:43.511Z" }' | ydb -p quickstart table query execute \
-q 'declare $a as Int64;
declare $b as Utf8;
declare $c as DateTime;
declare $d as Int64;
select $a, $b, $c, $d' \
--param-file p1.json \
--param '$d=30'
Command output:
┌─────────┬─────────────┬────────────────────────┬─────────┐
| column0 | column1 | column2 | column3 |
├─────────┼─────────────┼────────────────────────┼─────────┤
| 10 | "Some text" | "2012-04-23T18:25:43Z" | 30 |
└─────────┴─────────────┴────────────────────────┴─────────┘
Passing Base64-encoded binary strings
ydb -p quickstart table query execute \
-q 'DECLARE $a AS String;
SELECT $a' \
--input-format json-base64 \
--param '$a="SGVsbG8sIHdvcmxkCg=="'
Command output:
┌──────────────────┐
| column0 |
├──────────────────┤
| "Hello, world\n" |
└──────────────────┘
Passing binary content directly
curl -Ls http://ydb.tech/docs | ydb -p quickstart table query execute \
-q 'DECLARE $a AS String;
SELECT LEN($a)' \
--stdin-format raw \
--stdin-par a
Command output (exact number of bytes may vary):
┌─────────┐
| column0 |
├─────────┤
| 66426 |
└─────────┘
Iterative streaming processing
YDB CLI supports execution of a YQL query multiple times with different sets of parameter values provided on stdin
. In this case, the database connection is established once and the query execution plan is cached. This substantially increases the performance of such an approach compared to separate CLI calls.
To use this feature, you need to stream different sets of the same parameters to stdin
one after another, specifying a rule for the YDB CLI on how to separate the sets from each other.
The YQL query runs as many times as many parameter value sets received on stdin
. Each set received on stdin
is joined with the parameter values defined on other sources (--param
, --param-file
). The command will complete once the stdin
stream is closed. Each query is executed within a dedicated transaction.
A rule for separating parameter sets from one another (framing) complements the stdin
format specified by the --stdin-format
option:
Name | Description |
---|---|
--stdin-format |
Defines the stdin framing. Available options:
|
Warning
When using a newline character as a separator between the parameter sets, make sure that it isn't used inside the parameter sets. Putting some text value in quotes does not enable newlines within the text. Multiline JSON documents are not allowed.
Example
Streaming processing of multiple parameter sets
Suppose you need to run your query thrice, with the following sets of values for the a
and b
parameters:
a
= 10,b
= 20a
= 15,b
= 25a
= 35,b
= 48
Let's create a file that includes lines with JSON representations of these sets:
echo -e '{"a":10,"b":20}\n{"a":15,"b":25}\n{"a":35,"b":48}' > par1.txt
cat par1.txt
Command output:
{"a":10,"b":20}
{"a":15,"b":25}
{"a":35,"b":48}
Let's execute the query by passing the content of this file to stdin
, formatting the output as JSON:
cat par1.txt | \
ydb -p quickstart table query execute \
-q 'DECLARE $a AS Int64;
DECLARE $b AS Int64;
SELECT $a + $b' \
--stdin-format newline-delimited \
--format json-unicode
Command output:
{"column0":30}
{"column0":40}
{"column0":83}
This output can be passed as input to the next YQL query command.
Streaming processing with joining parameter values from different sources
For example, you need to run your query thrice, with the following sets of values for the a
and b
parameters:
a
= 10,b
= 100a
= 15,b
= 100a
= 35,b
= 100
echo -e '10\n15\n35' | \
ydb -p quickstart table query execute \
-q 'DECLARE $a AS Int64;
DECLARE $b AS Int64;
SELECT $a + $b AS sum1' \
--param '$b=100' \
--stdin-format newline-delimited \
--stdin-par a \
--format json-unicode
Command output:
{"sum1":110}
{"sum1":115}
{"sum1":135}
Batched streaming processing
The YDB CLI supports automatic conversion of multiple consecutive parameter sets to a List<>
, enabling you to process them in a single request and transaction. As a result, you can have a substantial performance gain compared to one-by-one query processing.
Two batch modes are supported:
- Full
- Adaptive
Full batch mode
The full
mode is a simplified batch mode where the query runs only once, and all the parameter sets received through stdin
are wrapped into a List<>
. If the request is too large, you will get an error.
Use this batch mode when you want to ensure transaction atomicity by applying all the parameters within a single transaction.
Adaptive batch mode
In the adaptive
mode, the input stream is split into multiple transactions, with the batch size automatically determined for each of them.
In this mode, you can process a broad range of dynamic workloads with unpredictable or infinite amounts of data, as well as with unpredictable or significantly varying rate of new sets appearance at the input. For example, such a profile is typical when sending the output of another command to stdin
using the |
operator.
The adaptive mode solves two basic issues of dynamic stream processing:
- Limiting the maximum batch size.
- Limiting the maximum data processing delay.
Syntax
To use the batching capbilities, define the List<...>
or List<Struct<...>>
parameter in the YQL query's DECLARE clause, and use the following options:
Name | Description |
---|---|
--batch |
The batch mode applied to parameter sets on stdin .Available options:
|
In the adaptive batch mode, you can use the following additional parameters:
Name | Description |
---|---|
--batch-limit |
The maximum number of sets of parameters per batch in the adaptive batch mode. The next batch will be sent to the YQL query if the number of parameter sets in it reaches the specified limit. When it's 0 , there's no limit.Default value: 1000 .Parameter values are sent to each YQL execution without streaming, so the total size per GRPC request that includes the parameter values has the upper limit of about 5 MB. |
--batch-max-delay |
The maximum delay to submit a received parameter set for processing in the adaptive batch mode. It's set as a number with a time unit - s , ms , m .Default value: 1s (1 second).The YDB CLI starts a timer when it receives a first set of parameters for the batch on stdin , and sends the whole accumulated batch for execution once the timer expires. With this parameter, you can batch efficiently when new parameter sets arrival rate on stdin is unpredictable. |
Examples: Full batch processing
echo -e '{"a":10,"b":20}\n{"a":15,"b":25}\n{"a":35,"b":48}' | \
ydb -p quickstart table query execute \
-q 'DECLARE $x AS List<Struct<a:Int64,b:Int64>>;
SELECT ListLength($x), $x' \
--stdin-format newline-delimited \
--stdin-par x \
--batch full
Command output:
┌─────────┬───────────────────────────────────────────────────┐
| column0 | column1 |
├─────────┼───────────────────────────────────────────────────┤
| 3 | [{"a":10,"b":20},{"a":15,"b":25},{"a":35,"b":48}] |
└─────────┴───────────────────────────────────────────────────┘
Examples: Adaptive batch processing
Limiting the maximum data processing delay
This example demonstrates the adaptive batching triggered by a processing delay. In the first line of the command below, we generate 1,000 rows at a delay of 0.2 seconds on stdout
and pipe them to stdin
to the YQL query execution command. The YQL query execution command shows the parameter batches in each subsequent YQL query call.
for i in $(seq 1 1000);do echo "Line$i";sleep 0.2;done | \
ydb -p quickstart table query execute \
-q 'DECLARE $x AS List<Utf8>;
SELECT ListLength($x), $x' \
--stdin-format newline-delimited \
--stdin-format raw \
--stdin-par x \
--batch adaptive
Command output (actual values may differ):
┌─────────┬────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
| column0 | column1 |
├─────────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
| 14 | ["Line1","Line2","Line3","Line4","Line5","Line6","Line7","Line8","Line9","Line10","Line11","Line12","Line13","Line14"] |
└─────────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
┌─────────┬─────────────────────────────────────────────────────────┐
| column0 | column1 |
├─────────┼─────────────────────────────────────────────────────────┤
| 6 | ["Line15","Line16","Line17","Line18","Line19","Line20"] |
└─────────┴─────────────────────────────────────────────────────────┘
┌─────────┬─────────────────────────────────────────────────────────┐
| column0 | column1 |
├─────────┼─────────────────────────────────────────────────────────┤
| 6 | ["Line21","Line22","Line23","Line24","Line25","Line26"] |
└─────────┴─────────────────────────────────────────────────────────┘
^C
The first batch includes all the rows accumulated at the input while the database connection has had been establishing, that's why it's larger than the next ones.
You can terminate the command by Ctrl+C or wait 200 seconds until the input generation is finished.
Limit on the number of records
This example demonstrates the adaptive batching triggered by a number of parameter sets. In the first line of the command below, we generate 200 rows. The command will show parameter batches in each subsequent YQL query call, applying the given limit --batch-limit
of 20 (the default limit is 1,000).
In this example, we also demonstrate the option to join parameters from different sources and generate JSON at the output.
for i in $(seq 1 200);do echo "Line$i";done | \
ydb -p quickstart table query execute \
-q 'DECLARE $x AS List<Utf8>;
DECLARE $p2 AS Int64;
SELECT ListLength($x) AS count, $p2 AS p2, $x AS items' \
--stdin-format newline-delimited \
--stdin-format raw \
--stdin-par x \
--batch adaptive \
--batch-limit 20 \
--param '$p2=10' \
--format json-unicode
Command output:
{"count":20,"p2":10,"items":["Line1","Line2","Line3","Line4","Line5","Line6","Line7","Line8","Line9","Line10","Line11","Line12","Line13","Line14","Line15","Line16","Line17","Line18","Line19","Line20"]}
{"count":20,"p2":10,"items":["Line21","Line22","Line23","Line24","Line25","Line26","Line27","Line28","Line29","Line30","Line31","Line32","Line33","Line34","Line35","Line36","Line37","Line38","Line39","Line40"]}
...
{"count":20,"p2":10,"items":["Line161","Line162","Line163","Line164","Line165","Line166","Line167","Line168","Line169","Line170","Line171","Line172","Line173","Line174","Line175","Line176","Line177","Line178","Line179","Line180"]}
{"count":20,"p2":10,"items":["Line181","Line182","Line183","Line184","Line185","Line186","Line187","Line188","Line189","Line190","Line191","Line192","Line193","Line194","Line195","Line196","Line197","Line198","Line199","Line200"]}
Deleting multiple records from a YDB table based on primary keys
This example shows how you can delete an unlimited number of records from YDB tables without risking exceeding the limit on the number of records per transaction.
Let's create a test table:
ydb -p quickstart yql -s 'create table test_delete_1( id UInt64 not null, primary key (id))'
Add 100,000 records to it:
for i in $(seq 1 100000);do echo "$i";done | \
ydb -p quickstart import file csv -p test_delete_1
Delete all records with ID > 10:
ydb -p quickstart table query execute -t scan \
-q 'SELECT t.id FROM test_delete_1 AS t WHERE t.id > 10' \
--format json-unicode | \
ydb -p quickstart table query execute \
-q 'DECLARE $lines AS List<Struct<id:UInt64>>;
DELETE FROM test_delete_1 WHERE id IN (SELECT tl.id FROM AS_TABLE($lines) AS tl)' \
--stdin-format newline-delimited \
--stdin-par lines \
--batch adaptive \
--batch-limit 10000
Processing of messages read from a topic
Examples of processing messages read from a topic are given in Running of an SQL query with the transmission of messages from the topic as parameters.