YDB Quick Start
In this guide, you will install a single-node local YDB cluster and execute simple queries against your database.
Normally, YDB stores data on multiple SSD/NVMe or HDD raw disk devices without any filesystem. However, for simplicity, this guide emulates disks in RAM or using a file in a regular filesystem. Thus, this setup is unsuitable for any production usage or even benchmarks. See the documentation for DevOps Engineers to learn how to run YDB in a production environment.
Install and start YDB
Note
The recommended environment to run YDB is x86_64 Linux. If you don't have access to one, feel free to switch to the instructions on the "Docker" tab.
-
Create a directory for YDB and use it as the current working directory:
mkdir ~/ydbd && cd ~/ydbd
-
Download and run the installation script:
curl https://install.ydb.tech | bash
This will download and unpack the archive containing the
ydbd
executable, libraries, configuration files, and scripts needed to start and stop the local cluster.The script is executed entirely with the current user privileges (notice the lack of
sudo
). Therefore, it can't do much on the system. You can check which exactly commands it runs by opening the same URL in your browser. -
Start the cluster in one of the following storage modes:
-
In-memory data:
./start.sh ram
In this case, all data is stored only in RAM, it will be lost when the cluster is stopped.
-
Data on disk:
./start.sh disk
When you run this command an 80GB
ydb.data
file will be created in the working directory if it weren't there before. Make sure there's enough disk space available to create it. This file will be used to emulate a raw disk device, which would have been used in production environments. -
Data on a real disk drive:
./start.sh drive "/dev/$DRIVE_NAME"
Replace
/dev/$DRIVE_NAME
with an actual device name that is not used for anything else, for example/dev/sdb
. The first time you run this command, the specified disk drive will be fully wiped and then used for YDB data storage. It is recommended to use a NVMe or SSD drive with at least 800Gb data volume. Such setup can be used for single-node performance testing or other environments that do not have any fault-tolerance requirements.
Result:
Starting storage process... Initializing storage ... Registering database ... Starting database process... Database started. Connection options for YDB CLI: -e grpc://localhost:2136 -d /Root/test
-
-
Create a directory for YDB and use it as the current working directory:
mkdir ~/ydbd && cd ~/ydbd mkdir ydb_data mkdir ydb_certs
-
Run the Docker container:
docker run -d --rm --name ydb-local -h localhost \ --platform linux/amd64 \ -p 2135:2135 -p 2136:2136 -p 8765:8765 \ -v $(pwd)/ydb_certs:/ydb_certs -v $(pwd)/ydb_data:/ydb_data \ -e GRPC_TLS_PORT=2135 -e GRPC_PORT=2136 -e MON_PORT=8765 \ -e YDB_USE_IN_MEMORY_PDISKS=true \ cr.yandex/yc/yandex-docker-local-ydb:latest
If the container starts successfully, you'll see the container's ID. The container might take a few minutes to initialize. The database will not be available until container initialization is complete.
The
YDB_USE_IN_MEMORY_PDISKS
setting makes all data volatile, stored only in RAM. Currently, data persistence by turning it off is supported only on x86_64 processors.
-
Install the Kubernetes CLI kubectl and Helm 3 package manager.
-
Install and run Minikube.
-
Clone the repository with YDB Kubernetes Operator:
git clone https://github.com/ydb-platform/ydb-kubernetes-operator && cd ydb-kubernetes-operator
-
Install the YDB controller in the cluster:
helm upgrade --install ydb-operator deploy/ydb-operator --set metrics.enabled=false
-
Apply the manifest for creating a YDB cluster:
kubectl apply -f samples/minikube/storage.yaml
-
Wait for
kubectl get storages.ydb.tech
to becomeReady
. -
Apply the manifest for creating a database:
kubectl apply -f samples/minikube/database.yaml
-
Wait for
kubectl get databases.ydb.tech
to becomeReady
. -
After processing the manifest, a StatefulSet object that describes a set of dynamic nodes is created. The created database will be accessible from inside the Kubernetes cluster by the
database-minikube-sample
DNS name on port 2135. -
To continue, get access to port 8765 from outside Kubernetes using
kubectl port-forward database-minikube-sample-0 8765
.
-
Install the Kubernetes CLI kubectl and Helm 3 package manager.
-
Install Kind.
-
Clone the repository with YDB Kubernetes Operator:
git clone https://github.com/ydb-platform/ydb-kubernetes-operator && cd ydb-kubernetes-operator
-
Create a Kind cluster and wait until it is ready:
kind create cluster --config=samples/kind/kind-config.yaml --wait 5m
-
Install the YDB controller in the cluster:
helm upgrade --install ydb-operator deploy/ydb-operator --set metrics.enabled=false
-
Apply the manifest for creating a storage:
kubectl apply -f samples/kind/storage.yaml
-
Wait for
kubectl get storages.ydb.tech
to becomeReady
. -
Apply the manifest for creating a database:
kubectl apply -f samples/kind/database.yaml
-
Wait for
kubectl get databases.ydb.tech
to becomeReady
. -
After processing the manifest, a StatefulSet object that describes a set of dynamic nodes is created. The created database will be accessible from inside the Kubernetes cluster by the
database-kind-sample
DNS name on port 2135. -
To continue, get access to port 8765 from outside Kubernetes using
kubectl port-forward database-kind-sample-0 8765
.
Run your first "Hello, world!" query
The simplest way to launch your first YDB query is via the built-in web interface. It is launched by default on port 8765 of the YDB server. If you have launched it locally, open localhost:8765 in your web browser. If not, replace localhost
with your server's hostname in this URL or use ssh -L 8765:localhost:8765 my-server-hostname-or-ip.example.com
to set up port forwarding and still open localhost:8765. You'll see a page like this:
YDB is designed to be a multi-tenant system, with potentially thousands of users working with the same cluster simultaneously. Hence, most logical entities inside a YDB cluster reside in a flexible hierarchical structure more akin to Unix's virtual filesystem rather than a fixed-depth schema you might be familiar with from other database management systems. As you can see, the first level of hierarchy consists of databases running inside a single YDB process that might belong to different tenants. /Root
is for system purposes, while /Root/test
or /local
(depending on the chosen installation method) is a playground created during installation in the previous step. Click on either /Root/test
or /local
, enter your first query, and hit the "Run" button:
SELECT "Hello, world!"u;
The query returns the greeting, as it is supposed to:
Note
Did you notice the odd u
suffix? YDB and its query language, YQL, are strongly typed. Regular strings in YDB can contain any binary data, while this suffix indicates that this string literal is of the Utf8
data type, which can only contain valid UTF-8 sequences. Learn more about YDB's type system.
The second simplest way to run a SQL query with YDB is the command line interface (CLI), while most real-world applications will likely communicate with YDB via one of the available software development kits (SDK). Feel free to follow the rest of the guide using either the CLI or one of the SDKs instead of the web UI if you feel comfortable doing so.
Create your first table
The main purpose of database management systems is to store data for later retrieval. As an SQL-based system, YDB's primary abstraction for data storage is a table. To create our first one, run the following query:
CREATE TABLE example
(
key UInt64,
value String,
PRIMARY KEY (key)
);
As you can see, it is a simple key-value table. Let's walk through the query step-by-step:
- Each SQL statement kind like
CREATE TABLE
has more detailed explanation in YQL reference. example
is the table name identifier, whilekey
andvalue
are column name identifiers. It is recommended to use simple names for identifiers like these, but if you need one that contains non-trivial symbols, wrap the name in backticks.UInt64
andString
are data type names.String
represents a binary string, andUInt64
is a 64-bit unsigned integer. Thus, our example table stores string values identified by unsigned integer keys. More details about data types.PRIMARY KEY
is one of the fundamental concepts of SQL that has a significant impact on both application logic and performance. Following the SQL standard, the primary key also implies an unique constraint, meaning the table cannot have multiple rows with equal primary keys. In this example table, it's quite straightforward which column should be chosen as the primary key, which we specify as(key)
in round brackets after the respective keyword. In real-world scenarios, tables often have dozens of columns, and primary keys can be compound (consisting of multiple columns in a specified order), making choosing the right primary key more of an art. If you are interested in this topic, there's a guide on choosing the primary key for maximizing performance. YDB tables are required to have a primary key.
Add sample data
Now let's fill our table with some data. The simplest way is to just use literals:
INSERT INTO example (key, value)
VALUES (123, "hello"),
(321, "world");
Step-by-step walkthrough:
INSERT INTO
is the classic SQL statement for adding new rows to a table. However, it is not the most performant, as according to the SQL standard, it has to check whether the table already has rows with the given primary key values, and raise an error if they exist. Thus, if you run this query multiple times, all attempts except the first will return an error. If your application logic doesn't require this behavior, it is better to useUPSERT INTO
instead ofINSERT INTO
. Upsert (which stands for "update or insert") will blindly write the provided values, overwriting existing rows if there were any. The rest of the syntax will be the same.(key, value)
specifies the names of the columns we're inserting and their order. The values provided next need to match this specification, both in the number of columns and their data types.- After the
VALUES
keyword, there's a list of tuples, each representing a table row. In this example, we have two rows identified by 123 and 321 in thekey
column, and "hello" and "world" values in thevalue
column, respectively.
To double-check that the rows were indeed added to the table, there's a common query that should return 2
in this case:
SELECT COUNT(*) FROM example;
A few notable details in this one:
- The
FROM
clause specifies a table to retrieve data from. COUNT
is an aggregate function that counts the number of values. By default, when there are no other special clauses around, the presence of any aggregate function collapses the result to one row containing aggregates over the whole input data (theexample
table in this case).- Asterisk
*
is a placeholder that normally means "all columns"; thus,COUNT
will return the overall row count.
Another common way to fill a table with data is by combining INSERT INTO
(or UPSERT INTO
) and SELECT
. In this case, values to be stored are calculated inside the database instead of being provided by the client as literals. We'll use a slightly more realistic query to demonstrate this:
$subquery = SELECT ListFromRange(1000, 10000) AS keys;
UPSERT INTO example
SELECT
key,
CAST(RandomUuid(key) AS String) AS value
FROM $subquery
FLATTEN LIST BY keys AS key
There's quite a lot going on in this query; let's dig into it:
-
$subquery
is a named expression. This syntax is YQL's extension to the SQL standard that allows making complex queries more readable. It behaves the same as if you wrote that firstSELECT
inline where$subquery
is later used on the last row, but it allows comprehending what's going on piece by piece, like variables in regular programming languages. -
ListFromRange
is a function that produces a list of consecutive integers, starting from the value provided in the first argument and ending with the value provided in the second argument. There's also a third optional argument that can allow skipping integers with a specified step, but we omit it in our example, which defaults to returning all integers in the given range.List
is one of the most common container data types. -
AS
is a keyword used to give a name to the value we're returning fromSELECT
; in this example,keys
. -
FROM ... FLATTEN LIST BY ... AS ...
has a few notable things happening:- Another
SELECT
used in theFROM
clause is called a subquery. That's why we chose this name for our$subquery
named expression, but we could have chosen something more meaningful to explain what it is. Subqueries normally aren't materialized; they just pass the output of oneSELECT
to the input of another on the fly. They can be used as a means to produce arbitrarily complex execution graphs, especially if used in conjunction with other YQL features. FLATTEN LIST BY
clause modifies input passed viaFROM
in the following way: for each row in the input data, it takes a column of list data type and produces multiple rows according to the number of elements in that list. Normally, that list column is replaced by the column with the current single element, but theAS
keyword in this context allows access to both the whole list (under the original name) and the current element (under the name specified afterAS
), or just to make it more clear what is what, like in this example.
- Another
-
RandomUuid
is a function that returns a pseudorandom UUID version 4. Unlike most other functions, it doesn't actually use what is passed as an argument (thekey
column); instead, it indicates that we need to call the function on each row. See the reference for more examples of how this works. -
CAST(... AS ...)
is a common function for converting values to a specified data type. In this context, the type specification is expected afterAS
(in this case,String
), not an arbitrary name. -
UPSERT INTO
will blindly write the values to the specified tables, as we discussed previously. Note that it didn't require(key, value)
column names specification when used in conjunction withSELECT
, as now columns can just be matched by names returned fromSELECT
.
Quick question!
What will the SELECT COUNT(*) FROM example;
query return now?
Stop the cluster
Stop the local YDB cluster after you have finished experimenting:
To stop the local cluster, run the following command:
~/ydbd/stop.sh
Optionally, you can then clean up your filesystem by removing your working directory with the rm -rf ~/ydbd
command. All data inside the local YDB cluster will be lost.
To stop the Docker container with the local cluster, run the following command:
docker kill ydb-local
Optionally, you can then clean up your filesystem by removing your working directory with the rm -rf ~/ydbd
command. All data inside the local YDB cluster will be lost.
To delete the YDB database, it is enough to delete the Database resource associated with it:
kubectl delete database.ydb.tech database-minikube-sample
To delete the YDB cluster, execute the following commands (all data will be lost):
kubectl delete storage.ydb.tech storage-minikube-sample
To remove the YDB controller from the Kubernetes cluster, delete the release created by Helm:
helm delete ydb-operator
To delete the YDB database, it is enough to delete the Database resource associated with it:
kubectl delete database.ydb.tech database-kind-sample
To delete the YDB cluster, execute the following commands (all data will be lost):
kubectl delete storage.ydb.tech storage-kind-sample
To remove the YDB controller from the Kubernetes cluster, delete the release created by Helm:
helm delete ydb-operator
To delete kind
cluster, run the following command:
kind delete cluster
Done! What's next?
After getting a hold of some basics demonstrated in this guide, you should be ready to jump into more advanced topics. Choose what looks the most relevant depending on your use case and role:
- Walk through a more detailed YQL tutorial that focuses on writing queries.
- Try to build your first app storing data in YDB using one of the SDKs.
- Learn how to set up a production-ready deployment of YDB.
- Read about YDB concepts.