Videos
- Introducing YDB, a Distributed SQL DBMS for mission-critical workloads
- Working with Raw Disk Drives in Kubernetes — YDB's Experience | 在Kubernetes中使用原始磁盘驱动器——YDB的经验
- YDB: extending a Distributed SQL DBMS with PostgreSQL compatibility
- YDB: dealing with Big Data and moving towards AI
- An approach to unite tables and persistent queues in one system
- YDB vs. TPC-C: the Good, the Bad, and the Ugly behind High-Performance Benchmarking
- Enhancing a Distributed SQL Database Engine: A Case Study on Performance Optimization
- Breaking out of the cage: move complex development to GitHub
- Scale it easy: YDB's high performance in a nutshell
- YDB — an open-source distributed SQL database
- YDB — a Distributed SQL Database
- Parallel asynchronous replication between YDB database instances
- Scalability and Fault Tolerance in YDB
Video recordings from conferences and webinars. The materials are divided by categories and tagged:
Introducing YDB, a Distributed SQL DBMS for mission-critical workloads
YDB is a versatile open-source Distributed SQL Database that combines high availability and scalability with strong consistency and ACID transactions. It accommodates transactional (OLTP), analytical (OLAP), and streaming workloads simultaneously. It is publicly available under Apache 2.0, one of the most permissive open-source licenses. In this talk at IndiaFOSS 2024, Ivan Blinkov (VP, Product and Open Source) introduces the system and explains how it can be used to build reliable data-driven applications that implement business-critical processes.
Working with Raw Disk Drives in Kubernetes — YDB's Experience | 在Kubernetes中使用原始磁盘驱动器——YDB的经验
YDB is an open-source distributed database management system that, for performance reasons, uses raw disk drives (block devices) to store all data without any filesystem. It was relatively straightforward to manage such a setup in the bare-metal world of the past, but the dynamic nature of cloud-native environments introduced new challenges to keep this performance benefit. In this talk at KubeCon + CloudNativeCon + Open Source Summit Hong Kong, Ivan Blinkov (VP, Product and Open Source) explores how to leverage Kubernetes and the Operator design pattern to modernize how stateful distributed database clusters are managed without changing the primary approach to how the data is physically stored.
YDB是一个开源的分布式数据库管理系统,为了性能考虑,使用原始磁盘驱动器(块设备)存储所有数据,而不使用任何文件系统。在过去的裸金属世界中管理这样的设置相对比较简单,但云原生环境的动态特性引入了新的挑战,以保持这种性能优势。在这次演讲中,我们将探讨如何利用Kubernetes和运算符设计模式来现代化管理有状态的分布式数据库集群,而不改变数据物理存储的主要方法。
YDB: extending a Distributed SQL DBMS with PostgreSQL compatibility
PostgreSQL is an implementation of SQL standard with one of the most vibrant ecosystems around it. To leverage all the tools and libraries that already know how to work with PostgreSQL, emerging database management systems that bring something new to the market need to learn how to mimic PostgreSQL. In this talk at COSCUP 2024 Ivan Blinkov (VP, Product and Open Source) explores possible approaches to this and related trade-offs, as well as reasoning why YDB chose a unique approach to bring serializable consistency and seamless scalability to the PostgreSQL ecosystem.
Note
The video will be available later.
The presentation is suitable for people interested in trade-offs during implementation of PostgreSQL-compatible DBMS.
YDB: dealing with Big Data and moving towards AI
YDB is a versatile, open-source Distributed SQL database management system that combines high availability and scalability with strong consistency and ACID transactions. It provides services for machine learning products and goes beyond traditional vector search capabilities.
Note
The video will be available later.
This database is used for industrial operations within Yandex. Among its clients are Yandex Market, Yandex Alice, and Yandex Taxi, which are some of the largest and most demanding AI-based applications.
The database offers true elastic scalability, capable of scaling up or down by several orders of magnitude.
Simultaneously, the database is fault-tolerant. It is designed to operate across three availability zones, ensuring continuous operation even if one of the zones becomes unavailable. The database automatically recovers from disk failures, server failures, or data center failures, with minimal latency disruptions to applications.
Currently, work is underway to implement accurate and approximate nearest neighbor searches for machine learning purposes.
Takeaways:
- Architecture of a distributed, fault-tolerant database.
- Approaches to implementing vector search on large datasets.
An approach to unite tables and persistent queues in one system
People need databases to store their data and persistent queues to transfer their data from one system to another. We’ve united tables and persisted queues within one data platform. Now you have a possibility to take your data from a queue, then process it and keep the result in a database within a single transaction. So your application developers don’t need to think about data inconsistency in cases of connection failures or other errors.
Elena Kalinina (Technical Project Manager) tell you about an open-source platform called YDB which allows you to work with tables and queues within a single transaction. Elena walk you through architecture decisions, possible scenarios, and performance aspects of this approach.
YDB vs. TPC-C: the Good, the Bad, and the Ugly behind High-Performance Benchmarking
Modern distributed databases scale horizontally with great efficiency, making them almost limitless in capacity. This implies that benchmarks should be able to run on multiple machines and be very efficient to minimize the number of machines required. This talk will focus on benchmarking high-performance databases, particularly emphasizing YDB and our implementation of the TPC-C benchmark, the de facto gold standard in the database field.
First, we will speak about benchmarking strategies from a user's perspective. We will dive into key details related to benchmark implementations, which could be useful when you create a custom benchmark to mirror your production scenarios. Throughout our performance journey, we have identified numerous anti-patterns: there are things you should unequivocally avoid in your benchmark implementations. We'll highlight these "bad" and "ugly" practices with illustrative examples.
Next, we'll briefly discuss the popular key-value benchmark YCSB, which is a prerequisite for robust performance in distributed transactions. We'll then explore the TPC-C benchmark in greater detail, sharing valuable insights derived from our own implementation.
We'll conclude our talk by presenting performance results from the TPC-C benchmark, comparing YDB and CockroachDB with PostgreSQL to illustrate situations where PostgreSQL might not be enough and when you might want to consider a distributed DBMS instead.
Evgenii Ivanov (Senior developer) discussed best high-performance benchmarking practices and some pitfalls found during TPC-C implementation, then demonstrated TPC-C results of PostgreSQL, CockroachDB, and YDB.
The presentation will be of interest to developers of high-load systems and developers of platforms for various purposes.
Enhancing a Distributed SQL Database Engine: A Case Study on Performance Optimization
Learn how we optimized a distributed SQL database engine, focusing on benchmark-driven improvements, and pivotal testing strategies. Alexey Ozeritskiy (Lead Software Engineer) will talk about performance optimization of distributed SQL engine. He will discuss background information about YDB engine itself and where it is used. The final part of his talk will be about containerization and performance.
The presentation is suitable for DBA.
Breaking out of the cage: move complex development to GitHub
Alexander Smirnov (Technology Expert at Nebius) shows how the YDB team moved its primary development branch from an in-house repository to GitHub, set up independent commodity on-demand cloud infrastructure, CI processes with GitHub Actions, test management with open source and cloud tools. Special attention will be paid to the complexities of decoupling from the corporate monorepository and build system.
The presentation is suitable for DevOps engineers (CI/CD).
Scale it easy: YDB's high performance in a nutshell
Implementing a distributed database with strong consistency isn’t difficult; ensuring speed and scalability is the challenge. YDB excels in these aspects. In this talk, we’ll discuss YDB’s architecture and high performance, present benchmark results, and compare YDB to top competitors.
Evgenii Ivanov (Senior developer) discussed the architecture of YDB, demonstrated its high performance through benchmark results, and compared YDB with its competitors.
The presentation will be of interest to developers of high-load systems and developers of platforms for various purposes.
YDB — an open-source distributed SQL database
YDB is used as a mission-critical database for many Internet-scale services. YDB has been designed as a platform for various data storage and processing systems and is aimed at solving a wide range of problems. Oleg Bondar (CPO YDB) spoke about the structure of YDB, its main features, and benefits.
The presentation is suitable for everyone who is not yet familiar with YDB.
YDB — a Distributed SQL Database
This is a recording of a guest lecture in Belgrade University at the faculty of Mathematics. In this video, we describe the reasons why distributed SQL databases were created. Illustrate a brief history of Distributed SQL DBMS development, which products have appeared first.
Parallel asynchronous replication between YDB database instances
In this talk, we present an approach to asynchronous replication in YDB that provides the following characteristics: changefeed from the source database is sharded among multiple persistent queues, sharded changefeed is applied to the target database in a manner that guarantees the target database consistency.
Scalability and Fault Tolerance in YDB
In this talk, we will cover two layers of YDB: Tablet and BlobStorage, which together provide fault tolerance, scalability, and user isolation.