Metrics reference

Resource usage metrics

Metric name
Type, units of measurement
Description
Labels
resources.storage.used_bytes
IGAUGE, bytes
The size of user and service data stored in distributed network storage. resources.storage.used_bytes = resources.storage.table.used_bytes + resources.storage.topic.used_bytes.
resources.storage.table.used_bytes
IGAUGE, bytes
The size of user and service data stored by tables in distributed network storage. Service data includes the data of the primary, secondary indexes and vector indexes.
resources.storage.topic.used_bytes
IGAUGE, bytes
The size of storage used by topics. This metric sums the topic.storage_bytes values of all topics.
resources.storage.limit_bytes
IGAUGE, bytes
A limit on the size of user and service data that a database can store in distributed network storage.

GRPC API metrics

Metric name
Type, units of measurement
Description
Labels
api.grpc.request.bytes
RATE, bytes
The size of queries received by the database in a certain period of time.
Labels:
- api_service: The name of the gRPC API service, such as table or data_streams.
- method: The name of a gRPC API service method, such as ExecuteDataQuery (for table service) or PutRecord, GetRecords (for data_stream service).
api.grpc.request.dropped_count
RATE, pieces
The number of requests dropped at the transport (gRPC) layer due to an error.
Labels:
- api_service: The name of the gRPC API service, such as table.
- method: The name of a gRPC API service method, such as ExecuteDataQuery.
api.grpc.request.inflight_count
IGAUGE, pieces
The number of requests that a database is simultaneously handling in a certain period of time.
Labels:
- api_service: The name of the gRPC API service, such as table.
- method: The name of a gRPC API service method, such as ExecuteDataQuery.
api.grpc.request.inflight_bytes
IGAUGE, bytes
The size of requests that a database is simultaneously handling in a certain period of time.
Labels:
- api_service: The name of the gRPC API service, such as table.
- method: The name of a gRPC API service method, such as ExecuteDataQuery.
api.grpc.response.bytes
RATE, bytes
The size of responses sent by the database in a certain period of time.
Labels:
- api_service: The name of the gRPC API service, such as table.
- method: The name of a gRPC API service method, such as ExecuteDataQuery.
api.grpc.response.count
RATE, pieces
The number of responses sent by the database in a certain period of time.
Labels:
- api_service: The name of the gRPC API service, such as table.
- method: The name of a gRPC API service method, such as ExecuteDataQuery.
- status is the request execution status. See a more detailed description of statuses under Error Handling.
api.grpc.response.dropped_count
RATE, pieces
The number of responses dropped at the transport (gRPC) layer due to an error.
Labels:
- api_service: The name of the gRPC API service, such as table.
- method: The name of a gRPC API service method, such as ExecuteDataQuery.
api.grpc.response.issues
RATE, pieces
The number of errors of a certain type arising in the execution of a request over a certain period of time.
Tags:
- issue_type is the error type wth the only value being optimistic_locks_invalidation. For more on lock invalidation, review Transactions and requests to YDB.

GRPC API metrics fot topics

Metric name
Type, units of measurement
Description
Labels
grpc.topic.stream_read.commits
RATE,
Commit number of method Ydb::TopicService::StreamRead.
Labels:
- topic – topic name.
- consumer – consumer name
grpc.topic.stream_read.bytes
RATE, pieces
Number of bytes read by the Ydb::TopicService::StreamRead method.
Labels:
- topic – topic name.
- consumer – consumer name
grpc.topic.stream_read.messages
RATE, pieces
Number of messages read by the method Ydb::TopicService::StreamRead.
Labels:
- topic – topic name.
- consumer – consumer name
grpc.topic.stream_read.partition_session.errors
RATE, pieces
Number of errors when working with the partition.
Labels:
- topic – topic name.
- consumer – consumer name
grpc.topic.stream_read.partition_session.started
RATE, pieces
The number of sessions launched per unit of time.
Labels:
- topic – topic name.
- consumer – consumer name
grpc.topic.stream_read.partition_session.stopped
RATE, pieces
Number of sessions stopped per unit of time.
Labels:
- topic – topic name.
- consumer – consumer name
grpc.topic.stream_read.partition_session.starting_count
RATE, pieces
The number of sessions being launched (it means, the client received a command to start a session, but the client has not yet launched the session).
Labels:
- topic – topic name.
- consumer – consumer name
grpc.topic.stream_read.partition_session.stopping_count
RATE, pieces
Number of sessions stopped.
Labels:
- topic – topic name.
- consumer – consumer name
grpc.topic.stream_read.partition_session.count
RATE, pieces
Number of partition_session.
Label:
- topic – topic name.
- consumer – consumer name
grpc.topic.stream_write.bytes
RATE, bytes
Number of bytes writing by Ydb::TopicService::StreamWrite.
Labels:
- topic – topic name
grpc.topic.stream_write.uncommitted_bytes
RATE, bytes
The number of bytes written by the Ydb::TopicService::StreamWrite method within transactions that have not yet been committed.
Label:
- topic – topic name
grpc.topic.stream_write.errors
RATE, pieces
Number of errors when calling the Ydb::TopicService::StreamWrite method.
Labels:
- topic – topic name
grpc.topic.stream_write.messages
RATE, pieces
Number of messages written by method Ydb::TopicService::StreamWrite.
Label:
- topic – topic name
grpc.topic.stream_write.uncommitted_messages
RATE, pieces
Number of messages written by method Ydb::TopicService::StreamWrite within transactions that have not yet been committed.
Label:
- topic – topic name
grpc.topic.stream_write.partition_throttled_milliseconds
HIST_RATE, pieces
Histogram counter. Intervals are specified in milliseconds. Shows the number of messages waiting at the quota.
Label:
- topic – topic name
grpc.topic.stream_write.sessions_active_count
GAUGE, pieces
Number of open recording sessions.
Метки:
- topic – topic name
grpc.topic.stream_write.sessions_created
RATE, pieces
Number of recording sessions created.
Метки:
- topic – topic name

HTTP API Metrics

Metric name
Type, units of measurement
Description
Labels
api.http.data_streams.request.count
RATE, pieces
Number of HTTP requests.
Labels:
- method – name of the HTTP API service method, for example PutRecord , GetRecords.
- topic – topic name
api.http.data_streams.request.bytes
RATE, bytes
Total size of HTTP requests.
Labels:
- method – name of the HTTP API service method, in this case only PutRecord.
- topic – topic name
api.http.data_streams.response.count
RATE, pieces
Number of responses via HTTP protocol.
Label:
- method – name of the HTTP API service method, for example PutRecord , GetRecords.
- topic – topic name.
- code – HTTP response code
api.http.data_streams.response.bytes
RATE, bytes
Total size of HTTP responses.
Label:
- method – name of the HTTP API service method, in this case only GetRecords.
- topic – topic name
api.http.data_streams.response.duration_milliseconds
HIST_RATE, pieces
Histogram counter. Intervals are specified in milliseconds. Shows the number of responses whose execution time falls within a certain interval.
Label:
- method – name of the HTTP API service method.
- topic – topic name
api.http.data_streams.get_records.messages
RATE, pieces
Number of messages written by the method GetRecords.
Labels:
- topic – topic name
api.http.data_streams.put_record.messages
RATE, pieces
Number of messages written by the method PutRecord (always =1).
Label:
- topic – topic name
api.http.data_streams.put_records.failed_messages
RATE, pieces
The number of messages sent by the PutRecords method that were not recorded.
Метки:
- topic – topic name
api.http.data_streams.put_records.successful_messages
RATE, pieces
The number of messages sent by the PutRecords method that were successfully written.
Метки:
- topic – topic name
api.http.data_streams.put_records.total_messages
RATE, pieces
Number of messages sent using the PutRecords method.
Label:
- topic – topic name

Kafka API metrics

Metric name
Type, units of measurement
Description
Labels
api.kafka.request.count
RATE, pieces
The number of requests via the Kafka protocol per unit of time.
Label:
- method – name of the Kafka API service method, for example PRODUCE, SASL_HANDSHAKE
api.kafka.request.bytes
RATE, bytes
Total size of Kafka requests per unit of time.
Label:
- method – name of the Kafka API service method, for example PRODUCE, SASL_HANDSHAKE
api.kafka.response.count
RATE, pieces
The number of responses via the Kafka protocol per unit of time.
Label:
- method – name of the Kafka API service method, for example PRODUCE, SASL_HANDSHAKE.
- error_code – Kafka response code
api.kafka.response.bytes
RATE, bytes
The total size of responses via the Kafka protocol per unit of time.
Label:
- method – name of the Kafka API service method, for example PRODUCE, SASL_HANDSHAKE
api.kafka.response.duration_milliseconds
HIST_RATE, pieces
Histogram counter. Defines a set of intervals in milliseconds and for each of them shows the number of requests with execution time falling within this interval.
Label:
- method – name of the Kafka API service method
api.kafka.produce.failed_messages
RATE, pieces
The number of messages per unit of time sent by the PRODUCE method that were not recorded.
Label:
- topic – topic name
api.kafka.produce.successful_messages
RATE, pieces
The number of messages per unit of time sent by the PRODUCE method that were successfully recorded.
Метки:
- topic – topic name
api.kafka.produce.total_messages
RATE, pieces
Number of messages per unit of time sent by the PRODUCE method.
Label:
- topic – topic name

Session metrics

Metric name
Type, units of measurement
Description
Labels
table.session.active_count
IGAUGE, pieces
The number of sessions started by clients and running at a given time.
table.session.closed_by_idle_count
RATE, pieces
The number of sessions closed by the DB server in a certain period of time due to exceeding the lifetime allowed for an idle session.

Transaction processing metrics

You can analyze a transaction's execution time using a histogram counter. The intervals are set in milliseconds. The chart shows the number of transactions whose duration falls within a certain time interval.

Metric name
Type, units of measurement
Description
Labels
table.transaction.total_duration_milliseconds
HIST_RATE, pieces
The number of transactions with a certain duration on the server and client. The duration of a transaction is counted from the point of its explicit or implicit start to committing changes or its rollback. Includes the transaction processing time on the server and the time on the client between sending different requests within the same transaction.
Labels:
- tx_kind: The transaction type, possible values are read_only, read_write, write_only, and pure.
table.transaction.server_duration_milliseconds
HIST_RATE, pieces
The number of transactions with a certain duration on the server. The duration is the time of executing requests within a transaction on the server. Does not include the waiting time on the client between sending separate requests within a single transaction.
Labels:
-tx_kind: The transaction type, possible values areread_only, read_write, write_only, and pure.
table.transaction.client_duration_milliseconds
HIST_RATE, pieces
The number of transactions with a certain duration on the client. The duration is the waiting time on the client between sending individual requests within a single transaction. Does not include the time of executing requests on the server.
Labels:
- tx_kind: The transaction type, possible values are read_only, read_write, write_only, and pure.

Query processing metrics

Metric name
Type, units of measurement
Description
Labels
table.query.request.bytes
RATE, bytes
The size of YQL query text and parameter values to queries received by the database in a certain period of time.
table.query.request.parameters_bytes
RATE, bytes
The parameter size to the queries received by the database in a certain period of time.
table.query.response.bytes
RATE, bytes
The size of responses sent by the database in a certain period of time.
table.query.compilation.latency_milliseconds
HIST_RATE, pieces
Histogram counter. The intervals are set in milliseconds. Shows the number of successfully executed compilation queries whose duration falls within a certain time interval.
table.query.compilation.active_count
IGAUGE, pieces
The number of active compilations at a given time.
table.query.compilation.count
RATE, pieces
The number of compilations that completed successfully in a certain time period.
table.query.compilation.errors
RATE, pieces
The number of compilations that failed in a certain period of time.
table.query.compilation.cache_hits
RATE, pieces
The number of queries in a certain period of time, which didn't require any compilation, because there was an existing plan in the cache of prepared queries.
table.query.compilation.cache_misses
RATE, pieces
The number of queries in a certain period of time that required query compilation.
table.query.execution.latency_milliseconds
HIST_RATE, pieces
Histogram counter. The intervals are set in milliseconds. Shows the number of queries whose execution time falls within a certain interval.

Row-oriented table partition metrics

Metric name
Type, units of measurement
Description
Labels
table.datashard.row_count
GAUGE, pieces
The number of rows in all row-oriented tables in the database.
table.datashard.size_bytes
GAUGE, bytes
The size of data in all row-oriented tables in the database.
table.datashard.used_core_percents
HIST_GAUGE, %
Histogram counter. The intervals are set as a percentage. Shows the number of row-oriented table partitions using computing resources in the ratio that falls within a certain interval.
table.datashard.read.rows
RATE, pieces
The number of rows that are read by all partitions of all row-oriented tables in the database in a certain period of time.
table.datashard.read.bytes
RATE, bytes
The size of data that is read by all partitions of all row-oriented tables in the database in a certain period of time.
table.datashard.write.rows
RATE, pieces
The number of rows that are written by all partitions of all row-oriented tables in the database in a certain period of time.
table.datashard.write.bytes
RATE, bytes
The size of data that is written by all partitions of all row-oriented tables in the database in a certain period of time.
table.datashard.scan.rows
RATE, pieces
The number of rows that are read through StreamExecuteScanQuery or StreamReadTable gRPC API calls by all partitions of all row-oriented tables in the database in a certain period of time.
table.datashard.scan.bytes
RATE, bytes
The size of data that is read through StreamExecuteScanQuery or StreamReadTable gRPC API calls by all partitions of all row-oriented tables in the database in a certain period of time.
table.datashard.bulk_upsert.rows
RATE, pieces
The number of rows that are added through a BulkUpsert gRPC API call to all partitions of all row-oriented tables in the database in a certain period of time.
table.datashard.bulk_upsert.bytes
RATE, bytes
The size of data that is added through a BulkUpsert gRPC API call to all partitions of all row-oriented tables in the database in a certain period of time.
table.datashard.erase.rows
RATE, pieces
The number of rows deleted from row-oriented tables in the database in a certain period of time.
table.datashard.erase.bytes
RATE, bytes
The size of data deleted from row-oriented tables in the database in a certain period of time.
table.datashard.cache_hit.bytes
RATE, bytes
The total amount of data successfully retrieved from memory (cache), indicating efficient cache utilization in serving frequently accessed data without accessing distributed storage.
table.datashard.cache_miss.bytes
RATE, bytes
The total amount of data that was requested but not found in memory (cache) and was read from distributed storage, highlighting potential areas for cache optimization.

Column-oriented table partition metrics

Metric name
Type, units of measurement
Description
Labels
table.columnshard.write.rows
RATE, pieces
The number of rows that are written by all partitions of all column-oriented tables in the database in a certain period of time.
table.columnshard.write.bytes
RATE, bytes
The size of data that is written by all partitions of all column-oriented tables in the database in a certain period of time.
table.columnshard.scan.rows
RATE, pieces
The number of rows that are read through StreamExecuteScanQuery or StreamReadTable gRPC API calls by all partitions of all column-oriented tables in the database in a certain period of time.
table.columnshard.scan.bytes
RATE, bytes
The size of data that is read through StreamExecuteScanQuery or StreamReadTable gRPC API calls by all partitions of all column-oriented tables in the database in a certain period of time.
table.columnshard.bulk_upsert.rows
RATE, pieces
The number of rows that are added through a BulkUpsert gRPC API call to all partitions of all column-oriented tables in the database in a certain period of time.
table.columnshard.bulk_upsert.bytes
RATE, bytes
The size of data that is added through a BulkUpsert gRPC API call to all partitions of all column-oriented tables in the database in a certain period of time.

Resource usage metrics (for Dedicated mode only)

Metric name
Type
units of measurement
Description
Labels
resources.cpu.used_core_percents
RATE, %
CPU usage. If the value is 100, one of the cores is being used for 100%. The value may be greater than 100 for multi-core configurations.
Labels:
- pool: The computing pool, possible values are user, system, batch, io, and ic.
resources.cpu.limit_core_percents
IGAUGE, %
The percentage of CPU available to a database. For example, for a database that has three nodes with four cores in pool=user per node, the value of this metric will be 1200.
Labels:
- pool: The computing pool, possible values are user, system, batch, io, and ic.
resources.memory.used_bytes
IGAUGE, bytes
The amount of RAM used by the database nodes.
resources.memory.limit_bytes
IGAUGE, bytes
RAM available to the database nodes.

Query processing metrics (for Dedicated mode only)

Metric name
Type
units of measurement
Description
Labels
table.query.compilation.cache_evictions
RATE, pieces
The number of queries evicted from the cache of prepared queries in a certain period of time.
table.query.compilation.cache_size_bytes
IGAUGE, bytes
The size of the cache of prepared queries.
table.query.compilation.cached_query_count
IGAUGE, pieces
The size of the cache of prepared queries.

Topic metrics

Metric name
Type
units of measurement
Description
Labels
topic.producers_count
GAUGE, pieces
The number of unique topic producers.
Labels:
- topic – the name of the topic.
topic.storage_bytes
GAUGE, bytes
The size of the topic in bytes.
Labels:
- topic - the name of the topic.
topic.read.bytes
RATE, bytes
The number of bytes read by the consumer from the topic.
Labels:
- topic – the name of the topic.
- consumer – the name of the consumer.
topic.read.messages
RATE, pieces
The number of messages read by the consumer from the topic.
Labels:
- topic – the name of the topic.
- consumer – the name of the consumer.
topic.read.lag_messages
RATE, pieces
The number of unread messages by the consumer in the topic.
Labels:
- topic – the name of the topic.
- consumer – the name of the consumer.
topic.read.lag_milliseconds
HIST_RATE, pieces
A histogram counter. The intervals are specified in milliseconds. It shows the number of messages where the difference between the reading time and the message creation time falls within the specified interval.
Labels:
- topic – the name of the topic.
- consumer – the name of the consumer.
topic.write.bytes
RATE, bytes
The size of the written data.
Labels:
- topic – the name of the topic.
topic.write.uncommited_bytes
RATE, bytes
The size of data written as part of ongoing transactions.
Labels:
- topic — the name of the topic.
topic.write.uncompressed_bytes
RATE, bytes
The size of uncompressed written data.
Метки:
- topic – the name of the topic.
topic.write.messages
RATE, pieces
The number of written messages.
Labels:
- topic – the name of the topic.
topic.write.uncommitted_messages
RATE, pieces
The number of messages written as part of ongoing transactions.
Labels:
- topic — the name of the topic.
topic.write.message_size_bytes
HIST_RATE, pieces
A histogram counter. The intervals are specified in bytes. It shows the number of messages which size falls within the boundaries of the interval.
Labels:
- topic – the name of the topic.
topic.write.lag_milliseconds
HIST_RATE, pieces
A histogram counter. The intervals are specified in milliseconds. It shows the number of messages where the difference between the write time and the message creation time falls within the specified interval.
Labels:
- topic – the name of the topic.

Aggregated metrics of topic partitions

The following table shows aggregated partition metrics for the topic. The maximum and minimum values ​​are calculated for all partitions of a given topic.

Metric name
Type
units of measurement
Description
Labels
topic.partition.init_duration_milliseconds_max
GAUGE, milliseconds
Maximum partition initialization delay.
Метки:
- topic – topic name
topic.partition.producers_count_max
GAUGE, pieces
The maximum number of sources in the partition.
Метки:
- topic – topic name
topic.partition.storage_bytes_max
GAUGE, bytes
Maximum partition size in bytes.
Label:
- topic – topic name
topic.partition.uptime_milliseconds_min
GAUGE, pieces
Minimum partition operating time after restart.
Normally during a rolling restart of topic.partition.uptime_milliseconds_min is close to 0, after the end of the rolling restart the value of topic.partition.uptime_milliseconds_min should increase to infinity.
Label:
- topic – topic name
topic.partition.total_count
GAUGE, pieces
Total number of partitions in the topic.
Label:
- topic – topic name
topic.partition.alive_count
GAUGE, pieces
The number of partitions sending their metrics.
Label:
- topic – topic name
topic.partition.committed_end_to_end_lag_milliseconds_max
GAUGE, milliseconds
The maximum (across all partitions) difference between the current time and the time the last downloaded message was created.
Label:
- topic – topic name.
- consumer – name of the consumer
topic.partition.committed_lag_messages_max
GAUGE, pieces
The maximum (across all partitions) difference between the last partition offset and the recorded partition offset.
Метки:
- topic – topic name.
- consumer – name of the consumer
topic.partition.committed_read_lag_milliseconds_max
GAUGE, milliseconds
The maximum (across all partitions) difference between the current time and the recording time of the last recorded message.
Label:
- topic – topic name.
- consumer – name of the consumer
topic.partition.end_to_end_lag_milliseconds_max
GAUGE, milliseconds
The difference between the current time and the minimum creation time among all messages read in the last minute in all partitions.
Label:
- topic – topic name.
- consumer – name of the consumer
topic.partition.lag_messages_max
GAUGE, pieces
The maximum difference (across all partitions) of the last offset in the partition and the last subtracted offset.
Label:
- topic – topic name.
- consumer – name of the consumer
topic.partition.read.lag_milliseconds_max
GAUGE, milliseconds
The difference between the current time and the minimum recording time among all messages read in the last minute in all partitions.
Label:
- topic – topic name.
- consumer – name of the consumer
topic.partition.read.idle_milliseconds_max
GAUGE, milliseconds
Maximum idle time (how long the partition was not read) for all partitions.
Label:
- topic – topic name.
- consumer – name of the consumer
topic.partition.read.lag_milliseconds_max
GAUGE, milliseconds
The maximum difference between the recording time and the creation time among all messages read in the last minute.
Label:
- topic – topic name.
- consumer – name of the consumer
topic.partition.write.lag_milliseconds_max
GAUGE, milliseconds
The maximum difference between the recording time and the creation time among all messages recorded in the last minute.
Label:
- topic – topic name
topic.partition.write.speed_limit_bytes_per_second
GAUGE, bytes per second
Write quota in bytes per second per partition.
Label:
- topic – topic name
topic.partition.write.throttled_nanoseconds_max
GAUGE, nanoseconds
Maximum write throttling time (waiting on quota) for all partitions. In the limit, if topic.partition.write.throttled_nanoseconds_max = 10^9, then this means that the entire second was waited on the quota
Label:
- topic – topic name
topic.partition.write.bytes_per_day_max
GAUGE, bytes
The maximum number of bytes written over the last 24 hours for all partitions.
Label:
- topic – topic name
topic.partition.write.bytes_per_hour_max
GAUGE, bytes
The maximum number of bytes written in the last hour, across all partitions.
Label:
- topic – topic name
topic.partition.write.bytes_per_minute_max
GAUGE, bytes
The maximum number of bytes written in the last minute, across all partitions.
Label:
- topic – topic name
topic.partition.write.idle_milliseconds_max
GAUGE, milliseconds
Maximum time the partition is idle for recording.br/>Label:
- topic – topic name

Resource pool metrics

Metric name
Type, units of measurement
Description
Tags
kqp.workload_manager.CpuQuotaManager.AverageLoadPercentage
RATE, pieces
Average database load, the DATABASE_LOAD_CPU_THRESHOLD works based on this metric.
kqp.workload_manager.InFlightLimit
GAUGE, pieces
Limit on the number of simultaneously running requests.
kqp.workload_manager.GlobalInFly
GAUGE, pieces
The current number of simultaneously running requests. Displayed only for pools with CONCURRENT_QUERY_LIMIT or DATABASE_LOAD_CPU_THRESHOLD enabled
kqp.workload_manager.QueueSizeLimit
GAUGE, pieces
Queue size of pending requests.
kqp.workload_manager.GlobalDelayedRequests
GAUGE, pieces
The number of requests waiting in the execution queue. Only visible for pools with CONCURRENT_QUERY_LIMIT or DATABASE_LOAD_CPU_THRESHOLD enabled .