Metrics reference

Resource usage metrics

Metric name Type, units of measurement	Description Labels
`resources.storage.used_bytes` `IGAUGE`, bytes	The size of user and service data stored in distributed network storage. `resources.storage.used_bytes` = `resources.storage.table.used_bytes` + `resources.storage.topic.used_bytes`.
`resources.storage.table.used_bytes` `IGAUGE`, bytes	The size of user and service data stored by tables in distributed network storage. Service data includes the data of the primary, secondary indexes and vector indexes.
`resources.storage.topic.used_bytes` `IGAUGE`, bytes	The size of storage used by topics. This metric sums the `topic.storage_bytes` values of all topics.
`resources.storage.limit_bytes` `IGAUGE`, bytes	A limit on the size of user and service data that a database can store in distributed network storage.

GRPC API metrics

Metric name Type, units of measurement	Description Labels
`api.grpc.request.bytes` `RATE`, bytes	The size of queries received by the database in a certain period of time. Labels: - api_service: The name of the gRPC API service, such as `table` or `data_streams`. - method: The name of a gRPC API service method, such as `ExecuteDataQuery` (for `table` service) or `PutRecord`, `GetRecords` (for `data_stream` service).
`api.grpc.request.dropped_count` `RATE`, pieces	The number of requests dropped at the transport (gRPC) layer due to an error. Labels: - api_service: The name of the gRPC API service, such as `table`. - method: The name of a gRPC API service method, such as `ExecuteDataQuery`.
`api.grpc.request.inflight_count` `IGAUGE`, pieces	The number of requests that a database is simultaneously handling in a certain period of time. Labels: - api_service: The name of the gRPC API service, such as `table`. - method: The name of a gRPC API service method, such as `ExecuteDataQuery`.
`api.grpc.request.inflight_bytes` `IGAUGE`, bytes	The size of requests that a database is simultaneously handling in a certain period of time. Labels: - api_service: The name of the gRPC API service, such as `table`. - method: The name of a gRPC API service method, such as `ExecuteDataQuery`.
`api.grpc.response.bytes` `RATE`, bytes	The size of responses sent by the database in a certain period of time. Labels: - api_service: The name of the gRPC API service, such as `table`. - method: The name of a gRPC API service method, such as `ExecuteDataQuery`.
`api.grpc.response.count` `RATE`, pieces	The number of responses sent by the database in a certain period of time. Labels: - api_service: The name of the gRPC API service, such as `table`. - method: The name of a gRPC API service method, such as `ExecuteDataQuery`. - status is the request execution status. See a more detailed description of statuses under Error Handling.
`api.grpc.response.dropped_count` `RATE`, pieces	The number of responses dropped at the transport (gRPC) layer due to an error. Labels: - api_service: The name of the gRPC API service, such as `table`. - method: The name of a gRPC API service method, such as `ExecuteDataQuery`.
`api.grpc.response.issues` `RATE`, pieces	The number of errors of a certain type arising in the execution of a request over a certain period of time. Tags: - issue_type is the error type wth the only value being `optimistic_locks_invalidation`. For more on lock invalidation, review Transactions and requests to YDB.

GRPC API metrics fot topics

Metric name Type, units of measurement	Description Labels
`grpc.topic.stream_read.commits` `RATE`,	Commit number of method `Ydb::TopicService::StreamRead`. Labels: - topic – topic name. - consumer – consumer name
`grpc.topic.stream_read.bytes` `RATE`, pieces	Number of bytes read by the `Ydb::TopicService::StreamRead` method. Labels: - topic – topic name. - consumer – consumer name
`grpc.topic.stream_read.messages` `RATE`, pieces	Number of messages read by the method `Ydb::TopicService::StreamRead`. Labels: - topic – topic name. - consumer – consumer name
`grpc.topic.stream_read.partition_session.errors` `RATE`, pieces	Number of errors when working with the partition. Labels: - topic – topic name. - consumer – consumer name
`grpc.topic.stream_read.partition_session.started` `RATE`, pieces	The number of sessions launched per unit of time. Labels: - topic – topic name. - consumer – consumer name
`grpc.topic.stream_read.partition_session.stopped` `RATE`, pieces	Number of sessions stopped per unit of time. Labels: - topic – topic name. - consumer – consumer name
`grpc.topic.stream_read.partition_session.starting_count` `RATE`, pieces	The number of sessions being launched (it means, the client received a command to start a session, but the client has not yet launched the session). Labels: - topic – topic name. - consumer – consumer name
`grpc.topic.stream_read.partition_session.stopping_count` `RATE`, pieces	Number of sessions stopped. Labels: - topic – topic name. - consumer – consumer name
`grpc.topic.stream_read.partition_session.count` `RATE`, pieces	Number of partition_session. Label: - topic – topic name. - consumer – consumer name
`grpc.topic.stream_write.bytes` `RATE`, bytes	Number of bytes writing by `Ydb::TopicService::StreamWrite`. Labels: - topic – topic name
`grpc.topic.stream_write.uncommitted_bytes` `RATE`, bytes	The number of bytes written by the `Ydb::TopicService::StreamWrite` method within transactions that have not yet been committed. Label: - topic – topic name
`grpc.topic.stream_write.errors` `RATE`, pieces	Number of errors when calling the `Ydb::TopicService::StreamWrite` method. Labels: - topic – topic name
`grpc.topic.stream_write.messages` `RATE`, pieces	Number of messages written by method `Ydb::TopicService::StreamWrite`. Label: - topic – topic name
`grpc.topic.stream_write.uncommitted_messages` `RATE`, pieces	Number of messages written by method `Ydb::TopicService::StreamWrite` within transactions that have not yet been committed. Label: - topic – topic name
`grpc.topic.stream_write.partition_throttled_milliseconds` `HIST_RATE`, pieces	Histogram counter. Intervals are specified in milliseconds. Shows the number of messages waiting at the quota. Label: - topic – topic name
`grpc.topic.stream_write.sessions_active_count` `GAUGE`, pieces	Number of open recording sessions. Метки: - topic – topic name
`grpc.topic.stream_write.sessions_created` `RATE`, pieces	Number of recording sessions created. Метки: - topic – topic name

HTTP API Metrics

Metric name Type, units of measurement	Description Labels
`api.http.data_streams.request.count` `RATE`, pieces	Number of HTTP requests. Labels: - method – name of the HTTP API service method, for example `PutRecord` , `GetRecords`. - topic – topic name
`api.http.data_streams.request.bytes` `RATE`, bytes	Total size of HTTP requests. Labels: - method – name of the HTTP API service method, in this case only `PutRecord`. - topic – topic name
`api.http.data_streams.response.count` `RATE`, pieces	Number of responses via HTTP protocol. Label: - method – name of the HTTP API service method, for example `PutRecord` , `GetRecords`. - topic – topic name. - code – HTTP response code
`api.http.data_streams.response.bytes` `RATE`, bytes	Total size of HTTP responses. Label: - method – name of the HTTP API service method, in this case only `GetRecords`. - topic – topic name
`api.http.data_streams.response.duration_milliseconds` `HIST_RATE`, pieces	Histogram counter. Intervals are specified in milliseconds. Shows the number of responses whose execution time falls within a certain interval. Label: - method – name of the HTTP API service method. - topic – topic name
`api.http.data_streams.get_records.messages` `RATE`, pieces	Number of messages written by the method `GetRecords`. Labels: - topic – topic name
`api.http.data_streams.put_record.messages` `RATE`, pieces	Number of messages written by the method `PutRecord` (always =1). Label: - topic – topic name
`api.http.data_streams.put_records.failed_messages` `RATE`, pieces	The number of messages sent by the `PutRecords` method that were not recorded. Метки: - topic – topic name
`api.http.data_streams.put_records.successful_messages` `RATE`, pieces	The number of messages sent by the `PutRecords` method that were successfully written. Метки: - topic – topic name
`api.http.data_streams.put_records.total_messages` `RATE`, pieces	Number of messages sent using the `PutRecords` method. Label: - topic – topic name

Kafka API metrics

Metric name Type, units of measurement	Description Labels
`api.kafka.request.count` `RATE`, pieces	The number of requests via the Kafka protocol per unit of time. Label: - method – name of the Kafka API service method, for example `PRODUCE`, `SASL_HANDSHAKE`
`api.kafka.request.bytes` `RATE`, bytes	Total size of Kafka requests per unit of time. Label: - method – name of the Kafka API service method, for example `PRODUCE`, `SASL_HANDSHAKE`
`api.kafka.response.count` `RATE`, pieces	The number of responses via the Kafka protocol per unit of time. Label: - method – name of the Kafka API service method, for example `PRODUCE`, `SASL_HANDSHAKE`. - error_code – Kafka response code
`api.kafka.response.bytes` `RATE`, bytes	The total size of responses via the Kafka protocol per unit of time. Label: - method – name of the Kafka API service method, for example `PRODUCE`, `SASL_HANDSHAKE`
`api.kafka.response.duration_milliseconds` `HIST_RATE`, pieces	Histogram counter. Defines a set of intervals in milliseconds and for each of them shows the number of requests with execution time falling within this interval. Label: - method – name of the Kafka API service method
`api.kafka.produce.failed_messages` `RATE`, pieces	The number of messages per unit of time sent by the `PRODUCE` method that were not recorded. Label: - topic – topic name
`api.kafka.produce.successful_messages` `RATE`, pieces	The number of messages per unit of time sent by the `PRODUCE` method that were successfully recorded. Метки: - topic – topic name
`api.kafka.produce.total_messages` `RATE`, pieces	Number of messages per unit of time sent by the `PRODUCE` method. Label: - topic – topic name

Session metrics

Metric name Type, units of measurement	Description Labels
`table.session.active_count` `IGAUGE`, pieces	The number of sessions started by clients and running at a given time.
`table.session.closed_by_idle_count` `RATE`, pieces	The number of sessions closed by the DB server in a certain period of time due to exceeding the lifetime allowed for an idle session.

Transaction processing metrics

You can analyze a transaction's execution time using a histogram counter. The intervals are set in milliseconds. The chart shows the number of transactions whose duration falls within a certain time interval.

Metric name Type, units of measurement	Description Labels
`table.transaction.total_duration_milliseconds` `HIST_RATE`, pieces	The number of transactions with a certain duration on the server and client. The duration of a transaction is counted from the point of its explicit or implicit start to committing changes or its rollback. Includes the transaction processing time on the server and the time on the client between sending different requests within the same transaction. Labels: - tx_kind: The transaction type, possible values are `read_only`, `read_write`, `write_only`, and `pure`.
`table.transaction.server_duration_milliseconds` `HIST_RATE`, pieces	The number of transactions with a certain duration on the server. The duration is the time of executing requests within a transaction on the server. Does not include the waiting time on the client between sending separate requests within a single transaction. Labels: -tx_kind: The transaction type, possible values are`read_only`, `read_write`, `write_only`, and `pure`.
`table.transaction.client_duration_milliseconds` `HIST_RATE`, pieces	The number of transactions with a certain duration on the client. The duration is the waiting time on the client between sending individual requests within a single transaction. Does not include the time of executing requests on the server. Labels: - tx_kind: The transaction type, possible values are `read_only`, `read_write`, `write_only`, and `pure`.

Query processing metrics

Metric name Type, units of measurement	Description Labels
`table.query.request.bytes` `RATE`, bytes	The size of YQL query text and parameter values to queries received by the database in a certain period of time.
`table.query.request.parameters_bytes` `RATE`, bytes	The parameter size to the queries received by the database in a certain period of time.
`table.query.response.bytes` `RATE`, bytes	The size of responses sent by the database in a certain period of time.
`table.query.compilation.latency_milliseconds` `HIST_RATE`, pieces	Histogram counter. The intervals are set in milliseconds. Shows the number of successfully executed compilation queries whose duration falls within a certain time interval.
`table.query.compilation.active_count` `IGAUGE`, pieces	The number of active compilations at a given time.
`table.query.compilation.count` `RATE`, pieces	The number of compilations that completed successfully in a certain time period.
`table.query.compilation.errors` `RATE`, pieces	The number of compilations that failed in a certain period of time.
`table.query.compilation.cache_hits` `RATE`, pieces	The number of queries in a certain period of time, which didn't require any compilation, because there was an existing plan in the cache of prepared queries.
`table.query.compilation.cache_misses` `RATE`, pieces	The number of queries in a certain period of time that required query compilation.
`table.query.execution.latency_milliseconds` `HIST_RATE`, pieces	Histogram counter. The intervals are set in milliseconds. Shows the number of queries whose execution time falls within a certain interval.

Row-oriented table partition metrics

Metric name Type, units of measurement	Description Labels
`table.datashard.row_count` `GAUGE`, pieces	The number of rows in all row-oriented tables in the database.
`table.datashard.size_bytes` `GAUGE`, bytes	The size of data in all row-oriented tables in the database.
`table.datashard.used_core_percents` `HIST_GAUGE`, %	Histogram counter. The intervals are set as a percentage. Shows the number of row-oriented table partitions using computing resources in the ratio that falls within a certain interval.
`table.datashard.read.rows` `RATE`, pieces	The number of rows that are read by all partitions of all row-oriented tables in the database in a certain period of time.
`table.datashard.read.bytes` `RATE`, bytes	The size of data that is read by all partitions of all row-oriented tables in the database in a certain period of time.
`table.datashard.write.rows` `RATE`, pieces	The number of rows that are written by all partitions of all row-oriented tables in the database in a certain period of time.
`table.datashard.write.bytes` `RATE`, bytes	The size of data that is written by all partitions of all row-oriented tables in the database in a certain period of time.
`table.datashard.scan.rows` `RATE`, pieces	The number of rows that are read through `StreamExecuteScanQuery` or `StreamReadTable` gRPC API calls by all partitions of all row-oriented tables in the database in a certain period of time.
`table.datashard.scan.bytes` `RATE`, bytes	The size of data that is read through `StreamExecuteScanQuery` or `StreamReadTable` gRPC API calls by all partitions of all row-oriented tables in the database in a certain period of time.
`table.datashard.bulk_upsert.rows` `RATE`, pieces	The number of rows that are added through a `BulkUpsert` gRPC API call to all partitions of all row-oriented tables in the database in a certain period of time.
`table.datashard.bulk_upsert.bytes` `RATE`, bytes	The size of data that is added through a `BulkUpsert` gRPC API call to all partitions of all row-oriented tables in the database in a certain period of time.
`table.datashard.erase.rows` `RATE`, pieces	The number of rows deleted from row-oriented tables in the database in a certain period of time.
`table.datashard.erase.bytes` `RATE`, bytes	The size of data deleted from row-oriented tables in the database in a certain period of time.
`table.datashard.cache_hit.bytes` `RATE`, bytes	The total amount of data successfully retrieved from memory (cache), indicating efficient cache utilization in serving frequently accessed data without accessing distributed storage.
`table.datashard.cache_miss.bytes` `RATE`, bytes	The total amount of data that was requested but not found in memory (cache) and was read from distributed storage, highlighting potential areas for cache optimization.

Column-oriented table partition metrics

Metric name Type, units of measurement	Description Labels
`table.columnshard.write.rows` `RATE`, pieces	The number of rows that are written by all partitions of all column-oriented tables in the database in a certain period of time.
`table.columnshard.write.bytes` `RATE`, bytes	The size of data that is written by all partitions of all column-oriented tables in the database in a certain period of time.
`table.columnshard.scan.rows` `RATE`, pieces	The number of rows that are read through `StreamExecuteScanQuery` or `StreamReadTable` gRPC API calls by all partitions of all column-oriented tables in the database in a certain period of time.
`table.columnshard.scan.bytes` `RATE`, bytes	The size of data that is read through `StreamExecuteScanQuery` or `StreamReadTable` gRPC API calls by all partitions of all column-oriented tables in the database in a certain period of time.
`table.columnshard.bulk_upsert.rows` `RATE`, pieces	The number of rows that are added through a `BulkUpsert` gRPC API call to all partitions of all column-oriented tables in the database in a certain period of time.
`table.columnshard.bulk_upsert.bytes` `RATE`, bytes	The size of data that is added through a `BulkUpsert` gRPC API call to all partitions of all column-oriented tables in the database in a certain period of time.

Resource usage metrics (for Dedicated mode only)

Metric name Type units of measurement	Description Labels
`resources.cpu.used_core_percents` `RATE`, %	CPU usage. If the value is `100`, one of the cores is being used for 100%. The value may be greater than `100` for multi-core configurations. Labels: - pool: The computing pool, possible values are `user`, `system`, `batch`, `io`, and `ic`.
`resources.cpu.limit_core_percents` `IGAUGE`, %	The percentage of CPU available to a database. For example, for a database that has three nodes with four cores in `pool=user` per node, the value of this metric will be `1200`. Labels: - pool: The computing pool, possible values are `user`, `system`, `batch`, `io`, and `ic`.
`resources.memory.used_bytes` `IGAUGE`, bytes	The amount of RAM used by the database nodes.
`resources.memory.limit_bytes` `IGAUGE`, bytes	RAM available to the database nodes.

Query processing metrics (for Dedicated mode only)

Metric name Type units of measurement	Description Labels
`table.query.compilation.cache_evictions` `RATE`, pieces	The number of queries evicted from the cache of prepared queries in a certain period of time.
`table.query.compilation.cache_size_bytes` `IGAUGE`, bytes	The size of the cache of prepared queries.
`table.query.compilation.cached_query_count` `IGAUGE`, pieces	The size of the cache of prepared queries.

Topic metrics

Metric name Type units of measurement	Description Labels
`topic.producers_count` `GAUGE`, pieces	The number of unique topic producers. Labels: - topic – the name of the topic.
`topic.storage_bytes` `GAUGE`, bytes	The size of the topic in bytes. Labels: - topic - the name of the topic.
`topic.read.bytes` `RATE`, bytes	The number of bytes read by the consumer from the topic. Labels: - topic – the name of the topic. - consumer – the name of the consumer.
`topic.read.messages` `RATE`, pieces	The number of messages read by the consumer from the topic. Labels: - topic – the name of the topic. - consumer – the name of the consumer.
`topic.read.lag_messages` `RATE`, pieces	The number of unread messages by the consumer in the topic. Labels: - topic – the name of the topic. - consumer – the name of the consumer.
`topic.read.lag_milliseconds` `HIST_RATE`, pieces	A histogram counter. The intervals are specified in milliseconds. It shows the number of messages where the difference between the reading time and the message creation time falls within the specified interval. Labels: - topic – the name of the topic. - consumer – the name of the consumer.
`topic.write.bytes` `RATE`, bytes	The size of the written data. Labels: - topic – the name of the topic.
`topic.write.uncommited_bytes` `RATE`, bytes	The size of data written as part of ongoing transactions. Labels: - topic — the name of the topic.
`topic.write.uncompressed_bytes` `RATE`, bytes	The size of uncompressed written data. Метки: - topic – the name of the topic.
`topic.write.messages` `RATE`, pieces	The number of written messages. Labels: - topic – the name of the topic.
`topic.write.uncommitted_messages` `RATE`, pieces	The number of messages written as part of ongoing transactions. Labels: - topic — the name of the topic.
`topic.write.message_size_bytes` `HIST_RATE`, pieces	A histogram counter. The intervals are specified in bytes. It shows the number of messages which size falls within the boundaries of the interval. Labels: - topic – the name of the topic.
`topic.write.lag_milliseconds` `HIST_RATE`, pieces	A histogram counter. The intervals are specified in milliseconds. It shows the number of messages where the difference between the write time and the message creation time falls within the specified interval. Labels: - topic – the name of the topic.

Aggregated metrics of topic partitions

The following table shows aggregated partition metrics for the topic. The maximum and minimum values are calculated for all partitions of a given topic.

Metric name Type units of measurement	Description Labels
`topic.partition.init_duration_milliseconds_max` `GAUGE`, milliseconds	Maximum partition initialization delay. Метки: - topic – topic name
`topic.partition.producers_count_max` `GAUGE`, pieces	The maximum number of sources in the partition. Метки: - topic – topic name
`topic.partition.storage_bytes_max` `GAUGE`, bytes	Maximum partition size in bytes. Label: - topic – topic name
`topic.partition.uptime_milliseconds_min` `GAUGE`, pieces	Minimum partition operating time after restart. Normally during a rolling restart of `topic.partition.uptime_milliseconds_min` is close to 0, after the end of the rolling restart the value of `topic.partition.uptime_milliseconds_min` should increase to infinity. Label: - topic – topic name
`topic.partition.total_count` `GAUGE`, pieces	Total number of partitions in the topic. Label: - topic – topic name
`topic.partition.alive_count` `GAUGE`, pieces	The number of partitions sending their metrics. Label: - topic – topic name
`topic.partition.committed_end_to_end_lag_milliseconds_max` `GAUGE`, milliseconds	The maximum (across all partitions) difference between the current time and the time the last downloaded message was created. Label: - topic – topic name. - consumer – name of the consumer
`topic.partition.committed_lag_messages_max` `GAUGE`, pieces	The maximum (across all partitions) difference between the last partition offset and the recorded partition offset. Метки: - topic – topic name. - consumer – name of the consumer
`topic.partition.committed_read_lag_milliseconds_max` `GAUGE`, milliseconds	The maximum (across all partitions) difference between the current time and the recording time of the last recorded message. Label: - topic – topic name. - consumer – name of the consumer
`topic.partition.end_to_end_lag_milliseconds_max` `GAUGE`, milliseconds	The difference between the current time and the minimum creation time among all messages read in the last minute in all partitions. Label: - topic – topic name. - consumer – name of the consumer
`topic.partition.lag_messages_max` `GAUGE`, pieces	The maximum difference (across all partitions) of the last offset in the partition and the last subtracted offset. Label: - topic – topic name. - consumer – name of the consumer
`topic.partition.read.lag_milliseconds_max` `GAUGE`, milliseconds	The difference between the current time and the minimum recording time among all messages read in the last minute in all partitions. Label: - topic – topic name. - consumer – name of the consumer
`topic.partition.read.idle_milliseconds_max` `GAUGE`, milliseconds	Maximum idle time (how long the partition was not read) for all partitions. Label: - topic – topic name. - consumer – name of the consumer
`topic.partition.read.lag_milliseconds_max` `GAUGE`, milliseconds	The maximum difference between the recording time and the creation time among all messages read in the last minute. Label: - topic – topic name. - consumer – name of the consumer
`topic.partition.write.lag_milliseconds_max` `GAUGE`, milliseconds	The maximum difference between the recording time and the creation time among all messages recorded in the last minute. Label: - topic – topic name
`topic.partition.write.speed_limit_bytes_per_second` `GAUGE`, bytes per second	Write quota in bytes per second per partition. Label: - topic – topic name
`topic.partition.write.throttled_nanoseconds_max` `GAUGE`, nanoseconds	Maximum write throttling time (waiting on quota) for all partitions. In the limit, if `topic.partition.write.throttled_nanoseconds_max` = 10^9, then this means that the entire second was waited on the quota Label: - topic – topic name
`topic.partition.write.bytes_per_day_max` `GAUGE`, bytes	The maximum number of bytes written over the last 24 hours for all partitions. Label: - topic – topic name
`topic.partition.write.bytes_per_hour_max` `GAUGE`, bytes	The maximum number of bytes written in the last hour, across all partitions. Label: - topic – topic name
`topic.partition.write.bytes_per_minute_max` `GAUGE`, bytes	The maximum number of bytes written in the last minute, across all partitions. Label: - topic – topic name
`topic.partition.write.idle_milliseconds_max` `GAUGE`, milliseconds	Maximum time the partition is idle for recording.br/>Label: - topic – topic name

Resource pool metrics

Metric name Type, units of measurement	Description Tags
`kqp.workload_manager.CpuQuotaManager.AverageLoadPercentage` `RATE`, pieces	Average database load, the `DATABASE_LOAD_CPU_THRESHOLD` works based on this metric.
`kqp.workload_manager.InFlightLimit` `GAUGE`, pieces	Limit on the number of simultaneously running requests.
`kqp.workload_manager.GlobalInFly` `GAUGE`, pieces	The current number of simultaneously running requests. Displayed only for pools with `CONCURRENT_QUERY_LIMIT` or `DATABASE_LOAD_CPU_THRESHOLD` enabled
`kqp.workload_manager.QueueSizeLimit` `GAUGE`, pieces	Queue size of pending requests.
`kqp.workload_manager.GlobalDelayedRequests` `GAUGE`, pieces	The number of requests waiting in the execution queue. Only visible for pools with `CONCURRENT_QUERY_LIMIT` or `DATABASE_LOAD_CPU_THRESHOLD` enabled .

Was the article helpful?

Observability

Grafana dashboards