Metrics reference
- Resource usage metrics
- GRPC API metrics
- GRPC API metrics fot topics
- HTTP API Metrics
- Kafka API metrics
- Session metrics
- Transaction processing metrics
- Query processing metrics
- Row-oriented table partition metrics
- Column-oriented table partition metrics
- Resource usage metrics (for Dedicated mode only)
- Query processing metrics (for Dedicated mode only)
- Topic metrics
- Aggregated metrics of topic partitions
- Resource pool metrics
Resource usage metrics
| Metric name Type, units of measurement |
Description Labels |
|---|---|
resources.storage.used_bytesIGAUGE, bytes |
The size of user and service data stored in distributed network storage. resources.storage.used_bytes = resources.storage.table.used_bytes + resources.storage.topic.used_bytes. |
resources.storage.table.used_bytesIGAUGE, bytes |
The size of user and service data stored by tables in distributed network storage. Service data includes the data of the primary, secondary indexes and vector indexes. |
resources.storage.topic.used_bytesIGAUGE, bytes |
The size of storage used by topics. This metric sums the topic.storage_bytes values of all topics. |
resources.storage.limit_bytesIGAUGE, bytes |
A limit on the size of user and service data that a database can store in distributed network storage. |
GRPC API metrics
| Metric name Type, units of measurement |
Description Labels |
|---|---|
api.grpc.request.bytesRATE, bytes |
The size of queries received by the database in a certain period of time. Labels: - api_service: The name of the gRPC API service, such as table or data_streams.- method: The name of a gRPC API service method, such as ExecuteDataQuery (for table service) or PutRecord, GetRecords (for data_stream service). |
api.grpc.request.dropped_countRATE, pieces |
The number of requests dropped at the transport (gRPC) layer due to an error. Labels: - api_service: The name of the gRPC API service, such as table.- method: The name of a gRPC API service method, such as ExecuteDataQuery. |
api.grpc.request.inflight_countIGAUGE, pieces |
The number of requests that a database is simultaneously handling in a certain period of time. Labels: - api_service: The name of the gRPC API service, such as table.- method: The name of a gRPC API service method, such as ExecuteDataQuery. |
api.grpc.request.inflight_bytesIGAUGE, bytes |
The size of requests that a database is simultaneously handling in a certain period of time. Labels: - api_service: The name of the gRPC API service, such as table.- method: The name of a gRPC API service method, such as ExecuteDataQuery. |
api.grpc.response.bytesRATE, bytes |
The size of responses sent by the database in a certain period of time. Labels: - api_service: The name of the gRPC API service, such as table.- method: The name of a gRPC API service method, such as ExecuteDataQuery. |
api.grpc.response.countRATE, pieces |
The number of responses sent by the database in a certain period of time. Labels: - api_service: The name of the gRPC API service, such as table.- method: The name of a gRPC API service method, such as ExecuteDataQuery.- status is the request execution status. See a more detailed description of statuses under Error Handling. |
api.grpc.response.dropped_countRATE, pieces |
The number of responses dropped at the transport (gRPC) layer due to an error. Labels: - api_service: The name of the gRPC API service, such as table.- method: The name of a gRPC API service method, such as ExecuteDataQuery. |
api.grpc.response.issuesRATE, pieces |
The number of errors of a certain type arising in the execution of a request over a certain period of time. Tags: - issue_type is the error type wth the only value being optimistic_locks_invalidation. For more on lock invalidation, review Transactions and requests to YDB. |
GRPC API metrics fot topics
| Metric name Type, units of measurement |
Description Labels |
|---|---|
grpc.topic.stream_read.commitsRATE, |
Commit number of method Ydb::TopicService::StreamRead.Labels: - topic – topic name. - consumer – consumer name |
grpc.topic.stream_read.bytesRATE, pieces |
Number of bytes read by the Ydb::TopicService::StreamRead method.Labels: - topic – topic name. - consumer – consumer name |
grpc.topic.stream_read.messagesRATE, pieces |
Number of messages read by the method Ydb::TopicService::StreamRead.Labels: - topic – topic name. - consumer – consumer name |
grpc.topic.stream_read.partition_session.errorsRATE, pieces |
Number of errors when working with the partition. Labels: - topic – topic name. - consumer – consumer name |
grpc.topic.stream_read.partition_session.startedRATE, pieces |
The number of sessions launched per unit of time. Labels: - topic – topic name. - consumer – consumer name |
grpc.topic.stream_read.partition_session.stoppedRATE, pieces |
Number of sessions stopped per unit of time. Labels: - topic – topic name. - consumer – consumer name |
grpc.topic.stream_read.partition_session.starting_countRATE, pieces |
The number of sessions being launched (it means, the client received a command to start a session, but the client has not yet launched the session). Labels: - topic – topic name. - consumer – consumer name |
grpc.topic.stream_read.partition_session.stopping_countRATE, pieces |
Number of sessions stopped. Labels: - topic – topic name. - consumer – consumer name |
grpc.topic.stream_read.partition_session.countRATE, pieces |
Number of partition_session. Label: - topic – topic name. - consumer – consumer name |
grpc.topic.stream_write.bytesRATE, bytes |
Number of bytes writing by Ydb::TopicService::StreamWrite.Labels: - topic – topic name |
grpc.topic.stream_write.uncommitted_bytesRATE, bytes |
The number of bytes written by the Ydb::TopicService::StreamWrite method within transactions that have not yet been committed.Label: - topic – topic name |
grpc.topic.stream_write.errorsRATE, pieces |
Number of errors when calling the Ydb::TopicService::StreamWrite method.Labels: - topic – topic name |
grpc.topic.stream_write.messagesRATE, pieces |
Number of messages written by method Ydb::TopicService::StreamWrite.Label: - topic – topic name |
grpc.topic.stream_write.uncommitted_messagesRATE, pieces |
Number of messages written by method Ydb::TopicService::StreamWrite within transactions that have not yet been committed.Label: - topic – topic name |
grpc.topic.stream_write.partition_throttled_millisecondsHIST_RATE, pieces |
Histogram counter. Intervals are specified in milliseconds. Shows the number of messages waiting at the quota. Label: - topic – topic name |
grpc.topic.stream_write.sessions_active_countGAUGE, pieces |
Number of open recording sessions. Метки: - topic – topic name |
grpc.topic.stream_write.sessions_createdRATE, pieces |
Number of recording sessions created. Метки: - topic – topic name |
HTTP API Metrics
| Metric name Type, units of measurement |
Description Labels |
|---|---|
api.http.data_streams.request.countRATE, pieces |
Number of HTTP requests. Labels: - method – name of the HTTP API service method, for example PutRecord , GetRecords.- topic – topic name |
api.http.data_streams.request.bytesRATE, bytes |
Total size of HTTP requests. Labels: - method – name of the HTTP API service method, in this case only PutRecord.- topic – topic name |
api.http.data_streams.response.countRATE, pieces |
Number of responses via HTTP protocol. Label: - method – name of the HTTP API service method, for example PutRecord , GetRecords.- topic – topic name. - code – HTTP response code |
api.http.data_streams.response.bytesRATE, bytes |
Total size of HTTP responses. Label: - method – name of the HTTP API service method, in this case only GetRecords.- topic – topic name |
api.http.data_streams.response.duration_millisecondsHIST_RATE, pieces |
Histogram counter. Intervals are specified in milliseconds. Shows the number of responses whose execution time falls within a certain interval. Label: - method – name of the HTTP API service method. - topic – topic name |
api.http.data_streams.get_records.messagesRATE, pieces |
Number of messages written by the method GetRecords.Labels: - topic – topic name |
api.http.data_streams.put_record.messagesRATE, pieces |
Number of messages written by the method PutRecord (always =1).Label: - topic – topic name |
api.http.data_streams.put_records.failed_messagesRATE, pieces |
The number of messages sent by the PutRecords method that were not recorded.Метки: - topic – topic name |
api.http.data_streams.put_records.successful_messagesRATE, pieces |
The number of messages sent by the PutRecords method that were successfully written.Метки: - topic – topic name |
api.http.data_streams.put_records.total_messagesRATE, pieces |
Number of messages sent using the PutRecords method.Label: - topic – topic name |
Kafka API metrics
| Metric name Type, units of measurement |
Description Labels |
|---|---|
api.kafka.request.countRATE, pieces |
The number of requests via the Kafka protocol per unit of time. Label: - method – name of the Kafka API service method, for example PRODUCE, SASL_HANDSHAKE |
api.kafka.request.bytesRATE, bytes |
Total size of Kafka requests per unit of time. Label: - method – name of the Kafka API service method, for example PRODUCE, SASL_HANDSHAKE |
api.kafka.response.countRATE, pieces |
The number of responses via the Kafka protocol per unit of time. Label: - method – name of the Kafka API service method, for example PRODUCE, SASL_HANDSHAKE.- error_code – Kafka response code |
api.kafka.response.bytesRATE, bytes |
The total size of responses via the Kafka protocol per unit of time. Label: - method – name of the Kafka API service method, for example PRODUCE, SASL_HANDSHAKE |
api.kafka.response.duration_millisecondsHIST_RATE, pieces |
Histogram counter. Defines a set of intervals in milliseconds and for each of them shows the number of requests with execution time falling within this interval. Label: - method – name of the Kafka API service method |
api.kafka.produce.failed_messagesRATE, pieces |
The number of messages per unit of time sent by the PRODUCE method that were not recorded.Label: - topic – topic name |
api.kafka.produce.successful_messagesRATE, pieces |
The number of messages per unit of time sent by the PRODUCE method that were successfully recorded.Метки: - topic – topic name |
api.kafka.produce.total_messagesRATE, pieces |
Number of messages per unit of time sent by the PRODUCE method.Label: - topic – topic name |
Session metrics
| Metric name Type, units of measurement |
Description Labels |
|---|---|
table.session.active_countIGAUGE, pieces |
The number of sessions started by clients and running at a given time. |
table.session.closed_by_idle_countRATE, pieces |
The number of sessions closed by the DB server in a certain period of time due to exceeding the lifetime allowed for an idle session. |
Transaction processing metrics
You can analyze a transaction's execution time using a histogram counter. The intervals are set in milliseconds. The chart shows the number of transactions whose duration falls within a certain time interval.
| Metric name Type, units of measurement |
Description Labels |
|---|---|
table.transaction.total_duration_millisecondsHIST_RATE, pieces |
The number of transactions with a certain duration on the server and client. The duration of a transaction is counted from the point of its explicit or implicit start to committing changes or its rollback. Includes the transaction processing time on the server and the time on the client between sending different requests within the same transaction. Labels: - tx_kind: The transaction type, possible values are read_only, read_write, write_only, and pure. |
table.transaction.server_duration_millisecondsHIST_RATE, pieces |
The number of transactions with a certain duration on the server. The duration is the time of executing requests within a transaction on the server. Does not include the waiting time on the client between sending separate requests within a single transaction. Labels: -tx_kind: The transaction type, possible values are read_only, read_write, write_only, and pure. |
table.transaction.client_duration_millisecondsHIST_RATE, pieces |
The number of transactions with a certain duration on the client. The duration is the waiting time on the client between sending individual requests within a single transaction. Does not include the time of executing requests on the server. Labels: - tx_kind: The transaction type, possible values are read_only, read_write, write_only, and pure. |
Query processing metrics
| Metric name Type, units of measurement |
Description Labels |
|---|---|
table.query.request.bytesRATE, bytes |
The size of YQL query text and parameter values to queries received by the database in a certain period of time. |
table.query.request.parameters_bytesRATE, bytes |
The parameter size to the queries received by the database in a certain period of time. |
table.query.response.bytesRATE, bytes |
The size of responses sent by the database in a certain period of time. |
table.query.compilation.latency_millisecondsHIST_RATE, pieces |
Histogram counter. The intervals are set in milliseconds. Shows the number of successfully executed compilation queries whose duration falls within a certain time interval. |
table.query.compilation.active_countIGAUGE, pieces |
The number of active compilations at a given time. |
table.query.compilation.countRATE, pieces |
The number of compilations that completed successfully in a certain time period. |
table.query.compilation.errorsRATE, pieces |
The number of compilations that failed in a certain period of time. |
table.query.compilation.cache_hitsRATE, pieces |
The number of queries in a certain period of time, which didn't require any compilation, because there was an existing plan in the cache of prepared queries. |
table.query.compilation.cache_missesRATE, pieces |
The number of queries in a certain period of time that required query compilation. |
table.query.execution.latency_millisecondsHIST_RATE, pieces |
Histogram counter. The intervals are set in milliseconds. Shows the number of queries whose execution time falls within a certain interval. |
Row-oriented table partition metrics
| Metric name Type, units of measurement |
Description Labels |
|---|---|
table.datashard.row_countGAUGE, pieces |
The number of rows in all row-oriented tables in the database. |
table.datashard.size_bytesGAUGE, bytes |
The size of data in all row-oriented tables in the database. |
table.datashard.used_core_percentsHIST_GAUGE, % |
Histogram counter. The intervals are set as a percentage. Shows the number of row-oriented table partitions using computing resources in the ratio that falls within a certain interval. |
table.datashard.read.rowsRATE, pieces |
The number of rows that are read by all partitions of all row-oriented tables in the database in a certain period of time. |
table.datashard.read.bytesRATE, bytes |
The size of data that is read by all partitions of all row-oriented tables in the database in a certain period of time. |
table.datashard.write.rowsRATE, pieces |
The number of rows that are written by all partitions of all row-oriented tables in the database in a certain period of time. |
table.datashard.write.bytesRATE, bytes |
The size of data that is written by all partitions of all row-oriented tables in the database in a certain period of time. |
table.datashard.scan.rowsRATE, pieces |
The number of rows that are read through StreamExecuteScanQuery or StreamReadTable gRPC API calls by all partitions of all row-oriented tables in the database in a certain period of time. |
table.datashard.scan.bytesRATE, bytes |
The size of data that is read through StreamExecuteScanQuery or StreamReadTable gRPC API calls by all partitions of all row-oriented tables in the database in a certain period of time. |
table.datashard.bulk_upsert.rowsRATE, pieces |
The number of rows that are added through a BulkUpsert gRPC API call to all partitions of all row-oriented tables in the database in a certain period of time. |
table.datashard.bulk_upsert.bytesRATE, bytes |
The size of data that is added through a BulkUpsert gRPC API call to all partitions of all row-oriented tables in the database in a certain period of time. |
table.datashard.erase.rowsRATE, pieces |
The number of rows deleted from row-oriented tables in the database in a certain period of time. |
table.datashard.erase.bytesRATE, bytes |
The size of data deleted from row-oriented tables in the database in a certain period of time. |
table.datashard.cache_hit.bytesRATE, bytes |
The total amount of data successfully retrieved from memory (cache), indicating efficient cache utilization in serving frequently accessed data without accessing distributed storage. |
table.datashard.cache_miss.bytesRATE, bytes |
The total amount of data that was requested but not found in memory (cache) and was read from distributed storage, highlighting potential areas for cache optimization. |
Column-oriented table partition metrics
| Metric name Type, units of measurement |
Description Labels |
|---|---|
table.columnshard.write.rowsRATE, pieces |
The number of rows that are written by all partitions of all column-oriented tables in the database in a certain period of time. |
table.columnshard.write.bytesRATE, bytes |
The size of data that is written by all partitions of all column-oriented tables in the database in a certain period of time. |
table.columnshard.scan.rowsRATE, pieces |
The number of rows that are read through StreamExecuteScanQuery or StreamReadTable gRPC API calls by all partitions of all column-oriented tables in the database in a certain period of time. |
table.columnshard.scan.bytesRATE, bytes |
The size of data that is read through StreamExecuteScanQuery or StreamReadTable gRPC API calls by all partitions of all column-oriented tables in the database in a certain period of time. |
table.columnshard.bulk_upsert.rowsRATE, pieces |
The number of rows that are added through a BulkUpsert gRPC API call to all partitions of all column-oriented tables in the database in a certain period of time. |
table.columnshard.bulk_upsert.bytesRATE, bytes |
The size of data that is added through a BulkUpsert gRPC API call to all partitions of all column-oriented tables in the database in a certain period of time. |
Resource usage metrics (for Dedicated mode only)
| Metric name Type units of measurement |
Description Labels |
|---|---|
resources.cpu.used_core_percentsRATE, % |
CPU usage. If the value is 100, one of the cores is being used for 100%. The value may be greater than 100 for multi-core configurations.Labels: - pool: The computing pool, possible values are user, system, batch, io, and ic. |
resources.cpu.limit_core_percentsIGAUGE, % |
The percentage of CPU available to a database. For example, for a database that has three nodes with four cores in pool=user per node, the value of this metric will be 1200.Labels: - pool: The computing pool, possible values are user, system, batch, io, and ic. |
resources.memory.used_bytesIGAUGE, bytes |
The amount of RAM used by the database nodes. |
resources.memory.limit_bytesIGAUGE, bytes |
RAM available to the database nodes. |
Query processing metrics (for Dedicated mode only)
| Metric name Type units of measurement |
Description Labels |
|---|---|
table.query.compilation.cache_evictionsRATE, pieces |
The number of queries evicted from the cache of prepared queries in a certain period of time. |
table.query.compilation.cache_size_bytesIGAUGE, bytes |
The size of the cache of prepared queries. |
table.query.compilation.cached_query_countIGAUGE, pieces |
The size of the cache of prepared queries. |
Topic metrics
| Metric name Type units of measurement |
Description Labels |
|---|---|
topic.producers_countGAUGE, pieces |
The number of unique topic producers. Labels: - topic – the name of the topic. |
topic.storage_bytesGAUGE, bytes |
The size of the topic in bytes. Labels: - topic - the name of the topic. |
topic.read.bytesRATE, bytes |
The number of bytes read by the consumer from the topic. Labels: - topic – the name of the topic. - consumer – the name of the consumer. |
topic.read.messagesRATE, pieces |
The number of messages read by the consumer from the topic. Labels: - topic – the name of the topic. - consumer – the name of the consumer. |
topic.read.lag_messagesRATE, pieces |
The number of unread messages by the consumer in the topic. Labels: - topic – the name of the topic. - consumer – the name of the consumer. |
topic.read.lag_millisecondsHIST_RATE, pieces |
A histogram counter. The intervals are specified in milliseconds. It shows the number of messages where the difference between the reading time and the message creation time falls within the specified interval. Labels: - topic – the name of the topic. - consumer – the name of the consumer. |
topic.write.bytesRATE, bytes |
The size of the written data. Labels: - topic – the name of the topic. |
topic.write.uncommited_bytesRATE, bytes |
The size of data written as part of ongoing transactions. Labels: - topic — the name of the topic. |
topic.write.uncompressed_bytesRATE, bytes |
The size of uncompressed written data. Метки: - topic – the name of the topic. |
topic.write.messagesRATE, pieces |
The number of written messages. Labels: - topic – the name of the topic. |
topic.write.uncommitted_messagesRATE, pieces |
The number of messages written as part of ongoing transactions. Labels: - topic — the name of the topic. |
topic.write.message_size_bytesHIST_RATE, pieces |
A histogram counter. The intervals are specified in bytes. It shows the number of messages which size falls within the boundaries of the interval. Labels: - topic – the name of the topic. |
topic.write.lag_millisecondsHIST_RATE, pieces |
A histogram counter. The intervals are specified in milliseconds. It shows the number of messages where the difference between the write time and the message creation time falls within the specified interval. Labels: - topic – the name of the topic. |
Aggregated metrics of topic partitions
The following table shows aggregated partition metrics for the topic. The maximum and minimum values are calculated for all partitions of a given topic.
| Metric name Type units of measurement |
Description Labels |
|---|---|
topic.partition.init_duration_milliseconds_maxGAUGE, milliseconds |
Maximum partition initialization delay. Метки: - topic – topic name |
topic.partition.producers_count_maxGAUGE, pieces |
The maximum number of sources in the partition. Метки: - topic – topic name |
topic.partition.storage_bytes_maxGAUGE, bytes |
Maximum partition size in bytes. Label: - topic – topic name |
topic.partition.uptime_milliseconds_minGAUGE, pieces |
Minimum partition operating time after restart. Normally during a rolling restart of topic.partition.uptime_milliseconds_min is close to 0, after the end of the rolling restart the value of topic.partition.uptime_milliseconds_min should increase to infinity.Label: - topic – topic name |
topic.partition.total_countGAUGE, pieces |
Total number of partitions in the topic. Label: - topic – topic name |
topic.partition.alive_countGAUGE, pieces |
The number of partitions sending their metrics. Label: - topic – topic name |
topic.partition.committed_end_to_end_lag_milliseconds_maxGAUGE, milliseconds |
The maximum (across all partitions) difference between the current time and the time the last downloaded message was created. Label: - topic – topic name. - consumer – name of the consumer |
topic.partition.committed_lag_messages_maxGAUGE, pieces |
The maximum (across all partitions) difference between the last partition offset and the recorded partition offset. Метки: - topic – topic name. - consumer – name of the consumer |
topic.partition.committed_read_lag_milliseconds_maxGAUGE, milliseconds |
The maximum (across all partitions) difference between the current time and the recording time of the last recorded message. Label: - topic – topic name. - consumer – name of the consumer |
topic.partition.end_to_end_lag_milliseconds_maxGAUGE, milliseconds |
The difference between the current time and the minimum creation time among all messages read in the last minute in all partitions. Label: - topic – topic name. - consumer – name of the consumer |
topic.partition.lag_messages_maxGAUGE, pieces |
The maximum difference (across all partitions) of the last offset in the partition and the last subtracted offset. Label: - topic – topic name. - consumer – name of the consumer |
topic.partition.read.lag_milliseconds_maxGAUGE, milliseconds |
The difference between the current time and the minimum recording time among all messages read in the last minute in all partitions. Label: - topic – topic name. - consumer – name of the consumer |
topic.partition.read.idle_milliseconds_maxGAUGE, milliseconds |
Maximum idle time (how long the partition was not read) for all partitions. Label: - topic – topic name. - consumer – name of the consumer |
topic.partition.read.lag_milliseconds_maxGAUGE, milliseconds |
The maximum difference between the recording time and the creation time among all messages read in the last minute. Label: - topic – topic name. - consumer – name of the consumer |
topic.partition.write.lag_milliseconds_maxGAUGE, milliseconds |
The maximum difference between the recording time and the creation time among all messages recorded in the last minute. Label: - topic – topic name |
topic.partition.write.speed_limit_bytes_per_secondGAUGE, bytes per second |
Write quota in bytes per second per partition. Label: - topic – topic name |
topic.partition.write.throttled_nanoseconds_maxGAUGE, nanoseconds |
Maximum write throttling time (waiting on quota) for all partitions. In the limit, if topic.partition.write.throttled_nanoseconds_max = 10^9, then this means that the entire second was waited on the quotaLabel: - topic – topic name |
topic.partition.write.bytes_per_day_maxGAUGE, bytes |
The maximum number of bytes written over the last 24 hours for all partitions. Label: - topic – topic name |
topic.partition.write.bytes_per_hour_maxGAUGE, bytes |
The maximum number of bytes written in the last hour, across all partitions. Label: - topic – topic name |
topic.partition.write.bytes_per_minute_maxGAUGE, bytes |
The maximum number of bytes written in the last minute, across all partitions. Label: - topic – topic name |
topic.partition.write.idle_milliseconds_maxGAUGE, milliseconds |
Maximum time the partition is idle for recording.br/>Label: - topic – topic name |
Resource pool metrics
| Metric name Type, units of measurement |
Description Tags |
|---|---|
kqp.workload_manager.CpuQuotaManager.AverageLoadPercentageRATE, pieces |
Average database load, the DATABASE_LOAD_CPU_THRESHOLD works based on this metric. |
kqp.workload_manager.InFlightLimitGAUGE, pieces |
Limit on the number of simultaneously running requests. |
kqp.workload_manager.GlobalInFlyGAUGE, pieces |
The current number of simultaneously running requests. Displayed only for pools with CONCURRENT_QUERY_LIMIT or DATABASE_LOAD_CPU_THRESHOLD enabled |
kqp.workload_manager.QueueSizeLimitGAUGE, pieces |
Queue size of pending requests. |
kqp.workload_manager.GlobalDelayedRequestsGAUGE, pieces |
The number of requests waiting in the execution queue. Only visible for pools with CONCURRENT_QUERY_LIMIT or DATABASE_LOAD_CPU_THRESHOLD enabled . |
Was the article helpful?
Previous