Vector Indexes
Alert
The functionality of vector indexes is available in the test mode in main. This functionality will be fully available in version 25.1.
The following features are not supported:
- Index update: the main table can be modified, but the existing index will not be updated. A new index is to be built to reflect the changes. If necessary, the existing index can be atomically replaced with the newly built one.
- Building an index for vectors with bit quantization.
These limitations may be removed in future versions.
Vector indexes are specialized data structures that enable efficient vector search in multidimensional spaces. Unlike secondary indexes, which optimize searching by equality or range, vector indexes allow similarity searching based on similarity or distance functions.
Data in a YDB table is stored and sorted by the primary key, ensuring efficient searching by exact match and range scanning. Vector indexes provide similar efficiency for nearest neighbor searches in vector spaces.
Characteristics of Vector Indexes
Vector indexes in YDB address the nearest neighbor search problem using similarity or distance functions. Several distance/similarity functions are supported: "inner_product", "cosine" (similarity) and "cosine", "euclidean", "manhattan" (distance).
The current implementation offers one type of index: vector_kmeans_tree
.
vector_kmeans_tree
Vector Index Type The vector_kmeans_tree
index implements hierarchical data clustering. The structure of the index includes:
-
Hierarchical clustering:
- the index builds multiple levels of k-means clusters
- at each level, vectors are distributed across a predefined number of clusters raised to the power of the level
- the first level clusters the entire dataset
- subsequent levels recursively cluster the contents of each parent cluster
-
Search process:
- search proceeds recursively from the first level to the subsequent ones
- during queries, the index analyzes only the most promising clusters
- such search space pruning avoids complete enumeration of all vectors
-
Parameters:
levels
: number of levels in the tree, defining search depth (recommended 1-3)clusters
: number of clusters in k-means, defining search width (recommended 64-512)
Internally, a vector index consists of hidden index tables named indexImpl*Table
. In selection queries using the vector index, the index tables will appear in query statistics.
Types of Vector Indexes
A vector index can be covering, meaning it includes additional columns to enable reading from the index without accessing the main table.
Alternatively, it can be prefixed, allowing for additional columns to be used for quick filtering during reading.
Below are examples of creating vector indexes of different types.
Basic Vector Index
Global vector index on the embedding
column:
ALTER TABLE my_table
ADD INDEX my_index
GLOBAL USING vector_kmeans_tree
ON (embedding)
WITH (distance=cosine, vector_type="uint8", vector_dimension=512, levels=2, clusters=128);
Vector Index with Covering Columns
A covering vector index, including an additional column data
to avoid reading from the main table during a search:
ALTER TABLE my_table
ADD INDEX my_index
GLOBAL USING vector_kmeans_tree
ON (embedding) COVER (data)
WITH (distance=cosine, vector_type="uint8", vector_dimension=512, levels=2, clusters=128);
Prefixed Vector Index
A prefixed vector index, allowing filtering by the prefix column user
during vector search:
ALTER TABLE my_table
ADD INDEX my_index
GLOBAL USING vector_kmeans_tree
ON (user, embedding)
WITH (distance=cosine, vector_type="uint8", vector_dimension=512, levels=2, clusters=128);
Prefixed Vector Index with Covering Columns
A prefixed vector index with covering columns:
ALTER TABLE my_table
ADD INDEX my_index
GLOBAL USING vector_kmeans_tree
ON (user, embedding) COVER (data)
WITH (distance=cosine, vector_type="uint8", vector_dimension=512, levels=2, clusters=128);
Creating Vector Indexes
Vector indexes can be created:
- during table creation using the YQL operator CREATE TABLE;
- added to an existing table using the YQL operator ALTER TABLE.
Using Vector Indexes
Queries to vector indexes are executed using the VIEW
syntax in YQL. For prefixed indexes, specify the prefix columns in the WHERE
clause:
DECLARE $query_vector AS List<Uint8>;
SELECT user, data
FROM my_table VIEW my_index
ORDER BY Knn::CosineSimilarity(embedding, $query_vector) DESC
LIMIT 10;
For more details on executing SELECT
queries using vector indexes, see the section VIEW VECTOR INDEX.
Note
If the VIEW
expression is not used, the query will perform a full table scan with pairwise comparison of vectors.
It is recommended to check the optimality of the written query using query statistics. In particular, ensure there is no full scan of the main table.