Vector index
Vector index in row-oriented tables is created using the same syntax as secondary indexes, by specifying vector_kmeans_tree as the index type. Subset of syntax available for vector indexes:
CREATE TABLE `<table_name>` (
...
INDEX `<index_name>`
GLOBAL
[SYNC]
USING vector_kmeans_tree
ON ( <index_columns> )
[COVER ( <cover_columns> )]
[WITH ( <parameter_name> = <parameter_value>[, ...])]
[, ...]
)
Where:
<index_name>- unique index name for data accessSYNC- indicates synchronous data writing to the index. This is the only currently available option, and it is used by default.<index_columns>- comma-separated list of table columns used for index searches (the last column is used as embedding, others as filtering columns)<cover_columns>- list of additional table columns stored in the index to enable retrieval without accessing the main table<parameter_name>and<parameter_value>- list of key-value parameters:
- common parameters for all vector indexes:
vector_dimension- embedding vector dimensionality (should be between 1 and 16384)vector_type- vector value type (float,uint8, orint8)distance- distance function (cosine,manhattan, oreuclidean), mutually exclusive withsimilaritysimilarity- similarity function (inner_productorcosine), mutually exclusive withdistance
- specific parameters for
vector_kmeans_tree(see the reference):clusters- number of centroids for k-means algorithm (should be between 2 and 2048)levels- number of levels in the tree (should be between 1 and 16)- the total number of nodes in the tree, calculated as
clustersraised to the power oflevels, should be no more than 1073741824 - the product of
vector_dimensionandclustersshould be no more than 4194304
Warning
Indexed vector search completeness or performance may decrease after updating a large amount of data in a table with a vector index. For more details, see Updating Vector Indexes.
Warning
Supported only for row-oriented tables. Support for column-oriented tables is currently under development.
Example
CREATE TABLE user_articles (
article_id Uint64,
user String,
title String,
text String,
embedding String,
INDEX emb_cosine_idx GLOBAL SYNC USING vector_kmeans_tree
ON (user, embedding) COVER (title, text)
WITH (
distance="cosine",
vector_type="float",
vector_dimension=512,
clusters=128,
levels=2
),
PRIMARY KEY (article_id)
)
Was the article helpful?
Previous
Next