Vector index

Warning

Supported only for row-oriented tables. Support for column-oriented tables is currently under development.

Alert

The following features are not supported:

  • Index update: the main table can be modified, but the existing index will not be updated. A new index is to be built to reflect the changes. If necessary, the existing index can be atomically replaced with the newly built one.
  • Building an index for vectors with bit quantization.

These limitations may be removed in future versions.

Warning

It makes no sense to create an empty table with a vector index, because for now we don't allow mutations in tables with vector indexes.

You should use ALTER TABLE ... ADD INDEX command) to add a vector index to an existing table.

Vector index in row-oriented tables is created using the same syntax as secondary indexes, by specifying vector_kmeans_tree as the index type. Subset of syntax available for vector indexes:

CREATE TABLE `<table_name>` (
    ...
    INDEX `<index_name>`
        GLOBAL
        [SYNC]
        USING vector_kmeans_tree
        ON ( <index_columns> )
        [COVER ( <cover_columns> )]
        [WITH ( <parameter_name> = <parameter_value>[, ...])]
    [,   ...]
)

Where:

  • <index_name> - unique index name for data access
  • SYNC - indicates synchronous data writing to the index. This is the only currently available option, and it is used by default.
  • <index_columns> - comma-separated list of table columns used for index searches (the last column is used as embedding, others as filtering columns)
  • <cover_columns> - list of additional table columns stored in the index to enable retrieval without accessing the main table
  • <parameter_name> and <parameter_value> - list of key-value parameters:
  • common parameters for all vector indexes:
    • vector_dimension - embedding vector dimensionality (should be between 1 and 16384)
    • vector_type - vector value type (float, uint8, or int8)
    • distance - distance function (cosine, manhattan, or euclidean), mutually exclusive with similarity
  • specific parameters for vector_kmeans_tree (see the reference):
    • clusters - number of centroids for k-means algorithm (should be between 2 and 2048)
    • levels - number of levels in the tree (should be between 1 and 16)
    • overlap_clusters - the number of nearest clusters to add each vector to (default 1)
    • the total number of nodes in the tree, calculated as clusters raised to the power of levels, should be no more than 1073741824
    • the product of vector_dimension and clusters should be no more than 4194304

Warning

Vector indexes with vector_type=bit are not currently supported.

Example

CREATE TABLE user_articles (
    article_id Uint64,
    user String,
    title String,
    text String,
    embedding String,
    INDEX emb_cosine_idx GLOBAL SYNC USING vector_kmeans_tree
    ON (user, embedding) COVER (title, text)
    WITH (
        distance="cosine",
        vector_type="float",
        vector_dimension=512,
        clusters=128,
        levels=2
    ),
    PRIMARY KEY (article_id)
)