Simple Vector DB is a lightweight, efficient database for high-dimensional vectors. It supports dynamic operations like insertion, update, deletion, and comparison (cosine similarity, Euclidean distance, dot product) via a RESTful API. Ideal for machine learning, data science, and scientific computing applications.
Simple Vector DB is a lightweight, efficient, and easy-to-use vector database designed to store, retrieve, and manage high-dimensional vectors. It supports operations such as insertion, update, deletion, and comparison of vectors using cosine similarity, Euclidean distance, and dot product. Additionally, it allows for finding the nearest vector based on KD-tree median points.
Simple Vector DB depends on libmicrohttpd
and cJSON
. Here are the instructions to install these dependencies on different operating systems.
brew install libmicrohttpd
brew install cjson
For Debian-based distributions (e.g., Ubuntu):
sudo apt-get update
sudo apt-get install libmicrohttpd-dev
sudo apt-get install libcjson-dev
For Red Hat-based distributions (e.g., CentOS, Fedora):
sudo dnf install libmicrohttpd-devel
sudo dnf install cjson-devel
For Windows, you can use a package manager like vcpkg to install the dependencies.
Install vcpkg:
git clone https://github.com/microsoft/vcpkg.git
cd vcpkg
./bootstrap-vcpkg.sh
libmicrohttpd:
./vcpkg install libmicrohttpd
cJSON:
./vcpkg install cjson
After installing the libraries, you need to find the paths to libmicrohttpd.h
and cjson/cjson.h
.
macOS and Linux:
Typically, headers are located in /usr/include
, /usr/local/include
, or the installation prefix of the package manager (e.g., /opt/homebrew/include
for Homebrew on macOS).
Libraries are usually in /usr/lib
, /usr/local/lib
, or the package manager prefix (e.g., /opt/homebrew/lib
).
Use the find
command to locate the header files:
find /usr -name "libmicrohttpd.h"
find /usr -name "cjson.h"
Windows:
vcpkg/installed
directory. You can find headers and libraries in vcpkg/installed/x64-windows/include
and vcpkg/installed/x64-windows/lib
.Update your Makefile to include the correct paths for the headers and libraries. Below are the two lines that need to be updated:
CFLAGS = -Wall -I/opt/homebrew/include -I./include
LDFLAGS = -L/opt/homebrew/lib -lmicrohttpd -lcjson
Replace /opt/homebrew/include
and /opt/homebrew/lib
with the appropriate paths for your system.
You can start the server on the default port (8888) or specify a custom port using the -p
flag. Additionally, you can specify other parameters such as the database filename, kd-tree dimension, and vector size using the corresponding flags. Alternatively, you can use a configuration file with the -c
flag.
# Start the server with default settings
./executable/vector_db_server
# Start the server on a custom port (e.g., 8080)
./executable/vector_db_server -p 8080
# Start the server with a custom database filename
./executable/vector_db_server -f custom_database.db
# Start the server with a custom kd-tree dimension
./executable/vector_db_server -d 5
# Start the server with a custom vector size
./executable/vector_db_server -s 256
# Start the server with a configuration file
./executable/vector_db_server -c config.json
# Combine multiple custom settings
./executable/vector_db_server -p 8080 -f custom_database.db -d 5 -s 256 -c config.json
Save this as config.json
:
{
"DB_FILENAME": "vector_database.db",
"DEFAULT_PORT": 8888,
"DEFAULT_KD_TREE_DIMENSION": 3,
"DB_VECTOR_SIZE": 128
}
DB_FILENAME
: The name of the database file (e.g., vector_database.db
).DEFAULT_PORT
: The port number on which the server will run (e.g., 8888
).DEFAULT_KD_TREE_DIMENSION
: The default dimension for the kd-tree (e.g., 3
).DB_VECTOR_SIZE
: The size of the database vectors (e.g., 128
).You can fill the database with different vectors of different dimensions. Randomly generated.
# Change execution of the file
chmod +x ./test/add_vectors.sh
./test/add_vectors.sh
/vector
POST
curl -X POST -H "Content-Type: application/json" -d '{"uuid": "123e4567-e89b-12d3-a456-426614174000", "vector": [1.23, 4.56, 7.89, 0.12, 3.45]}' http://localhost:8888/vector
UUID is considered as the bridge (shared key for a chunk) between your application database and the simple vector database.
Response:
{
"index": 2,
"vector": [1.0, 2.0, 3.0, 4.08993, 5.937,6.389, 1.39],
"uuid": F07243B9-58D1-4A33-9670-C14FFA9050EF,
}
/vector
GET
index
(the index of the vector to retrieve).uuid
(the uuid of the vector to retrieve).curl "http://localhost:8888/vector?index=0"
curl "http://localhost:8888/vector?uuid=0"
Response:
{
"index": 2,
"vector": [1.0, 2.0, 3.0, 4.08993, 5.937,6.389, 1.39],
"uuid": F07243B9-58D1-4A33-9670-C14FFA9050EF,
}
/vector
PUT
index
(the index of the vector to update).curl -X PUT -H "Content-Type: application/json" -d '[1.5, 2.5, 3.5, 4.5]' "http://localhost:8888/vector?index=0"
/vector
DELETE
index
(the index of the vector to delete).curl -X DELETE "http://localhost:8888/vector?index=0"
/compare/cosine_similarity
GET
index1
and index2
(the indices of the vectors to compare).curl "http://localhost:8888/compare/cosine_similarity?index1=0&index2=1"
/compare/euclidean_distance
GET
index1
and index2
(the indices of the vectors to compare).curl "http://localhost:8888/compare/euclidean_distance?index1=0&index2=1"
/compare/dot_product
GET
index1
and index2
(the indices of the vectors to compare).curl "http://localhost:8888/compare/dot_product?index1=0&index2=1"
/nearest
POST
application/json
number=(int)
The number of nearest vectors to return - default is 1.The /nearest
endpoint uses a KD-tree for indexing, which allows for more efficient nearest neighbor searches. All vectors in the database must have the same dimension. During vector insertion, a point is added to the KD-tree, and during vector updates, the KD-tree is modified to reflect the changes.
curl -X POST -H "Content-Type: application/json" -d '[7,3.00003,6.32,4.5,8,5,1.842,4.929066,7.94764,6.16051,6.946,4.71,4.3,1.704,2.321,5.9,6.74227,7.365,5.31,4.1705]' "http://localhost:8888/nearest"
Response:
{
"index": 2,
"vector": [1.0, 2.0, 3.0, 4.08993, 5.937,6.389, 1.39],
"uuid": F07243B9-58D1-4A33-9670-C14FFA9050EF,
}
This response indicates that the nearest vector is at index 2, and it includes the vector and its median point.
To build and run Simple Vector DB, execute the following commands:
# Build the project
make
# Start the server
./executable/vector_db_server
To specify a custom port:
./executable/vector_db_server -p 8080
We welcome contributions to Simple Vector DB! Please fork the repository, create a new branch for your feature or bugfix, and submit a pull request.
git checkout -b my-new-feature
git commit -am 'Add some new feature'
git push origin my-new-feature
This project is licensed under the MIT License - see the LICENSE file for details.