A multi-modal vector database that supports upserts and vector queries using unified SQL (MySQL-Compatible) on structured and unstructured data, while meeting the requirements of high concurrency and ultra-low latency.
DingoDB is an open-source distributed multi-modal vector database independently designed and developed by DataCanvas, which integrates real-time strong consistency, relational semantics, and vector semantics into a unified platform, DingoDB positioning itself as a distinctive multi-modal database solution. With exceptional horizontal scalability and elastic scaling capabilities, it effortlessly meets enterprise-grade high availability requirements. Furthermore, DingoDB offers extensive multi-language interfaces and seamless compatibility with the MySQL protocol, delivering unparalleled flexibility and convenience for users. Demonstrating comprehensive excellence in functionality, performance, and user-friendliness, DingoDB stands out as a robust solution for modern data-driven applications.
1. Comprehensive access interface
DingoDB provides comprehensive access interfaces, supporting various flexible access modes such as SQL, SDK, and API to meet the needs of different developers. Additionally, it introduces Table and Vector as first-class citizen data models, providing users with efficient and powerful data processing capabilities.
2.Built-in data high availability
DingoDB provides fully functional and highly available built-in configurations without the need to deploy any external components, which can significantly reduce users’ deployment and operation and maintenance costs and significantly improve the efficiency of system operation and maintenance.
3.Fully automatic elastic data sharding
DingoDB supports dynamic configuration of data shard size, automatic splitting and merging, realizing efficient and friendly resource allocation strategies, and easily responding to various business expansion needs.
4.Scalar-vector hybrid retrieval
DingoDB supports both traditional database index types and various vector index types, providing a seamless scalar and vector hybrid retrieval experience, reflecting industry-leading retrieval capabilities. In addition, it also supports fusion of scalars and vectors. Distributed transaction processing.
5.Built-in real-time index optimization
DingoDB can build scalar and vector indexes in real time, providing users with unconscious background automatic index optimization. At the same time, it ensures no delays during data retrieval.
All Documentation Docs
How to install and deploy Docker or Ansible
How to use DingoDB Usage
We recommend VS Code to develop the DingoDB codebase.
We recommend YourKit Java Profiler for any preformance critical application you make.
Check it out at https://www.yourkit.com/
DingoFS is a cloud-native distributed high-speed file storage system designed and developed by DataCanvas. It integrates multiple features such as elasticity, multi-cloud compatibility, multi-protocol convergence, and exceptional performance.By leveraging its multi-tiered, multi-type, and high-performance distributed multi-level caching architecture, DingoFS accelerates data I/O for AI workflows, effectively addressing burst I/O challenges in AI scenarios. Additionally, it provides local cache storage capabilities to meet the full lifecycle storage requirements of large-scale AI models.
1. POSIX Compliance
DingoFS delivers a native file system-like operational experience, enabling seamless system integration.
2. AI-Native Architecture
Deeply optimized for large language model (LLM) workflows, efficiently managing massive training datasets and checkpoint workloads.
3. S3 Protocol Compatibility
DingoFS supports standard S3 interface protocols for streamlined access to filesystem namespace resources.
4. Fully Distributed Architecture
DingoFS’s metadata Service (MDS), data storage layer, caching system, and client components all support linear scalability.
5. Exceptional Performance
Combines SSD-level low-latency responsiveness with object storage-grade elastic throughput capacity.
6. Intelligent Caching Acceleration System
DingFS implements a three-tier caching topology (memory/local SSD/distributed cluster) to deliver high-throughput, low-latency intelligent I/O acceleration for AI workloads.
If you installed the software using a Docker container, the container already includes pre-integrated Dingo-eureka and Dingo-sdk, no additional installation is required.
wget https://github.com/jemalloc/jemalloc/releases/download/5.3.0/jemalloc-5.3.0.tar.bz2
tar -xjvf jemalloc-5.3.0.tar.bz2
cd jemalloc-5.3.0 && ./configure && make && make install
git submodule sync
git submodule update --init --recursive
bash build_thirdparties.sh
mkdir build
cd build
cmake ..
make -j 32
We recommend Rocky and Ubuntu to develop the DingoFS codebase.
We recommend using GCC 13 as the primary compiled language.
The main projects about Dingo are as follows:
Dingo is Sponsored by DataCanvas, a new platform to do data science and data process in real-time.
DingoDB is an open-source project licensed in Apache License Version 2.0, and DingoFS is an open-source project licensed in License Version 3.0, welcome any feedback from the community.
For any support or suggestion, please contact us.
If you have any technical questions or business needs, please contact us.
Attach the Wetchat QR Code