An Internet-Scale Database.
Copyright 2015, Baidu, Inc.
Tera is a high performance distributed NoSQL database, which is inspired by google’s BigTable and designed for real-time applications. Tera can easily scale to petabytes of data across thousands of commodity servers. Besides, Tera is widely used in many Baidu products with varied demands,which range from throughput-oriented applications to latency-sensitive service, including web indexing, WebPage DB, LinkBase DB, etc. (中文)
Tera is the collection of many sparse, distributed, multidimensional tables. The table is indexed by a row key, column key, and a timestamp; each value in the table is an uninterpreted array of bytes.
To learn more about the schema, you can refer to BigTable.
Tera has three major components: sdk, master and tablet servers.
Tera is built on several pieces of open source infrastructure.
Filesystem (required)
Tera uses the distributed file system to store transaction log and data files. So Tera uses an abstract file system interface, called Env, to adapt to different implementations of file systems (e.g., BFS, HDFS, HDFS2, POXIS filesystem).
Distributed lock service (required)
Tera relies on a highly-available and persistent distributed lock service, which is used for a variety of tasks: to ensure that there is at most one active master at any time; to store meta table’s location, to discover new tablet server and finalize tablet server deaths. Tera has an adapter class to adapt to different implementations of lock service (e.g., ZooKeeper, Nexus)
High performance RPC framework (required)
Tera is designed to handle a variety of demanding workloads, which range from throughput-oriented applications to latency-sensitive service. So Tera needs a high performance network programming framework. Now Tera heavily relies on Sofa-pbrpc to meet the performance demand.
Cluster management system (not necessary)
A Tera cluster in Baidu typically operates in a shared pool of machines that runs a wide variety of other distributed applications. So Tera can be deployed in a cluster management system Galaxy, which uses for scheduling jobs, managing resources on shared machines, dealing with machine failures, and monitoring machine status. Besides, Tera can also be deployed on RAW machine or in Docker container.
How to build
Use sh ./build.sh to build Tera.
How to deploy
How to access
Contributions are welcomed and greatly appreciated.
Read Roadmap to get a general knowledge about our development plan.
See Contributions for more details.
To join us, please send resume to tera-user at baidu.com.