The Apache Cassandra database is the right choice when you need scalability and high availability without compromising performance. Linear scalability and proven fault-tolerance on commodity hardware or cloud infrastructure make it the perfect platform for mission-critical data. Cassandra's support for replicating across multiple datacenters is best-in-class, providing lower latency for your users and the peace of mind of knowing that you can survive regional outages.
PROVEN
Cassandra is in use at Constant Contact, CERN, Comcast, eBay, GitHub, GoDaddy, Hulu, Instagram, Intuit, Netflix, Reddit, The Weather Channel, and over 1500 more companies that have large, active data sets.
FAULT TOLERANT
Data is automatically replicated to multiple nodes for fault-tolerance. Replication across multiple data centers is supported. Failed nodes can be replaced with no downtime.
PERFORMANT
Cassandra consistently outperforms popular NoSQL alternatives in benchmarks and real applications, primarily because of fundamental architectural choices.
DECENTRALIZED
There are no single points of failure. There are no network bottlenecks. Every node in the cluster is identical.
SCALABLE
Some of the largest production deployments include Apple's, with over 75,000 nodes storing over 10 PB of data, Netflix (2,500 nodes, 420 TB, over 1 trillion requests per day), Chinese search engine Easou (270 nodes, 300 TB, over 800 million reqests per day), and eBay (over 100 nodes, 250 TB).
DURABLE
Cassandra is suitable for applications that can't afford to lose data, even when an entire data center goes down.
YOU'RE IN CONTROL
Choose between synchronous or asynchronous replication for each update. Highly available asynchronous operations are optimized with features like Hinted Handoff and Read Repair.
ELASTIC
Read and write throughput both increase linearly as new machines are added, with no downtime or interruption to applications.
PROFESSIONALLY SUPPORTED
Cassandra support contracts and services are available from third parties.
Cassandra Anti-Entropy Repairs:
Anti-entropy repair in Cassandra has two distinct phases. To run successful, performant repairs, it is important to understand both of them.
Merkle Tree calculations: This computes the differences between the nodes and their replicas.
Data streaming: Based on the outcome of the Merkle Tree calculations, data is scheduled to be streamed from one node to another. This is an attempt to synchronize the data between replicas.
Stopping a Repair:
You can stop a repair by issuing a STOP VALIDATION command from nodetool:
$ nodetool stop validation
How do I know when repair is completed?
You can check for the first phase of repair (Merkle Tree calculations) by checking nodetool compactionstats
.
You can check for repair streams using nodetool netstats
. Repair streams will also be visible in your logs. You can grep
for them in your system logs like this:
$ grep Entropy system.log
INFO [AntiEntropyStage:1] 2016-07-25 07:32:47,077 RepairSession.java (line 164) [repair #70c35af0-526e-11e6-8646-8102d8573519] Received merkle tree for test_users from /192.168.14.3
INFO [AntiEntropyStage:1] 2016-07-25 07:32:47,081 RepairSession.java (line 164) [repair #70c35af0-526e-11e6-8646-8102d8573519] Received merkle tree for test_users from /192.168.16.5
INFO [AntiEntropyStage:1] 2016-07-25 07:32:47,091 RepairSession.java (line 221) [repair #70c35af0-526e-11e6-8646-8102d8573519] test_users is fully synced
INFO [AntiEntropySessions:4] 2016-07-25 07:32:47,091 RepairSession.java (line 282) [repair #70c35af0-526e-11e6-8646-8102d8573519] session completed successfully
Active repair streams can also be monitored with this (Bash) command:
$ while true; do date; diff <(nodetool -h 192.168.0.1 netstats) <(sleep 5 && nodetool -h 192.168.0.1 netstats); done
ref: how do i know if nodetool repair is finished
How to check for stuck or orphaned repair streams?
On each node, you can monitor this with nodetool tpstats
, and check for anything "blocked" on the "AntiEntropy" lines.
$ nodetool tpstats
Pool Name Active Pending Completed Blocked All time blocked
...
AntiEntropyStage 0 0 854866 0 0
...
AntiEntropySessions 0 0 2576 0 0
...
The Cassandra Driver from Datastax very much mirrors the Java JDBC MySQL driver.
Session
, Statement
, PreparedStatement
are present in both drivers.
The Singleton Connection is from this question and answer: http://stackoverflow.com/a/24691456/671896
Feature wise, Cassandra 2 and 3 are identical. Cassandra 3 introduced a complete rewrite of the data storage system.