cassandra

Topics related to cassandra:

Getting started with cassandra

The Apache Cassandra database is the right choice when you need scalability and high availability without compromising performance. Linear scalability and proven fault-tolerance on commodity hardware or cloud infrastructure make it the perfect platform for mission-critical data. Cassandra's support for replicating across multiple datacenters is best-in-class, providing lower latency for your users and the peace of mind of knowing that you can survive regional outages.

PROVEN

Cassandra is in use at Constant Contact, CERN, Comcast, eBay, GitHub, GoDaddy, Hulu, Instagram, Intuit, Netflix, Reddit, The Weather Channel, and over 1500 more companies that have large, active data sets.

FAULT TOLERANT

Data is automatically replicated to multiple nodes for fault-tolerance. Replication across multiple data centers is supported. Failed nodes can be replaced with no downtime.

PERFORMANT

Cassandra consistently outperforms popular NoSQL alternatives in benchmarks and real applications, primarily because of fundamental architectural choices.

DECENTRALIZED

There are no single points of failure. There are no network bottlenecks. Every node in the cluster is identical.

SCALABLE

Some of the largest production deployments include Apple's, with over 75,000 nodes storing over 10 PB of data, Netflix (2,500 nodes, 420 TB, over 1 trillion requests per day), Chinese search engine Easou (270 nodes, 300 TB, over 800 million reqests per day), and eBay (over 100 nodes, 250 TB).

DURABLE

Cassandra is suitable for applications that can't afford to lose data, even when an entire data center goes down.

YOU'RE IN CONTROL

Choose between synchronous or asynchronous replication for each update. Highly available asynchronous operations are optimized with features like Hinted Handoff and Read Repair.

ELASTIC

Read and write throughput both increase linearly as new machines are added, with no downtime or interruption to applications.

PROFESSIONALLY SUPPORTED

Cassandra support contracts and services are available from third parties.

Cassandra - PHP

Repairs in Cassandra

Cassandra Anti-Entropy Repairs:

Anti-entropy repair in Cassandra has two distinct phases. To run successful, performant repairs, it is important to understand both of them.

  • Merkle Tree calculations: This computes the differences between the nodes and their replicas.

  • Data streaming: Based on the outcome of the Merkle Tree calculations, data is scheduled to be streamed from one node to another. This is an attempt to synchronize the data between replicas.

Stopping a Repair:

You can stop a repair by issuing a STOP VALIDATION command from nodetool:

$ nodetool stop validation

How do I know when repair is completed?

You can check for the first phase of repair (Merkle Tree calculations) by checking nodetool compactionstats.

You can check for repair streams using nodetool netstats. Repair streams will also be visible in your logs. You can grep for them in your system logs like this:

$ grep Entropy system.log

INFO [AntiEntropyStage:1] 2016-07-25 07:32:47,077 RepairSession.java (line 164) [repair #70c35af0-526e-11e6-8646-8102d8573519] Received merkle tree for test_users from /192.168.14.3
INFO [AntiEntropyStage:1] 2016-07-25 07:32:47,081 RepairSession.java (line 164) [repair #70c35af0-526e-11e6-8646-8102d8573519] Received merkle tree for test_users from /192.168.16.5
INFO [AntiEntropyStage:1] 2016-07-25 07:32:47,091 RepairSession.java (line 221) [repair #70c35af0-526e-11e6-8646-8102d8573519] test_users is fully synced
INFO [AntiEntropySessions:4] 2016-07-25 07:32:47,091 RepairSession.java (line 282) [repair #70c35af0-526e-11e6-8646-8102d8573519] session completed successfully

Active repair streams can also be monitored with this (Bash) command:

$ while true; do date; diff <(nodetool -h 192.168.0.1 netstats) <(sleep 5 && nodetool -h 192.168.0.1 netstats); done

ref: how do i know if nodetool repair is finished

How to check for stuck or orphaned repair streams?

On each node, you can monitor this with nodetool tpstats, and check for anything "blocked" on the "AntiEntropy" lines.

$ nodetool tpstats
Pool Name                    Active   Pending      Completed   Blocked  All time blocked
...
AntiEntropyStage                  0         0         854866         0                 0
...
AntiEntropySessions               0         0           2576         0                 0
...

Security

Running Repair on Cassandra

Connecting to Cassandra

Cassandra keys

Cassandra as a Service