Cassandra uses two kinds of keys:
A primary key is a combination of those to types. The vocabulary depends on the combination:
The table creation statement should contain a PRIMARY KEY
expression. The way you declare it is very important. In a nutshell:
PRIMARY KEY(partition key)
PRIMARY KEY(partition key, clustering key)
Additional parentheses group multiple fields into a composite partition key or declares a compound composite key.
Simple primary key:
PRIMARY KEY (key)
key
is called the partition key.
(for simple primary key, it is also possible to put the PRIMARY KEY
expression after the field, i.e. key int PRIMARY KEY,
for example).
Compound primary key:
PRIMARY KEY (key_part_1, key_part_2)
Contrary to SQL, this does not exactly create a composite primary key. Instead, it declares key_part_1
as the partition key and key_part_2
as the clustering key. Any other field will also be considered part of the clustering key.
Composite+Compound primary keys:
PRIMARY KEY ((part_key_1, ..., part_key_n), (clust_key_1, ..., clust_key_n))
The first parenthese defines the compound partition key, the other columns are the clustering keys.
(part_key)
(part_key, clust_key)
(part_key, clust_key_1, clust_key_2)
(part_key, (clust_key_1, clust_key_2))
((part_key_1, part_key_2), clust_key)
((part_key_1, part_key_2), (clust_key_1, clust_key_2))
The partition key is the minimum specifier needed to perform a query using a where clause.
If you declare a composite clustering key, the order matters.
Say you have the following primary key:
PRIMARY KEY((part_key1, part_key_2), (clust_key_1, clust_key_2, clust_key_3))
Then, the only valid queries use the following fields in the where
clause:
part_key_1
, part_key_2
part_key_1
, part_key_2
, clust_key_1
part_key_1
, part_key_2
, clust_key_1
, clust_key_2
part_key_1
, part_key_2
, clust_key_1
, clust_key_2
, clust_key_3
Example of invalid queries are:
part_key_1
, part_key_2
, clust_key_2
part_key_1
, part_key_2
If you want to use clust_key_2
, you have to also specify clust_key_1
, and so on.
So the order in which you declare your clustering keys will have an impact on the type of queries you can do. In the opposite, the order of the partition key fields is not important, since you always have to specify all of them in a query.