All we need is an easy explanation of the problem, so here it is.
Consider a scenario where a table partitions with thousands of deleted rows. When reading from the table, Cassandra has to scan over thousands of deleted rows before it gets to the live rows.
A common workaround is to manually run a compaction on a node to forcibly get rid of tombstones.
What are the downsides of forcing major compaction on a table (with
nodetool compact) and what is the best practice recommendation?
How to solve :
I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.
When forcing a major compaction on a table configured with the
SizeTieredCompactionStrategy (STCS), all the SSTables on the node get compacted together into a single large SSTable. Due to its size, the resulting SSTable will likely never get compacted out since similar-sized SSTables are not available as compaction candidates. This creates additional issues for the nodes since tombstones do not get evicted and keep accumulating, affecting the cluster’s performance.
We understand that cluster administrators use major compaction as a way of evicting tombstones which have accumulated as a result of high-delete workloads which in most cases is due to an incorrect data model.
The recommendation in this post does NOT constitute a solution to the underlying issue users face. It should not be considered a long-term fix to the data model problem.
In Apache Cassandra 2.2, CASSANDRA-7272 introduced a huge improvement which splits the output of
nodetool compact into multiple files which are 50% then 25% then 12.5% of the original table size until the smallest chunk is 50MB for tables using STCS.
When using major compaction as a last resort for evicting tombstones, use the
--split-output (or shorthand
-s) to take advantage of this new feature:
$ nodetool compact --split-output -- <keyspace> <table>
NOTE – This feature is only available from Cassandra 2.2 and newer versions.
Also see How to split large SSTables on another server. Cheers!
Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂