Upgrade cassandra in all environments
The version of cassandra we are using (4.0.8) is a couple of version late:
- in the 4.0 branch: 4.0.13 (unsupported since 2024-05-20)
- in the 4.1 branch: 4.1.7
- in the 5.0 branch: 5.0.2 (GA)
It could be a good idea to test an upgrade before the switch to cassandra in production.
We should update and validate the migration is going fine through our various testing environments first. Then finalize in the production environment if everything is fine.
1st:
-
dockerdocker environment is already in 4.1 branch - next-version: 4.0.1 -> 4.0.8 -> 4.0.15
- staging: 4.0.8 -> 4.0.15
- production: 4.0.8 -> 4.0.15
2nd:
- next-version: 4.0.15 -> 4.1.7
- staging: 4.0.15 -> 4.1.7
- production: 4.0.15 -> 4.1.7
3rd:
- next-version: 4.1 -> 5.0
- staging: 4.1 -> 5.0
- production: 4.1 -> 5.0
What are the important changes:
- 4.0.8 -> 4.0.15:
- https://github.com/apache/cassandra/blob/cassandra-4.0/NEWS.txt
- https://github.com/apache/cassandra/blob/cassandra-4.0/CHANGES.txt
- CASSANDRA-18773: compaction speedup
- CASSANDRA-18736: transaction log issue on startup (we had some issues that look like this (#5297 (closed)))
- 4.0 -> 4.1:
- https://github.com/apache/cassandra/blob/cassandra-4.1/NEWS.txt
- https://github.com/apache/cassandra/blob/cassandra-4.1/CHANGES.txt
- A lot of paxos improvements, cf upgrade procedure
-
CASSANDRA-15234 Unit change
Units suffixes should be provided for all rates(B/s|KiB/s|MiB/s), memory (B|KiB|MiB|GiB) and duration(d|h|m|s|ms|us|µs|ns) parameters.
- Some deprecations of mbeans methods, we should ensure the monitoring will still work as expected
- https://cassandra.apache.org/_/blog/Apache-Cassandra-4.1-New-SSTable-Identifiers.html
- 4.1 -> 5.0:
The recommanded upgrade path is to upgrade to the last patch available for the current Major.Minor version so in our case 4.0.8 -> 4.0.15 -> 4.1.7 -> 5.0.2
Upgrade procedure:
- 4.0.x -> 4.0.15
- Check reaper
- 4.0 -> 4.1
- Steps for upgrading Paxos:
- Set paxos_variant: v2 across the cluster. This may be set via JMX, but should also be written persistently to any yaml.
- Ensure paxos repairs are running regularly, either as part of normal incremental repair workflows or on their own separate schedule. These operations are cheap and better to run frequently (e.g. once per hour)
- Set paxos_state_purging: repaired across the cluster. This may be set via JMX, but should also be written persistently to any yaml. NOTE: once this has been set, you must not restore paxos_state_purging: legacy. If this setting must be disabled you must instead set paxos_state_purging: gc_grace. This may be necessary if paxos repairs must be disabled for some reason on an extended basis, but in this case your applications must restore default commit consistency to ensure correctness.
- Applications may now safely be updated to use ANY commit consistency level (or LOCAL_QUORUM, as preferred). Uncontended writes should now take 2 round-trips, and uncontended reads should typically take one round-trip.
- Steps for upgrading Paxos:
- 4.1 -> 5.0.2
Interesting documentation: