We should upgrade to the last version. A couple of ui improvement was implemented in 3.3.0 and 3.3.2.
Currently, old running repairs are not always displayed in the repair list because of a filter.
If 100 repairs were ran before the end of a long repair (mostly full repairs), the repair disappears from the list and there is no way to see it's still running or not.
Some permissions are missing to the reaper user to apply the migrations:
cassandra/reaper-78785bf47f-t56mn[reaper]: org.cognitor.cassandra.migration.MigrationException: Error during migration of script 032_add_2i_status.cql while executing 'CREATE INDEX IF NOT EXISTS state2i ON repair_run_by_cluster_v2 (repair_run_state);'...cassandra/reaper-78785bf47f-t56mn[reaper]: Caused by: com.datastax.driver.core.exceptions.UnauthorizedException: User reaper has no ALTER permission on <table reaper_db.repair_run_by_cluster_v2> or any of its parents
For the production, it's not necessary to add the permission as the security is not yet activated.
Upgrade:
Snapshot of the reaper_db taken:
root@pergamon:~# clush -b -w @cassandra "/opt/cassandra/bin/nodetool -u cassandra --password $PASS snapshot reaper_db"---------------cassandra01---------------Requested creating snapshot(s) for [reaper_db] with snapshot name [1693296484854] and options {skipFlush=false}Snapshot directory: 1693296484854---------------cassandra02---------------Requested creating snapshot(s) for [reaper_db] with snapshot name [1693296484948] and options {skipFlush=false}Snapshot directory: 1693296484948---------------cassandra03---------------Requested creating snapshot(s) for [reaper_db] with snapshot name [1693296484809] and options {skipFlush=false}Snapshot directory: 1693296484809---------------cassandra04---------------Requested creating snapshot(s) for [reaper_db] with snapshot name [1693296484951] and options {skipFlush=false}Snapshot directory: 1693296484951---------------cassandra05---------------Requested creating snapshot(s) for [reaper_db] with snapshot name [1693296484852] and options {skipFlush=false}Snapshot directory: 1693296484852---------------cassandra06---------------Requested creating snapshot(s) for [reaper_db] with snapshot name [1693296485256] and options {skipFlush=false}Snapshot directory: 1693296485256---------------cassandra07---------------Requested creating snapshot(s) for [reaper_db] with snapshot name [1693296484959] and options {skipFlush=false}Snapshot directory: 1693296484959---------------cassandra08---------------Requested creating snapshot(s) for [reaper_db] with snapshot name [1693296484975] and options {skipFlush=false}Snapshot directory: 1693296484975
Reaper upgraded by argo
Rester OK
The old (and stuck) repairs are now present in the UI:
looks like the upgrade has also upgrade the scheduling as no segments were repaired on the repair of the 14th august, and a segment is in repair. cool.
Let's remove the snapshot as everything looks ok:
staging:
root@pergamon:~# clush -b -w @staging-cassandra "/opt/cassandra/bin/nodetool -u cassandra --password $PASS clearsnapshot --all"---------------cassandra[1-3].internal.staging.swh.network (3)---------------Requested clearing snapshot(s) for [all keyspaces] with [all snapshots]root@pergamon:~# clush -b -w @staging-cassandra "/opt/cassandra/bin/nodetool -u cassandra --password $PASS listsnapshots"---------------cassandra[1-3].internal.staging.swh.network (3)---------------Snapshot Details:There are no snapshots
production
root@pergamon:~# clush -b -w @cassandra "/opt/cassandra/bin/nodetool -u cassandra --password $PASS clearsnapshot --all"---------------cassandra[01-08] (8)---------------Requested clearing snapshot(s) for [all keyspaces] with [all snapshots]root@pergamon:~# clush -b -w @cassandra "/opt/cassandra/bin/nodetool -u cassandra --password $PASS listsnapshots"---------------cassandra[01-08] (8)---------------Snapshot Details:There are no snapshots
(The removal of the snapshot is async and it can take some time before listsnapshots is empty)