Validate data replayed in winery
We want to be able to validate the date replayed in winery until now.
The checker must take care that all the content is not replay and not generate false positive if it the check catch up the replayers.
Activity
-
Newest first Oldest first
-
Show all activity Show comments only Show history only
- Vincent Sellier added platform::CEA label
added platform::CEA label
- Vincent Sellier assigned to @vsellier
assigned to @vsellier
- Vincent Sellier mentioned in commit swh/devel/snippets@f00cfc1b
mentioned in commit swh/devel/snippets@f00cfc1b
- Author Owner
- extract the replayer current positions per partition:
/opt/kafka/bin/kafka-consumer-groups.sh --bootstrap-server $SERVER --describe --group swh-archive-prod-winery-content-replayer |grep -v PARTITION | awk '{print $3,$4}' | sort -n 0 3857735 1 5088832 2 4253136 3 4071316 4 3694360 ... 252 4651455 253 5356356 254 5669851 255 4231367
A a first initial try to check the behavior, it was launch from the production kubernetes cluster, it seems it tooks ~2 days to check a partition:
partition=0 object_checked=25212 not_found=0 incorrect_hash=0 to_check=3288889 time_to_completion=1 day, 22:00:00.589842 (24.97/s) partition=0 object_checked=25229 not_found=0 incorrect_hash=0 to_check=3288872 time_to_completion=1 day, 22:00:18.669682 (16.98/s) partition=0 object_checked=25256 not_found=0 incorrect_hash=0 to_check=3288845 time_to_completion=1 day, 21:59:30.632848 (26.97/s)
The script still need to be updated to save the swhid of the objects in error.
This first attempt was done using the haproxy which is certainly not the optimal way to do.
The script could be splitted to serialized the ids to check in a file in a first part, which can be launched at rocquencourt and a second part could reuse the list of ids to directly request winery on the cea part.
- Vincent Sellier mentioned in commit swh/devel/snippets@343b10c4
mentioned in commit swh/devel/snippets@343b10c4
- Vincent Sellier mentioned in commit swh/devel/snippets@7df8c5e5
mentioned in commit swh/devel/snippets@7df8c5e5
- Vincent Sellier mentioned in commit swh/devel/snippets@ac4a0270
mentioned in commit swh/devel/snippets@ac4a0270
- Vincent Sellier mentioned in commit swh/devel/snippets@c7b5cb89
mentioned in commit swh/devel/snippets@c7b5cb89
- Vincent Sellier mentioned in commit swh/devel/snippets@c3360afa
mentioned in commit swh/devel/snippets@c3360afa
- Vincent Sellier mentioned in commit swh/devel/snippets@4e41eba1
mentioned in commit swh/devel/snippets@4e41eba1
- Vincent Sellier mentioned in commit swh/devel/snippets@b247e282
mentioned in commit swh/devel/snippets@b247e282
- Vincent Sellier mentioned in commit swh/devel/snippets@ed2afc85
mentioned in commit swh/devel/snippets@ed2afc85
- Vincent Sellier mentioned in commit swh/devel/snippets@043c6ad4
mentioned in commit swh/devel/snippets@043c6ad4
- Vincent Sellier mentioned in commit swh/devel/snippets@fbfec8e8
mentioned in commit swh/devel/snippets@fbfec8e8
- Vincent Sellier mentioned in commit swh/devel/snippets@004e622d
mentioned in commit swh/devel/snippets@004e622d
- Vincent Sellier mentioned in commit swh/devel/snippets@69f9e4b2
mentioned in commit swh/devel/snippets@69f9e4b2
- Vincent Sellier mentioned in commit swh/infra/ci-cd/swh-charts@b20b148d
mentioned in commit swh/infra/ci-cd/swh-charts@b20b148d
- Author Owner
No alerts were raised by the tests. The missing object from the tests were replayed by a reset of the replayer offsets.
The future runtime checks will be handled by the object storage checker
- Vincent Sellier closed
closed
- Vincent Sellier mentioned in commit swh/devel/snippets@c862c883
mentioned in commit swh/devel/snippets@c862c883
- Vincent Sellier mentioned in commit swh/devel/snippets@6464313c
mentioned in commit swh/devel/snippets@6464313c
- Vincent Sellier mentioned in commit swh/devel/snippets@6194e5b1
mentioned in commit swh/devel/snippets@6194e5b1