Deploy swh-search v0.6.0 in **staging**
This new version come with a mapping change on the metadata so there are some actions to perform:
- Stop the journal clients and swh-search
- upgrade the packages
- delete the origin index
- Recreate the index with the new mapping
- Restart swh-search service
- Copy the backup of the index done in #2780 (closed)
- Restore the swh.search.journal_client consumer group offsets to migrated/migration$941
- Reset the swh.search.journal_client.indexed consumer group offsets to the beginning
- restart the service and the journal_client
- wait for the backfill completion
Migrated from T3060 (view on Phabricator)
Designs
- Show closed items
Activity
-
Newest first Oldest first
-
Show all activity Show comments only Show history only
- Phabricator Migration user marked this issue as related to swh/devel/swh-search#3058 (closed)
marked this issue as related to swh/devel/swh-search#3058 (closed)
- Phabricator Migration user marked this issue as related to swh/meta#2590 (closed)
marked this issue as related to swh/meta#2590 (closed)
- Vincent Sellier added Archive search System administration ~119 state:wip labels
added Archive search System administration ~119 state:wip labels
- Vincent Sellier changed the description
changed the description
- Author Owner
stop the journal clients and swh-search
root@search0:~# puppet agent --disable "swh-search upgrade" root@search0:~# systemctl stop swh-search-journal-client@objects.service root@search0:~# systemctl stop swh-search-journal-client@indexed.service root@search0:~# systemctl stop gunicorn-swh-search.service
update the packages
root@search0:~# apt update && apt list --upgradable ... python3-swh.search/unknown 0.6.0-1~swh1~bpo10+1 all [upgradable from: 0.5.0-1~swh1~bpo10+1] ... root@search0:~# apt dist-upgrade ... Preparing to unpack .../python3-swh.search_0.6.0-1~swh1~bpo10+1_all.deb ... Unpacking python3-swh.search (0.6.0-1~swh1~bpo10+1) over (0.5.0-1~swh1~bpo10+1) ... Setting up python3-swh.search (0.6.0-1~swh1~bpo10+1) ...
- Vincent Sellier marked the checklist item Stop the journal clients and swh-search as completed
marked the checklist item Stop the journal clients and swh-search as completed
- Vincent Sellier marked the checklist item upgrade the packages as completed
marked the checklist item upgrade the packages as completed
- Author Owner
delete current index
-
Make a backup before
- index
vsellier@search-esnode0 ~ % export NEW_INDEX=origin-v0.5.0 vsellier@search-esnode0 ~ % curl -XPUT http://$ES_SERVER/${NEW_INDEX} {"acknowledged":true,"shards_acknowledged":true,"index":"origin-v0.5.0"} vsellier@search-esnode0 ~ % curl http://${ES_SERVER}/origin/_mapping\?pretty | jq '.origin.mappings' > /tmp/mapping.json vsellier@search-esnode0 ~ % curl -XPUT -H "Content-Type: application/json" http://${ES_SERVER}/${NEW_INDEX}/_mapping -d @/tmp/mapping.json {"acknowledged":true}% vsellier@search-esnode0 ~ % cat >reindex-origin.json <<EOF { "source": { "index": "origin" }, "dest": { "index": "${NEW_INDEX}" } } EOF vsellier@search-esnode0 ~ % curl -XPOST -H "Content-Type: application/json" http://${ES_SERVER}/_reindex\?pretty\&refresh=true\&requests_per_second=-1\&\&wait_for_completion=true -d @reindex-origin.json { "took" : 246426, "timed_out" : false, "total" : 503339, "updated" : 0, "created" : 503339, "deleted" : 0, "batches" : 504, "version_conflicts" : 0, "noops" : 0, "retries" : { "bulk" : 0, "search" : 0 }, "throttled_millis" : 0, "requests_per_second" : -1.0, "throttled_until_millis" : 0, "failures" : [ ] } vsellier@search-esnode0 ~ % curl -s http://${ES_SERVER}/_cat/indices\?v health status index uuid pri rep docs.count docs.deleted store.size pri.store.size green open origin xBl67YKsQbWAt7V78UeDLA 80 0 868121 82710 1gb 1gb green open origin-backup-20210209-1736 P1CKjXW0QiWM5zlzX46-fg 80 0 496619 0 156.6mb 156.6mb green open origin-v0.5.0 SGplSaqPR_O9cPYU4ZsmdQ 80 0 868121 0 987.7mb 987.7mb
- kafka offsets
vsellier@journal0 ~ % /opt/kafka/bin/kafka-consumer-groups.sh --bootstrap-server $SERVER --reset-offsets --all-topics --to-current --dry-run --export --group swh.search.journal_client 2>&1 > journal_client_offsets.csv
Values stored on migrated/migration$952
- delete the index
vsellier@search-esnode0 ~ % curl -s -XDELETE http://${ES_SERVER}/origin {"acknowledged":true}%
Recreate the index with the new mapping
swhstorage@search0:~$ swh search --config-file=/etc/softwareheritage/search/journal_client_objects.yml initialize INFO:elasticsearch:PUT http://search-esnode0.internal.staging.swh.network:9200/origin [status:200 request:3.136s] INFO:elasticsearch:PUT http://search-esnode0.internal.staging.swh.network:9200/origin/_mapping [status:200 request:0.036s] Done.
vsellier@search-esnode0 ~ % curl -s -H "Content-Type: application/json" http://${ES_SERVER}/origin/_mapping\?pretty | grep date "date_detection" : false,
restart swh-search service
root@search0:~# systemctl start gunicorn-swh-search.service
-
- Vincent Sellier marked the checklist item delete the origin index as completed
marked the checklist item delete the origin index as completed
- Vincent Sellier marked the checklist item Recreate the index with the new mapping as completed
marked the checklist item Recreate the index with the new mapping as completed
- Vincent Sellier marked the checklist item Restart swh-search service as completed
marked the checklist item Restart swh-search service as completed
- Author Owner
Copy the backup of the index done in #2780 (closed)
vsellier@search-esnode0 ~ % cat >reindex-origin.json <<EOF { "source": { "index": "origin-backup-20210209-1736" }, "dest": { "index": "origin" } } EOF vsellier@search-esnode0 ~ % curl -XPOST -H "Content-Type: application/json" http://${ES_SERVER}/_reindex\?pretty\&refresh=true\&requests_per_second=-1\&\&wait_for_completion=true -d @reindex-origin.json { "took" : 134042, "timed_out" : false, "total" : 496619, "updated" : 0, "created" : 496619, "deleted" : 0, "batches" : 497, "version_conflicts" : 0, "noops" : 0, "retries" : { "bulk" : 0, "search" : 0 }, "throttled_millis" : 0, "requests_per_second" : -1.0, "throttled_until_millis" : 0, "failures" : [ ] } vsellier@search-esnode0 ~ % curl -s http://${ES_SERVER}/_cat/indices\?v health status index uuid pri rep docs.count docs.deleted store.size pri.store.size green open origin HthJj42xT5uO7w3Aoxzppw 80 0 496619 0 329.5mb 329.5mb green open origin-backup-20210209-1736 P1CKjXW0QiWM5zlzX46-fg 80 0 496619 0 156.6mb 156.6mb green open origin-v0.5.0 SGplSaqPR_O9cPYU4ZsmdQ 80 0 868121 0 987.7mb 987.7mb
Restore the swh.search.journal_client consumer group offsets to migrated/migration$941
vsellier@journal0 ~ % /opt/kafka/bin/kafka-consumer-groups.sh --bootstrap-server $SERVER --reset-offsets --all-topics --from-file offsets.csv --group swh.search.journal_client --execute GROUP TOPIC PARTITION NEW-OFFSET swh.search.journal_client swh.journal.objects.origin_visit_status 26 335718 swh.search.journal_client swh.journal.objects.origin_visit_status 12 336502 swh.search.journal_client swh.journal.objects.origin_visit_status 35 335346 swh.search.journal_client swh.journal.objects.origin 54 8082 swh.search.journal_client swh.journal.objects.origin_visit 55 169851 ...
Reset the swh.search.journal_client.indexed consumer group offsets to the beginning
vsellier@journal0 ~ % /opt/kafka/bin/kafka-consumer-groups.sh --bootstrap-server $SERVER --reset-offsets --all-topics --to-earliest --group swh.search.journal_client.indexed --execute GROUP TOPIC PARTITION NEW-OFFSET swh.search.journal_client.indexed swh.journal.indexed.origin_intrinsic_metadata 0 13598025 vsellier@journal0 ~ % /opt/kafka/bin/kafka-consumer-groups.sh --bootstrap-server $SERVER --describe --group swh.search.journal_client.indexed Consumer group 'swh.search.journal_client.indexed' has no active members. GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID swh.search.journal_client.indexed swh.journal.indexed.origin_intrinsic_metadata 0 13598025 15051694 1453669 - - -
restart the service and the journal_client
root@search0:~# systemctl start swh-search-journal-client@objects.service root@search0:~# systemctl start swh-search-journal-client@indexed.service
- Vincent Sellier marked the checklist item Copy the backup of the index done in #2780 (closed) as completed
marked the checklist item Copy the backup of the index done in #2780 (closed) as completed
- Vincent Sellier marked the checklist item Restore the swh.search.journal_client consumer group offsets to migrated/migration$941 as completed
marked the checklist item Restore the swh.search.journal_client consumer group offsets to migrated/migration$941 as completed
- Vincent Sellier marked the checklist item Reset the swh.search.journal_client.indexed consumer group offsets to the beginning as completed
marked the checklist item Reset the swh.search.journal_client.indexed consumer group offsets to the beginning as completed
- Vincent Sellier marked the checklist item restart the service and the journal_client as completed
marked the checklist item restart the service and the journal_client as completed
- Vincent Sellier marked the checklist item wait for the backfill completion as completed
marked the checklist item wait for the backfill completion as completed
- Author Owner
The journal clients recovered, so the index is up-to-date. Let's check some point before closing :
- The index size looks huge (~10g) compared to before the deployment
- it seems some document have no origin_visit_type populated as they should :
swh=> select * from origin where url='deb://Debian/packages/node-response-time'; id | url -------+------------------------------------------ 15552 | deb://Debian/packages/node-response-time (1 row) swh=> select * from origin_visit where origin=15552 limit 1; origin | visit | date | type --------+-------+-------------------------------+------ 15552 | 1 | 2020-11-03 06:16:19.962182+00 | deb
{ "_index": "origin", "_type": "_doc", "_id": "17e7984da6467e6b56e7c7caff01821a8143bb58", "_version": 1, "_seq_no": 1783, "_primary_term": 1, "found": true, "_source": { "url": "deb://Debian/packages/node-response-time", "sha1": "17e7984da6467e6b56e7c7caff01821a8143bb58", "has_visits": true } }
- Author Owner
Regarding the index size, it seems it's due to a huge number of deleted documents (probably due to the backlog and an update of the documents at each change)
% curl -s http://${ES_SERVER}/_cat/indices\?v health status index uuid pri rep docs.count docs.deleted store.size pri.store.size green open origin HthJj42xT5uO7w3Aoxzppw 80 0 868634 8577610 10.5gb 10.5gb green close origin-backup-20210209-1736 P1CKjXW0QiWM5zlzX46-fg 80 0 green open origin-v0.5.0 SGplSaqPR_O9cPYU4ZsmdQ 80 0 868121 0 987.7mb 987.7mb green open origin-toremove PL7WEs3FTJSQy4dgGIwpeQ 80 0 868610 0 987.5mb 987.5mb <-- A clean copy of the origin index has almose the same size as yesterday
Forcing a merge seems restore a decent size :
% curl -XPOST -H "Content-Type: application/json" http://${ES_SERVER}/origin/_forcemerge {"_shards":{"total":80,"successful":80,"failed":0}}%
% curl -s http://${ES_SERVER}/_cat/indices\?v health status index uuid pri rep docs.count docs.deleted store.size pri.store.size green open origin HthJj42xT5uO7w3Aoxzppw 80 0 868684 3454 1gb 1gb green close origin-backup-20210209-1736 P1CKjXW0QiWM5zlzX46-fg 80 0 green open origin-v0.5.0 SGplSaqPR_O9cPYU4ZsmdQ 80 0 868121 0 987.7mb 987.7mb green open origin-toremove PL7WEs3FTJSQy4dgGIwpeQ 80 0 868610 0 987.5mb 987.5mb
It will be probably something to schedule regularly on production index if size matters
- Author Owner
Regarding the missing visit_type, one of the topic with the visit_type needs to be visited again to populate the fields for all the origins. As the index was restored from the backup, the fields was only set for the visits done since the last 15days. The offset will be reset for the origin_visit to limit the work.
- Author Owner
- stop the journal client
root@search0:~# systemctl stop swh-search-journal-client@objects.service root@search0:~# puppet agent --disable "stop search journal client to reset offsets"
- reset the offset for the
swh.journal.objects.origin_visit
topic:
vsellier@journal0 ~ % /opt/kafka/bin/kafka-consumer-groups.sh --bootstrap-server $SERVER --reset-offsets --topic swh.journal.objects.origin_visit --to-earliest --group swh.search.journal_client --execute GROUP TOPIC PARTITION NEW-OFFSET swh.search.journal_client swh.journal.objects.origin_visit 16 0 swh.search.journal_client swh.journal.objects.origin_visit 10 0 ...
- restart the journal client
root@search0:~# puppet agent --enable root@search0:~# systemctl start swh-search-journal-client@objects.service
The backlog recovering is in progress
- Owner
Comment discarded (unrelated to this task) and reported in a dedicated task [1]
- [1] #3067 (closed)
- Author Owner
the backfill is done, the search on metadata seems to work correctly.
The index statistics:
% curl -s http://${ES_SERVER}/_cat/indices\?v health status index uuid pri rep docs.count docs.deleted store.size pri.store.size green open origin HthJj42xT5uO7w3Aoxzppw 80 0 907324 1972532 8gb 8gb
- Vincent Sellier removed state:wip label
removed state:wip label
- Vincent Sellier closed
closed
- Antoine R. Dumont mentioned in issue #3067 (closed)
mentioned in issue #3067 (closed)
- Antoine R. Dumont mentioned in issue #4397 (closed)
mentioned in issue #4397 (closed)