Enable the journal-writer for the swh-idx-storage in production

marked this issue as related to swh/meta#2590 (closed)

added Journal System administration priority:Normal labels

Is this supposed to be persistent (and keep the full history of all messages), or transient (and used for "real-time" clients)? IOW, what are the storage requirements for this?

Where is the list of topics that need to be created?

I think we should definitely use a different prefix as swh.storage, as the ACLs for third parties should be separate.

It's unclear what the prefix should be. swh.storage uses swh.journal.objects, we can either use that one too, or a new one, eg. swh.journal.indexed I think we should definitely use a different prefix as swh.storage, as the ACLs for third parties should be separate.

so, heads up, the topic prefix swh.journal.indexed has been elected and declared in the current staging diff infra/puppet/puppet-swh-site!267

Where is the list of topics that need to be created?

I'd say in swh.indexer.storage.__init__.py:

./__init__.py:        self.journal_writer.write_additions("content_mimetype", mimetypes)
./__init__.py:        self.journal_writer.write_additions("content_language", languages)
./__init__.py:        self.journal_writer.write_additions("content_ctags", ctags)
./__init__.py:        self.journal_writer.write_additions("content_fossology_license", licenses)
./__init__.py:        self.journal_writer.write_additions("content_metadata", metadata)
./__init__.py:        self.journal_writer.write_additions("revision_intrinsic_metadata", metadata)
./__init__.py:        self.journal_writer.write_additions("origin_intrinsic_metadata", metadata)

! In #2780 (closed), @olasd wrote: Is this supposed to be persistent (and keep the full history of all messages), or transient (and used for "real-time" clients)? IOW, what are the storage requirements for this?

I'd say transient, as we can always recompute it. But this means backfilling the journal every time we add a new client that needs to get all the messages, so I don't know.

Where is the list of topics that need to be created?

Answered by @ardumont

I think we should definitely use a different prefix as swh.storage, as the ACLs for third parties should be separate.

Agreed

I propose meeting in the middle and having the following policies:

content topics: transient, bound by volume
revision / origin topics: persistent

I expect the content topics to be the most "volatile" and heavy, and the revision / origin topics to be the most useful to keep in the long term for third party clients.

Does that make sense?

Is there some remaining blocker on this? (If not i'll attend to it next week)

I just mention some on #2912 (closed) but it's unclear whether that's actually true of me misremembering things.

added state:wip label

I just mention some on #2912 (closed) but it's unclear whether that's actually true of me misremembering things.

I was not misremembering but T2876 got fixed in between.

Some preparatory work needs to be tested on staging first following T2876. I'm attending to that.

tl; dr deployed on staging and it seems ok.

Point of attention on the index side though (it's growing quite large fast and we are only on staging).

(Details below)

Some preparatory work needs to be tested on staging first following T2876.

Namely checking that it's actually ok and for this.

Actually updating our manifest infra/puppet/puppet-swh-site!264.

Stop the swh-search-journal-client@objects so it stops writing to the index.

systemctl stop swh-search-journal-client@objects

Backup the current staging index (as a snapshot, just in case the activation of the new service mess things up):

root@search-esnode0:~# curl -XPOST -H "Content-Type: application/json" http://${ES_SERVER}/_reindex\?pretty\&refresh=true\&requests_per_second=-1\&\&wait_for_completion=true -d @/tmp/backup.json
{
  "took" : 102654,
  "timed_out" : false,
  "total" : 496619,
  "updated" : 0,
  "created" : 496619,
  "deleted" : 0,
  "batches" : 497,
  "version_conflicts" : 0,
  "noops" : 0,
  "retries" : {
    "bulk" : 0,
    "search" : 0
  },
  "throttled_millis" : 0,
  "requests_per_second" : -1.0,
  "throttled_until_millis" : 0,
  "failures" : [ ]
}

Backup the offsets as well just in case (migrated/migration$941). Puppet will start back swh-search-journal-client@objects... (so eventually if something goes wrong, we'll reset those alongside the snapshot index to install back).

Now, after landing and deploying the diff ^, apply and check everything runs fine:

Snaphot status detail on current indices:

ardumont@search0:~% curl http://search-esnode0.internal.staging.swh.network:9200/_cat/indices\?v
health status index                       uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green open  origin                      xBl67YKsQbWAt7V78UeDLA 80 0 496619 5145 348.7mb 348.7mb
green open  origin-backup-20210209-1736 P1CKjXW0QiWM5zlzX46-fg 80 0 496619    0 156.6mb 156.6mb

After deployment, everything is going fine.

BUT the index is growing quite large and fast...

health status index                       uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   origin                      xBl67YKsQbWAt7V78UeDLA  80   0     622296        54024        1gb            1gb

Note: consumer group's lag is subsiding (expectedly):

root@journal0:~# /opt/kafka/bin/kafka-consumer-groups.sh --bootstrap-server $SERVER --describe --group swh.search.journal_client.indexed --all-topics

GROUP                             TOPIC                                         PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG             CONSUMER-ID                                  HOST            CLIENT-ID
swh.search.journal_client.indexed swh.journal.indexed.origin_intrinsic_metadata 0          10882458        13557845        2675387         rdkafka-7c45245c-814f-47f1-ba67-041e4f426373 /192.168.130.90 rdkafka
...
swh.search.journal_client.indexed swh.journal.indexed.origin_intrinsic_metadata 0          10979458        13558110        2578652         rdkafka-7c45245c-814f-47f1-ba67-041e4f426373 /192.168.130.90 rdkafka
...
swh.search.journal_client.indexed swh.journal.indexed.origin_intrinsic_metadata 0          11274458        13558652        2284194         rdkafka-7c45245c-814f-47f1-ba67-041e4f426373 /192.168.130.90 rdkafka

Note: Regarding the partition (only 1 here), we'll need to create first-hand the consumer group to have a better partition configuration for the production.

Grafana's ETA estimation [1] is ~1h

[1] https://grafana.softwareheritage.org/goto/I0JcyVPGk

swh-search-journal-client@indexed kept up with its topic:

swh.search.journal_client.indexed swh.journal.indexed.origin_intrinsic_metadata 0          13653216        13653216        0               rdkafka-7c45245c-814f-47f1-ba67-041e4f426373 /192.168.130.90 rdkafka

And the index size stabilized at 1Gb (out of an inital 156mb).

health status index                       uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   origin                      xBl67YKsQbWAt7V78UeDLA  80   0     786803        85285        1gb            1gb
green  open   origin-backup-20210209-1736 P1CKjXW0QiWM5zlzX46-fg  80   0     496619            0    156.6mb        156.6mb

Note that the "docs.count" grew though (from 496619 to 786803) and the reason are unclear.

The same index is used to store the metadata out of the indexer with the same origin url as key [1] and we are computing index metadata on origins already seen (thus already present in the index afaiui). So I would have expect the docs.count stay roughly (or even exactly?) the same as before?

[1] well the sha1 of the origin computed by search but still

Note that the "docs.count" grew though (from 496619 to 786803) and the reason are unclear.

The same index is used to store the metadata out of the indexer with the same origin url as key [1] and we are computing index metadata on origins already seen (thus already present in the index afaiui). So I would have expect the docs.count stay roughly (or even exactly?) the same as before?

Well, red herring apparently ¯_(ツ)_/¯:

ardumont@search0:~% curl -s http://$ES_NODE/origin/_count\?pretty | jq .count
499164
ardumont@search0:~% curl -s http://$ES_NODE/origin-backup-20210209-1736/_count\?pretty | jq .count
496619

The size order is actually roughly the same! And not exactly because we are indexing new origins along the way as well.

(Thus the ~3k delta between the indices)

I did too much here. Finish the pipeline swh-indexer -> swh-search on staging (so that's good nonetheless)

The point of that task was only about making the indexer storage write to its topics though. So I'm going to do that now.

mentioned in commit swh-sysadmin-provisioning@764ab396

We'll prepare the topics with the following first and we'll improve later if need be:

staging:

export SERVER=journal0.internal.staging.swh.network:9092
for topic in content_mimetype content_language content_ctags content_fossology_license content_metadata revision_intrinsic_metadata origin_intrinsic_metadata; do
  /opt/kafka/bin/kafka-topics.sh --bootstrap-server $SERVER --create --config cleanup.policy=compact --partitions 64 --replication-factor 1 --topic "swh.journal.indexed.$topic"
done

Run:

prod:

export SERVER=kafka1.internal.softwareheritage.org:9092
for topic in content_mimetype content_language content_ctags content_fossology_license content_metadata revision_intrinsic_metadata origin_intrinsic_metadata; do
  /opt/kafka/bin/kafka-topics.sh --bootstrap-server $SERVER --create --config cleanup.policy=compact --partitions 256 --replication-factor 2 --topic "swh.journal.indexed.$topic"
done

Run:

root@kafka1:~# for topic in content_mimetype content_language content_ctags content_fossology_license content_metadata revision_intrinsic_metadata origin_intrinsic_metadata; do
>   /opt/kafka/bin/kafka-topics.sh --bootstrap-server $SERVER --create --config cleanup.policy=compact --partitions 256 --replication-factor 2 --topic "swh.journal.indexed.$topic"
> done
WARNING: Due to limitations in metric names, topics with a period ('.') or underscore ('_') could collide. To avoid issues it is best to use either, but not both.
Created topic swh.journal.indexed.content_mimetype.
WARNING: Due to limitations in metric names, topics with a period ('.') or underscore ('_') could collide. To avoid issues it is best to use either, but not both.
Created topic swh.journal.indexed.content_language.
WARNING: Due to limitations in metric names, topics with a period ('.') or underscore ('_') could collide. To avoid issues it is best to use either, but not both.
Created topic swh.journal.indexed.content_ctags.
WARNING: Due to limitations in metric names, topics with a period ('.') or underscore ('_') could collide. To avoid issues it is best to use either, but not both.
Created topic swh.journal.indexed.content_fossology_license.
WARNING: Due to limitations in metric names, topics with a period ('.') or underscore ('_') could collide. To avoid issues it is best to use either, but not both.
Created topic swh.journal.indexed.content_metadata.
WARNING: Due to limitations in metric names, topics with a period ('.') or underscore ('_') could collide. To avoid issues it is best to use either, but not both.
Created topic swh.journal.indexed.revision_intrinsic_metadata.
WARNING: Due to limitations in metric names, topics with a period ('.') or underscore ('_') could collide. To avoid issues it is best to use either, but not both.
Created topic swh.journal.indexed.origin_intrinsic_metadata.

Only noticing now that we have only one indexer currently running in staging (so only 1 topic is currently written there).

So some more indexer got deployed there to check the journal is holding up ok (it does [1]).

After lunch, on with the production.

[1] https://grafana.softwareheritage.org/goto/GDoGPNEMk

Enable the journal-writer for the swh-idx-storage in production

Designs

Child items ...

Activity