- Nov 29, 2024
-
-
Nicolas Dandrimont authored
This communication thread is in charge of pulling the messages from kafka and handing them off to a processing thread, as well as doing regular polling of the rdkafka client (which in turn notifies the brokers that the consumer is still alive). Doing this allows the kafka communication thread to pause the kafka consumption explicitly when processing a batch of messages takes too long. This can in turn avoid a lot of rebalance traffic on the kafka brokers, and overall avoids a bunch of internal rdkafka timeouts.
-
- Nov 22, 2024
-
-
David Douard authored
-
- Nov 21, 2024
-
-
David Douard authored
Commit ac1a44f6 introduced a hard dependency on swh.model for any usage of the kafka writer. But there are other usages of the kafka journal than swh.storage.
-
- Aug 30, 2024
-
-
Antoine Lambert authored
-
Antoine Lambert authored
-
- Aug 27, 2024
-
-
David Douard authored
-
- Jul 22, 2024
-
-
Antoine Lambert authored
Fix broken tests due to the introduction of the ModelObjectType enum.
-
- Jun 03, 2024
-
-
Vincent Sellier authored
This reverts commit e625c64c as the consumers are now lost in an infinite loop after a partition rebalance. This commit simply revert the change because the on_eof option is almost not yet used by any consumer in production (only the journal scrubber). The proper way to implemennt the change will be investigate later. Related to swh/devel/swh-journal#4659
-
- May 17, 2024
-
-
Pierre-Yves David authored
It is no longer possible to instantiate such classes, so this tests are no longer needed (nor are they working).
-
- Apr 24, 2024
-
-
Vincent Sellier authored
When a CG is rebalanced, no messages are delivered to the consumer until the stabilization so a timeout occurs in the consume() method. The empty list is considered as we are at the end of the partition. The result is all the consumers of the group end at the same time if the rebalancing takes more than 10s. The fix changes the behavior to wait until a list of partitions are assigned to the consumer before testing if the end is reached.
-
- Mar 29, 2024
-
-
David Douard authored
-
- Feb 06, 2024
-
-
Antoine Lambert authored
Related to swh/meta#5075.
-
- Dec 05, 2023
-
-
David Douard authored
-
- Dec 04, 2023
-
-
David Douard authored
-
- Dec 03, 2023
-
-
David Douard authored
-
- Nov 28, 2023
-
-
David Douard authored
-
- Nov 20, 2023
-
-
David Douard authored
Convert README from markdown to ReST to make it embeddable in docs/index.rst
-
- Nov 16, 2023
-
-
Jérémy Bobbio (Lunar) authored
“Deleting” an event in Kafka is a two-step process. First, a new event is added for the key to be deleted with `null` as its value. Such events are known as tombstones. When topics are configured to use compaction, older events will actually be deleted after specific thresholds have been reached. Tombstones themselves usually also linger for a while in a topic. This gives a chance for consumers to learn that a given key has been deleted. This is configured by `delete.retention.ms`. For Software Heritage, we should still not rely on consumers of the journal actually seeing these tombstones to handle object deletions. If they lag too much, the tombstone will eventually be removed (together with the actual data) from the journal. This shall be handled by #4658 instead. Normally, compaction will be triggered when the ratio of dirty data to total data reaches the threshold set by the `min.cleanable.dirty.ratio` configuration. `min.compaction.lag.ms` can be set to prevent overly aggressive cleaning. This provides a minimum period of time for applications to see an event prior to its deletion. `max.compaction.lag.ms` sets the time limit before triggering a compaction, regardless of the amount of dirty data. For more information see: https://developer.confluent.io/courses/architecture/compaction/ The `delete` method is only implemented for KafkaJournalWriter because the semantics are so closely aligned with Kafka’s. Based on the initial merge request !233 written by olasd. Closes: #4657
-
- Sep 04, 2023
-
-
Antoine Lambert authored
Turn shared memory use optional in journal writer memory backend and disable its use by default. Such backend is typically created in tests of swh packages but only a single instance is used so enabling shared memory is not required. This brings a great speedup when executing tests using a journal writer with memory backend but also prevent flaky tests.
-
- Jul 24, 2023
-
-
Antoine Lambert authored
It raises ValueError otherwise.
-
- Jul 12, 2023
-
-
Antoine Lambert authored
-
Antoine Lambert authored
-
Antoine Lambert authored
This new parameter enables to create kafka topics if they do not exist when initializing a journal client. It should not be set to True in production, its purpose is to delegate topics creation to journal clients in SWH docker environment instead of creating all topics prior starting the kafka consumer services. Proceeding like this brings a huge speedup to initialize the docker compose environment.
-
- May 15, 2023
-
-
vlorentz authored
-
- May 05, 2023
-
-
Nicolas Dandrimont authored
-
- May 04, 2023
-
-
vlorentz authored
Without it, it's impossible to monitor/visualize a specific group
-
- Mar 13, 2023
-
-
David Douard authored
recent change broke the statsd reporting at swh.journal.client level by missing the namespace argname while instanciating the Statsd class, thus making "swh_journal_client" the host os the Statsd instance. Add a test for the usage of statsd in the journal client.
-
- Mar 03, 2023
-
- Feb 17, 2023
-
-
Antoine Lambert authored
Related to swh/meta#4960
-
Antoine Lambert authored
Related to swh/meta#4960
-
- Feb 16, 2023
-
-
Jérémy Bobbio (Lunar) authored
Related to swh/meta#4959
-
- Feb 02, 2023
-
-
Antoine Lambert authored
This fixes python 3.7 support due to poetry, a dependency of isort, that removed support for that Python version in a recent release.
-
- Dec 19, 2022
-
-
Antoine Lambert authored
In order to remove warnings about /apidoc/*.rst files being included multiple times in toc when building full swh documentation, prefer to include module indices only when building standalone package documentation. Also include them the proper sphinx way. Related to T4496
-
- Nov 15, 2022
-
-
Jérémy Bobbio (Lunar) authored
`make install` is not working since commit ed9d6827 (Feb 2021) as the confluent-kafka-python mock broker removed the need for it and has been in use since April 2020.
-
- Oct 25, 2022
-
- Oct 21, 2022
-
-
David Douard authored
using a constructor 'auto_flush' bookean argument. The idea is that in a test session, each call to 'flush()' takes ~1s to run, so having the test handling a single call to flush() when needed (instead of n calls) make test execution significantly faster. For example, swh-storage's test_replay.py execution went from ~140s to ~40s, and test_backfill.py from ~40s to ~15s.
-
- Oct 18, 2022
-
-
David Douard authored
this actually speeds up tests quite a bit (preventing a 30s timeout when the fixture is actually reused several times in a test session).
-
David Douard authored
- pre-commit from 4.1.0 to 4.3.0, - codespell from 2.2.1 to 2.2.2, - black from 22.3.0 to 22.10.0 and - flake8 from 4.0.1 to 5.0.4. Also freeze flake8 dependencies. Also change flake8's repo config to github (the gitlab mirror being outdated).
-
- Jun 16, 2022
-
-
Antoine R. Dumont authored
So it can be cross-linked from another part of the documentation.
-
- Jun 08, 2022
-
-
David Douard authored
908f0154 introduced an unexpected change of behavior of the InMemoryJournalWriter, making it always anonymize objects. This broke tests for some dependencies (esp. swh-storage). This fix adds an 'anonymize' (bool) argument to the class constructor (similar to the KafkaJournalWriter).
-