Commits · 5fec9fdd331fa77d49b46946165d84a203657876 · Platform / Development / swh-journal

Mar 17, 2020
- journal.replay: Align _fix_content with other fix methods · 5fec9fdd
  Antoine R. Dumont authored 5 years ago
  
  5fec9fdd
- journal.replay: Align fix revision behavior to other fix methods · fbd82953
  Antoine R. Dumont authored 5 years ago
  
  fbd82953
- Remove extra 'perms' key of contents when replaying. · 4ce17626
  vlorentz authored 5 years ago
  
  It crashes when the dict is converted to an swh-model object.
  4ce17626
Mar 16, 2020
- Migrate to latest origin_visit_upsert/add api changes · d3d5b797
  Antoine R. Dumont authored 5 years ago
  
  Related to D2813 Related to D2820
  View commits for tag v0.0.27 v0.0.27
  
  d3d5b797
Mar 13, 2020
- journal: Use swh-model objects instead of dicts in replay and writer · cad510eb
  Antoine R. Dumont authored 5 years ago
  
  cad510eb
- replay: Filter out colliding contents when replaying · 3c0e4913
  Antoine R. Dumont authored 5 years ago
  
  This allows to decrease the number of contents not replayed when hash collisions happened.
  3c0e4913
- tox: Add optional py3-dev dependency to ease debugging · e2429199
  Antoine R. Dumont authored 5 years ago
  
  e2429199
Mar 10, 2020

Use better kafka producer semantics in the journal writers · 084ef272

What we really want is for the broker to acknowledge all messages before we go
on to the next step. That's accomplished by flushing the producer rather than
enabling idempotence (which has other side-effects, such as only-once delivery,
which we don't really care about as all our consumers are, in effect,
idempotent).

Setting acks to all means that the broker acknowledges that all in-sync replicas
have persisted the message, which is a stronger guarantee than what we had
before.

084ef272

Make the number of messages processed at a time by journal clients configurable · b1b125fb

Nicolas Dandrimont authored 5 years ago

This setting needs to be tuned differently for different topics, as the size of
objects varies quite wildly. Bumping the default to 200 reduces chatter in the
consumer group which should reduce the amount of bandwidth used for consumption.

b1b125fb

Drop deprecated cli options · a34bde05
Nicolas Dandrimont authored 5 years ago

a34bde05
Replace deprecated options with a config file override in cli tests · b4463acb
Nicolas Dandrimont authored 5 years ago
```
This paves the way for removal of all the deprecated cli options in `swh
journal`.
```
b4463acb

Clean up the signature of test_cli's invoke method · 9fe8f263

Nicolas Dandrimont authored 5 years ago

 - Drop catch_exception as it's not used anywhere (and we always assert on
 result.return_code)
 - Make the argument list a starargs instead of having to make a list

9fe8f263

Migrate test cli config to a dict instead of raw yaml · 4cf8f9d5
Nicolas Dandrimont authored 5 years ago
```
In preparation of being able to override parts of the configuration.
```
4cf8f9d5

kafka: normalize KafkaJournalWriter.write_addition[s] API · 82df6ace

David Douard authored 5 years ago

remove the 'flush' argument from these methods' argument list: it
does not exists in the InMemoryJournalWriter version of this
service, and it is in fact useless.

82df6ace

Rename JournalClient.max_messages to JournalClient.stop_after_objects · 210d207e

Nicolas Dandrimont authored 5 years ago

After various refactorings, the meaning of `max_messages` got muddled and we're
at a point where it doesn't mean anything anymore. `stop_after_objects` is
clearer as to what behavior the parameter is actually trying to achieve. We also
rename the replay command-line argument to the same name.

These refactorings also had us end up with a loop inside JournalClient.process,
while some callers still had a loop around it which ever got called once (or 0
times when the surrounding code was buggy). This commit removes all these outer
loops as well, keeping only the `JournalClient.process` inner loop.

While we're here, we use the opportunity to clarify and expand the documentation
of the JournalClient.

210d207e

Be more careful with content generation in test_write_replay · dd5d25b4

Nicolas Dandrimont authored 5 years ago

When using the object_dicts() strategy, we often end up not generating any
contents; This works because the `replayer.process` call is within a loop,
guarded with the length of the queue.

If the queue is empty, `replayer.process` is never called, but the test is also
useless. So we add an assertion to that effect.

dd5d25b4

Add type annotations to swh.journal.client arguments · fd453b47
Nicolas Dandrimont authored 5 years ago

fd453b47

Mar 06, 2020
- Unify tense for content replay statistics log entry · ef913e29
  Nicolas Dandrimont authored 5 years ago
  
  ef913e29
- Add missing log4j.properties file from MANIFEST.in · 5f0f311c
  Nicolas Dandrimont authored 5 years ago
  
  This allows getting more verbose logs from kafka during tests on packaged versions of the module (e.g. when using tox).
  5f0f311c
- Unify retry/error handling for content replay · dad19f86
  Nicolas Dandrimont authored 5 years ago
  
  This uses a custom wrapper exception and tenacity callbacks to log exceptions when the copy of a given content fails several times. This makes the consumer more robust (fewer crashes), which in turns allows fewer consumer rebalances, which finally drastically reduces the consumer bandwidth consumption. At this point, the retry of "definitely" failed content replays needs to be handled separately.
  dad19f86
- Make the flaky object storage generic · 06688ba0
  Nicolas Dandrimont authored 5 years ago
  
  This will allow us to reuse it in more tests.
  View commits for tag v0.0.26 v0.0.26
  
  06688ba0
- Better structure for object copy logs · 0a2b6010
  Nicolas Dandrimont authored 5 years ago
  
  Uses logging keyword arguments for filtering, instead of plain arguments
  0a2b6010
- Handle ObjNotFoundErrors separately from other replay errors · 361b64f4
  Nicolas Dandrimont authored 5 years ago
  
  361b64f4
- journal.replay: Batch insert contents/skipped_contents · 632f1719
  Antoine R. Dumont authored 5 years ago
  
  632f1719
Mar 04, 2020
- Add some tenacity to checking whether an object is in the destination · ff741ad0
  Nicolas Dandrimont authored 5 years ago
  
  Containment checking has a tendency to fail with an error 500 on S3, retrying smooths that out.
  ff741ad0
Mar 03, 2020

Actually document the new flag for process_replay_objects_contents · 8daf401d
Nicolas Dandrimont authored 5 years ago
```
Replaces docstring annotations with type annotations as well
```
8daf401d

Generate a new kafka consumer group name for each test · 8edb539c

Nicolas Dandrimont authored 5 years ago

This avoids reusing the same consumer group name on subsequent tests, which is a
problem when some of the tests change the broker-side consumer group rebalance
timeouts.

8edb539c

Add a flag to copy objects only if they don't exist in the destination · f6495420
Nicolas Dandrimont authored 5 years ago
```
This trades bandwidth/processing time for more API queries, which can be a win
if your exclusion file is a bit stale.
```
f6495420
Stop hardcoding the number of contents in the replay cli tests · a25c773b
Nicolas Dandrimont authored 5 years ago

a25c773b
Give better granularity to the CONTENT_OPERATIONS_METRIC statsd probe · e3ad04b5
Nicolas Dandrimont authored 5 years ago
```
This allows us to get a better sense of the actual work done by the content
replayer.
```
e3ad04b5

Add support for the static consumer group feature to journal client · 6098d93f

Nicolas Dandrimont authored 5 years ago

The new KAFKA_GROUP_INSTANCE_ID env variable can be set on journal clients to
set the group.instance.id enable support for Kafka Static Consumer Groups.

https://github.com/edenhill/librdkafka/pull/2525
https://cwiki.apache.org/confluence/display/KAFKA/KIP-345%3A+Introduce+static+membership+protocol+to+reduce+consumer+rebalances

Combined with larger values for the session.timeout.ms and max.poll.interval.ms
settings, this setting informs the Group Coordinator broker that the consumer
group has static membership, and that the disappearance of a given member of the
consumer group should not immediately trigger a rebalance; This allows crashing
consumers to re-join the consumer group and start consuming from their assigned
partitions immediately.

This setting is implemented as an environment variable so that several consumers
in the same group can share a configuration file, and still override the
value (e.g. by setting `Environment=KAFKA_GROUP_INSTANCE_ID=groupname-%i` in a
systemd template unit).

When this setting is enabled, we also up the relevant values for
session.timeout.ms and max.poll.interval.ms.

6098d93f

Upgrade kafka to 2.4.0 · a3cc5772

Nicolas Dandrimont authored 5 years ago

This matches the production environment.

We need to add some bits to the zookeeper configuration, as the zookeeper
bundled with kafka 2.4.0 starts the adminserver, which tries to bind to port
8080, by default.

a3cc5772

Mar 02, 2020

replay: drop the hand-rolled retry contextmanager · 1e277b41

Nicolas Dandrimont authored 5 years ago

It makes the consumer crash with `generator didn't stop after throw()`
exceptions. We'll reintroduce retry behavior at a later stage.

1e277b41

Punch through the storage.objstorage "collaborator" to get to the actual objstorages · 82a0a748

Nicolas Dandrimont authored 5 years ago

This change introduced subtly confusing behavior, where the storage.objstorage
__getattr__ proxy hides itself, until you try a dunder method (e.g. `foo in
storage.objstorage` or `iter(storage.objstorage)`), blows up in your face with a
confusing exception (because the collaborator is _also_ called ObjStorage).

Using the underlying objstorage works around that issue.

82a0a748

Mar 01, 2020
- swh.journal.tests: Fix missing new attribute on mock journal client · 198f5bdd
  Antoine R. Dumont authored 5 years ago
  
  This fixes the current ci build failure [1] [1] https://jenkins.softwareheritage.org/job/DJNL/job/tests/717/console
  198f5bdd
Feb 26, 2020
- JournalClient: add a stop_at_eof boolean to read the log only once · 6ca43d5c
  Antoine Pietri authored 5 years ago
  
  6ca43d5c
Feb 25, 2020

JournalClient: split main loop in three functions · eea69820

Antoine Pietri authored 5 years ago

Allow clients to override this behavior more easily in the client,
notably the deserialization step where message metadata can be added to
the objects.

eea69820

Feb 12, 2020
- Use swh-storage validation proxy. · 038418a6
  vlorentz authored 5 years ago
  
  Required by swh-storage >= v0.0.172.
  038418a6
Feb 11, 2020
- Add comment explaining the 10ms extra timeout before consumer.consume(). · a5a4768b
  vlorentz authored 5 years ago
  
  a5a4768b
Feb 10, 2020
- Use skipped_content_add instead of content_add_metadata for skipped content. · b377a3b4
  vlorentz authored 5 years ago
  
  This new endpoint is now required by swh-storage for skipped contents.
  b377a3b4