- Feb 01, 2021
-
-
vlorentz authored
self._index_contents was called multiple times in a loop with the same arguments, except for the set of hashes to exclude. It means that, if there were N pages of hashes to exclude, each content was indexed N times; and the N-1 first iterations didn't even exclude all the hashes they had to exclude.
- Jan 04, 2021
-
-
David Douard authored
-
- Dec 04, 2020
-
-
Antoine R. Dumont authored
The reason for this is to avoid surprises like the indexer journal client stuck in limbo for a while. Related to T2821 Related to T2814
-
- Dec 02, 2020
-
-
Antoine R. Dumont authored
Related to D4638
-
Antoine R. Dumont authored
This detected some paper cuts within cli tests for example. The main goal is to decrease friction when actually deploying indexer related services (backend, indexers, ...). The pg backends tests should still be reasonably fast as it's using the swh.core.db.pytest_plugin (which truncate tables in between tests). Related to T2821
-
- Nov 27, 2020
-
-
Antoine R. Dumont authored
According to the value_sanitizer docstring, this takes 2 parameters, first is the object type, the second is the actual dict value to sanitize. As a somewhat default identity function, this discards the object type and returns directly the dict value unchanged. [1] https://forge.softwareheritage.org/source/swh-journal/browse/master/swh/journal/writer/kafka.py$97-100
-
vlorentz authored
This always happens when writing to Kafka, as the Kafka writer sets it to None at the same it it injects the 'tool' data. This was not caught by tests because they use the in-mem writer; which did not call unique_key() at all in swh-journal<=v0.5.1 (but future versions will).
-
Antoine R. Dumont authored
This fixes the indexer debian package build.
-
Antoine R. Dumont authored
It's simplifying reading and more consistent with other similar tests
-
- Nov 26, 2020
-
-
Antoine R. Dumont authored
-
Antoine R. Dumont authored
... instead of OriginVisit. OriginVisit model object no longer hold status information so the current filtering happening on the journal client side could not work. Related to T2814 Related to P882
-
Vincent Sellier authored
The minimum configuration is provided either by the --config-file or the --broker parameters Related to T2814
-
- Nov 16, 2020
-
-
Nicolas Dandrimont authored
swh.storage and swh.objstorage, as well as swh.indexer itself, have deprecated using an explicit `args` in their factories for a while; we can drop them now.
-
Nicolas Dandrimont authored
vcversioner was already removed months ago.
-
- Nov 10, 2020
-
-
vlorentz authored
-
- Nov 05, 2020
-
-
vlorentz authored
postgresql kindly returns the results in the order the test expected... most of the time.
- Nov 03, 2020
-
-
vlorentz authored
Commits in the last month made these endpoints more consistent with the other ones (though not completely), so these tests don't need to be skipped anymore.
-
vlorentz authored
By removing the False behavior, which we didn't use in practice, and was removed from the indexers in the previous commit. The main motivation is to make _add endpoints write to Kafka in a future commit, as Kafka's semantics are closer to conflict_update=True than conflict_update=False.
-
vlorentz authored
By removing the False/ignore-dups behavior, which we didn't use. The main motivation is to make _add endpoints write to Kafka in a future commit, as Kafka's semantics are closer to True/update-dups than False/ignore-dups.
-
- Nov 02, 2020
-
-
vlorentz authored
This was expected to be used in these two cases: 1. if we remove mappings or file detection from a metadata indexer 2. if an origin removes all its metadata files but: 1. if we do so, then we should bump the indexer version, so the old metadata will be preserved anyway, as different indexer versions get different indexer_configuration_ids 2. this should be a rather rare even, and even if it happens, we might want to keep the old metadata anyway rather than nothing (even if it's outdated), for search purposes. Additionally, this commit is motivated by: * that's less issues to deal with when writing to Kafka (the journal writer currently doesn't support suppression; and we would also have to add support for deletion in all consumers) * less code (~250 lines)
-
- Oct 30, 2020
-
-
Antoine R. Dumont authored
This should fix [1] [1] https://jenkins.softwareheritage.org/job/DENV/job/tests/lastBuild/artifact/swh-indexer.log
-
- Oct 29, 2020
-
-
Antoine R. Dumont authored
According to the task schema, that parameter should not be null and defaults to 0. But current scheduler backend implementation expects a value nonetheless and if it's null, that breaks the call. This should fix [1] [1] https://jenkins.softwareheritage.org/job/DENV/job/tests/lastFailedBuild/artifact/swh-indexer.log
-
Antoine R. Dumont authored
So the `swh db *` commands work expectedly. Related to T2736
-
- Oct 16, 2020
-
-
Antoine R. Dumont authored
-
Antoine R. Dumont authored
-
Antoine R. Dumont authored
-
- Oct 15, 2020
-
-
Antoine R. Dumont authored
Related to T1410
-
- Oct 08, 2020
-
-
vlorentz authored
They are all migrated to attr classes now.
- Oct 07, 2020
-
-
vlorentz authored
also add typing to some test functions
-
vlorentz authored
also add typing to some test functions
-
vlorentz authored
-
vlorentz authored
-
vlorentz authored
-
vlorentz authored
For consistency with the main storage.
-
vlorentz authored
-
vlorentz authored
1. it was wrongfully annotated as '-> TResult' even though some indexers can return None 2. in a future commit, the fossology indexer will need to return multiple results.
-