- Sep 10, 2021
-
-
vlorentz authored
-
- Sep 09, 2021
-
-
vlorentz authored
This should make it run up to 100 times faster, even on average directories.
-
vlorentz authored
Instead of fetching them one-by-one, with the very high latency this entails. This is preliminary work to make `directory_ls` less painfully slow.
-
vlorentz authored
And fall back to concurrent insertion.
-
- Sep 08, 2021
-
-
vlorentz authored
By reusing the 'steady state' main statement (which is quite large) across calls.
-
vlorentz authored
This adds a new config option for the cassandra backend, 'directory_entries_insert_algo', with three possible values: * 'one-per-one' is the default, and preserves the current naive behavior * 'concurrent' and 'batch' are attempts at being more efficient
-
- Sep 06, 2021
-
-
vlorentz authored
This will be used as a second pass on objects that failed with older versions of the script.
-
- Sep 03, 2021
-
-
vlorentz authored
-
- Aug 31, 2021
-
-
vlorentz authored
They were inaccurate and a performance bottleneck. We can/should use swh-counters instead, now.
-
- Aug 30, 2021
-
-
Vincent Sellier authored
resulting in OriginVisitStatus trying to put a snapshot id in the metadata field Related to T3539
-
Vincent Sellier authored
Related to T3517
-
- Aug 27, 2021
-
-
vlorentz authored
It will be used in the Cassandra experiment. Currently we use the built-in counters of the Cassandra backend; but in addition to being inaccurate, they seem to be a bottleneck. This proxy will be a lightweight solution for counting object insertion, without needing to run Kafka on the test cluster.
-
- Aug 24, 2021
-
-
vlorentz authored
It was unclear this actually worked; I had to write this test to realize the code wasn't buggy. Also replaced a conditional that is always False (because Cassandra always returns results in the order of the clustering key) with an assertion, so the code is less confusing.
-
vlorentz authored
When called by a replayer, the visit.visit field is set; but origin.next_visit_id was never incremented, so on the next loader run, the visit id would be 1 even if there is already a visit with that id.
-
vlorentz authored
Instead of calling content_find() for each object, which needs to make two queries for each. Given the latency of Cassandra queries, this should be a significant speed-up (possibly up to 100 times faster, as this is the value of PARTITION_KEY_RESTRICTION_MAX_SIZE). This also changes the schema, because CQL does not allow doing `IN` queries on compound partition keys.
-
Vincent Sellier authored
Related to T3485
- Aug 06, 2021
-
- Jul 27, 2021
-
- Jul 23, 2021
-
-
Nicolas Dandrimont authored
This fields allows having multiple version of the ExtID -> SWHID mapping, for instance when the implementation of a loader changes in a backwards-incompatible way. For now, we don't change the API used to query or store ExtIDs. When querying for the SWHIDs corresponding to a given external objects, all versions are returned, and the client is expected to do the filtering.
-
- Jul 07, 2021
-
-
Vincent Sellier authored
The default ONE level is used to keep the previous behaviour Related to T3396
-
- Jun 28, 2021
-
-
vlorentz authored
This allows mypy to actually type-check calls to db methods. This commit also fixes an issue found by mypy.
-
vlorentz authored
-
vlorentz authored
This will make it easier for users of swh-web to discover metadata on a given SWHID, as you otherwise need to specify an authority to fetch metadata.
-
- Jun 25, 2021
-
-
vlorentz authored
We agreed a while ago they are IRIs, and we have some of them in the postgresql database already.
-
- Jun 15, 2021
-
-
vlorentz authored
This will be used by swh-web to allow downloading them from a non-JSON endpoint.
-
- Jun 09, 2021
-
-
Antoine Lambert authored
-
- May 21, 2021
-
-
vlorentz authored
All features work but snapshot_count_branches, because ScyllaDB does not support user-defined aggregates yet. Migration tests hang when run after the regular tests, but I can't figure out why. This should not be an issue for now, as we won't run Scylla tests on the CI.
-
Antoine R. Dumont authored
This will remove further deprecation warnings from the tests, especially the ones from other modules depending on the storage's pytest-plugin. This also fixes some edge case configuration for the backfill and the storage rpc backend which would have been broken if we switched to that new name prior to this. Related to b487a21f
-
- May 19, 2021
- May 18, 2021
-
- May 14, 2021
-
- May 11, 2021
-
-
vlorentz authored
Before this commit, the only way to get Content objects from their sha1_git was to call content_find for each object. This was obviously neither convenient nor efficient. Using this endpoint to batch calls reduces the runtime of the git-bare vault cooker by 30%.
-
vlorentz authored
It spares a join with the content table, which should hopefully make the vault (and possibly other users) faster when they don't need this join.
-
vlorentz authored
-
- May 10, 2021
-
-
David Douard authored
-
David Douard authored
to clean a bit the swh.storage namespace.
-
- May 07, 2021
-
-
David Douard authored
give a chance to one-object batches to be ingested, and reduce the number of objects wrongly reported as non-ingested, e.g. during a replayer session, where this situation can occur.
-
- May 06, 2021
-
-
vlorentz authored
It renamed db_name to dbname, which is a breaking change.
-