- Apr 06, 2021
-
-
David Douard authored
this attribute is deprecated and on the verge of being replaced by RawExtrinsicMetadata objects, and the kafka journal currently in production contains a few invalid metadata entries that makes the replayer unhappy. Closes T3201.
-
David Douard authored
-
David Douard authored
and explicitely check for extid objects in the journal in TestStorage.
-
- Mar 30, 2021
-
-
vlorentz authored
They can't have any extrinsic metadata, so fetching git revisions wastes a lot of time.
-
- Mar 26, 2021
-
-
vlorentz authored
It did not make sense for multiple reasons: 1. two extids can point to the same target (eg. extids with type git and git-sha256; or two package managers with different checksums) 2. inserting two objects with the same target or extid in a single call actually wrote both, but would crash when reading 3. inserting extid1 then extid2 would write both to Kafka, but only extid1 would be inserted. When replaying on a new DB, extid2 may be inserted and extid1 ignored Points 2 and 3 are simply fixable bugs, but 1 is an issue by design, and this commit fixes all of them at once.
-
vlorentz authored
-
- Mar 22, 2021
-
-
vlorentz authored
For now, this has absolutely no effect on the API users, as rows are already deduplicated based on a subset of the fields hashed by the id.
- Mar 15, 2021
-
-
vlorentz authored
Must add to the objstorage before the DB and journal. Otherwise: 1. in case of a crash the DB may "believe" we have the content, but we didn't have time to write to the objstorage before the crash 2. the objstorage mirroring, which reads from the journal, may attempt to read from the objstorage before we finished writing it This is already done in the postgresql backend unintentionally since 209de5db. This commit documents it, makes the cassandra backend behave that way too, and adds a test.
- Mar 12, 2021
-
-
Antoine Lambert authored
Add an optional branch_name_exclude_prefix parameter to the snapshot_count_branches method of the Storage interface. It enables to filter out branches whose name starts with a given prefix when counting. The purpose is to get accurate counters in swh-web as pull request branches will be filtered out by default. Related to T2782
-
Antoine Lambert authored
Add optional branch_name_include_substring parameter to snapshot_get_branches, if provided only branches whose name contains the given substring will be returned. Add optional branch_name_exclude_prefix parameter to snapshot_get_branches, if provided branches whose name starts with the given prefix will not be returned. Purpose of these new features: add a search form in the branches view of swh-web and filter out pull request branches (whose names start with "refs/pull/") by default. Related to T2782
-
- Mar 11, 2021
-
-
David Douard authored
These endpoints allow to add and query the storage for known ExtID from SWHID (typically get original VCS' revision intrinsic identifier from SWHID). The underlying data structure is to be filled typically by loaders using the `extid_add()` endpoint. This only provides the Postgresql implementation. Related to T2849.
-
- Mar 10, 2021
-
-
David Douard authored
-
David Douard authored
this later has been deprecated for a while now.
-
David Douard authored
being miraculously listed the same.
-
Nicolas Dandrimont authored
This also checks the basic raw_extrinsic_metadata codepaths in the backfiller tests.
-
Nicolas Dandrimont authored
-
Nicolas Dandrimont authored
-
Nicolas Dandrimont authored
We convert the target attribute to a hashed ExtendedSWHID before returning the object.
-
- Mar 03, 2021
-
-
Antoine Lambert authored
With small limits (< 10), the snapshot branches query can degenerate into using the deduplication index on snapshot_branch (name, target, target_type), and the postgresql planner happily scans several hundred million rows. So ensure a minimum limit value of 10 before executing the query for optimal performances when a small branches_count value is provided to the snapshot_get_branches method of the Storage interface. Related to P966
-
vlorentz authored
-
Antoine Lambert authored
Ensure tests can be executed using hypothesis >= 6 by suppressing the function_scoped_fixture health check on tests that use a function scope fixture in combination with @given that does not need to be reset between individual hypothesis examples.
-
- Mar 01, 2021
-
- Feb 25, 2021
-
-
vlorentz authored
For now this does nothing as RawExtrinsicMetadata has no 'id' field, but the equality assertions will become errors when the next version of swh.model is released.
-
- Feb 19, 2021
-
-
Antoine Lambert authored
Enable to filter searched origins by visit types. Add a new optional visit_types parameter to origin_search method in StorageInterface. Implement visit types filtering in storage backends, an origin wil be returned if it has any of the requested visit types. This is clearly not designed to be used in production due to performance issues but rather in testing environments with small archive dataset. Related to T2869
-
- Feb 17, 2021
-
-
Antoine R. Dumont authored
-
- Feb 16, 2021
-
-
Nicolas Dandrimont authored
This allows us to only read the kafka topics once instead of twice in the same tests, which is apparently a hard thing to do in a way compatible with both confluent-kafka 1.5 and 1.6.
-
- Feb 09, 2021
-
-
Antoine R. Dumont authored
-
Antoine R. Dumont authored
-
Antoine R. Dumont authored
This stops using the origin_visit.type as fallback values as now, the database has been migrated. So this makes the origin_visit_status.type a not nullable column. This also drops now redundant join instructions on origin_visit table when reading. Related to T2968
-
Antoine Lambert authored
Side effect of the following commit in librdkafka 1.6: https://github.com/edenhill/librdkafka/commit/f418e0f721518d71ff533759698b647cb2e89b80 Tests was relying on a buggy behavior of the mocked kafka cluster: two subsequent consumers setup with the same group id should receive a different set of messages, rather than the same set of messages. Also explicitly commit messages once consumed.
-
- Feb 08, 2021
-
-
vlorentz authored
-
- Feb 04, 2021
-
-
Nicolas Dandrimont authored
This new integration test checks that, when flushing the buffer storage, the addition functions of the underlying storage backend are called in topological order (content, directory, revision, release then snapshot). This reduces the probability of "data consistency" regressions caused by the use of the buffering storage proxy alone.
-
Nicolas Dandrimont authored
The earlier implementation would only return summary data from keys that existed in the last `_add` backend method run, rather than collating all the results.
-
Nicolas Dandrimont authored
This is mostly a consistency addition, considering that most (if not all) loaders will only add a single snapshot. The common pattern of loading objects in topological order (content > directory > revision > release > snapshot), then flushing the storage, is now fully consistent; Without this addition, the snapshot addition would reach the backend storage before all other objects are added, leading to potential inconsistencies if the flush of other object types fails.
-
Nicolas Dandrimont authored
-
- Feb 01, 2021
-
-
Antoine R. Dumont authored
This returned a Tuple[OriginVisit, OriginVisitStatus]. This was required to have the missing information "type" for visit-status. This is no longer needed as now OriginVisitStatus holds the type information.
-
Antoine R. Dumont authored
This returned a Tuple[OriginVisit, OriginVisitStatus] which is no longer needed as now OriginVisitStatus held the type information now.
-