- Nov 23, 2022
-
-
Jenkins for Software Heritage authored
Update to upstream version '2.8.0' with Debian dir 8b4ffcb03cd177c9149fcfd09e3a4cf499a1b201
- Nov 21, 2022
-
-
vlorentz authored
Some snapshots are really large. Rather than fetching them entirely only to discard most of the branches, this commit only fetches some branches (to check existence + to use less queries on small snapshots), then requests specific branches as needed (usually only 2). This should improve performance and reduce timeout exceptions from the storage.
- Nov 03, 2022
-
-
Nicolas Dandrimont authored
This code was flushing kafka messages and waiting for the brokers on every message, instead of just doing it once per batch.
-
- Nov 02, 2022
-
-
Antoine Lambert authored
-
Jenkins for Software Heritage authored
Update to upstream version '2.7.3' with Debian dir 7da9a21feb7239b589c6d53d33ca7baf0dc3504f
-
- Oct 27, 2022
-
-
Jenkins for Software Heritage authored
Update to upstream version '2.7.2' with Debian dir 0a3e9b68e3a9a078d1a53cb19a27f9c8e117938e
- Oct 26, 2022
-
-
vlorentz authored
Codemeta reexports schema:url, schema:dateCreated, ... with `"@type": "@id"` and `"type": "schema:Date"` so that ``` { "@context": "https://doi.org/10.5063/schema/codemeta-2.0", "url": "http://example.org", "dateCreated": "2022-10-26" } ``` expands to: ``` { "http://schema.org/url": { "@type": "@id", "@value": "http://example.org" }, "dateCreated": { "@type": "http://schema.org/Date", "@value": "2022-10-26" } } ``` However, our translation tried to translate directly to a partially expanded form, like this: ``` { "@context": "https://doi.org/10.5063/schema/codemeta-2.0", "url": { "@value": "http://example.org" }, "dateCreated": { "@value": "2022-10-26" } } ``` which prevents the compaction and expansion algorithms from adding a type themselves, causing the document to be compacted to: ``` { "@context": "https://doi.org/10.5063/schema/codemeta-2.0", "schema:url": "http://example.org" "schema:dateCreated": "2022-10-26" } ``` or expanded to: ``` { "http://schema.org/url": { "@value": "http://example.org" }, "http://schema.org/dateCreated": { "@value": "2022-10-26" } } ``` which are not what we want. This commit replaces the hack for `@type` with the right solution that works for all properties.
- Oct 25, 2022
- Oct 24, 2022
-
-
vlorentz authored
Without this, some Sentry issues were tagged with the wrong object, which can be very confusing
-
- Oct 18, 2022
-
-
David Douard authored
- pre-commit from 4.1.0 to 4.3.0, - codespell from 2.2.1 to 2.2.2, - black from 22.3.0 to 22.10.0 and - flake8 from 4.0.1 to 5.0.4. Also freeze flake8 dependencies. Also change flake8's repo config to github (the gitlab mirror being outdated).
-
- Oct 07, 2022
-
-
Jenkins for Software Heritage authored
Update to upstream version '2.7.1' with Debian dir 3e6c8e43699958132eba9d7c620d7871d989a731
- Sep 28, 2022
- Sep 27, 2022
-
-
vlorentz authored
-
vlorentz authored
It was only fixed as a side-effect of other changes, but it's good to have a regression test
-
vlorentz authored
They are closer semantics as 'html_url' is the main page of the repository, so it is the best to identify it; and 'clone_url' is the URL that should be given to 'git clone', as documented by https://schema.org/codeRepository Additionally, that property was missing so far; but a future commit will need to use it to identify fork relationships (node ids are required to representation relationships between documents as we cannot use blank nodes for that)
-
vlorentz authored
-
vlorentz authored
-
- Sep 12, 2022
-
-
Antoine Lambert authored
They have been moved in a swh-core pytest plugin to share them with other swh packages that might need it.
-
Jenkins for Software Heritage authored
Update to upstream version '2.6.0' with Debian dir 132f86a3595679ad6ca88ad2ca01b29bc4fc100b
-
- Sep 08, 2022
-
-
vlorentz authored
Sentry uses repr() by default, which does not look good in a UI
-
vlorentz authored
persist_index_computations deduplicated row entries based on the entire content of the row; but postgresql enforces the 'id' should be unique. This was not an issue in older version of swh-indexer, because all operations were deterministic, given a specific directory as input. The recent switch to rdflib introduced non-determinism, so different outputs may be returned for the same directory id; causing the deduplication to not be good enough to avoid duplicate ids. With this commit, deduplication is now done on 'id', as expected. As a side-effect, persist_index_computations is now more efficient because: 1. it runs in linear time instead of quadratic in the number of metadata items 2. it only compares dir ids, instead of the content of indexed metadata (which is arbitrarily large JSON-like data)
-
vlorentz authored