- Dec 05, 2023
-
-
David Douard authored
-
- Dec 04, 2023
-
-
David Douard authored
And replace comment type annotations by explicit ones.
-
- Dec 03, 2023
-
- Nov 29, 2023
-
-
David Douard authored
-
- Nov 22, 2023
-
-
Antoine Lambert authored
Docstring is mandatory for a celery task function as its value is inserted in the required description column of the task_type table in the scheduler database. Celery task function name is also used as task name (with underscores replaced by dashes) so ensure task function names match the task types registered in production scheduler database.
-
- Nov 21, 2023
-
- Nov 08, 2023
-
-
David Douard authored
So that we can get rid of indexer task types being created by swh-schedulers' sql init scripts.
-
- Oct 26, 2023
-
-
vlorentz authored
-
- Oct 18, 2023
-
-
Antoine Lambert authored
SingleFileIntrinsicMapping.detect_metadata_files was comparing lowercase versions of filenames with SingleFileIntrinsicMapping.filename variable value to detect metadata files. But as SingleFileIntrinsicMapping.filename holds the canonical name of a metadata file it can contain uppercase characters. So ensure to compare lowercase versions of both filenames to avoid metadata files being undetected. This fixes indexing of Python intrinsic metadata.
-
- Jul 07, 2023
-
-
David Douard authored
It now needs types-click which is indeed a dependency of swh.core[testing].
-
- Jun 20, 2023
-
-
- Jun 07, 2023
-
-
vlorentz authored
-
- May 15, 2023
-
-
vlorentz authored
-
- Apr 25, 2023
-
-
Antoine R. Dumont authored
Refs. #4733
-
- Apr 18, 2023
-
-
vlorentz authored
-
- Apr 17, 2023
-
-
vlorentz authored
-
- Mar 21, 2023
-
-
- Mar 13, 2023
-
- Feb 21, 2023
-
-
- Feb 17, 2023
-
-
Antoine Lambert authored
Related to swh/meta#4960
-
- Feb 16, 2023
-
-
Jérémy Bobbio (Lunar) authored
Related to swh/meta#4959
-
- Feb 13, 2023
-
-
- Feb 02, 2023
-
-
Antoine Lambert authored
This fixes python 3.7 support due to poetry, a dependency of isort, that removed support for that Python version in a recent release.
-
- Dec 19, 2022
-
-
Antoine Lambert authored
In order to remove warnings about /apidoc/*.rst files being included multiple times in toc when building full swh documentation, prefer to include module indices only when building standalone package documentation. Also include them the proper sphinx way. Related to T4496
-
- Dec 07, 2022
-
-
vlorentz authored
1. they are internal to the DB so they do not belong in Kafka 2. on unrelated errors, they cause swh.journal to crash because it does not know how to handle integers in the output of unique_key()
-
- Nov 30, 2022
-
-
-
vlorentz authored
-
- Nov 29, 2022
-
-
vlorentz authored
-
Nicolas Dandrimont authored
-
vlorentz authored
-
- Nov 28, 2022
-
-
vlorentz authored
This avoids having a transaction inserting row A then B, while another inserts row B then A; which (probably) leads to deadlocks like this: ``` DeadlockDetected: deadlock detected DETAIL: Process 1842336 waits for ShareLock on transaction 1051957280; blocked by process 64261. Process 64261 waits for ShareLock on transaction 1051957281; blocked by process 1842336. HINT: See server log for query details. CONTEXT: while inserting index tuple (1972253,5) in relation "origin_extrinsic_metadata" SQL statement "insert into origin_extrinsic_metadata (id, metadata, indexer_configuration_id, from_remd_id, metadata_tsvector, mappings) ``` https://sentry.softwareheritage.org/share/issue/52b06caae89f4235a758887fd6817656/ This was already mitigating by sorting before inserting in temporary tables, then expecting postgresql to read from temporary tables in the same order rows where inserted. This is often true, but not guaranteed. No test for this, because I do not see a way to replicate this more than existing deadlock tests do.
- Nov 21, 2022
-
-
vlorentz authored
REMD from deposits target a directory, with an origin in its context, so this workaround allows indexing deposits easily, without significantly changing swh-search.
-
vlorentz authored
Some snapshots are really large. Rather than fetching them entirely only to discard most of the branches, this commit only fetches some branches (to check existence + to use less queries on small snapshots), then requests specific branches as needed (usually only 2). This should improve performance and reduce timeout exceptions from the storage.
-
- Nov 03, 2022
-
-
Nicolas Dandrimont authored
This code was flushing kafka messages and waiting for the brokers on every message, instead of just doing it once per batch.
-
- Nov 02, 2022
-
- Oct 26, 2022
-
-
vlorentz authored
Codemeta reexports schema:url, schema:dateCreated, ... with `"@type": "@id"` and `"type": "schema:Date"` so that ``` { "@context": "https://doi.org/10.5063/schema/codemeta-2.0", "url": "http://example.org", "dateCreated": "2022-10-26" } ``` expands to: ``` { "http://schema.org/url": { "@type": "@id", "@value": "http://example.org" }, "dateCreated": { "@type": "http://schema.org/Date", "@value": "2022-10-26" } } ``` However, our translation tried to translate directly to a partially expanded form, like this: ``` { "@context": "https://doi.org/10.5063/schema/codemeta-2.0", "url": { "@value": "http://example.org" }, "dateCreated": { "@value": "2022-10-26" } } ``` which prevents the compaction and expansion algorithms from adding a type themselves, causing the document to be compacted to: ``` { "@context": "https://doi.org/10.5063/schema/codemeta-2.0", "schema:url": "http://example.org" "schema:dateCreated": "2022-10-26" } ``` or expanded to: ``` { "http://schema.org/url": { "@value": "http://example.org" }, "http://schema.org/dateCreated": { "@value": "2022-10-26" } } ``` which are not what we want. This commit replaces the hack for `@type` with the right solution that works for all properties.
- Oct 25, 2022
-
-
vlorentz authored
This is hopefully the definitive workaround for the PyLD issue.
-