- Aug 10, 2022
-
-
vlorentz authored
They cause postgresql to crash because it does not allow them in text fields. They are seemingly only present accidentally in source documents; so stripping them does not really impact the quality of metadata.
-
- Aug 08, 2022
-
-
vlorentz authored
When capture_exceptions=False, the indexer's caller reports the exception to Sentry itself. However, because tags were added by indexers within a scope internal to the indexers; the scope was closed before returning to the caller, so these tags were actually not sent to Sentry.
-
vlorentz authored
Without these tags, it is often impossible to find what object caused a given crash without guesswork based on the object's content and swh-graph.
-
vlorentz authored
-
vlorentz authored
-
- Aug 04, 2022
-
-
vlorentz authored
Invalid URLs are a common source of crashes
-
- Aug 03, 2022
-
-
vlorentz authored
-
- Jul 29, 2022
-
-
Antoine R. Dumont authored
Detected through T4406 Related to T4412
-
Antoine R. Dumont authored
So future services to be deployed can match on that name. Related to T4406
-
- Jul 22, 2022
-
-
Antoine R. Dumont authored
The indexer language has been no longer running in production for years (and the related indexer code has been pruned years ago as well). Those related endpoints are not consumed by anyone. So we can drop those. That will ease code maintenance and make the ci gain some time when running the overall tests. Related to T4273
-
Antoine R. Dumont authored
It's been no longer running in production for years. That will ease code maintenance. Related to T4273
-
Antoine R. Dumont authored
Related to T4273
-
Antoine R. Dumont authored
-
Antoine R. Dumont authored
Related to T4273
-
- Jul 21, 2022
- Jul 12, 2022
-
-
Satvik authored
-
- Jul 11, 2022
-
-
vlorentz authored
-
- Jul 06, 2022
-
-
vlorentz authored
1. indexers call themselves directly instead of going through the scheduler 2. metadata is attached to directories instead of revisions
-
- Jul 05, 2022
-
-
vlorentz authored
-
vlorentz authored
-
vlorentz authored
-
vlorentz authored
It makes resulting documents (usually) shorter, and tests more readable.
-
vlorentz authored
detect_metadata_files and extrinsic_metadata_formats (respectively) are somewhat mutually exclusive, so it does not make much sense to have them in the same class and MAPPINGS dict
-
vlorentz authored
It is already set by _translate_dict itself.
-
- Jul 04, 2022
-
-
Satvik authored
-
Satvik authored
-
vlorentz authored
Which calls the GitHub mapping, for RawExtrinsicMetadata objects coming from github
-
vlorentz authored
-
vlorentz authored
-
vlorentz authored
-
vlorentz authored
-
vlorentz authored
This introduces the scaffholding for extrinsic metadata mappings
-
vlorentz authored
We have many of those now; and keeping them all their tests in the same file is messy This causes these tests to run after Celery tests, which breaks them; so this commit also renames Celery tests to make them run last.
-
vlorentz authored
-
vlorentz authored
Extrinsic metadata indexers will not use a 'file' as input, but will typically use RawExtrinsicMetadata containing formats in JSON.
-
vlorentz authored
This also moves the call to `detect_metadata()` to `translate_directory_intrinsic_metadata` so type annotations make more sense; and remove a dead/broken code branch in `DirectoryMetadataIndexer.index()` that was detected by mypy.
-
- Jun 29, 2022
-
-
Satvik authored
-
- Jun 28, 2022
-
-
vlorentz authored
_STORAGE_CLASSES will be renamed to OBJSTORAGE_IMPLEMENTATIONS.
-