- Mar 23, 2021
- Mar 08, 2021
-
-
Antoine Lambert authored
Due to an incomplete command passed to the HelpFormatter.format method, the text alignment in the usage_prefix variable was not the same as the command output generated by click. Those tests started to fail since the recent growth of available loaders as click wraps help lines to 80 columns.
-
- Mar 05, 2021
-
-
Antoine R. Dumont authored
The main change from using tar in the deposit has been released so we can flip from archive.zip to archive.tar. Related to T3094 Related to T3070
-
vlorentz authored
Needed to support swh-model 2.0.0.
-
- Mar 02, 2021
-
- Feb 25, 2021
-
-
Antoine R. Dumont authored
This will require a deployment configuration change to specify the default_filename. Related to T3070
-
vlorentz authored
Before this commit, load status 'success' and 'failed' updated the deposit status accordingly, but other statuses were ignored.
-
vlorentz authored
I don't see the point of it being a nested function, it just makes load() larger.
-
- Feb 19, 2021
-
-
Antoine R. Dumont authored
-
- Feb 16, 2021
-
-
Antoine R. Dumont authored
This also simplifies the docstring to a more relevant docstring. This also drops the spurious {*args, **kwargs} no longer used in the core loaders.
-
Antoine R. Dumont authored
This unifies and centralizes the instantiation the same way the lister does. This introduces a new base class swh.loader.core.loader.Loader for all loaders whose only concern for now is to instantiate loaders from either a configuration dict or a configuration file. This simplifies instantiation in celery task code and avoids duplicating the configuration load in each loader constructor. The end goal is to simplify the future refactoring on configuration. With the following, we will only have to adapt the Loader class when we start simplifying uniformly the configuration. Also note that I mostly reused the equivalent `swh.lister.pattern.Lister.from_config*`. I did not refactor the common behavior (to avoid throwing another dependency in the mix). That could always be refactored later. (inspired by both the work on listers and the configuration system work) Related to T1410
-
- Feb 11, 2021
-
-
Antoine R. Dumont authored
This also adds comment to clarify the test mechanism for said scenarios.
-
Antoine R. Dumont authored
This will allow git, mercurial or svn loader implementations to mark the visit status as not_found when such event occurs (e.g failing to to remotely communicate with the remote server, archive not found, etc...). Technically, this opens a means to trap a NotFound exception from the main loop. Which finalize the visit and mark its status as "not_found". Related to T3030
-
Antoine R. Dumont authored
If any errors happen during the communication with the origin to retrieve package information, the visit fails and its status is marked as not_found. Only pypi, npm and nixguix package loaders are impacted. The other loaders do not read anything from their url so there is no way to trigger such possibility. Note that the nixguix loader got refactored to avoid the side-effect of reading data out of the url within the constructor. It was necessary so the check fails and the visit status is dealt with as described. Also, this unifies it with how pypi and npm loaders deals with the url communication. Related to T3030
-
- Feb 08, 2021
-
-
Antoine R. Dumont authored
When a visit ends up with no snapshot within the exception block, mark the visit as failed. Otherwise, let it be a partial visit. Related to T3030
-
- Feb 05, 2021
-
-
Antoine R. Dumont authored
When: - failure to communicate internally with the storage - absolutely no revision got loaded during a visit Related to T3030
-
Antoine R. Dumont authored
Currently the latest storage (> 0.21) does it. We want to move away from that behavior and let the loaders explicitly set it. Related to T3030
-
- Feb 01, 2021
-
-
Antoine R. Dumont authored
-
- Jan 04, 2021
-
-
David Douard authored
-
- Nov 24, 2020
-
-
Antoine Lambert authored
-
- Nov 17, 2020
-
-
David Douard authored
-
- Nov 03, 2020
-
-
Nicolas Dandrimont authored
This will allow us to change the name of the id argument to target without going through a deprecation cycle.
-
Nicolas Dandrimont authored
This argument was deprecated in swh.model v0.7.2
-
- Oct 29, 2020
-
-
vlorentz authored
For consistency with the change in the previous commit, writing other extrinsic metadata on directories.
-
vlorentz authored
They are more useful on directories, as directory ids are more intrinsic than synthetic revision ids. And this adds the revision swhid in the context, so the revision relationship is still available when it's useful (eg. because the same directory can be referenced from multiple revisions).
-
- Oct 27, 2020
-
-
Nicolas Dandrimont authored
This was deprecated in swh.objstorage 0.2.2.
-
- Oct 15, 2020
-
-
Antoine R. Dumont authored
This reverts a small part of 777ea446 which introduced by mistake that conversion into json.
-
- Oct 13, 2020
-
-
vlorentz authored
Writing them on snapshot allowed us to write the raw metadata from the API, but it causes a lot of duplication; after running for only a couple of months, the metadata storage is already 700GB in size, mostly because of NPM metadata, but also because of these (eg. many over 1MB each). The metadata we wrote on snapshots was made of: * intrinsic metadata that PyPI extracted from the last upload * info on each file (sdist or otherwise) The former we don't need to archive like this (as they are intrinsic), and we keep loading the latter but only for source files and discard extrinsic metadata for binary files, as they are not useful.
-
- Oct 05, 2020
-
-
vlorentz authored
Writing them on snapshot allowed us to write the raw metadata from the API, but it causes a lot of duplication; after running for only a couple of months, the metadata storage is already 700GB in size, mostly because of these (eg. there are 150k over 1MB each). The metadata we wrote on snapshots was made of: * a 'versions' dict, whose content is moved to revisions * a 'time' dict, with one timestamp per version, which is used as the data of revision objects * 'dist-tags', which is currently ignored, but should be converted to ALIAS branches in a future commit. * a '_rev' property, which is internal to NPM, so not useful to archive * everything else can be recomputed from the metadata of the latest version.
-
vlorentz authored
Under some conditions, mypy can detect it is declared as Optional[str].
-
- Oct 02, 2020
-
-
Antoine R. Dumont authored
-
Antoine R. Dumont authored
Fix master build [1] [1] https://jenkins.softwareheritage.org/job/DLDBASE/job/tests/1132/console
-
Stefano Zacchiroli authored
-
Antoine R. Dumont authored
-
Antoine R. Dumont authored
Related to T1532 Related to T1410 Related to D3965
-
Antoine R. Dumont authored
Related to T1532 Related to T1410 Related to D3965
-
- Oct 01, 2020
-
-
Antoine R. Dumont authored
-
Antoine R. Dumont authored
This keeps the metadata written in the revision in the same format as before though (json dict). Related to T2649
-