package/archive: Handle tarball artifact with null time
An artifact without time info can be provided in the artifacts list parameter of the loader.
For instance last modification date is not available for tarballs coming from github tags
(the date
header below corresponds to request time, not tarball last modification).
15:09 $ curl -Li https://github.com/chromium/chromium/archive/refs/tags/104.0.5106.1.tar.gz
HTTP/2 302
server: GitHub.com
date: Tue, 07 Jun 2022 13:10:44 GMT
content-type: text/html; charset=utf-8
vary: X-PJAX, X-PJAX-Container, Turbo-Visit, Turbo-Frame, Accept-Encoding, Accept, X-Requested-With
permissions-policy: interest-cohort=()
location: https://codeload.github.com/chromium/chromium/tar.gz/refs/tags/104.0.5106.1
cache-control: max-age=0, private
strict-transport-security: max-age=31536000; includeSubdomains; preload
x-frame-options: deny
x-content-type-options: nosniff
x-xss-protection: 0
referrer-policy: no-referrer-when-downgrade
expect-ct: max-age=2592000, report-uri="https://api.github.com/_private/browser/errors"
content-security-policy: default-src 'none'; base-uri 'self'; block-all-mixed-content; child-src github.com/assets-cdn/worker/ gist.github.com/assets-cdn/worker/; connect-src 'self' uploads.github.com objects-origin.githubusercontent.com www.githubstatus.com collector.github.com raw.githubusercontent.com api.github.com github-cloud.s3.amazonaws.com github-production-repository-file-5c1aeb.s3.amazonaws.com github-production-upload-manifest-file-7fdce7.s3.amazonaws.com github-production-user-asset-6210df.s3.amazonaws.com cdn.optimizely.com logx.optimizely.com/v1/events *.actions.githubusercontent.com wss://*.actions.githubusercontent.com online.visualstudio.com/api/v1/locations github-production-repository-image-32fea6.s3.amazonaws.com github-production-release-asset-2e65be.s3.amazonaws.com insights.github.com wss://alive.github.com; font-src github.githubassets.com; form-action 'self' github.com gist.github.com objects-origin.githubusercontent.com; frame-ancestors 'none'; frame-src render.githubusercontent.com viewscreen.githubusercontent.com notebooks.githubusercontent.com; img-src 'self' data: github.githubassets.com identicons.github.com github-cloud.s3.amazonaws.com secured-user-images.githubusercontent.com/ github-production-user-asset-6210df.s3.amazonaws.com *.githubusercontent.com; manifest-src 'self'; media-src github.com user-images.githubusercontent.com/; script-src github.githubassets.com; style-src 'unsafe-inline' github.githubassets.com; worker-src github.com/assets-cdn/worker/ gist.github.com/assets-cdn/worker/
content-length: 0
x-github-request-id: swh/devel/swh-lister!12:4A4C:9CBB6E:BCAB87:629F4E54
HTTP/2 200
access-control-allow-origin: https://render.githubusercontent.com
content-disposition: attachment; filename=chromium-104.0.5106.1.tar.gz
content-security-policy: default-src 'none'; style-src 'unsafe-inline'; sandbox
content-type: application/x-gzip
etag: "2ebec60c73390de10b6e84d75838466d939f03a7b468f10873c9023f549a5242"
strict-transport-security: max-age=31536000
vary: Authorization,Accept-Encoding,Origin
x-content-type-options: nosniff
x-frame-options: deny
x-xss-protection: 1; mode=block
date: Tue, 07 Jun 2022 13:10:45 GMT
x-github-request-id: 867A:7031:7EED7:179E4C:629F4E54
Warning: Binary output can mess up your terminal. Use "--output -" to tell
Warning: curl to output it to your terminal anyway, or consider "--output
Warning: <FILE>" to save to a file.
That case was not handled by the archive loader wich was resulting in loading error so add fix for it.
swh-loader_1 | [2022-06-07 10:00:56,998: INFO/MainProcess] Task swh.loader.package.archive.tasks.LoadArchive[d61d54e5-3163-439a-95a5-2ab57bd75a7d] received
swh-loader_1 | [2022-06-07 10:00:57,001: DEBUG/ForkPoolWorker-1] Loading config file /loader.yml
swh-loader_1 | [2022-06-07 10:00:59,059: DEBUG/ForkPoolWorker-1] last snapshot: None
swh-loader_1 | [2022-06-07 10:00:59,064: DEBUG/ForkPoolWorker-1] package_info: ArchivePackageInfo(url='https://github.com/chromium/chromium/archive/refs/tags/104.0.5106.1.tar.gz', filename='104.0.5106.1.tar.gz', version='104.0.5106.1', directory_extrinsic_metadata=[], raw_info={'url': 'https://github.com/chromium/chromium/archive/refs/tags/104.0.5106.1.tar.gz', 'time': None, 'length': None, 'version': '104.0.5106.1'}, length=None, time=None)
swh-loader_1 | [2022-06-07 10:01:00,790: DEBUG/ForkPoolWorker-1] filename: 104.0.5106.1.tar.gz
swh-loader_1 | [2022-06-07 10:01:00,791: DEBUG/ForkPoolWorker-1] filepath: /tmp/tmpgnd1w9fy/104.0.5106.1.tar.gz
swh-loader_1 | [2022-06-07 10:08:40,664: DEBUG/ForkPoolWorker-1] extrinsic_metadata
swh-loader_1 | [2022-06-07 10:10:02,826: DEBUG/ForkPoolWorker-1] uncompressed_path: /tmp/tmpgnd1w9fy/src
swh-loader_1 | [2022-06-07 10:11:38,076: DEBUG/ForkPoolWorker-1] Number of skipped contents: 0
swh-loader_1 | [2022-06-07 10:11:38,076: DEBUG/ForkPoolWorker-1] Number of contents: 367501
swh-loader_1 | [2022-06-07 10:11:38,558: DEBUG/ForkPoolWorker-1] Flushing 367501 objects of type content (3423607967 bytes)
swh-loader_1 | [2022-06-07 10:32:41,504: DEBUG/ForkPoolWorker-1] Number of directories: 34530
swh-loader_1 | [2022-06-07 10:32:41,542: DEBUG/ForkPoolWorker-1] Flushing 34530 objects of type directory (432087 entries)
swh-loader_1 | [2022-06-07 10:33:20,750: ERROR/ForkPoolWorker-1] Failed to load branch releases/104.0.5106.1 for https://github.com/chromium/chromium/tags
swh-loader_1 | Traceback (most recent call last):
swh-loader_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/loader/package/loader.py", line 648, in load
swh-loader_1 | res = self._load_release(p_info, origin)
swh-loader_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/loader/package/loader.py", line 826, in _load_release
swh-loader_1 | p_info, uncompressed_path, directory=directory.hash
swh-loader_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/loader/package/archive/loader.py", line 148, in build_release
swh-loader_1 | normalized_time = TimestampWithTimezone.from_datetime(parsed_time)
swh-loader_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/model/model.py", line 488, in from_datetime
swh-loader_1 | return cls.from_dict(dt)
swh-loader_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/model/model.py", line 482, in from_dict
swh-loader_1 | f"TimestampWithTimezone.from_dict received non-integer timestamp: "
swh-loader_1 | ValueError: TimestampWithTimezone.from_dict received non-integer timestamp: None
swh-loader_1 | [2022-06-07 10:33:20,752: DEBUG/ForkPoolWorker-1] default version: 104.0.5106.1
swh-loader_1 | [2022-06-07 10:33:20,755: DEBUG/ForkPoolWorker-1] extra branches: {}
swh-loader_1 | [2022-06-07 10:33:20,755: DEBUG/ForkPoolWorker-1] releases: {'104.0.5106.1': []}
swh-loader_1 | [2022-06-07 10:33:20,755: DEBUG/ForkPoolWorker-1] snapshot: {'branches': {}}
swh-loader_1 | [2022-06-07 10:33:20,755: DEBUG/ForkPoolWorker-1] snapshot: Snapshot(branches=ImmutableDict({}), id=hash_to_bytes('1a8893e6a86f444e8be8e7bda6cb34fb1735a00e'))
swh-loader_1 | [2022-06-07 10:33:20,755: DEBUG/ForkPoolWorker-1] Flushing 1 objects of type snapshot
swh-loader_1 | [2022-06-07 10:33:22,355: WARNING/ForkPoolWorker-1] 1 failed branches
swh-loader_1 | [2022-06-07 10:33:22,356: WARNING/ForkPoolWorker-1] Failed branches: releases/104.0.5106.1
Migrated from D7967 (view on Phabricator)