Skip to content

maven: Handle null mtime value in index for jar archive

It exists cases where the modification time for a jar archive in a maven index is null which was leading to a processing error by the lister.

So handle that case to avoid premature exit of the listing process.

swh-lister_1                        | [2022-04-29 10:26:13,222: DEBUG/ForkPoolWorker-1] * Yielding jar http://apps.geomajas.org/nexus/content/repositories/public/org/mobicents/protocols/mgcp/mgcp-impl/2.0.0.GA/mgcp-impl-2.0.0.GA-sources.jar: {'type': 'maven', 'url': 'http://apps.geomajas.org/nexus/content/repositories/public/org/mobicents/protocols/mgcp/mgcp-impl/2.0.0.GA/mgcp-impl-2.0.0.GA-sources.jar', 'doc': 547574, 'gid': 'org.mobicents.protocols.mgcp', 'aid': 'mgcp-impl', 'version': '2.0.0.GA', 'time': 0}
swh-lister_1                        | [2022-04-29 10:26:13,227: ERROR/ForkPoolWorker-1] Task swh.lister.maven.tasks.FullMavenLister[6551d966-a28e-42fb-9efb-fb56e48093f8] raised unexpected: ValueError("invalid literal for int() with base 10: ''")
swh-lister_1                        | Traceback (most recent call last):
swh-lister_1                        |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/celery/app/trace.py", line 451, in trace_task
swh-lister_1                        |     R = retval = fun(*args, **kwargs)
swh-lister_1                        |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/scheduler/task.py", line 61, in __call__
swh-lister_1                        |     result = super().__call__(*args, **kwargs)
swh-lister_1                        |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/celery/app/trace.py", line 734, in __protected_call__
swh-lister_1                        |     return self.run(*args, **kwargs)
swh-lister_1                        |   File "/src/swh-lister/swh/lister/maven/tasks.py", line 16, in list_maven_full
swh-lister_1                        |     return lister.run().dict()
swh-lister_1                        |   File "/src/swh-lister/swh/lister/pattern.py", line 130, in run
swh-lister_1                        |     full_stats.origins += self.send_origins(origins)
swh-lister_1                        |   File "/src/swh-lister/swh/lister/pattern.py", line 233, in send_origins
swh-lister_1                        |     for batch_origins in grouper(origins, n=1000):
swh-lister_1                        |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/utils.py", line 53, in grouper
swh-lister_1                        |     for _data in itertools.zip_longest(*args, fillvalue=stop_value):
swh-lister_1                        |   File "/src/swh-lister/swh/lister/maven/lister.py", line 309, in get_origins_from_page
swh-lister_1                        |     last_update_dt = datetime.fromtimestamp(int(last_update_seconds))
swh-lister_1                        | ValueError: invalid literal for int() with base 10: ''

Related to T3874


Migrated from D7716 (view on Phabricator)

Merge request reports