Skip to content

maven: Remove extraction of groupId and artifactId from pom files

When parsing pom files, we are only interested to extract a VCS URL (git, hg, svn) in order to create associated loading tasks.

In that case, the groupId and artifactId are not used by the lister so better removing their extraction, plus it will prevent errors when those info are missing in pom files.

See for instance that error when listing jboss maven:

swh-lister_1                        | [2022-04-29 09:02:04,598: INFO/ForkPoolWorker-1] Fetching URL https://repository.jboss.org/maven2/org/jboss/ejb3/jboss-ejb3-tutorial-enterprise_webapp/0.1.0/jboss-ejb3-tutorial-enterprise_webapp-0.1.0.pom with params {}
swh-lister_1                        | [2022-04-29 09:02:04,748: ERROR/ForkPoolWorker-1] Task swh.lister.maven.tasks.FullMavenLister[45b54b16-ed7a-4b9c-80a3-b8adb25b8fe0] raised unexpected: KeyError('groupId')
swh-lister_1                        | Traceback (most recent call last):
swh-lister_1                        |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/celery/app/trace.py", line 451, in trace_task
swh-lister_1                        |     R = retval = fun(*args, **kwargs)
swh-lister_1                        |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/scheduler/task.py", line 61, in __call__
swh-lister_1                        |     result = super().__call__(*args, **kwargs)
swh-lister_1                        |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/celery/app/trace.py", line 734, in __protected_call__
swh-lister_1                        |     return self.run(*args, **kwargs)
swh-lister_1                        |   File "/src/swh-lister/swh/lister/maven/tasks.py", line 16, in list_maven_full
swh-lister_1                        |     return lister.run().dict()
swh-lister_1                        |   File "/src/swh-lister/swh/lister/pattern.py", line 127, in run
swh-lister_1                        |     for page in self.get_pages():
swh-lister_1                        |   File "/src/swh-lister/swh/lister/maven/lister.py", line 256, in get_pages
swh-lister_1                        |     gid = project_d["groupId"]
swh-lister_1                        | KeyError: 'groupId'

Migrated from D7715 (view on Phabricator)

Merge request reports