pypi: Use BeautifulSoup for parsing HTML instead of xmltodict
Another issue found while retesting the listers locally.
xmltodict
now raises an error while trying to parse the HTML content
of https://pypi.org/simple/ page., see below:
Traceback (most recent call last):
File "/home/anlambert/.virtualenvs/swh/bin/swh", line 8, in <module>
sys.exit(main())
File "/home/anlambert/.virtualenvs/swh/lib/python3.7/site-packages/swh/core/cli/__init__.py", line 135, in main
return swh(auto_envvar_prefix="SWH")
File "/home/anlambert/.virtualenvs/swh/lib/python3.7/site-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "/home/anlambert/.virtualenvs/swh/lib/python3.7/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/home/anlambert/.virtualenvs/swh/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/anlambert/.virtualenvs/swh/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/anlambert/.virtualenvs/swh/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/anlambert/.virtualenvs/swh/lib/python3.7/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/home/anlambert/.virtualenvs/swh/lib/python3.7/site-packages/click/decorators.py", line 21, in new_func
return f(get_current_context(), *args, **kwargs)
File "/home/anlambert/swh/swh-environment/swh-lister/swh/lister/cli.py", line 65, in run
get_lister(lister, **config).run()
File "/home/anlambert/swh/swh-environment/swh-lister/swh/lister/pattern.py", line 121, in run
for page in self.get_pages():
File "/home/anlambert/swh/swh-environment/swh-lister/swh/lister/pypi/lister.py", line 57, in get_pages
page_xmldict = xmltodict.parse(response.text)
File "/home/anlambert/.virtualenvs/swh/lib/python3.7/site-packages/xmltodict.py", line 327, in parse
parser.Parse(xml_input, True)
xml.parsers.expat.ExpatError: mismatched tag: line 6, column 4
So use BeautifulSoup
HTML parser instead as it is aleady a requirement
of swh-lister
and it does not fail parsing the PyPI HTML page.
Also drop no longer used xmltodict
in requirements.
Migrated from D5027 (view on Phabricator)