Skip to content

package/utils: Improve downloaded filename extraction

That diff improves the filename extraction for a download URL.

Two specific cases are considered, each in a dedicated commit:

  • requests follows URL redirection by default for GET requests so filename should be extracted from targetted URL when a redirection has been performed

This should fix that kind of sentry reported issue.

  • some URLs for downloading a file do not contain any filename but rather provide it in the "content-disposition" response header so ensure to extract the filename from that response header when available to avoid possible file processing issues afterwards.

This should fix the extraction of some tarballs downloaded by the opam loader for instance.

anlambert@carnavalet:/tmp$ curl -i https://codeload.github.com/abella-prover/abella/tar.gz/v2.0.2
HTTP/2 200 
access-control-allow-origin: https://render.githubusercontent.com
content-disposition: attachment; filename=abella-2.0.2.tar.gz
content-security-policy: default-src 'none'; style-src 'unsafe-inline'; sandbox
content-type: application/x-gzip
etag: "66393ca915087abb7e474f0d976918630ebb8d23250de2bd70bab0752c01708a"
strict-transport-security: max-age=31536000
vary: Authorization,Accept-Encoding,Origin
x-content-type-options: nosniff
x-frame-options: deny
x-xss-protection: 1; mode=block
date: Tue, 14 Sep 2021 11:51:16 GMT
x-github-request-id: CE86:E15A:45CCD8:573AE8:61408CB3

Warning: Binary output can mess up your terminal. Use "--output -" to tell 
Warning: curl to output it to your terminal anyway, or consider "--output 
Warning: <FILE>" to save to a file.

Related to swh/infra/sysadm-environment#3468 (closed)


Migrated from D6252 (view on Phabricator)

Merge request reports