nixguix: Improve is_tarball detection pattern
This actually includes all query param values as paths to check. When paths have extensions, it then pattern matches against tarballs if any. When no extension is detected, it's doing as before, fallbacks to head query the url to have more information on the file.
Prior to this commit, this only looked over a hard-coded list of values (for hard-coded keys: file, f, name, url) detected through docker runs. This way of doing it should decrease future misdetections (when new unknown "keys" show up in the wild).
Related to T3781
Migrated from D8626 (view on Phabricator)
Merge request reports
Activity
Build is green
Patch application report for D8626 (id=31138)
Rebasing onto 2ee103e2...
Current branch diff-target is up to date.
Changes applied before test
commit 202a571dee7c14fba37729104e707e85f2d25cef Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Wed Oct 5 11:52:43 2022 +0200 nixguix: Refactor is_tarball to simplify Related to [T3781](https://forge.softwareheritage.org/T3781 'view original for T3781 on Phabricator')
See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/756/ for more details.
Some references in the commit message have been migrated:
- T3781 is now swh/meta#3781 (closed)
Build is green
Patch application report for D8626 (id=31142)
Rebasing onto 2ee103e2...
Current branch diff-target is up to date.
Changes applied before test
commit 3da112ce060a2de9178cf62b79eba4558c928f0e Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Wed Oct 5 11:52:43 2022 +0200 nixguix: Improve is_tarball detection pattern This actually includes all query param values as paths to check. It then checks for file pattern matching against "tarball" patterns. When no extension is detected, it's doing as before, fallbacks to head query the url to have more information on the file. Prior to this commit, this only looked over a hard-coded list of keys (file, f, name, url) detected through docker runs. This way of doing it should decrease future misdetections. Related to [T3781](https://forge.softwareheritage.org/T3781 'view original for T3781 on Phabricator')
See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/757/ for more details.
Build is green
Patch application report for D8626 (id=31143)
Rebasing onto 2ee103e2...
Current branch diff-target is up to date.
Changes applied before test
commit f2377c283ac542a5b492a9d75ccce6d86b07c54a Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Wed Oct 5 11:52:43 2022 +0200 nixguix: Improve is_tarball detection pattern This actually includes all query param values as paths to check. When paths have extensions, it then pattern matches against tarballs if any. When no extension is detected, it's doing as before, fallbacks to head query the url to have more information on the file. Prior to this commit, this only looked over a hard-coded list of values (for hard-coded keys: file, f, name, url) detected through docker runs. This way of doing it should decrease future misdetections (when new unknown "keys" show up in the wild). Related to [T3781](https://forge.softwareheritage.org/T3781 'view original for T3781 on Phabricator')
See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/758/ for more details.
mentioned in issue swh/meta#3781 (closed)