properly handle ingestion of archives within archives (recursive extraction)
Currently, HAL accepts software deposits in .zip or .tar.gz formats, but if the deposit is not in .zip format, it wraps it into a .zip before sendind it to us. This fools our ingestion process, forcing the deposit into the Merkle tree of a .tar.gz blob instead of its contents.
This is clearly an error on the HAL side, and we will try to have it fixed there, but with the generalisation of software deposits, we may be confronted to zillions of mistakes like this ones, and while one waits for them to be fixed, we pollute our archive. We need to decide whether to try and fix this behavior on our side, by recursively opening wrappers or just reject the deposit if we see the double wrapping.
Wrappers should be easy to spot: a .zip file containing just a .tar.gz file, a .tar file containing just a .tar file etc.....
Migrated from T1122 (view on Phabricator)