loader core should not rely on SHA1 only to decide whether some content is missing or not
The culprit lines seem to use only sha1 to decide whether some content is missing. So, in case of sha1 collision, we might end up //not// adding something to the storage, but believing we have done so. Instead, we should always check for the presence of something using //all// checksums at once, as we have discussed many times during the initial design of loaders.
This is not problematic yet, as it has not been deployed in production for any VCS yet. (The Git loader always uses the add API entrypoint of storage, which uses all checksums at once and will bail in case of collisions.) But it is a blocker for deploying loaders that use the base loader (e.g., SVN).
Also, we need to review if that code was around at the time we did the initial test injection of GNU tarballs and Debian Source packages. If so, we should double check our local mirrors for SHA1 collisions.
Migrated from T519 (view on Phabricator)