Skip to content

save code now: also add new origins for unknown repos

When we save an unknown origin due to a Save code now request, we schedule a one-shot task for the ingestion, but don't add the origin for future crawling. It might make sense to do both.

It is possibly also the only reasonable place where we can have heuristics to de-duplicate URLs that point to the same repo, e.g., non-canonical GitHub repos URLs.

(Thanks @singpolyma for the heads-up.)

Related to T1110 Related to swh-model#2187


Migrated from T1524 (view on Phabricator)

Edited by Phabricator Migration user
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information