Handling missing DAG nodes
This is a long-standing and well-known issue, but I don't think a task was open about it yet.
When ingesting an origin, some nodes of the DAG may be missing, for various reasons:
- corrupted data (eg. a commit in the git history does not match its hash)
- directory must be found "somewhere else" (eg. SVN external (swh-loader-svn#611)
- revisions must be found "somewhere else" (eg. Bazaar stacked branches)
- ingestion of a (potentially large) repo might stop/crash after having ingested only some of its objects, and the repository might have disappeared when we try again
Currently, what happens is:
- if the missing object is a git object, then we know its sha1_git, and it's just a dangling reference (though this will be an issue when we will want to implement generation numbers, swh-storage#1617)
- even in this (fortunate) case, other objects transitively referenced might remain completely unknown
- otherwise, objects referencing the missing object cannot even be represented in the SWH data model (and recursively, all objects referencing it)
Migrated from T1957 (view on Phabricator)
Edited by Phabricator Migration user