[FAQ] Proposal of answer about impact of a project migration
@bchauvet @moranegg Please, I'd need a review of the proposal of answer. Thanks in advance!
Question
If someone submits a software for example that is on github/bidule today. A few years later, the same software changes from 'parent company" and it is now hosted in gitlab/truc.
How does SWH deal with such cases? Do we have to save the software again? so is there a new SWHID? is there a way to say if it’s a migration ?
Source
- ambassador
- question retrieved from the email info
Audience & Location - which FAQ to use?
- visitor - on website
- user - in user documentation (the contents already exist)
- contributor - in devel documentation
Answer draft
Here is the question put in other terms and its related topics:
- How does SWH deal with such cases?
- Do we have to save the software again?
- So is there a new SWHID?
- Is there a way to say if it’s a migration?
When several projects have identical files in common (same contents, same version of contents), SWH does not archive identical copies, regardless of whether it’s a project migration or a forked project, etc. As the SWHIDs are calculated from intrinsic data, the SWHIDs of these objects don't change since they are the same objects. For instance, if after the project migration, an article refers to a content from the initial project via a SWHID permalink, and if that content has not changed, then, the object-identifier relationship is still valid. There is no broken link in case the author used the permalink version of the SWHID.
And if the content changed after the migration, from a SWH perspective, it's a new artifact, with a different SWHID. So there is no risk of pointing to a modified content after a migration.
To preserve the architecture of a project, rather than archiving duplicates, SWH will instead express how each content is linked to the others in the archive, via its data model.
For further information: https://docs.softwareheritage.org/devel/swh-model/data-model.html#data-model
-
Case 1: the "host forge" is already archived in SWH? No need to do a "Save code now" for the code repository nor to make a "Add forge now". The forge is already harvested. You can check the origins harvested here: https://archive.softwareheritage.org/
The only action to plan is maybe doing a "Save again" if the archived version is not up to date, which is very easy to see with browser extensions: https://www.softwareheritage.org/browser-extensions/
-
Case 2: the "host forge" is not archived yet in SWH: then, it's necessary to archive the code repository, using the different "Save code now" options, as described here: https://docs.softwareheritage.org/#landing-preserve.
Another option is to make a request to archive the complete host forge but the process takes more time because it calls for a validation from the technical team. So, it depends on your need. If you’re only interested in the project that migrated, then, it’s more accurate to make a “Save code now”.
For further information: Pietri, A., Rousseau, G., & Zacchiroli, S. (2020). Forking Without Clicking: On How to Identify Software Repository Forks. Proceedings of the 17th International Conference on Mining Software Repositories, 277–287. https://doi.org/10.1145/3379597.3387450
The developers of a project have the possibility to integrate links to the archived version in the home forge in the README for example. And do the other way round for the project that migrated: integrate the SWHID of the initial project into the README of the project that migrated.
Validation workflow
- identify question
- identify source
- verify duplication in faq.md
- draft answer
- get feedback from team
- validate by Morane / Benoît / Roberto (depends on scope)