@@ -28,7 +28,7 @@ For more information about the Software Heritage [mission](https://www.softwareh
### 1.2 What is the Software Heritage archive?
The Software Heritage archive is the largest public collection of source code in existence. Visit the archive on https://archive.softwareheritage.org.
The Software Heritage archive is the largest public collection of source code in existence. Visit the archive on https://archive.softwareheritage.org.
### 1.3 What is the size of the archive?
The archive is growing over time as we crawl new source code from software projects and development forges. You can see live counters of the archive contents, as well as a breakdown by crawled origins, on https://archive.softwareheritage.org.
...
...
@@ -62,7 +62,7 @@ Here is an excerpt of this list:
### 2.2 If my code is on GitHub/GitLab/Bitbucket, is it already archived in Software Heritage?
It might be, as we crawl these and other popular forges regularly.
It might be, as we crawl these and other popular forges regularly.
Search for your code repository on https://archive.softwareheritage.org/browse/search/.
<details>
...
...
@@ -73,7 +73,7 @@ If it is not there yet, or if the latest snapshot is not the most recent state o
https://archive.softwareheritage.org/save/ or by clicking on the "Save again" button in the browse view.
A [GitHub action](https://github.com/marketplace/actions/save-to-software-heritage) is available to automatically push a save code now request. Here is an [example](https://github.com/patrickfuchs/buildH/blob/master/.github/workflows/archive-to-software-heritage.yml) of this action configured to run each time a new release is issued.
You can also use the [browser extension](https://www.softwareheritage.org/2022/08/02/updateswh-browser-extension/).
...
...
@@ -91,7 +91,7 @@ We do not inspect or filter the source code, and archive anything that we can ge
<summary> Expand for details </summary>
<br>
The reason for this approach is because the value of software source code cannot be known in advance. When a project starts, one cannot predict whether it will become a key software component or not. For example, when [Rasmus Lerdorf](https://fr.wikipedia.org/wiki/Rasmus_Lerdorf) released [the first version on PHP back in 1995](https://groups.google.com/g/comp.infosystems.www.authoring.cgi/c/PyJ25gZ6z7A/m/M9FkTUVDfcwJ), who could have predicted that it would become one of the most popular tools for the Web.
The reason for this approach is because the value of software source code cannot be known in advance. When a project starts, one cannot predict whether it will become a key software component or not. For example, when [Rasmus Lerdorf](https://fr.wikipedia.org/wiki/Rasmus_Lerdorf) released [the first version on PHP back in 1995](https://groups.google.com/g/comp.infosystems.www.authoring.cgi/c/PyJ25gZ6z7A/m/M9FkTUVDfcwJ), who could have predicted that it would become one of the most popular tools for the Web.
And it also happens that [very precious pieces of source code may be go unnoticed for decades](https://en.wikipedia.org/wiki/OpenSSL), until one day [some unexpected bug unveils that a big part of our digital infrastructure relies on them](https://en.wikipedia.org/wiki/Heartbleed).
And it also happens that [very precious pieces of source code may be go unnoticed for decades](https://en.wikipedia.org/wiki/OpenSSL), until one day [some unexpected bug unveils that a big part of our digital infrastructure relies on them](https://en.wikipedia.org/wiki/Heartbleed).
</details>
...
...
@@ -120,8 +120,8 @@ Our core mission is to preserve source code, because it is human readable and co
### 2.7 I can't find all my "releases" in a git repository in Software Heritage, what should I do?
Do not worry, your repository has been saved in full.
What you are witnessing is just a terminological difference between what
Do not worry, your repository has been saved in full.
What you are witnessing is just a terminological difference between what
platforms like GitHub calls "releases" (any non annotated git tag) and what we call "releases" (a node in the Merkle tree, which corresponds to a git annotated tag). This is a common issue, as you can see for example in [this discussion thread](https://stackoverflow.com/questions/11514075/what-is-the-difference-between-an-annotated-and-unannotated-tag).
<details>
...
...
@@ -223,7 +223,7 @@ Notice that the Permalinks tab offers a plurality of options to pick a SWHID (yo
### 3.4 Which type of SWHID should I use in my article/documentation?
### 3.4 Which type of SWHID should I use in my article/documentation?
It really depends on your use case, but as a general suggestion we recommend to take *the full SWHID of a directory (with the contextual information)*.
...
...
@@ -234,7 +234,7 @@ It really depends on your use case, but as a general suggestion we recommend to
When writing a research article, a blog post or technical documentation, one may face some tension between the need to provide the maximum amount of information, using the full SWHID, or keeping the reference short (for example due to page limitations).
Here is the recommended best practice to address this issue:
Here is the recommended best practice to address this issue:
1) get the full SWHID for the 'directory' containing the version of the code you want to reference. Here is an example of such a full SWHID:
@@ -257,7 +257,7 @@ In the digital version *the clickable link* uses *the full SWHID* to let the rea
If your code (or the latest version of it) is not yet in the archive, you need first to trigger its archival. This can be done with a ["Save Code Now"](https://save.softwareheritage.org) request, or via the deposit API.
Once a Save Code Now request is issued, the ingestion of the code is usually completed in a few minutes, depending on the size of the repository. Once it's done, the status of the save request is updated and you can get the SWHID as shown before.
Once a Save Code Now request is issued, the ingestion of the code is usually completed in a few minutes, depending on the size of the repository. Once it's done, the status of the save request is updated and you can get the SWHID as shown before.
When a deposit is submitted, the ingestion is also usually completed in a few minutes and the SWHID is accessible through the SWORD status response.
...
...
@@ -284,7 +284,7 @@ Please do not clone a full repository directly from Software Heritage: it is an
<summary> Expand for details </summary>
<br>
Software Heritage stores all the software artifacts in a massive shared [Merkle tree](https://en.wikipedia.org/wiki/Merkle_tree), so that exporting (a specific version of) an archived respository implies traversing the graph to get all the relevant contents and packaging them up for your consumption. This operation is much more expensive than downloading an existing tar file or cloning a repository from a forge.
Software Heritage stores all the software artifacts in a massive shared [Merkle tree](https://en.wikipedia.org/wiki/Merkle_tree), so that exporting (a specific version of) an archived respository implies traversing the graph to get all the relevant contents and packaging them up for your consumption. This operation is much more expensive than downloading an existing tar file or cloning a repository from a forge.
If really Software Heritage is your last resort, and you cannot find the source code of interest elsewhere, we recommend that you download only the version of interest for you, using the "*directory*" option of the Download button that you find when you browse the archive.
...
...
@@ -360,7 +360,7 @@ Last but not least, [Software Heritage indexes](https://www.softwareheritage.org
Credit: Gruenpeter M. and Thornton K. (2018) Pathways for Discovery of Free Software (slide deck from LibrePlanet 2018). https://en.wikipedia.org/wiki/File:Pathways-discovery-free.pdf