Updated version 0.0.30 from 'debian/upstream/0.0.30'

with Debian dir b14d6f7130e69b223e510b35603cbfa2f06c9e5d

Updated version 0.0.30 from 'debian/upstream/0.0.30'
with Debian dir b14d6f7130e69b223e510b35603cbfa2f06c9e5d
0499700c · Jenkins for Software Heritage · 776588f0 · 5ed0089a · 776588f0 · 776588f0
Commit 0499700c authored 6 years ago by Jenkins for Software Heritage
--- a/.gitignore
+++ b/.gitignore
-*~
-build
-.coverage
-dist
-*.egg-info/
-.eggs/
-.hypothesis
-*.pyc
-__pycache__
-.pytest_cache
-*.sw?
-.tox
-version.txt
--- a/AUTHORS
+++ b/AUTHORS
-Copyright (C) 2015 The Software Heritage developers
-
-See http://www.softwareheritage.org/ for more information.
--- a/LICENSE
+++ b/LICENSE
--- a/Makefile.local
+++ b/Makefile.local
-FLAG=-v
-NOSEFLAGS=-v -s
--- a/PKG-INFO
+++ b/PKG-INFO
 Metadata-Version: 2.1
 Name: swh.model
-Version: 0.0.29
+Version: 0.0.30
 Summary: Software Heritage data model
 Home-page: https://forge.softwareheritage.org/diffusion/DMOD/
 Author: Software Heritage developers
 Author-email: swh-devel@inria.fr
 License: UNKNOWN
 Project-URL: Bug Reports, https://forge.softwareheritage.org/maniphest
-Project-URL: Funding, https://www.softwareheritage.org/donate
 Project-URL: Source, https://forge.softwareheritage.org/source/swh-model
+Project-URL: Funding, https://www.softwareheritage.org/donate
 Description: swh-model
        =========
        

--- a/bin/git-revhash
+++ b/bin/git-revhash
-#!/usr/bin/env bash
-
-# Use
-# git-revhash 'tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904\nparent 22c0fa5195a53f2e733ec75a9b6e9d1624a8b771\nauthor seanius <seanius@3187e211-bb14-4c82-9596-0b59d67cd7f4> 1138341044 +0000\ncommitter seanius <seanius@3187e211-bb14-4c82-9596-0b59d67cd7f4> 1138341044 +0000\n\nmaking dir structure...\n'  # noqa
-# output: 17a631d474f49bbebfdf3d885dcde470d7faafd7
-
-echo -ne $* | git hash-object --stdin -t commit
--- a/bin/swh-hashtree
+++ b/bin/swh-hashtree
-#!/usr/bin/env python3
-
-# Use sample:
-# swh-hashtree --path . --ignore '.svn' --ignore '.git-svn' \
-#    --ignore-empty-folders
-# 38f8d2c3a951f6b94007896d0981077e48bbd702
-
-import click
-import os
-
-from swh.model import from_disk, hashutil
-
-
-def combine_filters(*filters):
-    """Combine several ignore filters"""
-    if len(filters) == 0:
-        return from_disk.accept_all_directories
-    elif len(filters) == 1:
-        return filters[0]
-
-    def combined_filter(*args, **kwargs):
-        return all(filter(*args, **kwargs) for filter in filters)
-
-    return combined_filter
-
-
-@click.command()
-@click.option('--path', default='.',
-              help='Optional path to hash.')
-@click.option('--ignore-empty-folder', is_flag=True, default=False,
-              help='Ignore empty folder.')
-@click.option('--ignore', multiple=True,
-              help='Ignore pattern.')
-def main(path, ignore_empty_folder=False, ignore=None):
-
-    filters = []
-    if ignore_empty_folder:
-        filters.append(from_disk.ignore_empty_directories)
-    if ignore:
-        filters.append(
-            from_disk.ignore_named_directories(
-                [os.fsencode(name) for name in ignore]
-            )
-        )
-
-    try:
-        d = from_disk.Directory.from_disk(path=os.fsencode(path),
-                                          dir_filter=combine_filters(*filters))
-        hash = d.hash
-    except Exception as e:
-        print(e)
-        return
-    else:
-        print(hashutil.hash_to_hex(hash))
-
-
-if __name__ == '__main__':
-    main()
--- a/bin/swh-revhash
+++ b/bin/swh-revhash
-#!/usr/bin/env python3
-
-# Use:
-# swh-revhash 'tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904\nparent 22c0fa5195a53f2e733ec75a9b6e9d1624a8b771\nauthor seanius <seanius@3187e211-bb14-4c82-9596-0b59d67cd7f4> 1138341044 +0000\ncommitter seanius <seanius@3187e211-bb14-4c82-9596-0b59d67cd7f4> 1138341044 +0000\n\nmaking dir structure...\n'  # noqa
-# output: 17a631d474f49bbebfdf3d885dcde470d7faafd7
-
-# To compare with git:
-# git-revhash 'tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904\nparent 22c0fa5195a53f2e733ec75a9b6e9d1624a8b771\nauthor seanius <seanius@3187e211-bb14-4c82-9596-0b59d67cd7f4> 1138341044 +0000\ncommitter seanius <seanius@3187e211-bb14-4c82-9596-0b59d67cd7f4> 1138341044 +0000\n\nmaking dir structure...\n'   # noqa
-# output: 17a631d474f49bbebfdf3d885dcde470d7faafd7
-
-
-import sys
-
-from swh.model import identifiers, hashutil
-
-
-def revhash(revision_raw):
-    """Compute the revision hash.
-
-    """
-    if b'\\n' in revision_raw:  # HACK: string have somehow their \n
-                                # expanded to \\n
-        revision_raw = revision_raw.replace(b'\\n', b'\n')
-
-    h = hashutil.hash_git_data(revision_raw, 'commit')
-    return identifiers.identifier_to_str(h)
-
-
-if __name__ == '__main__':
-    revision_raw = sys.argv[1].encode('utf-8')
-    print(revhash(revision_raw))
--- a/docs/.gitignore
+++ b/docs/.gitignore
-_build/
-apidoc/
-*-stamp
--- a/docs/Makefile
+++ b/docs/Makefile
-include ../../swh-docs/Makefile.sphinx
-include Makefile.local
--- a/docs/Makefile.local
+++ b/docs/Makefile.local
-sphinx/html: images
-sphinx/clean: clean-images
-
-images:
-	make -C images/
-clean-images:
-	make -C images/ clean
-
-.PHONY: images clean-images
-
-
-# Local Variables:
-# mode: makefile
-# End:
--- a/docs/_static/.placeholder
+++ b/docs/_static/.placeholder
--- a/docs/_templates/.placeholder
+++ b/docs/_templates/.placeholder
--- a/docs/conf.py
+++ b/docs/conf.py
-from swh.docs.sphinx.conf import *  # NoQA
--- a/docs/data-model.rst
+++ b/docs/data-model.rst
-.. _data-model:
-
-Data model
-==========
-
-.. note:: The text below is adapted from §7 of the article `Software Heritage:
-  Why and How to Preserve Software Source Code
-  <https://hal.archives-ouvertes.fr/hal-01590958/>`_ (in proceedings of `iPRES
-  2017 <https://ipres2017.jp/>`_, 14th International Conference on Digital
-  Preservation, by Roberto Di Cosmo and Stefano Zacchiroli), which also
-  provides a more general description of Software Heritage for the digital
-  preservation research community.
-
-In any archival project the choice of the underlying data model—at the logical
-level, independently from how data is actually stored on physical media—is
-paramount. The data model adopted by Software Heritage to represent the
-information that it collects is centered around the notion of *software
-artifact*, described below.
-
-It is important to notice that according to our principles, we must store with
-every software artifact full information on where it has been found
-(provenance), that is also captured in our data model, so we start by providing
-some basic information on the nature of this provenance information.
-
-
-Source code hosting places
--------------------------
-
-Currently, Software Heritage uses of a curated list of source code hosting
-places to crawl. The most common entries we expect to place in such a list are
-popular collaborative development forges (e.g., GitHub, Bitbucket), package
-manager repositories that host source package (e.g., CPAN, npm), and FOSS
-distributions (e.g., Fedora, FreeBSD). But we may of course allow also more
-niche entries, such as URLs of personal or institutional project collections
-not hosted on major forges.
-
-While currently entirely manual, the curation of such a list might easily be
-semi-automatic, with entries suggested by fellow archivists and/or concerned
-users that want to notify Software Heritage of the need of archiving specific
-pieces of endangered source code. This approach is entirely compatible with
-Web-wide crawling approaches: crawlers capable of detecting the presence of
-source code might enrich the list. In both cases the list will remain curated,
-with (semi-automated) review processes that will need to pass before a hosting
-place starts to be used.
-
-
-Software artifacts
------------------
-
-Once the hosting places are known, they will need to be periodically looked at
-in order to add to the archive missing software artifacts. Which software
-artifacts will be found there?
-
-In general, each software distribution mechanism hosts multiple releases of a
-given software at any given time. For VCS (Version Control Systems), this is
-the natural behaviour; for software packages, while a single version of a
-package is just a snapshot of the corresponding software product, one can often
-retrieve both current and past versions of the package from its distribution
-site.
-
-By reviewing and generalizing existing VCS and source package formats, we have
-identified the following recurrent artifacts as commonly found at source code
-hosting places. They form the basic ingredients of the Software Heritage
-archive. As the terminology varies quite a bit from technology to technology,
-we provide below both the canonical name used in Software Heritage and popular
-synonyms.
-
-**contents** (AKA "blobs")
-  the raw content of (source code) files as a sequence of bytes, without file
-  names or any other metadata.  File contents are often recurrent, e.g., across
-  different versions of the same software, different directories of the same
-  project, or different projects all together.
-
-**directories**
-  a list of named directory entries, each of which pointing to other artifacts,
-  usually file contents or sub-directories. Directory entries are also
-  associated to arbitrary metadata, which vary with technologies, but usually
-  includes permission bits, modification timestamps, etc.
-
-**revisions** (AKA "commits")
-  software development within a specific project is essentially a time-indexed
-  series of copies of a single "root" directory that contains the entire
-  project source code. Software evolves when a developer modifies the content
-  of one or more files in that directory and record their changes.
-
-  Each recorded copy of the root directory is known as a "revision". It points
-  to a fully-determined directory and is equipped with arbitrary metadata. Some
-  of those are added manually by the developer (e.g., commit message), others
-  are automatically synthesized (timestamps, preceding commit(s), etc).
-
-**releases** (AKA "tags")
-  some revisions are more equals than others and get selected by developers as
-  denoting important project milestones known as "releases". Each release
-  points to the last commit in project history corresponding to the release and
-  might carry arbitrary metadata—e.g., release name and version, release
-  message, cryptographic signatures, etc.
-
-
-Additionally, the following crawling-related information are stored as
-provenance information in the Software Heritage archive:
-
-**origins**
-  code "hosting places" as previously described are usually large platforms
-  that host several unrelated software projects. For software provenance
-  purposes it is important to be more specific than that.
-
-  Software origins are fine grained references to where source code artifacts
-  archived by Software Heritage have been retrieved from. They take the form of
-  ``(type, url)`` pairs, where ``url`` is a canonical URL (e.g., the address at
-  which one can ``git clone`` a repository or download a source tarball) and
-  ``type`` the kind of software origin (e.g., git, svn, or dsc for Debian
-  source packages).
-
-..
-   **projects**
-     as commonly intended are more abstract entities that precise software
-     origins. Projects relate together several development resources, including
-     websites, issue trackers, mailing lists, as well as software origins as
-     intended by Software Heritage.
-
-     The debate around the most apt ontologies to capture project-related
-     information for software hasn't settled yet, but the place projects will take
-     in the Software Heritage archive is fairly clear. Projects are abstract
-     entities, which will be arbitrarily nestable in a versioned
-     project/sub-project hierarchy, and that can be associated to arbitrary
-     metadata as well as origins where their source code can be found.
-
-**snapshots**
-  any kind of software origin offers multiple pointers to the "current" state
-  of a development project. In the case of VCS this is reflected by branches
-  (e.g., master, development, but also so called feature branches dedicated to
-  extending the software in a specific direction); in the case of package
-  distributions by notions such as suites that correspond to different maturity
-  levels of individual packages (e.g., stable, development, etc.).
-
-  A "snapshot" of a given software origin records all entry points found there
-  and where each of them was pointing at the time. For example, a snapshot
-  object might track the commit where the master branch was pointing to at any
-  given time, as well as the most recent release of a given package in the
-  stable suite of a FOSS distribution.
-
-**visits**
-  links together software origins with snapshots. Every time an origin is
-  consulted a new visit object is created, recording when (according to
-  Software Heritage clock) the visit happened and the full snapshot of the
-  state of the software origin at the time.
-
-
-Data structure
--------------
-
-.. _swh-merkle-dag:
-.. figure:: images/swh-merkle-dag.svg
-   :width: 1024px
-   :align: center
-
-   Software Heritage archive as a Merkle DAG, augmented with crawling
-   information (click to zoom).
-
-
-With all the bits of what we want to archive in place, the next question is how
-to organize them, i.e., which logical data structure to adopt for their
-storage. A key observation for this decision is that source code artifacts are
-massively duplicated. This is so for several reasons:
-
-* code hosting diaspora (i.e., project development moving to the most
-  recent/cool collaborative development technology over time);
-* copy/paste (AKA "vendoring") of parts or entire external FOSS software
-  components into other software products;
-* large overlap between revisions of the same project: usually only a very
-  small amount of files/directories are modified by a single commit;
-* emergence of DVCS (distributed version control systems), which natively work
-  by replicating entire repository copies around. GitHub-style pull requests
-  are the pinnacle of this, as they result in creating an additional repository
-  copy at each change done by a new developer;
-* migration from one VCS to another—e.g., migrations from Subversion to Git,
-  which are really popular these days—resulting in additional copies, but in a
-  different distribution format, of the very same development histories.
-
-These trends seem to be neither stopping nor slowing down, and it is reasonable
-to expect that they will be even more prominent in the future, due to the
-decreasing costs of storage and bandwidth.
-
-For this reason we argue that any sustainable storage layout for archiving
-source code in the very long term should support deduplication, allowing to pay
-for the cost of storing source code artifacts that are encountered more than
-once only once. For storage efficiency, deduplication should be supported for
-all the software artifacts we have discussed, namely: file contents,
-directories, revisions, releases, snapshots.
-
-Realizing that principle, the Software Heritage archive is conceptually a
-single (big) `Merkle Direct Acyclic Graph (DAG)
-<https://en.wikipedia.org/wiki/Merkle_tree>`_, as depicted in Figure
-:ref:`Software Heritage Merkle DAG <swh-merkle-dag>`. In such a graph each of
-the artifacts we have described—from file contents up to entire
-snapshots—correspond to a node.  Edges between nodes emerge naturally:
-directory entries point to other directories or file contents; revisions point
-to directories and previous revisions, releases point to revisions, snapshots
-point to revisions and releases. Additionally, each node contains all metadata
-that are specific to the node itself rather than to pointed nodes; e.g., commit
-messages, timestamps, or file names. Note that the structure is really a DAG,
-and not a tree, due to the fact that the line of revisions nodes might be
-forked and merged back.
-
-..
-   directory: fff3cc22cb40f71d26f736c082326e77de0b7692
-   parent: e4feb05112588741b4764739d6da756c357e1f37
-   author: Stefano Zacchiroli <zack@upsilon.cc>
-   date: 1443617461 +0200
-   committer: Stefano Zacchiroli <zack@upsilon.cc>
-   commiter_date: 1443617461 +0200
-   message:
-     objstorage: fix tempfile race when adding objects
-
-     Before this change, two workers adding the same
-     object will end up racing to write <SHA1>.tmp.
-     [...]
-
-     revisionid: 64a783216c1ec69dcb267449c0bbf5e54f7c4d6d
-     A revision node in the Software Heritage DAG
-
-In a Merkle structure each node is identified by an intrinsic identifier
-computed as a cryptographic hash of the node content. In the case of Software
-Heritage identifiers are computed taking into account both node-specific
-metadata and the identifiers of child nodes.
-
-Consider the revision node in the picture whose identifier starts with
-`c7640e08d..`. it points to a directory (identifier starting with
-`45f0c078..`), which has also been archived. That directory contains a full
-copy, at a specific point in time, of a software component—in the example the
-`Hello World <https://forge.softwareheritage.org/source/helloworld/>`_ software
-component available on our forge. The revision node also points to the
-preceding revision node (`43ef7dcd..`) in the project development history.
-Finally, the node contains revision-specific metadata, such as the author and
-committer of the given change, its timestamps, and the message entered by the
-author at commit time.
-
-The identifier of the revision node itself (`c7640e08d..`) is computed as a
-cryptographic hash of a (canonical representation of) all the information shown
-in figure. A change in any of them—metadata and/or pointed nodes—would result
-in an entirely different node identifier. All other types of nodes in the
-Software Heritage archive behave similarly.
-
-The Software Heritage archive inherits useful properties from the underlying
-Merkle structure. In particular, deduplication is built-in. Any software
-artifacts encountered in the wild gets added to the archive only if a
-corresponding node with a matching intrinsic identifier is not already
-available in the graph—file content, commits, entire directories or project
-snapshots are all deduplicated incurring storage costs only once.
-
-Furthermore, as a side effect of this data model choice, the entire development
-history of all the source code archived in Software Heritage—which ambitions to
-match all published source code in the world—is available as a unified whole,
-making emergent structures such as code reuse across different projects or
-software origins, readily available. Further reinforcing the Software Heritage
-use cases, this object could become a veritable "map of the stars" of our
-entire software commons.
--- a/docs/images/.gitignore
+++ b/docs/images/.gitignore
-swh-merkle-dag.pdf
-swh-merkle-dag.svg
--- a/docs/images/Makefile
+++ b/docs/images/Makefile
-
-MERKLE_DAG =  swh-merkle-dag.pdf swh-merkle-dag.svg
-
-BUILD_TARGETS =
-BUILD_TARGETS += $(MERKLE_DAG)
-
-all: $(BUILD_TARGETS)
-
-
-%.svg: %.dia
-	inkscape -l $@ $<
-
-%.pdf: %.dia
-	inkscape -A $@ $<
-
-clean:
-	-rm -f $(BUILD_TARGETS)
--- a/docs/images/swh-merkle-dag.dia
+++ b/docs/images/swh-merkle-dag.dia
--- a/docs/index.rst
+++ b/docs/index.rst
-.. _swh-model:
-
-Software Heritage - Data model
-==============================
-
-Implementation of the :ref:`data-model` to archive source code artifacts.
-
-
-.. toctree::
-   :maxdepth: 2
-   :caption: Contents:
-
-
-Overview
--------
-
-* :ref:`data-model`
-* :ref:`persistent-identifiers`
-
-
-Indices and tables
-==================
-
-* :ref:`genindex`
-* :ref:`modindex`
-* :ref:`search`
--- a/docs/persistent-identifiers.rst
+++ b/docs/persistent-identifiers.rst
-.. _persistent-identifiers:
-
-Persistent identifiers
-======================
-
-You can point to objects present in the Software Heritage archive by the means
-of **persistent identifiers** that are guaranteed to remain stable (persistent)
-over time. Their syntax, meaning, and usage is described below. Note that they
-are identifiers and not URLs, even though an URL-based resolver for Software
-Heritage persistent identifiers is also provided.
-
-A persistent identifier can point to any software artifact (or "object")
-available in the Software Heritage archive. Objects come in different types,
-and most notably:
-
-* contents
-* directories
-* revisions
-* releases
-* snapshots
-
-Each object is identified by an intrinsic, type-specific object identifier that
-is embedded in its persistent identifier as described below. Object identifiers
-are strong cryptographic hashes computed on the entire set of object properties
-to form a `Merkle structure <https://en.wikipedia.org/wiki/Merkle_tree>`_.
-
-See :ref:`data-model` for an overview of object types and how they are linked
-together. See :py:mod:`swh.model.identifiers` for details on how intrinsic
-object identifiers are computed.
-
-
-Syntax
------
-
-Syntactically, persistent identifiers are generated by the ``<identifier>``
-entry point of the grammar:
-
-.. code-block:: bnf
-
-  <identifier> ::= "swh" ":" <scheme_version> ":" <object_type> ":" <object_id> ;
-  <scheme_version> ::= "1" ;
-  <object_type> ::=
-      "snp"  (* snapshot *)
-    | "rel"  (* release *)
-    | "rev"  (* revision *)
-    | "dir"  (* directory *)
-    | "cnt"  (* content *)
-    ;
-  <object_id> ::= 40 * <hex_digit> ;  (* intrinsic object id, as hex-encoded SHA1 *)
-  <dec_digit> ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
-  <hex_digit> ::= <dec_digit> | "a" | "b" | "c" | "d" | "e" | "f" ;
-
-
-Semantics
---------
-
-``:`` is used as separator between the logical parts of identifiers. The
-``swh`` prefix makes explicit that these identifiers are related to *SoftWare
-Heritage*. ``1`` (``<scheme_version>``) is the current version of this
-identifier *scheme*; future editions will use higher version numbers, possibly
-breaking backward compatibility (but without breaking the resolvability of
-identifiers that conform to previous versions of the scheme).
-
-A persistent identifier points to a single object, whose type is explicitly
-captured by ``<object_type>``:
-
-* ``snp`` identifiers points to **snapshots**,
-* ``rel`` to **releases**,
-* ``rev`` to **revisions**,
-* ``dir`` to **directories**,
-* ``cnt`` to **contents**.
-
-The actual object pointed to is identified by the intrinsic identifier
-``<object_id>``, which is a hex-encoded (using lowercase ASCII characters) SHA1
-computed on the content and metadata of the object itself, as follows:
-
-* for **snapshots**, intrinsic identifiers are computed as per
-  :py:func:`swh.model.identifiers.snapshot_identifier`
-
-* for **releases**, as per
-  :py:func:`swh.model.identifiers.release_identifier`
-
-* for **revisions**, as per
-  :py:func:`swh.model.identifiers.revision_identifier`
-
-* for **directories**, as per
-  :py:func:`swh.model.identifiers.directory_identifier`
-
-* for **contents**, the intrinsic identifier is the ``sha1_git`` hash of the
-  multiple hashes returned by
-  :py:func:`swh.model.identifiers.content_identifier`, i.e., the SHA1 of a byte
-  sequence obtained by juxtaposing the ASCII string ``"blob"`` (without
-  quotes), a space, the length of the content as decimal digits, a NULL byte,
-  and the actual content of the file.
-
-
-Git compatibility
-~~~~~~~~~~~~~~~~~
-
-Intrinsic object identifiers for contents, directories, revisions, and releases
-are, at present, compatible with the `Git <https://git-scm.com/>`_ way of
-`computing identifiers
-<https://git-scm.com/book/en/v2/Git-Internals-Git-Objects>`_ for its objects.
-A Software Heritage content identifier will be identical to a Git blob
-identifier of any file with the same content, a Software Heritage revision
-identifier will be identical to the corresponding Git commit identifier, etc.
-This is not the case for snapshot identifiers as Git doesn't have a
-corresponding object type.
-
-Note that Git compatibility is incidental and is not guaranteed to be
-maintained in future versions of this scheme (or Git).
-
-
-Examples
--------
-
-* ``swh:1:cnt:94a9ed024d3859793618152ea559a168bbcbb5e2`` points to the content
-  of a file containing the full text of the GPL3 license
-* ``swh:1:dir:d198bc9d7a6bcf6db04f476d29314f157507d505`` points to a directory
-  containing the source code of the Darktable photography application as it was
-  at some point on 4 May 2017
-* ``swh:1:rev:309cf2674ee7a0749978cf8265ab91a60aea0f7d`` points to a commit in
-  the development history of Darktable, dated 16 January 2017, that added
-  undo/redo supports for masks
-* ``swh:1:rel:22ece559cc7cc2364edc5e5593d63ae8bd229f9f`` points to Darktable
-  release 2.3.0, dated 24 December 2016
-* ``swh:1:snp:c7c108084bc0bf3d81436bf980b46e98bd338453`` points to a snapshot
-  of the entire Darktable Git repository taken on 4 May 2017 from GitHub
-
-
-Contextual information
-======================
-
-It is often useful to complement persistent identifiers with **contextual
-information** about where the identified object has been found as well as which
-specific parts of it are of interest. To that end it is possible, via a
-dedicated syntax, to extend persistent identifiers with the following pieces of
-information:
-
-* the **software origin** where an object has been found/observed
-* the **line number(s)** of interest, usually within a content object
-
-
-Syntax
------
-
-The full-syntax to complement identifiers with contextual information is given
-by the ``<identifier_with_context>`` entry point of the grammar:
-
-.. code-block:: bnf
-
-  <identifier_with_context> ::= <identifier> [<lines_ctxt>] [<origin_ctxt>]
-  <lines_ctxt> ::= ";" "lines" "=" <line_number> ["-" <line_number>]
-  <origin_ctxt> ::= ";" "origin" "=" <url>
-  <line_number> ::= <dec_digit> +
-  <url> ::= (* RFC 3986 compliant URLs *)
-
-
-Semantics
---------
-
-``;`` is used as separator between persistent identifiers and additional
-optional contextual information. Each piece of contextual information is
-specified as a key/value pair, using ``=`` as a separator.
-
-The following piece of contextual information are supported:
-
-* line numbers: it is possible to specify a single line number or a line range,
-  separating two numbers with ``-``. Note that line numbers are purely
-  indicative and are not meant to be stable, as in some degenerate cases
-  (e.g., text files which mix different types of line terminators) it is
-  impossible to resolve them unambiguously.
-
-* software origin: where a given object has been found or observed in the wild,
-  as the URI that was used by Software Heritage to ingest the object into the
-  archive
-
-
-Resolution
-==========
-
-
-Dedicated resolvers
-------------------
-
-Persistent identifiers can be resolved using the Software Heritage Web
-application (see :py:mod:`swh.web`).  In particular, the **root endpoint**
-``/`` can be given a persistent identifier and will lead to the browsing page
-of the corresponding object, like this:
-``https://archive.softwareheritage.org/<identifier>``.
-
-A **dedicated** ``/resolve`` **endpoint** of the HTTP API is also available to
-explicitly request persistent identifier resolution; see:
-:http:get:`/api/1/resolve/(swh_id)/`.
-
-Examples:
-
-* `<https://archive.softwareheritage.org/swh:1:cnt:94a9ed024d3859793618152ea559a168bbcbb5e2>`_
-* `<https://archive.softwareheritage.org/swh:1:dir:d198bc9d7a6bcf6db04f476d29314f157507d505>`_
-* `<https://archive.softwareheritage.org/api/1/resolve/swh:1:rev:309cf2674ee7a0749978cf8265ab91a60aea0f7d>`_
-* `<https://archive.softwareheritage.org/api/1/resolve/swh:1:rel:22ece559cc7cc2364edc5e5593d63ae8bd229f9f>`_
-* `<https://archive.softwareheritage.org/api/1/resolve/swh:1:snp:c7c108084bc0bf3d81436bf980b46e98bd338453>`_
-
-
-External resolvers
------------------
-
-The following **independent resolvers** support resolution of Software
-Heritage persistent identifiers:
-
-* `Identifiers.org <https://identifiers.org>`_; see:
-  `<http://identifiers.org/swh/>`_ (registry identifier `MIR:00000655
-  <https://www.ebi.ac.uk/miriam/main/datatypes/MIR:00000655>`_).
-
-* `Name-to-Thing (N2T) <https://n2t.net/>`_
-
-Examples:
-
-* `<https://identifiers.org/swh:1:cnt:94a9ed024d3859793618152ea559a168bbcbb5e2>`_
-* `<https://identifiers.org/swh:1:dir:d198bc9d7a6bcf6db04f476d29314f157507d505>`_
-* `<https://identifiers.org/swh:1:rev:309cf2674ee7a0749978cf8265ab91a60aea0f7d>`_
-* `<https://n2t.net/swh:1:rel:22ece559cc7cc2364edc5e5593d63ae8bd229f9f>`_
-* `<https://n2t.net/swh:1:snp:c7c108084bc0bf3d81436bf980b46e98bd338453>`_
-
-Note that resolution via Identifiers.org does not support contextual
-information, due to `syntactic incompatibilities
-<http://identifiers.org/documentation#custom_requests>`_.
-
-
-References
-==========
-
-* Roberto Di Cosmo, Morane Gruenpeter, Stefano Zacchiroli. `Identifiers for
-  Digital Objects: the Case of Software Source Code Preservation
-  <https://hal.archives-ouvertes.fr/hal-01865790v4>`_. In Proceedings of `iPRES
-  2018 <https://ipres2018.org/>`_: 15th International Conference on Digital
-  Preservation, Boston, MA, USA, September 2018, 9 pages.
-
-