From 0767c811c859a798a7cfaa4247c6b2b15cc7e8aa Mon Sep 17 00:00:00 2001 From: Roberto Di Cosmo <roberto@dicosmo.org> Date: Sat, 28 Mar 2020 15:16:04 +0100 Subject: [PATCH] Extend SWH PID definition with additional context qualifiers. --- docs/persistent-identifiers.rst | 74 +++++++++++++++++++++++---------- 1 file changed, 52 insertions(+), 22 deletions(-) diff --git a/docs/persistent-identifiers.rst b/docs/persistent-identifiers.rst index 03d213ee..c34273fe 100644 --- a/docs/persistent-identifiers.rst +++ b/docs/persistent-identifiers.rst @@ -135,15 +135,13 @@ Examples Contextual information ====================== -It is often useful to complement persistent identifiers with **contextual -information** about where the identified object has been found as well as which -specific parts of it are of interest. To that end it is possible, via a -dedicated syntax, to extend persistent identifiers with the following pieces of -information: - -* the **software origin** where an object has been found/observed -* the **line number(s)** of interest, usually within a content object +Persistent identifiers may be equipped with **qualifiers** to provide *contextual information* about the object designated by the identifier. Qualifiers come in different kinds : +* origin +* visit +* anchor +* path +* lines Syntax ------ @@ -153,32 +151,64 @@ by the ``<identifier_with_context>`` entry point of the grammar: .. code-block:: bnf - <identifier_with_context> ::= <identifier> [<lines_ctxt>] [<origin_ctxt>] - <lines_ctxt> ::= ";" "lines" "=" <line_number> ["-" <line_number>] + <identifier_with_context> ::= <identifier> [ <qualifierlist> ] + <qualifierlist> := <qualifier> [ <qualifierlist> ] + <qualifier> ::= <origin_ctxt> | <visit_ctxt> | <anchor_ctxt> | <path_ctxt> |<lines_ctxt> <origin_ctxt> ::= ";" "origin" "=" <url> + <visit_ctxt> ::= ";" "visit" "=" <identifier> + <anchor_ctxt> ::= ";" "anchor" "=" <identifier> + <path_ctxt> ::= ";" "path" "=" <path_absolute> + <lines_ctxt> ::= ";" "lines" "=" <line_number> ["-" <line_number>] <line_number> ::= <dec_digit> + <url> ::= (* RFC 3986 compliant URLs *) + <path_absolute> ::= (* RFC 3986 compliant absolute file path *) +For ``<path_absolute>`` see `Section 3.3 of RFC 3986 <https://tools.ietf.org/html/rfc3986#section-3.3>`_ Semantics --------- -``;`` is used as separator between persistent identifiers and additional -optional contextual information. Each piece of contextual information is +``;`` is used as separator between persistent identifiers and the +optional contextual information qualifiers. Each contextual information qualifier is specified as a key/value pair, using ``=`` as a separator. The following piece of contextual information are supported: -* line numbers: it is possible to specify a single line number or a line range, - separating two numbers with ``-``. Note that line numbers are purely - indicative and are not meant to be stable, as in some degenerate cases - (e.g., text files which mix different types of line terminators) it is - impossible to resolve them unambiguously. - -* software origin: where a given object has been found or observed in the wild, - as the URI that was used by Software Heritage to ingest the object into the - archive - +* **origin** : the *software origin* where an object has been found or observed in the wild, + as the URI that was used by Software Heritage to ingest the object into the archive; +* **visit** : the *status of a full repository* containing the designated object, as a *snapshot* + corresponding to a specific *visit* of that repository; +* **anchor** : a *designated node* in the Merkle DAG relative to which a *path to the object* is specified, + as a persistent identifier of a directory, a revision, a release or a snapshot; +* **path** : the *absolute file path* from the *root directory* associated to the *anchor node* to the object; + when the anchor denotes a directory or a revision, and almost always when it's a release, + the root directory is uniquely determined; when the anchor denotes a snapshot, the root + directory is considered to be the one associated to the main branch of that snapshot; +* **lines** : *line number(s)* of interest, usually within a content object + +We recommend to equip with as many qualifiers as possible identifiers meant +to be shared. Redundant information should be omitted: for example, if the *visit* +is present, and the *path* is relative to the snapshot indicated there, then +the *anchor* qualifier is superfluous. + +Example +------- + +The following `fully qualified identifier <https://archive.softwareheritage.org/swh:1:cnt:4d99d2d18326621ccdd70f5ea66c2e2ac236ad8b;anchor=swh:1:rev:2db189928c94d62a3b4757b3eec68f0a4d4113f0;path=/Examples/SimpleFarm/simplefarm.ml;visit=swh:1:snp:d7f1b9eb7ccb596c2622c4780febaa02549830f9;origin=https://gitorious.org/ocamlp3l/ocamlp3l_cvs.git;lines=9-15>`_ +denotes the lines 9 to 15 of a file content that +can be found at absolute path ``/Examples/SimpleFarm/simplefarm.ml`` from the root directory +of the revision ``swh:1:rev:2db189928c94d62a3b4757b3eec68f0a4d4113f0`` that is contained +in the snapshot ``swh:1:snp:d7f1b9eb7ccb596c2622c4780febaa02549830f9`` taken from +the origin ``https://gitorious.org/ocamlp3l/ocamlp3l_cvs.git``. + +.. code-block:: url + + swh:1:cnt:4d99d2d18326621ccdd70f5ea66c2e2ac236ad8b; + anchor=swh:1:rev:2db189928c94d62a3b4757b3eec68f0a4d4113f0; + path=/Examples/SimpleFarm/simplefarm.ml; + visit=swh:1:snp:d7f1b9eb7ccb596c2622c4780febaa02549830f9; + origin=https://gitorious.org/ocamlp3l/ocamlp3l_cvs.git; + lines=9-15 Resolution ========== -- GitLab