Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found

Target

Select target project
  • anlambert/swh-model
  • lunar/swh-model
  • franckbret/swh-model
  • douardda/swh-model
  • olasd/swh-model
  • swh/devel/swh-model
  • Alphare/swh-model
  • samplet/swh-model
  • marmoute/swh-model
  • rboyer/swh-model
10 results
Show changes
Commits on Source (483)
# Changes here will be overwritten by Copier
_commit: v0.3.3
_src_path: https://gitlab.softwareheritage.org/swh/devel/swh-py-template.git
description: Software Heritage data model
distribution_name: swh-model
have_cli: true
have_workers: false
package_root: swh/model
project_name: swh.model
python_minimal_version: '3.7'
readme_format: rst
# python: Reformat code with black
bf3f1cec8685c8f480ddd95027852f8caa10b8e3
4c39334b2aa9f782950aaee72781dc1df9d37550
5ff7c5b592ce1d76f5696a7f089680807ad557a6
*.egg-info/
*.pyc
*.sw?
*~
.coverage
.eggs/
.hypothesis
.mypy_cache
.tox
__pycache__
*.egg-info/
dist
version.txt
build/
dist/
# these are symlinks created by a hook in swh-docs' main sphinx conf.py
docs/README.rst
docs/README.md
# this should be a symlink for people who want to build the sphinx doc
# without using tox, generally created by the swh-env/bin/update script
docs/Makefile.sphinx
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v5.0.0
hooks:
- id: trailing-whitespace
- id: check-json
- id: check-yaml
- repo: https://github.com/python/black
rev: 25.1.0
hooks:
- id: black
- repo: https://github.com/PyCQA/isort
rev: 6.0.0
hooks:
- id: isort
- repo: https://github.com/pycqa/flake8
rev: 7.1.1
hooks:
- id: flake8
additional_dependencies: [flake8-bugbear==24.12.12, flake8-pyproject]
- repo: https://github.com/codespell-project/codespell
rev: v2.4.1
hooks:
- id: codespell
name: Check source code spelling
stages: [pre-commit]
args: [-L assertIn, -L anc]
- id: codespell
name: Check commit message spelling
stages: [commit-msg]
- repo: local
hooks:
- id: mypy
name: mypy
entry: mypy
args: [swh]
pass_filenames: false
language: system
types: [python]
- id: twine-check
name: twine check
description: call twine check when pushing an annotated release tag
entry: bash -c "ref=$(git describe) &&
[[ $ref =~ ^v[0-9]+\.[0-9]+\.[0-9]+$ ]] &&
(python3 -m build --sdist && twine check $(ls -t dist/* | head -1)) || true"
pass_filenames: false
stages: [pre-push]
language: python
additional_dependencies: [twine, build]
# Software Heritage Code of Conduct
## Our Pledge
In the interest of fostering an open and welcoming environment, we as Software
Heritage contributors and maintainers pledge to making participation in our
project and our community a harassment-free experience for everyone, regardless
of age, body size, disability, ethnicity, sex characteristics, gender identity
and expression, level of experience, education, socioeconomic status,
nationality, personal appearance, race, religion, or sexual identity and
orientation.
## Our Standards
Examples of behavior that contributes to creating a positive environment
include:
* Using welcoming and inclusive language
* Being respectful of differing viewpoints and experiences
* Gracefully accepting constructive criticism
* Focusing on what is best for the community
* Showing empathy towards other community members
Examples of unacceptable behavior by participants include:
* The use of sexualized language or imagery and unwelcome sexual attention or
advances
* Trolling, insulting/derogatory comments, and personal or political attacks
* Public or private harassment
* Publishing others' private information, such as a physical or electronic
address, without explicit permission
* Other conduct which could reasonably be considered inappropriate in a
professional setting
## Our Responsibilities
Project maintainers are responsible for clarifying the standards of acceptable
behavior and are expected to take appropriate and fair corrective action in
response to any instances of unacceptable behavior.
Project maintainers have the right and responsibility to remove, edit, or
reject comments, commits, code, wiki edits, issues, and other contributions
that are not aligned to this Code of Conduct, or to ban temporarily or
permanently any contributor for other behaviors that they deem inappropriate,
threatening, offensive, or harmful.
## Scope
This Code of Conduct applies within all project spaces, and it also applies when
an individual is representing the project or its community in public spaces.
Examples of representing a project or community include using an official
project e-mail address, posting via an official social media account, or acting
as an appointed representative at an online or offline event. Representation of
a project may be further defined and clarified by project maintainers.
## Enforcement
Instances of abusive, harassing, or otherwise unacceptable behavior may be
reported by contacting the project team at `conduct@softwareheritage.org`. All
complaints will be reviewed and investigated and will result in a response that
is deemed necessary and appropriate to the circumstances. The project team is
obligated to maintain confidentiality with regard to the reporter of an
incident. Further details of specific enforcement policies may be posted
separately.
Project maintainers who do not follow or enforce the Code of Conduct in good
faith may face temporary or permanent repercussions as determined by other
members of the project's leadership.
## Attribution
This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html
[homepage]: https://www.contributor-covenant.org
For answers to common questions about this code of conduct, see
https://www.contributor-covenant.org/faq
Daniele Serafini
Ishan Bhanuka
Antoine Cezar
Pierre-Yves David
include Makefile
include requirements.txt
include requirements-swh.txt
include version.txt
Software Heritage - Data model
==============================
Implementation of the Data model of the Software Heritage project, used to
archive source code artifacts.
This module defines the notion of SoftWare Hash persistent IDentifiers
(SWHIDs) and provides tools to compute them:
.. code-block:: shell
$ swh-identify fork.c kmod.c sched/deadline.c
swh:1:cnt:2e391c754ae730bd2d8520c2ab497c403220c6e3 fork.c
swh:1:cnt:0277d1216f80ae1adeed84a686ed34c9b2931fc2 kmod.c
swh:1:cnt:57b939c81bce5d06fa587df8915f05affbe22b82 sched/deadline.c
$ swh-identify --no-filename /usr/src/linux/kernel/
swh:1:dir:f9f858a48d663b3809c9e2f336412717496202ab
......@@ -5,16 +5,17 @@
# --ignore-empty-folders
# 38f8d2c3a951f6b94007896d0981077e48bbd702
import click
import os
import click
from swh.model import from_disk, hashutil
def combine_filters(*filters):
"""Combine several ignore filters"""
if len(filters) == 0:
return from_disk.accept_all_directories
return from_disk.accept_all_paths
elif len(filters) == 1:
return filters[0]
......@@ -25,27 +26,24 @@ def combine_filters(*filters):
@click.command()
@click.option('--path', default='.',
help='Optional path to hash.')
@click.option('--ignore-empty-folder', is_flag=True, default=False,
help='Ignore empty folder.')
@click.option('--ignore', multiple=True,
help='Ignore pattern.')
@click.option("--path", default=".", help="Optional path to hash.")
@click.option(
"--ignore-empty-folder", is_flag=True, default=False, help="Ignore empty folder."
)
@click.option("--ignore", multiple=True, help="Ignore pattern.")
def main(path, ignore_empty_folder=False, ignore=None):
filters = []
if ignore_empty_folder:
filters.append(from_disk.ignore_empty_directories)
if ignore:
filters.append(
from_disk.ignore_named_directories(
[os.fsencode(name) for name in ignore]
)
from_disk.ignore_named_directories([os.fsencode(name) for name in ignore])
)
try:
d = from_disk.Directory.from_disk(path=os.fsencode(path),
dir_filter=combine_filters(*filters))
d = from_disk.Directory.from_disk(
path=os.fsencode(path), path_filter=combine_filters(*filters)
)
hash = d.hash
except Exception as e:
print(e)
......@@ -54,5 +52,5 @@ def main(path, ignore_empty_folder=False, ignore=None):
print(hashutil.hash_to_hex(hash))
if __name__ == '__main__':
if __name__ == "__main__":
main()
......@@ -11,21 +11,19 @@
import sys
from swh.model import identifiers, hashutil
from swh.model import hashutil, identifiers
def revhash(revision_raw):
"""Compute the revision hash.
"""Compute the revision hash."""
# HACK: string have somehow their \n expanded to \\n
if b"\\n" in revision_raw:
revision_raw = revision_raw.replace(b"\\n", b"\n")
"""
if b'\\n' in revision_raw: # HACK: string have somehow their \n
# expanded to \\n
revision_raw = revision_raw.replace(b'\\n', b'\n')
h = hashutil.hash_git_data(revision_raw, 'commit')
h = hashutil.hash_git_data(revision_raw, "commit")
return identifiers.identifier_to_str(h)
if __name__ == '__main__':
revision_raw = sys.argv[1].encode('utf-8')
if __name__ == "__main__":
revision_raw = sys.argv[1].encode("utf-8")
print(revhash(revision_raw))
swh-model (0.0.1-1) unstable; urgency=low
* Create swh-model package
-- Nicolas Dandrimont <olasd@debian.org> Mon, 07 Dec 2015 15:41:28 +0100
9
Source: swh-model
Maintainer: Software Heritage developers <swh-devel@inria.fr>
Section: python
Priority: optional
Build-Depends: debhelper (>= 9),
dh-python (>= 2),
python3 (>= 3.5) | python3-pyblake2,
python3-all,
python3-click,
python3-nose,
python3-setuptools,
python3-vcversioner
Standards-Version: 3.9.6
Homepage: https://forge.softwareheritage.org/diffusion/DMOD/
Package: python3-swh.model
Architecture: all
Depends: ${misc:Depends}, ${python3:Depends}
Breaks: python3-swh.loader.core (<< 0.0.16~),
python3-swh.loader.dir (<< 0.0.28~),
python3-swh.loader.svn (<< 0.0.28~)
Description: Software Heritage data model
Format: http://www.debian.org/doc/packaging-manuals/copyright-format/1.0/
Files: *
Copyright: 2015 The Software Heritage developers
License: GPL-3+
License: GPL-3+
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 3 of the License, or
(at your option) any later version.
.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
.
You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
.
On Debian systems, the complete text of the GNU General Public
License version 3 can be found in `/usr/share/common-licenses/GPL-3'.
#!/usr/bin/make -f
export PYBUILD_NAME=swh.model
export PYBUILD_TEST_ARGS=--with-doctest -sv -a !db,!fs
%:
dh $@ --with python3 --buildsystem=pybuild
override_dh_install:
dh_install
rm -v $(CURDIR)/debian/python3-*/usr/lib/python*/dist-packages/swh/__init__.py
3.0 (quilt)
include ../../swh-docs/Makefile.sphinx
include Makefile.sphinx
-include Makefile.local
sphinx/html: images
sphinx/clean: clean-images
assets: images
images:
make -C images/
......
Command-line interface
======================
.. click:: swh.model.cli:identify
:prog: swh identify
:nested: full
......@@ -74,8 +74,7 @@ synonyms.
**directories**
a list of named directory entries, each of which pointing to other artifacts,
usually file contents or sub-directories. Directory entries are also
associated to arbitrary metadata, which vary with technologies, but usually
includes permission bits, modification timestamps, etc.
associated to some metadata stored as permission bits.
**revisions** (AKA "commits")
software development within a specific project is essentially a time-indexed
......@@ -92,8 +91,8 @@ synonyms.
some revisions are more equals than others and get selected by developers as
denoting important project milestones known as "releases". Each release
points to the last commit in project history corresponding to the release and
might carry arbitrary metadata—e.g., release name and version, release
message, cryptographic signatures, etc.
carries metadata: release name and version, release message, cryptographic
signatures, etc.
Additionally, the following crawling-related information are stored as
......@@ -145,6 +144,11 @@ provenance information in the Software Heritage archive:
Software Heritage clock) the visit happened and the full snapshot of the
state of the software origin at the time.
.. note::
This model currently records visits as a single point in time. However, the
actual visit process is not instantaneous. Loaders can record successive
changes to the state of the visit, as their work progresses, as updates to
the visit object.
Data structure
--------------
......@@ -255,3 +259,39 @@ making emergent structures such as code reuse across different projects or
software origins, readily available. Further reinforcing the Software Heritage
use cases, this object could become a veritable "map of the stars" of our
entire software commons.
Extended data model
-------------------
In addition to the artifacts detailed above used to represent original software
artifacts, the Software Heritage archive stores information about these
artifacts.
**extid**
a relationship between an original identifier of an artifact, in its
native/upstream environment, and a `core SWHID <persistent-identifiers>`,
which is specific to Software Heritage. As such, it includes:
* the external identifier, stored as bytes whose format is opaque to the
data model
* a type (a simple name and a version), to identify the type of relationship
* the "target", which is a core SWHID
An extid may also include a "payload", which is arbitrary data about the
relationship. For example, an extid might link a directory to the
cryptographic hash of the tarball that originally contained it. In this
case, the payload could include data useful for reconstructing the
original tarball from the directory. The payload data is stored
separately. An extid refers to it by its ``sha1_git`` hash.
**raw extrinsic metadata**
an opaque bytestring, along with its format (a simple name), an identifier
of the object the metadata is about and in which context (similar to a
`qualified SWHID <persistent-identifiers>`), and provenance information
(the authority who provided it, the fetcher tool used to get it, and the
data it was discovered at).
It provides both a way to store information about an artifact contributed by
external entities, after the artifact was created, and an escape hatch to
store metadata that would not otherwise fit in the data model.