Skip to content
GitLab
Explore
Sign in
Register
Primary navigation
Search or go to…
Project
S
swh-loader-core
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Wiki
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Model registry
Operate
Environments
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
Platform
Development
swh-loader-core
Commits
f3c24776
Commit
f3c24776
authored
4 years ago
by
vlorentz
Browse files
Options
Downloads
Patches
Plain Diff
package-loader-tutorial: Add 'Loading metadata' and 'Final words' sections.
parent
320938cf
No related branches found
Branches containing commit
Tags
v0.23.0
Tags containing commit
1 merge request
!206
package-loader-tutorial: Add 'Loading metadata' and 'Final words' sections.
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
docs/package-loader-tutorial.rst
+60
-2
60 additions, 2 deletions
docs/package-loader-tutorial.rst
swh/loader/package/loader.py
+3
-1
3 additions, 1 deletion
swh/loader/package/loader.py
with
63 additions
and
3 deletions
docs/package-loader-tutorial.rst
+
60
−
2
View file @
f3c24776
...
...
@@ -357,7 +357,7 @@ Making your loader incremental
In the previous sections, you wrote a fully functional loader for a new type of
package repository. This is great! Please tell us about it, and
:ref:`submit it for review <patch-submission>` so we can give you some feedback.
:ref:`submit it for review <patch-submission>` so we can give you some feedback
early
.
Now, we will see a key optimization for any package loader: skipping packages
it already downloaded, using :term:`extids <extid>`.
...
...
@@ -589,4 +589,62 @@ to use as an example::
Loading metadata
----------------
TODO
Finally, an optional step: collecting and loading :term:`extrinsic metadata`.
This is metadata that your loader may collect while loading an origin.
For example, the PyPI loader collects some parts of the API response
(eg. https://pypi.org/pypi/requests/json)
They are stored as raw bytestring, along with a format (an ASCII string) and
a date of discovery (usually the time your loader ran).
This is done by adding them to the ``directory_extrinsic_metadata`` attribute of
your ``NewPackageInfo`` object when creating it in ``get_package_info``
as :py:cls:`swh.loader.package.loader.RawExtrinsicMetadataCore` objects::
NewPackageInfo(
...,
directory_extrinsic_metadata=[
RawExtrinsicMetadataCore(
format="new-format",
metadata=b"foo bar baz",
discovery_date=datetime.datetime(...),
)
]
)
``format`` should be a human-readable ASCII string that unambiguously describes
the format. Readers of the metadata object will have a built-in list of formats
they understand, and will check if your metadata object is among them.
You should use one of the :ref:`known metadata formats <extrinsic-metadata-format>`
if possible, or add yours to this list.
``metadata`` is the metadata object itself. When possible, it should be copied verbatim
from the source object you got, and should not be created by the loader.
If this is not possible, for example because it is extracted from a larger
JSON or XML document, make sure you do as little modifications as possible to reduce
the risks of corruption.
``discovery_date`` is optional, and defaults to the time your loader started working.
In theory, you can write extrinsic metadata on any kind of objects, eg. by implementing
:py:meth:`swh.loader.package.loader.PackageLoader.get_extrinsic_origin_metadata`,
:py:meth:`swh.loader.package.loader.PackageLoader.get_extrinsic_snapshot_metadata`;
but this is rarely relevant in practice.
Be sure to check if loader can find any potentially interesting metadata, though!
Final words
-----------
Congratulations, you made it to the end.
If you have not already, please `contact us`_ to tell us about your new loader,
and :ref:`submit your loader for review <patch-submission>` on our forge
so we can merge it and run it along our other loaders to archive more repositories.
And if you have any change in mind to improve this tutorial for future readers,
please submit them too.
Thank you for your contributions!
.. _contact us: https://www.softwareheritage.org/community/developers/
This diff is collapsed.
Click to expand it.
swh/loader/package/loader.py
+
3
−
1
View file @
f3c24776
...
...
@@ -121,8 +121,10 @@ class BasePackageInfo:
directory_extrinsic_metadata
=
attr
.
ib
(
type
=
List
[
RawExtrinsicMetadataCore
],
default
=
[],
kw_only
=
True
,
)
"""
:term:`extrinsic metadata` collected by the loader, that will be attached to the
loaded directory and added to the Metadata storage.
"""
# TODO: add support for metadata for
directorie
s and contents
# TODO: add support for metadata for
revision
s and contents
def
extid
(
self
)
->
Optional
[
PartialExtID
]:
"""
Returns a unique intrinsic identifier of this package info,
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment