Skip to content
GitLab
Explore
Sign in
Register
Primary navigation
Search or go to…
Project
S
swh-docs
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Wiki
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Model registry
Operate
Environments
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
Platform
Development
swh-docs
Commits
75502b55
Commit
75502b55
authored
1 year ago
by
David Douard
Browse files
Options
Downloads
Patches
Plain Diff
devel: Fix a few reST issues in object-storage.rst
parent
1f073aa0
No related branches found
Branches containing commit
No related tags found
No related merge requests found
Pipeline
#5823
failed
1 year ago
Stage: external
Changes
1
Pipelines
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
docs/devel/architecture/object-storage.rst
+31
-30
31 additions, 30 deletions
docs/devel/architecture/object-storage.rst
with
31 additions
and
30 deletions
docs/devel/architecture/object-storage.rst
+
31
−
30
View file @
75502b55
...
...
@@ -6,55 +6,56 @@ Object Storage Overview
The Object Storage: Contents
----------------------------
All the history and context of the archive is represented by
All the history and context of the archive is represented by
a graph (Merkle DAG) with the following nodes types:
- releases,
- revisions (commits)
- directories
- directory entries (file names)
This graph is stored in a database, commonly called "Graph" or
"Storage".
- releases,
- revisions (commits)
- directories
- directory entries (file names)
This database is currently based on PostgreSQL, and is going
to be migrated to Cassandra, which is more efficient in terms of
This graph is stored in a database, commonly called "Graph" or
"Storage".
This database is currently based on PostgreSQL, and is going
to be migrated to Cassandra, which is more efficient in terms of
concurrent writing.
The source code itself (the content of the files) represents a huge
volume of data, and one can find exactly the same content in different
files. In order to avoid storing several times the same content,
contents are deduplicated: a single content is stored only once,
and all the files entries having this exact content will refer to the
The source code itself (the content of the files) represents a huge
volume of data, and one can find exactly the same content in different
files. In order to avoid storing several times the same content,
contents are deduplicated: a single content is stored only once,
and all the files entries having this exact content will refer to the
same content.
Ceph
----
These contents are stored in a customized file system, called
"Object Storage", each content being considered as an object.
Until now, the actual object storage is based on an open source
File System technology called ZFS.
These contents are stored in a customized file system, called
"Object Storage", each content being considered as an object.
Until now, the actual object storage is based on an open source
File System technology called ZFS.
The growth of the archive requires a more adapted technology,
and an few years ago, we chose Ceph, a distributed Storage
The growth of the archive requires a more adapted technology,
and an few years ago, we chose Ceph, a distributed Storage
technology created by RedHat.
A specificity of Software Heritage is that each content has a
small size (half of our contents are less than 3KB), which is
much smaller than the minimum space used by Ceph to store a
single file (16KB).
A specificity of Software Heritage is that each content has a
small size (half of our contents are less than 3KB), which is
much smaller than the minimum space used by Ceph to store a
single file (16KB).
Using Ceph directly would hence result in a massive waste of space.
Winery
------
So we needed to create a custom layer on top of Ceph to group
the data we store, using sharding techniques: a shard is a Ceph
object that contains many contents. In order to be able to retrieve
the single contents, we need to handle a mechanism that enables to
So we needed to create a custom layer on top of Ceph to group
the data we store, using sharding techniques: a shard is a Ceph
object that contains many contents. In order to be able to retrieve
the single contents, we need to handle a mechanism that enables to
know where the content is located in the shard.
This layer is called Winery, and was developed especially for
This layer is called Winery, and was developed especially for
Software Heritage by Easter Eggs.
.. thumbnail:: ../images/object-storage.svg
\ No newline at end of file
.. thumbnail:: ../images/object-storage.svg
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment