Skip to content
Snippets Groups Projects
Commit 75502b55 authored by David Douard's avatar David Douard
Browse files

devel: Fix a few reST issues in object-storage.rst

parent 1f073aa0
No related branches found
No related tags found
No related merge requests found
Pipeline #5823 failed
......@@ -6,55 +6,56 @@ Object Storage Overview
The Object Storage: Contents
----------------------------
All the history and context of the archive is represented by
All the history and context of the archive is represented by
a graph (Merkle DAG) with the following nodes types:
- releases,
- revisions (commits)
- directories
- directory entries (file names)
This graph is stored in a database, commonly called "Graph" or
"Storage".
- releases,
- revisions (commits)
- directories
- directory entries (file names)
This database is currently based on PostgreSQL, and is going
to be migrated to Cassandra, which is more efficient in terms of
This graph is stored in a database, commonly called "Graph" or
"Storage".
This database is currently based on PostgreSQL, and is going
to be migrated to Cassandra, which is more efficient in terms of
concurrent writing.
The source code itself (the content of the files) represents a huge
volume of data, and one can find exactly the same content in different
files. In order to avoid storing several times the same content,
contents are deduplicated: a single content is stored only once,
and all the files entries having this exact content will refer to the
The source code itself (the content of the files) represents a huge
volume of data, and one can find exactly the same content in different
files. In order to avoid storing several times the same content,
contents are deduplicated: a single content is stored only once,
and all the files entries having this exact content will refer to the
same content.
Ceph
----
These contents are stored in a customized file system, called
"Object Storage", each content being considered as an object.
Until now, the actual object storage is based on an open source
File System technology called ZFS.
These contents are stored in a customized file system, called
"Object Storage", each content being considered as an object.
Until now, the actual object storage is based on an open source
File System technology called ZFS.
The growth of the archive requires a more adapted technology,
and an few years ago, we chose Ceph, a distributed Storage
The growth of the archive requires a more adapted technology,
and an few years ago, we chose Ceph, a distributed Storage
technology created by RedHat.
A specificity of Software Heritage is that each content has a
small size (half of our contents are less than 3KB), which is
much smaller than the minimum space used by Ceph to store a
single file (16KB).
A specificity of Software Heritage is that each content has a
small size (half of our contents are less than 3KB), which is
much smaller than the minimum space used by Ceph to store a
single file (16KB).
Using Ceph directly would hence result in a massive waste of space.
Winery
------
So we needed to create a custom layer on top of Ceph to group
the data we store, using sharding techniques: a shard is a Ceph
object that contains many contents. In order to be able to retrieve
the single contents, we need to handle a mechanism that enables to
So we needed to create a custom layer on top of Ceph to group
the data we store, using sharding techniques: a shard is a Ceph
object that contains many contents. In order to be able to retrieve
the single contents, we need to handle a mechanism that enables to
know where the content is located in the shard.
This layer is called Winery, and was developed especially for
This layer is called Winery, and was developed especially for
Software Heritage by Easter Eggs.
.. thumbnail:: ../images/object-storage.svg
\ No newline at end of file
.. thumbnail:: ../images/object-storage.svg
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment