Skip to content
GitLab
Explore
Sign in
Register
Primary navigation
Search or go to…
Project
S
swh-graph
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Wiki
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Model registry
Operate
Environments
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
Platform
Development
swh-graph
Compare revisions
58d088a4f7e688786949e686bda1452b39c9affb to 68b3578b3dcd9f666530aba883ef877d731307da
Compare revisions
Changes are shown as if the
source
revision was being merged into the
target
revision.
Learn more about comparing revisions.
Source
swh/devel/swh-graph
Select target project
No results found
68b3578b3dcd9f666530aba883ef877d731307da
Select Git revision
Swap
Target
swh/devel/swh-graph
Select target project
vlorentz/swh-graph
RomainLefeuvre/swh-graph
lunar/swh-graph
anlambert/swh-graph
douardda/swh-graph
vsellier/swh-graph
zom/swh-graph
zack/swh-graph
swh/devel/swh-graph
olasd/swh-graph
marmoute/swh-graph
Zimmi48/swh-graph
srapaport/swh-graph
varasterix/swh-graph
martin/swh-graph
15 results
58d088a4f7e688786949e686bda1452b39c9affb
Select Git revision
Show changes
Only incoming changes from source
Include changes to target since source was created
Compare
Commits on Source (3)
Remove section on Direct Loading
· 261a2ea4
vlorentz
authored
6 months ago
This wasn't the right syntax for commenting out a block of text in ReST
261a2ea4
docs: Add anchor to each compression step
· 89e3c2a1
vlorentz
authored
6 months ago
89e3c2a1
Update docs/compression.rst to the post-Java steps and file formats
· 68b3578b
vlorentz
authored
6 months ago
68b3578b
Expand all
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
docs/compression.rst
+211
-144
211 additions, 144 deletions
docs/compression.rst
docs/memory.rst
+0
-42
0 additions, 42 deletions
docs/memory.rst
with
211 additions
and
186 deletions
docs/compression.rst
View file @
68b3578b
This diff is collapsed.
Click to expand it.
docs/memory.rst
View file @
68b3578b
...
...
@@ -23,48 +23,6 @@ using this environment variable::
TMPDIR=/srv/softwareheritage/ssd/tmp
..
Direct Loading is currently not available in Rust, only memory-mapping.
Memory mapping vs Direct loading
--------------------------------
The main dial you can use to manage your memory usage is to chose between
memory-mapping and direct-loading the graph data. The different loading modes
available when loading the graph are documented in :ref:`swh-graph-java-api`.
Loading in mapped mode will not load any extra data in RAM, but will instead
use the ``mmap(1)`` syscall to put the graph file located on disk in the
virtual address space. The Linux kernel will then be free to arbitrarily cache
the file, either partially or in its entirety, depending on the available
memory space.
In our experiments, memory-mapping a small graph from a SSD only incurs a
relatively small slowdown (about 15-20%). However, when the graph is too big to
fit in RAM, the kernel has to constantly invalidate pages to cache newly
accessed sections, which incurs a very large performance penalty. A full
traversal of a large graph that usually takes about 20 hours when loaded in
main memory could take more than a year when mapped from a hard drive!
When deciding what to direct-load and what to memory-map, here are a few rules
of thumb:
- If you don't need random access to the graph edges, you can consider using
the "offline" loading mode. The offsets won't be loaded which will save
dozens of gigabytes of RAM.
- If you only need to query some specific nodes or run trivial traversals,
memory-mapping the graph from a HDD should be a reasonable solution that
doesn't take an inordinate amount of time. It might be bad for your disks,
though.
- If you are constrained in available RAM, memory-mapping the graph from an SSD
offers reasonable performance for reasonably complex algorithms.
- If you have a heavy workload (i.e. running a full traversal of the entire
graph) and you can afford the RAM, direct loading will be orders of magnitude
faster than all the above options.
Sharing mapped data across processes
...
...
This diff is collapsed.
Click to expand it.