Use a Merkle discovery algorithm with archives (798f749e) · Commits · Platform / Development / swh-loader-core

Commit 798f749e authored 2 years ago by Raphaël Gomès

Use a Merkle discovery algorithm with archives

"Discovery" is the term used to find out the differences between two
Merkle graphs. Using such an algorithm is useful in that it drastically
reduces the amount of data that needs to be transferred.

This commit introduces an efficient but simple algorithm that is a good
starting point for improved performance: random sampling of directories,
the details of which are explained in the docstrings.

Mercurial uses a more sophisticated algorithm for its discovery, but it
is quite a bit more involved and would introduce too much complexity at
once. Also, the constraints for speed that Mercurial has (in the order
of milliseconds) don't apply as obviously to this context without
further investigation.

Benchmarks
==========

Setup
-----
- With a local postgresql storage (so no network overhead), a local
  tmpfs obstorage on a fast NVME SSD, all of which should make this
  improvement look less good than it will be in production
- With a tarball of the linux kernel at commit
  d96d875ef5dd372f533059a44f98e92de9cf0d42 already loaded
- Loading a tarball of 20 commits earlier
  (bf3f401db6cbe010095fe3d1e233a5fde54e8b78)
- Only taking into account the loading (not the downloading of the
  tarball, or its decompression)

Result
------

before: ~30s
after: ~17s

Reproduced 4 times.

parent 26fe954b

No related branches found

No related tags found

Hide whitespace changes

Inline Side-by-side

Showing with 337 additions and 0 deletions

Phabricator Migration user @phabricator-migration
mentioned in merge request !313 (closed)
· 2 years ago

mentioned in merge request !313 (closed)

mentioned in merge request !313

Toggle commit list
Phabricator Migration user @phabricator-migration
mentioned in merge request !321 (closed)
· 2 years ago

mentioned in merge request !321 (closed)

mentioned in merge request !321

Toggle commit list
Phabricator Migration user @phabricator-migration
mentioned in merge request !450 (closed)
· 2 years ago

mentioned in merge request !450 (closed)

mentioned in merge request !450

Toggle commit list
Phabricator Migration user @phabricator-migration
mentioned in merge request !322 (closed)
· 2 years ago

mentioned in merge request !322 (closed)

mentioned in merge request !322

Toggle commit list
Phabricator Migration user @phabricator-migration
mentioned in merge request !366 (closed)
· 2 years ago

mentioned in merge request !366 (closed)

mentioned in merge request !366

Toggle commit list

Please register or to comment