Skip to content

collections: Add index to ImmutableDict to speedup look up by key

Previously when looking up data by key in an ImmutableDict, the inner tuple storing keys and values was iterated until finding the requested key.

This is not really efficient when the ImmutableDict contains a lot of entries, typically for an origin snapshot containing a lot of branches.

So add an index to speedup look up by key operations and improve loader performances.

Before these changes, this is the timing we obtain when performing a new visit of the v8 repository (for the record I am using a homemade storage proxy that reads from production storage and writes in a memory storage):

14:04 $ time swh loader -C ~/.config/swh/loader.yml run git https://github.com/v8/v8
INFO:swh.loader.git.loader.GitLoader:Load origin 'https://github.com/v8/v8' with type 'git'
Enumerating objects: 106, done.
Counting objects: 100% (106/106), done.
Compressing objects: 100% (96/96), done.
Total 106 (delta 48), reused 56 (delta 8), pack-reused 0
INFO:swh.loader.git.loader:Listed 30888 refs for repo https://github.com/v8/v8
INFO:swh.loader.git.loader.GitLoader:Fetched 107 objects; 107 are new
{'status': 'eventful'} for origin 'https://github.com/v8/v8'

real    0m59,440s
user    0m33,682s
sys     0m0,847s

And this is the timing we obtain after applying these changes:

14:07 $ time swh loader -C ~/.config/swh/loader.yml run git https://github.com/v8/v8
INFO:swh.loader.git.loader.GitLoader:Load origin 'https://github.com/v8/v8' with type 'git'
Enumerating objects: 106, done.
Counting objects: 100% (106/106), done.
Compressing objects: 100% (96/96), done.
Total 106 (delta 48), reused 56 (delta 8), pack-reused 0
INFO:swh.loader.git.loader:Listed 30888 refs for repo https://github.com/v8/v8
INFO:swh.loader.git.loader.GitLoader:Fetched 107 objects; 107 are new
{'status': 'eventful'} for origin 'https://github.com/v8/v8'

real    0m28,338s
user    0m3,613s
sys     0m0,945s

So a x2 speedup !

Merge request reports