Skip to content

model: optimize dictify()

  1. reorder conditionals to minimize the number of tests needed
  2. use hasattr() instead of the expensive isinstance()

Motivation: Ironically, this is the bottleneck of my checksumming script.

Benchmark:

In [1]: from swh.storage import get_storage

In [2]: from swh.model.identifiers import CoreSWHID, _BaseSWHID

In [3]: s = get_storage('remote', url='http://moma.internal.softwareheritage.org:5002/')

In [4]: rev = s.revision_get([bytes.fromhex("747675816d815e86b7482b5a0acb9110eeeec590")])[0]

Before this commit:

In [18]: %timeit rev.to_dict()
10000 loops, best of 5: 70.4 µs per loop

In [19]: %timeit rev.to_dict()
10000 loops, best of 5: 69.3 µs per loop

After this commit:

In [5]: %timeit rev.to_dict()
10000 loops, best of 5: 48.4 µs per loop

In [6]: %timeit rev.to_dict()
10000 loops, best of 5: 45.7 µs per loop

In [7]: %timeit rev.to_dict()
10000 loops, best of 5: 47.5 µs per loop

Unfortunately there isn't much more we can do, 90% of the time is spent constructing a dict (even when replacing the dictcomp {k: dictify(v) for k, v in value.items()} with map() + dict()).


Migrated from D6319 (view on Phabricator)

Merge request reports