time tox -e py3 -- --basetemp=/mnt/pytest -s --shard-size $((100 * 1024)) --object-max-size $((4 * 1024)) -k test_build_speed
number of objects = 45973118
baseline 163.73826217651367, write_duration 300.58917450904846, build_duration 26.01908826828003, total_duration 326.6082627773285
Conclusions:
Writing the content of objects the takes longer because there are 45 millions python function calls, but it is acceptable
Creating the perfect hash table and writing it to file is measured in seconds in the worst case scenario, i.e. there are only small objects therefore millions of them
There is an error on mmap which was not detected, therefore no information on why it failed. This was fixed.
# PYTHONMALLOC=malloc valgrind --tool=memcheck .tox/py3/bin/pytest --basetemp=/mnt/pytest -k test_build_speed --shard-size $((100 * 1024)) --object-max-size $((16 * 1024 * 1024)) swh/perfecthash/tests/test_hash.py ==17519== Memcheck, a memory error detector ==17519== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.==17519== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info ==17519== Command: .tox/py3/bin/pytest --basetemp=/mnt/pytest -k test_build_speed --shard-size 102400 --object-max-size 16777216 swh/perfecthash/tests/test_hash.py ==17519== ============================================== test session starts ===============================================platform linux -- Python 3.9.2, pytest-6.2.5, py-1.10.0, pluggy-1.0.0 rootdir: /root/swh-perfecthash, configfile: pytest.ini plugins: cov-3.0.0 collected 2 items / 1 deselected / 1 selected swh/perfecthash/tests/test_hash.py ==17519== Invalid write of size 8 ==17519== at 0x8DF92A1: memcpy (string_fortified.h:34)==17519== by 0x8DF92A1: shard_object_write (hash.c:104) ==17519== by 0x8DF86E5: _cffi_f_shard_object_write (_hash_cffi.c:898) ==17519== by 0x53F389: ??? (in /usr/bin/python3.9) ==17519== by 0x51D89A: _PyObject_MakeTpCall (in /usr/bin/python3.9) ==17519== by 0x5175B9: _PyEval_EvalFrameDefault (in /usr/bin/python3.9) ==17519== by 0x528B62: _PyFunction_Vectorcall (in /usr/bin/python3.9)==17519== by 0x53BCFA: ??? (in /usr/bin/python3.9) ==17519== by 0x511FB4: _PyEval_EvalFrameDefault (in /usr/bin/python3.9)==17519== by 0x5106EC: ??? (in /usr/bin/python3.9) ==17519== by 0x528D20: _PyFunction_Vectorcall (in /usr/bin/python3.9)==17519== by 0x53C360: PyObject_Call (in /usr/bin/python3.9) ==17519== by 0x513E8A: _PyEval_EvalFrameDefault (in /usr/bin/python3.9)==17519== Address 0x1ff is not stack'd, malloc'd or (recently) free'd ==17519==Fatal Python error: Segmentation fault
and with the debug activated:
============================================= test session starts ===============================================platform linux -- Python 3.9.2, pytest-6.2.5, py-1.10.0, pluggy-1.0.0 rootdir: /root/swh-perfecthash, configfile: pytest.ini plugins: cov-3.0.0collected 2 items / 1 deselected / 1 selected swh/perfecthash/tests/test_hash.py hnumber of objects = 12814, total size = 107373772352 shard_object_write: object_size = 7806490 n_object_size = 1882072536171151360 shard_object_write: object_offset = 512 ==21356== Invalid write of size 8 ==21356== at 0x8DF92E1: memcpy (string_fortified.h:34)==21356== by 0x8DF92E1: shard_object_write (hash.c:104) ==21356== by 0x8DF86F5: _cffi_f_shard_object_write (_hash_cffi.c:898) ==21356== by 0x53F389: ??? (in /usr/bin/python3.9) ==21356== by 0x51D89A: _PyObject_MakeTpCall (in /usr/bin/python3.9) ==21356== by 0x5175B9: _PyEval_EvalFrameDefault (in /usr/bin/python3.9) ==21356== by 0x528B62: _PyFunction_Vectorcall (in /usr/bin/python3.9)==21356== by 0x53BCFA: ??? (in /usr/bin/python3.9) ==21356== by 0x511FB4: _PyEval_EvalFrameDefault (in /usr/bin/python3.9)==21356== by 0x5106EC: ??? (in /usr/bin/python3.9) ==21356== by 0x528D20: _PyFunction_Vectorcall (in /usr/bin/python3.9)==21356== by 0x53C360: PyObject_Call (in /usr/bin/python3.9) ==21356== by 0x513E8A: _PyEval_EvalFrameDefault (in /usr/bin/python3.9)==21356== Address 0x1ff is not stack'd, malloc'd or (recently) free'd ==21356==Fatal Python error: Segmentation fault