Persistent readonly perfect hash table: benchmarks

assigned to @dachary

marked this issue as related to #3104 (closed)

added RedHat collaboration priority:Normal labels

changed the description

The implementation of the benchmarks is prepared at:

https://git.easter-eggs.org/biceps/swh-perfecthash/-/tree/wip-benchmark

Created a project in https://portal.fed4fire.eu/ with the intention of using grid5000. It is pending approval from an administrator (see swh/devel/swh-objstorage#3670 (closed)).

Running benchmarks directly on grid5000

oarsub -I -l "{cluster='dahu'}/host=1,walltime=1" -t deploy
kadeploy3 -f $OAR_NODE_FILE -e debian11-x64-base -k
ssh root@$(tail -1 $OAR_NODE_FILE)
mkfs.ext4 /dev/sdb1
mount /dev/sdb1 /mnt
apt-get install -y python3-venv libcmph-dev gcc git
git clone https://git.easter-eggs.org/biceps/swh-perfecthash/
python3 -m venv bench
source bench/bin/activate
pip install -r requirements.txt -r requirements-test.txt
cd swh-perfecthash
tox -e py3
time tox -e py3 -- --basetemp=/mnt/pytest -s --shard-size $((100 * 1024)) --object-max-size $((100 * 1024)) -k test_build_speed
rm -fr /mnt/pytest

time tox -e py3 -- --basetemp=/mnt/pytest -s --shard-size $((100 * 1024)) --object-max-size $((4 * 1024)) -k test_build_speed number of objects = 45973118
baseline 163.73826217651367, write_duration 300.58917450904846, build_duration 26.01908826828003, total_duration 326.6082627773285

Conclusions:

Writing the content of objects the takes longer because there are 45 millions python function calls, but it is acceptable
Creating the perfect hash table and writing it to file is measured in seconds in the worst case scenario, i.e. there are only small objects therefore millions of them

There is an error on mmap which was not detected, therefore no information on why it failed. This was fixed.

# PYTHONMALLOC=malloc valgrind --tool=memcheck .tox/py3/bin/pytest --basetem
p=/mnt/pytest -k test_build_speed --shard-size $((100 * 1024)) --object-max-size $((16 * 1024 * 1024)) swh/perfect
hash/tests/test_hash.py                                                                                           
==17519== Memcheck, a memory error detector                                                                       ==17519== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==17519== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info                                     
==17519== Command: .tox/py3/bin/pytest --basetemp=/mnt/pytest -k test_build_speed --shard-size 102400 --object-max
-size 16777216 swh/perfecthash/tests/test_hash.py                                                                
==17519==                                                                                                         ============================================== test session starts ===============================================
platform linux -- Python 3.9.2, pytest-6.2.5, py-1.10.0, pluggy-1.0.0                                            
rootdir: /root/swh-perfecthash, configfile: pytest.ini                                                           
plugins: cov-3.0.0                                                                                               
collected 2 items / 1 deselected / 1 selected                                                                    
                                                                                                                 
swh/perfecthash/tests/test_hash.py ==17519== Invalid write of size 8                                              ==17519==    at 0x8DF92A1: memcpy (string_fortified.h:34)
==17519==    by 0x8DF92A1: shard_object_write (hash.c:104)                                                       
==17519==    by 0x8DF86E5: _cffi_f_shard_object_write (_hash_cffi.c:898)                                         
==17519==    by 0x53F389: ??? (in /usr/bin/python3.9)                                                            
==17519==    by 0x51D89A: _PyObject_MakeTpCall (in /usr/bin/python3.9)                                           
==17519==    by 0x5175B9: _PyEval_EvalFrameDefault (in /usr/bin/python3.9)                                        ==17519==    by 0x528B62: _PyFunction_Vectorcall (in /usr/bin/python3.9)
==17519==    by 0x53BCFA: ??? (in /usr/bin/python3.9)                                                            
==17519==    by 0x511FB4: _PyEval_EvalFrameDefault (in /usr/bin/python3.9)
==17519==    by 0x5106EC: ??? (in /usr/bin/python3.9)                                                            
==17519==    by 0x528D20: _PyFunction_Vectorcall (in /usr/bin/python3.9)
==17519==    by 0x53C360: PyObject_Call (in /usr/bin/python3.9)                                                  
==17519==    by 0x513E8A: _PyEval_EvalFrameDefault (in /usr/bin/python3.9)
==17519==  Address 0x1ff is not stack'd, malloc'd or (recently) free'd                                           
==17519==
Fatal Python error: Segmentation fault

and with the debug activated:

============================================= test session starts ===============================================
platform linux -- Python 3.9.2, pytest-6.2.5, py-1.10.0, pluggy-1.0.0                                            
rootdir: /root/swh-perfecthash, configfile: pytest.ini                                                            plugins: cov-3.0.0
collected 2 items / 1 deselected / 1 selected                                                                    
                                                                                                                 
swh/perfecthash/tests/test_hash.py hnumber of objects = 12814, total size = 107373772352                         
shard_object_write: object_size = 7806490 n_object_size = 1882072536171151360                                    
shard_object_write: object_offset = 512                                                                          
==21356== Invalid write of size 8                                                                                 ==21356==    at 0x8DF92E1: memcpy (string_fortified.h:34)
==21356==    by 0x8DF92E1: shard_object_write (hash.c:104)                                                       
==21356==    by 0x8DF86F5: _cffi_f_shard_object_write (_hash_cffi.c:898)                                         
==21356==    by 0x53F389: ??? (in /usr/bin/python3.9)                                                            
==21356==    by 0x51D89A: _PyObject_MakeTpCall (in /usr/bin/python3.9)                                           
==21356==    by 0x5175B9: _PyEval_EvalFrameDefault (in /usr/bin/python3.9)                                        ==21356==    by 0x528B62: _PyFunction_Vectorcall (in /usr/bin/python3.9)
==21356==    by 0x53BCFA: ??? (in /usr/bin/python3.9)                                                            
==21356==    by 0x511FB4: _PyEval_EvalFrameDefault (in /usr/bin/python3.9)
==21356==    by 0x5106EC: ??? (in /usr/bin/python3.9)                                                            
==21356==    by 0x528D20: _PyFunction_Vectorcall (in /usr/bin/python3.9)
==21356==    by 0x53C360: PyObject_Call (in /usr/bin/python3.9)                                                  
==21356==    by 0x513E8A: _PyEval_EvalFrameDefault (in /usr/bin/python3.9)
==21356==  Address 0x1ff is not stack'd, malloc'd or (recently) free'd                                           
==21356==
Fatal Python error: Segmentation fault

$ time tox -e py3 -- --basetemp=/mnt/pytest -s --shard-size $((100 * 1024)) --object-max-size $((4 * 1024)) -k test_build_speed
number of objects = 45973694, total size = 105903024192                                                          
baseline 165.74853587150574, write_duration 495.07564210891724, build_duration 24.210500478744507, total_duration  519.2861425876617


$ time tox -e py3 -- --basetemp=/mnt/pytest -s --shard-size $((100 * 1024)) --object-max-size $((100 * 1024 * 1024)) -k test_build_speed
number of objects = 2057, total size = 107374116576                                                               
baseline 165.85373330116272, write_duration 327.1912658214569, build_duration 0.0062100887298583984, total_duration 327.19747591018677

removed state:wip label

closed

Persistent readonly perfect hash table: benchmarks

Child items ...

Activity