package.loader: Flush and clear regularly internal proxy storage states
To avoid the internal states of proxy storages to grow out of proportions. swh-storage!924 (closed) and this diff should be enough to replace !379 (closed) altogether.
Related to T2352
Test Plan
tox
Migrated from D3014 (view on Phabricator)
Merge request reports
Activity
Build is green
Patch application report for D3014 (id=10700)
Rebasing onto 0c35beca...
Current branch diff-target is up to date.
Changes applied before test
commit 3cef5b1ba32c4507cd5228486efb0662bb2bcd1d Author: Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Date: Sat Apr 11 12:10:20 2020 +0200 package.loader: Flush and clear regularly inner worker state To avoid the inner state of proxy storages to grow out of proportions. Related to [T2352](https://forge.softwareheritage.org/T2352 'view original for T2352 on Phabricator')
See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/19/ for more details.
Isn't it already configurable in the buffering proxy?
... ¯_(ツ)_/¯ ...
but yeah, it is already so it's redundant with the buffer proxy behavior.
... (more thinking) ...
Not the filtering proxy though (right now it can grow infinitely).
But it'd be better to flush at the same time we clean the filter inner state.
TIL that the filter has an internal state too.
It keeps a set of objects (
objects_seen: Dict[str, Set[bytes]] = {}
, object_type to 1 hash id) already seen through the loading (to filter prior to actual call to themissing
endpoint).Right now, it's never cleared. That diff fixes that when calling
self.storage.clear_buffers
.! In !380 (closed), @ardumont wrote: TIL that the filter has an internal state too.
It keeps a set of objects (
objects_seen: Dict[str, Set[bytes]] = {}
, object_type to 1 hash id) already seen through the loading (to filter prior to actual call to themissing
endpoint).Right now, it's never cleared. That diff fixes that when calling
self.storage.clear_buffers
.Yeah I saw that. What I'm saying is, we can remove
objects_seen
entirelyDoes objects_seen have a purpose, other than avoiding duplicates in the buffer proxy?
Its initial purpose was to avoid calling too often
self.storage.<object-type>_missing
with the same list of objects over and over. But that might be completely peanuts, no idea at what point it's useful or not.I'm tempted to cleanup the filter indeed. That's be one less state management worry.
;)
Hmm, good point. But the relevant pages are probably cached by pg or the server kernel; so that's only sparing the network requests
Yes, that's also a good point ;)
I'm tempted to cleanup the filter indeed. That's be one less state management worry.
Well, I've opened swh-storage!925 (closed) for that purpose ;)
Abandon in favor of swh-storage!925 (closed) (there is no longer state in the filter proxy). And the buffer proxy already flushes when threshold per object type is reached. So no longer needed.
mentioned in merge request swh-storage!925 (closed)