Limit the number of entries in the cache
The implementation test after each revision if there are more than 100 000 entries of any kind in the cache. If yes, it flush the content. It's quite naive but the memory seems to stay around 5Go max on big repositories like linux.
This can be still an issue with the current zeromq client because the origin are sorted by repositories a lot of snapshots of big repositories can be ingested in parallel. With the journal implementation, it should be more distributed on repositories of different sizes
Related to swh/infra/sysadm-environment#4313 (closed)
Migrated from D8040 (view on Phabricator)
Merge request reports
Activity
Build is green
Patch application report for D8040 (id=28949)
Rebasing onto 80434e3b...
Current branch diff-target is up to date.
Changes applied before test
commit 6bf00a395eca96eaa04c07a2168389a8d6ab85e6 Author: Vincent SELLIER <vincent.sellier@softwareheritage.org> Date: Mon Jun 27 15:48:47 2022 +0200 Limit the number of entries in the cache The implementation test after each revision if there are more than 100 000 entries of any kind in the cache. If yes, it flush the content. It's quite naive but the memory seems to stay around 5Go max on big repositories like linux. This can be still an issue with the current zeromq client because the origin are sorted by repositories a lot of snapshots of big repositories can be ingested in parallel. With the journal implementation, it should be more distributed on repositories of different sizes Related to swh/infra/sysadm-environment#4313
See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/609/ for more details.
Build is green
Patch application report for D8040 (id=28950)
Rebasing onto 80434e3b...
Current branch diff-target is up to date.
Changes applied before test
commit 571477be0498f2fbaf35fcca4b307447eeb430b5 Author: Vincent SELLIER <vincent.sellier@softwareheritage.org> Date: Mon Jun 27 15:48:47 2022 +0200 Limit the number of entries in the cache The implementation test after each revision if there are more than 100 000 entries of any kind in the cache. If yes, it flush the content. It's quite naive but the memory seems to stay around 5Go max on big repositories like linux. This can be still an issue with the current zeromq client because the origin are sorted by repositories a lot of snapshots of big repositories can be ingested in parallel. With the journal implementation, it should be more distributed on repositories of different sizes Related to swh/infra/sysadm-environment#4313
See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/610/ for more details.
105 105 LOGGER.debug("Adding revision to origin") 106 106 provenance.revision_add_to_origin(origin, revision) 107 107 108 cache_flush_start = datetime.now() 109 if provenance.flush_if_necessary(): 110 LOGGER.info( 111 "Intermediate cache flush in %s", (datetime.now() - cache_flush_start) 112 ) 108 121 self.flush_origin_revision_layer() 109 122 self.clear_caches() 110 123 124 def flush_if_necessary(self) -> bool: 125 """Flush if the number of cached information reached a limit.""" 126 LOGGER.info("Cache stats: %s", self._get_cache_stats()) 127 if self._flush_limit_reached(): 128 self.flush() 129 return True Build has FAILED
Patch application report for D8040 (id=28958)
Rebasing onto 80434e3b...
Current branch diff-target is up to date.
Changes applied before test
commit b4f9226cffe69a211dbe0204541bd49e89daa8df Author: Vincent SELLIER <vincent.sellier@softwareheritage.org> Date: Mon Jun 27 15:48:47 2022 +0200 Limit the number of entries in the cache The implementation test after each revision if there are more than 100 000 entries of any kind in the cache. If yes, it flush the content. It's quite naive but the memory seems to stay around 5Go max on big repositories like linux. This can be still an issue with the current zeromq client because the origin are sorted by repositories a lot of snapshots of big repositories can be ingested in parallel. With the journal implementation, it should be more distributed on repositories of different sizes Related to swh/infra/sysadm-environment#4313
Link to build: https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/611/ See console output for more information: https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/611/console
Build has FAILED
Patch application report for D8040 (id=28959)
Rebasing onto 80434e3b...
Current branch diff-target is up to date.
Changes applied before test
commit 35c92963798c96475165aa0e132f5936be66e6f5 Author: Vincent SELLIER <vincent.sellier@softwareheritage.org> Date: Mon Jun 27 15:48:47 2022 +0200 Limit the number of entries in the cache The implementation test after each revision if there are more than 100 000 entries of any kind in the cache. If yes, it flush the content. It's quite naive but the memory seems to stay around 5Go max on big repositories like linux. This can be still an issue with the current zeromq client because the origin are sorted by repositories a lot of snapshots of big repositories can be ingested in parallel. With the journal implementation, it should be more distributed on repositories of different sizes Related to swh/infra/sysadm-environment#4313
Link to build: https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/612/ See console output for more information: https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/612/console
Some references in the commit message have been migrated:
- T4313 is now swh/infra/sysadm-environment#4313 (closed)
Build is green
Patch application report for D8040 (id=28962)
Rebasing onto 80434e3b...
Current branch diff-target is up to date.
Changes applied before test
commit f5f741366383cff7e7f173a79f656e9c6e159602 Author: Vincent SELLIER <vincent.sellier@softwareheritage.org> Date: Mon Jun 27 15:48:47 2022 +0200 Limit the number of entries in the cache The implementation test after each revision if there are more than 100 000 entries of any kind in the cache. If yes, it flush the content. It's quite naive but the memory seems to stay around 5Go max on big repositories like linux. This can be still an issue with the current zeromq client because the origin are sorted by repositories a lot of snapshots of big repositories can be ingested in parallel. With the journal implementation, it should be more distributed on repositories of different sizes Related to swh/infra/sysadm-environment#4313
See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/613/ for more details.