Skip to content
Snippets Groups Projects

Limit the number of entries in the cache

2 unresolved threads

The implementation test after each revision if there are more than 100 000 entries of any kind in the cache. If yes, it flush the content. It's quite naive but the memory seems to stay around 5Go max on big repositories like linux.

This can be still an issue with the current zeromq client because the origin are sorted by repositories a lot of snapshots of big repositories can be ingested in parallel. With the journal implementation, it should be more distributed on repositories of different sizes

Related to swh/infra/sysadm-environment#4313 (closed)


Migrated from D8040 (view on Phabricator)

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Author Maintainer

    update license headers

  • Build is green

    Patch application report for D8040 (id=28949)

    Rebasing onto 80434e3b...

    Current branch diff-target is up to date.
    Changes applied before test
    commit 6bf00a395eca96eaa04c07a2168389a8d6ab85e6
    Author: Vincent SELLIER <vincent.sellier@softwareheritage.org>
    Date:   Mon Jun 27 15:48:47 2022 +0200
    
        Limit the number of entries in the cache
        
        The implementation test after each revision if there are
        more than 100 000 entries of any kind in the cache. If yes,
        it flush the content.
        It's quite naive but the memory seems to stay around 5Go
        max on big repositories like linux.
        
        This can be still an issue with the current zeromq client because
        the origin are sorted by repositories a lot of snapshots of big
        repositories can be ingested in parallel.
        With the journal implementation, it should be more distributed on
        repositories of different sizes
        
        Related to swh/infra/sysadm-environment#4313

    See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/609/ for more details.

  • Build is green

    Patch application report for D8040 (id=28950)

    Rebasing onto 80434e3b...

    Current branch diff-target is up to date.
    Changes applied before test
    commit 571477be0498f2fbaf35fcca4b307447eeb430b5
    Author: Vincent SELLIER <vincent.sellier@softwareheritage.org>
    Date:   Mon Jun 27 15:48:47 2022 +0200
    
        Limit the number of entries in the cache
        
        The implementation test after each revision if there are
        more than 100 000 entries of any kind in the cache. If yes,
        it flush the content.
        It's quite naive but the memory seems to stay around 5Go
        max on big repositories like linux.
        
        This can be still an issue with the current zeromq client because
        the origin are sorted by repositories a lot of snapshots of big
        repositories can be ingested in parallel.
        With the journal implementation, it should be more distributed on
        repositories of different sizes
        
        Related to swh/infra/sysadm-environment#4313

    See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/610/ for more details.

  • vlorentz
    vlorentz @vlorentz started a thread on the diff
105 105 LOGGER.debug("Adding revision to origin")
106 106 provenance.revision_add_to_origin(origin, revision)
107 107
108 cache_flush_start = datetime.now()
109 if provenance.flush_if_necessary():
110 LOGGER.info(
111 "Intermediate cache flush in %s", (datetime.now() - cache_flush_start)
112 )
  • vlorentz
    vlorentz @vlorentz started a thread on the diff
  • 108 121 self.flush_origin_revision_layer()
    109 122 self.clear_caches()
    110 123
    124 def flush_if_necessary(self) -> bool:
    125 """Flush if the number of cached information reached a limit."""
    126 LOGGER.info("Cache stats: %s", self._get_cache_stats())
    127 if self._flush_limit_reached():
    128 self.flush()
    129 return True
  • lgtm, especially better if you attend to val's good suggestions ;)

    A couple of docstring suggestions inline.

  • Merge request was accepted

  • Antoine R. Dumont approved this merge request

    approved this merge request

  • Author Maintainer

    update according the reviews

    • simplify the cache management
    • fix the doc strings
  • Build has FAILED

    Patch application report for D8040 (id=28958)

    Rebasing onto 80434e3b...

    Current branch diff-target is up to date.
    Changes applied before test
    commit b4f9226cffe69a211dbe0204541bd49e89daa8df
    Author: Vincent SELLIER <vincent.sellier@softwareheritage.org>
    Date:   Mon Jun 27 15:48:47 2022 +0200
    
        Limit the number of entries in the cache
        
        The implementation test after each revision if there are
        more than 100 000 entries of any kind in the cache. If yes,
        it flush the content.
        It's quite naive but the memory seems to stay around 5Go
        max on big repositories like linux.
        
        This can be still an issue with the current zeromq client because
        the origin are sorted by repositories a lot of snapshots of big
        repositories can be ingested in parallel.
        With the journal implementation, it should be more distributed on
        repositories of different sizes
        
        Related to swh/infra/sysadm-environment#4313

    Link to build: https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/611/ See console output for more information: https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/611/console

  • Author Maintainer

    add missing parenthesis

  • Build has FAILED

    Patch application report for D8040 (id=28959)

    Rebasing onto 80434e3b...

    Current branch diff-target is up to date.
    Changes applied before test
    commit 35c92963798c96475165aa0e132f5936be66e6f5
    Author: Vincent SELLIER <vincent.sellier@softwareheritage.org>
    Date:   Mon Jun 27 15:48:47 2022 +0200
    
        Limit the number of entries in the cache
        
        The implementation test after each revision if there are
        more than 100 000 entries of any kind in the cache. If yes,
        it flush the content.
        It's quite naive but the memory seems to stay around 5Go
        max on big repositories like linux.
        
        This can be still an issue with the current zeromq client because
        the origin are sorted by repositories a lot of snapshots of big
        repositories can be ingested in parallel.
        With the journal implementation, it should be more distributed on
        repositories of different sizes
        
        Related to swh/infra/sysadm-environment#4313

    Link to build: https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/612/ See console output for more information: https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/612/console

  • Author Maintainer

    make mypy happy

  • Build is green

    Patch application report for D8040 (id=28962)

    Rebasing onto 80434e3b...

    Current branch diff-target is up to date.
    Changes applied before test
    commit f5f741366383cff7e7f173a79f656e9c6e159602
    Author: Vincent SELLIER <vincent.sellier@softwareheritage.org>
    Date:   Mon Jun 27 15:48:47 2022 +0200
    
        Limit the number of entries in the cache
        
        The implementation test after each revision if there are
        more than 100 000 entries of any kind in the cache. If yes,
        it flush the content.
        It's quite naive but the memory seems to stay around 5Go
        max on big repositories like linux.
        
        This can be still an issue with the current zeromq client because
        the origin are sorted by repositories a lot of snapshots of big
        repositories can be ingested in parallel.
        With the journal implementation, it should be more distributed on
        repositories of different sizes
        
        Related to swh/infra/sysadm-environment#4313

    See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/613/ for more details.

  • Author Maintainer

    Merge request was merged

  • Please register or sign in to reply
    Loading