Skip to content
Snippets Groups Projects

Add patching proxy

Merged vlorentz requested to merge object-patching-proxy into master

This is an alternative to modifying the person table, as that technique only worked on the PostgreSQL backend and not with Cassandra, as it does not have a person table (authors and committers are inlined in the revision and release tables).

Co-written with @douardda

Edited by vlorentz

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
    • Looks like the proxy's __getattr__ is completely unused.

      I'm not fully convinced by the tradeoffs for introducing a whole new proxy; I see a few pros and cons:

      pros:

      • (I think it's a big one) cleaner separation of concerns

      cons:

      • if we fully separate the two result-altering proxies, multiplication of the result-altering databases
      • stacked overheads for method accesses when chaining the proxies (probably mitigated by the setattr workaround)
      • Double the number of tests (at least for the "no altering" codepaths)
      • I don't see places where we would want to use one without the other

      With !1113 (merged) maybe this should all end up living in or alongside swh.alter as "the swh.alter storage proxy"?

      Edited by Nicolas Dandrimont
    • multiplication of the result-altering databases

      I'd say we want to be able to use the same DB for all these small and specific use cases, using proper configuration and "namespace" support in the database (using tablespace?)

    • Author Maintainer

      Looks like the proxy's __getattr__ is completely unused.

      What do you mean? It is used for all methods not acting on revisions or releases.

      multiplication of the result-altering databases

      I assumed we would end up with both proxies in the same package, so they can safely use the same database.

      With !1113 (merged) maybe this should all end up living in or alongside swh.alter as "the swh.alter storage proxy"?

      👍 that would solve the DB question above.

      stacked overheads for method accesses when chaining the proxies (probably mitigated by the setattr workaround)

      Even assuming the backend requests take 1ms (I can't find the data on Grafana, but IIRC most requests are in the 100ms range), the overhead is less than 1% once setattr is done:

      In [1]: import time
      
      In [2]: class Backend:
         ...:     def f(self, arg1, arg2):
         ...:         time.sleep(0.001)
         ...: 
      
      In [3]: class Proxy:
         ...:     def f(self, arg1, arg2):
         ...:         self.storage.f(arg1, arg2)
         ...: 
      
      In [4]: backend = Backend()
      
      In [5]: proxy = Proxy()
      
      In [6]: proxy.storage = backend
      
      In [7]: %timeit backend.f(123, 456)
      1.16 ms ± 2.07 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
      
      In [8]: %timeit proxy.f(123, 456)
      1.16 ms ± 4.85 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
      
      In [9]: %timeit backend.f(123, 456)
      1.16 ms ± 4.74 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
      
      In [10]: %timeit proxy.f(123, 456)
      1.16 ms ± 3.58 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

      Double the number of tests (at least for the "no altering" codepaths)

      Yeah that's a bit of an issue because it adds 10s; but I think it will be acceptable once outside swh-storage and its super-long test suite.

      I don't see places where we would want to use one without the other

      I don't think that's a big deal compared to the separation of concerns. We could make the same argument about the buffering and filtering proxies.

      Edited by vlorentz
    • Looks like the proxy's getattr is completely unused.

      What do you mean? It is used for all methods not acting on revisions or releases.

      Sorry, I meant that _get_method always returns getattr(self.storage, method_name)

      Edited by Nicolas Dandrimont
    • multiplication of the result-altering databases

      I assumed we would end up with both proxies in the same package, so they can safely use the same database.

      With !1113 (merged) maybe this should all end up living in or alongside swh.alter as "the swh.alter storage proxy"?

      👍 that would solve the DB question above.

      Meh; that's still two connection pools to manage, double the number of connections to establish to the same database, duplicate configs, etc. As I said, I'm not convinced the separation of concerns trumps all of this overhead.

    • Author Maintainer

      Sorry, I meant that _get_method always returns getattr(self.storage, method_name)

      or raises NotImplementedError. It's meant as an allow-list so we don't accidentally forget to implement new methods on revisions and releases.

      Meh; that's still two connection pools to manage, double the number of connections to establish to the same database, duplicate configs, etc. As I said, I'm not convinced the separation of concerns trumps all of this overhead.

      Hmm... I don't know

    • (__getattr__ could just raise NotImplementedError if revision or release is in method_name)

    • Author Maintainer

      Done in efb60ed8.

    • Please register or sign in to reply
  • Jenkins job DSTO/gitlab-builds #532 failed .
    See Console Output and Coverage Report for more details.

  • mentioned in issue #4707 (closed)

  • Nicolas Dandrimont mentioned in merge request !1118 (closed)

    mentioned in merge request !1118 (closed)

  • vlorentz added 1 commit

    added 1 commit

    Compare with previous version

  • Jenkins job DSTO/gitlab-builds #543 succeeded .
    See Console Output and Coverage Report for more details.

  • vlorentz added 1 commit

    added 1 commit

    • efb60ed8 - Make allowlist in get_method more compact

    Compare with previous version

  • Jenkins job DSTO/gitlab-builds #544 succeeded .
    See Console Output and Coverage Report for more details.

  • vlorentz added 27 commits

    added 27 commits

    Compare with previous version

  • vlorentz changed the description

    changed the description

  • Jenkins job DSTO/gitlab-builds #550 succeeded .
    See Console Output and Coverage Report for more details.

  • vlorentz added 1 commit

    added 1 commit

    • 2da8b456 - Merge patching proxy into masking proxy

    Compare with previous version

  • vlorentz changed title from Introduce object patching proxy to Make masking proxy apply display names

    changed title from Introduce object patching proxy to Make masking proxy apply display names

    • Author Maintainer

      As requested by @olasd, I merged the patching proxy this MR introduced into the existing masking proxy.

      I kept the MaskingProxy/MaskingAdmin/MaskingQuery terminology, but we may want to bikeshed these.

    • Thanks.

      As mentioned on IRC (and as expected, I guess ;-)) the separated code was cleaner, so it probably makes sense to retain it.

      A couple alternatives were mentioned:

      • making one proxy inherit the other and have the shorter one use super()
      • sharing the underlying masking connection pool by adding a bool argument to __init__ or special-casing masking_db="shared" (and using self._masking_pool = self.storage._masking_pool)

      Either option would keep the separation of concerns and allow DRYing the database/connection management, which makes me happier :-)

      (the pool sharing is a bit of a hack but it probably the option that better keeps the concerns separate)

      Edited by Nicolas Dandrimont
    • Please register or sign in to reply
  • Jenkins job DSTO/gitlab-builds #556 succeeded .
    See Console Output and Coverage Report for more details.

  • Nicolas Dandrimont
  • Nicolas Dandrimont
  • vlorentz added 3 commits

    added 3 commits

    • 3b9e4158 - Revert "Merge patching proxy into masking proxy"
    • 21b23346 - Merge Patching{Query,Admin} into Masking{Query,Admin}
    • d3eea0b6 - Add masking_db=shared

    Compare with previous version

  • vlorentz changed title from Make masking proxy apply display names to Add patching proxy

    changed title from Make masking proxy apply display names to Add patching proxy

  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Please register or sign in to reply
    Loading