Skip to content
Snippets Groups Projects

Add support for a priviledged "channel" of topics for non-anonymized objects

The idea is to publish on a "public" prefix anonymized objects and have a priviledged prefix for non-anonymized ones. Every model object which ".anonymize()" method returns a(n anonymized) model object will be sent to both channels (anonymized version on the regular channel, and original version on the privileged channel).

Currently only Release and Revision objects are anonymizable.

For anonymizable objects:

  • regular (non-anonymized) objects are sent to the topic

    "{prefix}_privileged.{object_type}"

  • anonymized version of these objects are sent to

    "{prefix}.{object_type}"

On the client side, a new boolean "privileged" config parameter is used to select privileged topics if vivible on the Kafka broker at subscrition time.

Replaces !172 (closed).


Migrated from D3172 (view on Phabricator)

Merge request reports

Approval is optional

Merged by avatar (Apr 21, 2025 6:26am UTC)

Merge details

  • Changes merged into with .
  • Deleted the source branch.

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Build has FAILED

    Patch application report for D3172 (id=11260)

    Rebasing onto 89cd8f7b...

    Current branch diff-target is up to date.
    Changes applied before test
    commit 12f224e11a00827db38f9328b022ccf565416bb8
    Author: David Douard <david.douard@sdfa3.org>
    Date:   Mon May 18 11:35:22 2020 +0200
    
        Add support for a priviledged "channel" of topics for non-anonymized objects
        
        The idea is to publish on a "public" prefix anonymized objects and have a
        priviledged prefix for non-anonymized ones. Every model object which
        ".anonymize()" method returns a(n anonymized) model object will be sent
        to both channels (anonymized version on the regular channel, and
        original version on the privileged channel).
        
        Currently only Release and Revision objects are anonymizable.
        
        For anonymizable objects:
        
        - regular (non-anonymized) objects are sent to the topic
        
          "{prefix}_privileged.{object_type}"
        
        - anonymized version of these objects are sent to
        
          "{prefix}.{object_type}"
        
        On the client side, a new boolean "privileged" config parameter is used to
        select privileged topics if vivible on the Kafka broker at subscrition time.

    Link to build: https://jenkins.softwareheritage.org/job/DJNL/job/tests-on-diff/73/ See console output for more information: https://jenkins.softwareheritage.org/job/DJNL/job/tests-on-diff/73/console

48 48
49 49 fetched_messages += 1
50 50 topic = msg.topic()
51 assert topic.startswith(kafka_prefix + "."), "Unexpected topic"
51 assert topic.startswith(f"{kafka_prefix}.") or topic.startswith(
52 f"{kafka_prefix}_privileged."
  • 241 241 stop_after_objects=1,
    242 242 object_types=["else"],
    243 243 )
    244
    245
    246 def test_client_subscriptions_with_anonymized_topics(
    247 kafka_prefix: str, kafka_consumer_group: str, kafka_server_base: str
    248 ):
    249 producer = Producer(
    • Maybe mentions that this tests only the client subscription to anonymized topics (and respectively priviledged topic for the next one).

      maybe make that loop over the priviledged set of object types (revision, release)?

    • Author Maintainer

      Will do (not sure yet which fix, either rename the test or consume all the topics).

    • Please register or sign in to reply
  • 313 prefix=kafka_prefix,
    314 stop_after_objects=1,
    315 privileged=False,
    316 )
    317 # we only subscribed to the standard prefix
    318 assert client.subscription == [kafka_prefix + ".revision"]
    319
    320 # with privileged channel activated on the client side
    321 client = JournalClient(
    322 brokers=[kafka_server_base],
    323 group_id=kafka_consumer_group,
    324 prefix=kafka_prefix,
    325 stop_after_objects=1,
    326 privileged=True,
    327 )
    328 # we also only subscribed to the standard prefix, since there is no priviled prefix
    • curious me, what's expected to happen if we require priviledged=True to something other than revision and release?

      Shouldn't it be tested as well?

    • Author Maintainer

      The behavior of the client (which should be clear according to commit messages and docstrings, but...) is that if this config flag is set, it will subscribe to existing privileged topics rather than regular ones (if both privileged and regular exists in the advertised topics list), but will apply this logic for object_types one by one.

      So for other object types than anonymizable ones, it will subscribe to regular topics.

    • Please register or sign in to reply
  • 27 37 consumed_messages = consume_messages(consumer, kafka_prefix, expected_messages)
    28 38 assert_all_objects_consumed(consumed_messages)
    29 39
    40 for key, obj_dict in consumed_messages["revision"]:
    41 obj = Revision.from_dict(obj_dict)
    42 for person in (obj.author, obj.committer):
    43 assert not (
    44 len(person.fullname) == 32
    45 and person.name is None
    46 and person.email is None
  • 190 209
    191 210 def _write_addition(self, object_type: str, object_: ModelObject) -> None:
    192 211 """Write a single object to the journal"""
    193 topic = f"{self._prefix}.{object_type}"
    194 212 key = object_key(object_type, object_)
    213
    214 if self.anonymize:
    215 anon_object_ = object_.anonymize()
    216 if anon_object_: # can be either None, or an anonymized object
    217 # if the object is anonymizable, send the non-anonymized version in the
    218 # privileged channel
    219 topic = f"{self._prefix_privileged}.{object_type}"
    220 dict_ = self._sanitize_object(object_type, object_)
    221 logger.debug("topic: %s, key: %s, value: %s", topic, key, dict_)
    222 self.send(topic, key=key, value=dict_)
  • Looks good.

    I got questions in the diff, nothing blocking.

    For my last remark though, i think there is an issue, i don't think you are sending the stock object when the anonymization is on. Thus required changed to ensure that.

  • Merge request was returned for changes

  • mentioned in merge request swh-model!251 (closed)

  • Antoine R. Dumont mentioned in merge request !172 (closed)

    mentioned in merge request !172 (closed)

  • mentioned in merge request swh-storage!400 (closed)

  • Merge request was accepted

  • Antoine R. Dumont approved this merge request

    approved this merge request

  • Author Maintainer

    typos, rename a couple of tests, and better comments

  • Build has FAILED

    Patch application report for D3172 (id=11269)

    Rebasing onto 89cd8f7b...

    Current branch diff-target is up to date.
    Changes applied before test
    commit cc4af0e0baece37aa2b0b9054e3c647ffa4b84b5
    Author: David Douard <david.douard@sdfa3.org>
    Date:   Mon May 18 11:35:22 2020 +0200
    
        Add support for a priviledged "channel" of topics for non-anonymized objects
        
        The idea is to publish on a "public" prefix anonymized objects and have a
        priviledged prefix for non-anonymized ones. Every model object which
        ".anonymize()" method returns a(n anonymized) model object will be sent
        to both channels (anonymized version on the regular channel, and
        original version on the privileged channel).
        
        Currently only Release and Revision objects are anonymizable.
        
        For anonymizable objects:
        
        - regular (non-anonymized) objects are sent to the topic
        
          "{prefix}_privileged.{object_type}"
        
        - anonymized version of these objects are sent to
        
          "{prefix}.{object_type}"
        
        On the client side, a new boolean "privileged" config parameter is used to
        select privileged topics if vivible on the Kafka broker at subscrition time.

    Link to build: https://jenkins.softwareheritage.org/job/DJNL/job/tests-on-diff/74/ See console output for more information: https://jenkins.softwareheritage.org/job/DJNL/job/tests-on-diff/74/console

  • Author Maintainer

    bump dep on swh-model to 0.2

  • Build is green

    Patch application report for D3172 (id=11281)

    Rebasing onto 89cd8f7b...

    Current branch diff-target is up to date.
    Changes applied before test
    commit 5210495975af9a7a0320d8f6f60986a3a495d427
    Author: David Douard <david.douard@sdfa3.org>
    Date:   Mon May 18 11:35:22 2020 +0200
    
        Add support for a priviledged "channel" of topics for non-anonymized objects
        
        The idea is to publish on a "public" prefix anonymized objects and have a
        priviledged prefix for non-anonymized ones. Every model object which
        ".anonymize()" method returns a(n anonymized) model object will be sent
        to both channels (anonymized version on the regular channel, and
        original version on the privileged channel).
        
        Currently only Release and Revision objects are anonymizable.
        
        For anonymizable objects:
        
        - regular (non-anonymized) objects are sent to the topic
        
          "{prefix}_privileged.{object_type}"
        
        - anonymized version of these objects are sent to
        
          "{prefix}.{object_type}"
        
        On the client side, a new boolean "privileged" config parameter is used to
        select privileged topics if vivible on the Kafka broker at subscrition time.

    See https://jenkins.softwareheritage.org/job/DJNL/job/tests-on-diff/75/ for more details.

  • Author Maintainer

    typo

  • Build is green

    Patch application report for D3172 (id=11282)

    Rebasing onto 89cd8f7b...

    Current branch diff-target is up to date.
    Changes applied before test
    commit 3cac9cffe243b0893f408606005c464f24e84412
    Author: David Douard <david.douard@sdfa3.org>
    Date:   Mon May 18 11:35:22 2020 +0200
    
        Add support for a priviledged "channel" of topics for non-anonymized objects
        
        The idea is to publish on a "public" prefix anonymized objects and have a
        priviledged prefix for non-anonymized ones. Every model object which
        ".anonymize()" method returns a(n anonymized) model object will be sent
        to both channels (anonymized version on the regular channel, and
        original version on the privileged channel).
        
        Currently only Release and Revision objects are anonymizable.
        
        For anonymizable objects:
        
        - regular (non-anonymized) objects are sent to the topic
        
          "{prefix}_privileged.{object_type}"
        
        - anonymized version of these objects are sent to
        
          "{prefix}.{object_type}"
        
        On the client side, a new boolean "privileged" config parameter is used to
        select privileged topics if vivible on the Kafka broker at subscrition time.

    See https://jenkins.softwareheritage.org/job/DJNL/job/tests-on-diff/76/ for more details.

  • closed

  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Please register or sign in to reply
    Loading