Skip to content
Snippets Groups Projects

Add support for model object anonymization

Simply add a BaseModel.anonymize() method. Default implementation returns None, meaning the object is not anonymizable.

For Revision, Release and Person, the method do return an anonymized version of the object.

Replaces (partially) the couple swh-journal!172 (closed)/swh-storage!400 (closed).

See swh-journal!173 (closed) for the part in swh.journal.


Migrated from D3171 (view on Phabricator)

Merge request reports

Approved by

Closed by Phabricator Migration userPhabricator Migration user 4 years ago (May 20, 2020 2:33pm UTC)

Merge details

  • The changes were not merged into generated-differential-D3171-target.

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
369 388 d["date"] = TimestampWithTimezone.from_dict(d["date"])
370 389 return cls(target_type=ObjectType(d.pop("target_type")), **d)
371 390
391 def anonymize(self) -> "Release":
392 """Returns an anonymized version of the Release object.
  • 422 449 **d,
    423 450 )
    424 451
    452 def anonymize(self) -> "Revision":
    453 """Returns an anonymized version of the Revision object.
  • mentioned in merge request swh-journal!172 (closed)

  • mentioned in merge request swh-storage!400 (closed)

  • Neat.

    Some typos to fix.

  • Merge request was accepted

  • Antoine R. Dumont approved this merge request

    approved this merge request

  • 129 140
    130 141 return Person(name=name or None, email=email or None, fullname=fullname,)
    131 142
    143 def anonymize(self) -> "Person":
    144 """Returns an anonymized version of the Person object.
    145
    146 Anonymization is simply a Person which fullname is the hashed, with unset name
    147 or email.
  • Shouldn't we make anonymized objects error when their compute_hash() method is called?

  • Author Maintainer

    ! In !251 (closed), @vlorentz wrote: Shouldn't we make anonymized objects error when their compute_hash() method is called?

    Maybe, but that would require we keep the info "this is an anonymized object" somewhere, which is not the case for now. This idea can be dealt later, maybe?

  • ! In !251 (closed), @douardda wrote: Maybe, but that would require we keep the info "this is an anonymized object" somewhere, which is not the case for now. This idea can be dealt later, maybe?

    That's why I'm asking now, so you/we don't have to do some code changes later. But if you're comfortable with it, then fine.

  • Author Maintainer

    Typos + comments/docstrings + hash on the fullname in Person.anonymize()

    also ensures persons_d() strategy do not generate data that looks like and anonymized person.

  • Build is green

    Patch application report for D3171 (id=11267)

    Rebasing onto cce30366...

    Current branch diff-target is up to date.
    Changes applied before test
    commit e40fe471031bc85f9d40be163cba9d7351a02888
    Author: David Douard <david.douard@sdfa3.org>
    Date:   Tue May 19 16:04:30 2020 +0200
    
        Add support for model object anonymization
        
        Simply add a BaseModel.anonymize() method. Default implementation returns
        None, meaning the object is not anonymizable.
        
        For Person, the method returns a Person whith hashed fullname (and unset
        name and email).
        
        For Revision and Release, the method returns an anonymized version of
        the object, i.e. with instance of Person replaced by anonymized ones.

    See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/61/ for more details.

  • 93 93 return "%s://%s" % (protocol, domain)
    94 94
    95 95
    96 def persons_d():
    97 return builds(
    98 dict, fullname=binary(), email=optional(binary()), name=optional(binary()),
    99 )
    96 @composite
    97 def persons_d(draw):
    98 fullname = draw(binary())
    99 email = draw(optional(binary()))
    100 name = draw(optional(binary()))
    101 assume(not (len(fullname) == 32 and email is None and name is None))
    102 return dict(fullname=fullname, name=name, email=email)
  • Author Maintainer

    properly annotate BaseModel.anonymize()

  • Build is green

    Patch application report for D3171 (id=11268)

    Rebasing onto cce30366...

    Current branch diff-target is up to date.
    Changes applied before test
    commit 0f3af381835fc2f1e3e420519d0bba7aef3d8ce6
    Author: David Douard <david.douard@sdfa3.org>
    Date:   Tue May 19 16:04:30 2020 +0200
    
        Add support for model object anonymization
        
        Simply add a BaseModel.anonymize() method. Default implementation returns
        None, meaning the object is not anonymizable.
        
        For Person, the method returns a Person whith hashed fullname (and unset
        name and email).
        
        For Revision and Release, the method returns an anonymized version of
        the object, i.e. with instance of Person replaced by anonymized ones.

    See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/62/ for more details.

  • Author Maintainer

    use ModelType instead of T for type annotation

  • Build is green

    Patch application report for D3171 (id=11270)

    Rebasing onto cce30366...

    Current branch diff-target is up to date.
    Changes applied before test
    commit 29312dff6d96ac1c9bc18bf98de1d2e27a76c334
    Author: David Douard <david.douard@sdfa3.org>
    Date:   Tue May 19 16:04:30 2020 +0200
    
        Add support for model object anonymization
        
        Simply add a BaseModel.anonymize() method. Default implementation returns
        None, meaning the object is not anonymizable.
        
        For Person, the method returns a Person whith hashed fullname (and unset
        name and email).
        
        For Revision and Release, the method returns an anonymized version of
        the object, i.e. with instance of Person replaced by anonymized ones.

    See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/63/ for more details.

  • closed

  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Please register or sign in to reply
    Loading