Are we planning to add a way to notify the mirrors of the takedown notices ?
I'm just thinking if it could be interesting to subscribe the staging environment to it to ensure the content is also removed from it (and also flagged to avoid any further ingestion).
! In #3087 (moved), @vsellier wrote:
Are we planning to add a way to notify the mirrors of the takedown notices ?
Yeah, we'll have to do that.
What we (me and @rdicosmo) have been thinking of so far, was providing mirrors with a feed of the following information:
reference of the takedown request
SWHID of object affected
reason for takedown (maybe, can be found from the reference of the takedown request, if we find a way to structure it properly; useful for automated processing, I guess)
decision taken by Software Heritage (hide / remove once / blocklist forever)
We'd expect mirror operators to follow the feed, and to take their own decisions with respect to the actions to enact on their own infra.
I'm just thinking if it could be interesting to subscribe the staging environment to it to ensure the content is also removed from it
Once this scaffolding exists, it would certainly make sense to have it used to push the decisions from prod to staging.
(and also flagged to avoid any further ingestion).
For now my working assumption is that we'll remove objects once but we won't make the decision sticky. But I can see how having a sticky ingestion blocklist could be useful in some cases.
! In #3087 (moved), @douardda wrote:
So what about exports of the archive available on git-annex?
In the most serious cases, we will be obliged to remove the incriminated content from these exports too.
One can imagine at least two ways to go:
open up the export, chase the incriminated content, remove it or zero it out, then repack and replace the original export
rebuild the export after removing the content from the archive
Fo 2., it would be handy to have timestamps on all objects (feature mentioned in another thread), so one could rebuild an export with the same content (minus the removed one) as the original export
Any thoughts on this? Any other ways to handle this issue (short of simply removing the exports)?
! In #3087 (moved), @douardda wrote:
So what about exports of the archive available on git-annex?
Those exports do not contain blobs, so in case the takedown to be handled are only concerning file contents, they should not be impacted.
They might be impacted in case of takedown related to metadata, e.g., commit messages.
In that case we can go with what Roberto suggests (in short: "hot fixing" the exports), but that will take a significant amount of processing. For instance, graph compression will need to be redone from scratch. An alternative option, assuming that takedown impacting metadata will be rare enough, will be to just pull the entire graph exports. Once we have regular graph exports (which can happen as often as on a monthly basis) the impact of doing so will be fairly limited.