From c86307a96912936fd58bd2caf579c41b1b6584df Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?J=C3=A9r=C3=A9my=20Bobbio=20=28Lunar=29?= <lunar@softwareheritage.org> Date: Mon, 12 Jun 2023 16:00:26 +0200 Subject: [PATCH] Document takedown notices on mirrors This document Software Heritage policy regarding how takedown notices must be handled on mirrors. Currently we are missing tooling to implement it in full, but we can still document the stopgap process and what we are aiming. Thanks to ardumont and vlorentz for proofreading and spotting several typos, syntax and grammar issues. Ref swh/meta#4975 --- docs/sysadm/mirror-operations/index.rst | 1 + docs/sysadm/mirror-operations/planning.rst | 4 +- .../mirror-operations/takedown-notices.rst | 91 +++++++++++++++++++ 3 files changed, 95 insertions(+), 1 deletion(-) create mode 100644 docs/sysadm/mirror-operations/takedown-notices.rst diff --git a/docs/sysadm/mirror-operations/index.rst b/docs/sysadm/mirror-operations/index.rst index 7039fdf4..f34af064 100644 --- a/docs/sysadm/mirror-operations/index.rst +++ b/docs/sysadm/mirror-operations/index.rst @@ -133,5 +133,6 @@ You may want to read: planning deploy + takedown-notices onboard monitor diff --git a/docs/sysadm/mirror-operations/planning.rst b/docs/sysadm/mirror-operations/planning.rst index 0f5459b1..386565b0 100644 --- a/docs/sysadm/mirror-operations/planning.rst +++ b/docs/sysadm/mirror-operations/planning.rst @@ -13,7 +13,9 @@ Namely, hosting a mirror requires: - a dedicated infrastructure with enough compute (s/computing) power and storage - enough network bandwidth (both ingress and egress) -- good IT tooling (supervision, alerting). +- good IT tooling (supervision, alerting) +- a legal and operational structure to :ref:`handle takedown requests + <mirror_takedown_requests>` The mirror operator is not required to run the Software Heritage `full software stack <https://docs.softwareheritage.org/devel>`_, however it is possible to diff --git a/docs/sysadm/mirror-operations/takedown-notices.rst b/docs/sysadm/mirror-operations/takedown-notices.rst new file mode 100644 index 00000000..949cc724 --- /dev/null +++ b/docs/sysadm/mirror-operations/takedown-notices.rst @@ -0,0 +1,91 @@ +.. _mirror_takedown_requests: + +Takedown notices on mirrors +=========================== + +Software Heritage creates a universal archive of all publicly available source +code. As most of the data collection is done without supervision, some files +turn out to be unwarranted. The organization might then receive take down +requests for copyright infringements or application of data protection laws. +If deemed legitimate, these requests are acted upon by the Software Heritage +team and targeted data is removed from the main archive. + +Each mirror is under its own specific set of laws. Therefore data removal +from the main archive is not propagated automatically to the mirrors and +must be reviewed by mirror operators. This also means that mirror +operators might receive specific takedown requests which they must be +able to handle. + +Motivations +----------- + +Software Heritage might receive a take down requests for an infringement not +applicable to the mirror jurisdiction. As the mission of Software Heritage is +to preserve as much source code as possible, we want to avoid deleting source +code–even if it is only from mirrors–unless it is really needed. Example: +GitHub had deleted the code of ``youtube-dl`` but we would not. + +Policy +------ + +1. Takedown requests sent to mirror operators must be handled in a timely + fashion. +2. Data removed from the main Software Heritage archive is not to be + automatically removed from any mirrors. +3. For online mirrors, data removed from the main archive should be made + inaccessible, but not deleted, as soon as possible. +4. Mirror operators are responsible for reviewing the reasons of removals and + decide if they should be propagated. If legitimate, data should be removed + from the mirror as well. Otherwise, it should be made accessible again. + +Rationale +--------- + +Software Heritage needs to provide a feed of objects removed from the archive +so it can eventually be propagated to the mirrors. But such a list of objects +can easily be weaponized. + +Imagine “doxxingâ€â€“a form of harassment by revealing private information (like +names, addresses or phone numbers)–is made using a Git repository, then +archived by Software Heritage. After receiving a request, Software Heritage +proceeds to remove the copy of this repository from the main archive. With +access to a running mirror and the list of removed objects from the main +archive, an adversary could create preemptive copy of any information removed +from the main archive. In the case of “doxxing†and other abuse of the archive, +such backups could be browsed for further abuse, like blackmail. + +While datasets are not exported everyday, the list of removed origins can still +be inferred with some delay by looking at the ones removed between two exports +(publicly available on S3). Therefore, we should assume that such information +is already public. + +We can embrace the situation by publishing a list of removed objects which is +good for transparency and for users of exports (for example users of +`swh-graph`) to avoid referring to stale data. + +Such a list must be consumed by mirrors, as soon as possible, to prevent rogue +access to the removed objects. Another channel, only available to mirror +operators, provide the reasons behind the removals to allow them to evaluate +the request legitimacy according to their own contexts. + +Make data inaccessible as soon as possible has the added benefit of relieving +mirror operators from having to review take down notices with any strong +time-critical constraints. This helps for holidays that are not aligned between +France and countries hosting the mirrors. + +Software Heritage will not relay the request itself–it would be a breach of +privacy–but only the reasons for which the organization deemed the request +legitimate (“copyright violationâ€, “harmful contentâ€, etc.) + +Processes +--------- + +The necessary tooling to implement this policy is `not yet available +<https://gitlab.softwareheritage.org/swh/meta/-/issues/4976>`_. +Until then, the process to propagate takedown requests to mirrors will be the following: + +- When the Software Heritage team acts on a takedown request, an email will be + sent to mirrors with the origins (as URL) and the reasons why we deemed the + request legitimate. +- Mirror operators can use :command:`swh alter remove` to also remove the origins (see + :ref:`swh-alter`). -- GitLab