Content moderation in end-to-end encrypted systems
The Center for Democracy and Technology (CDT) published research and recommendations about doing content moderation in end-to-end encrypted (E2EE) systems. We use E2EE systems everyday, for example when using encrypted chat applications such as Signal, Wire, or WhatsApp. As the contents on these platforms are encrypted, platforms cannot easily moderate spam or fake news and need specific techniques to do so. Mallory Knodel from CDT presented CDT's research findings at the Usenix conference. I created a set of watercolor illustrations to make the slides of the talk more visually appealing and understandable. The text below each illustration was created by myself for this website, to give you context to understand them.
The problem with this type of moderation is always: who defines what
"fraudulous" means? Most of the techniques described below can also be
used for surveillance and censorship, and that is why we need to
carefully study them to avoid, in the best case, that techniques
favoring this type of usage are being implemented on platforms, or even
into legal frameworks.
For: Center
for Democracy and Technology, Washington/USA
Year: 2021

Content moderation phases. There are several phases involved when moderating content in end-to-end encrypted systems: platform operators need to define the content to be moderated, then detect this type of content, evaluate their findings, then inform users about moderated content, allow users to appeal a decision, and educate them.

Detection phase of content moderation. Detection of content to be moderated takes place in a space of possibles: it can be done before or after messages are being sent. Content can be moderated actively or pro-actively. The detection of fraudulous messages can be done using content or metadata.

1- User reporting. This technique is called message franking, or more precisely: asymmetric message franking. It also works on decentralized messaging systems. Users can report fraudulous messages to a moderator who can transparently verify the reported message, the sender, and the receiver using a cryptographic signature and public key cryptography. (Moderation space: active, after sending)

2- Metadata analysis. Even encrypted messages leave traces. The metadata associated with a message is generally not encrypted and can be used to detect fraudulous messages. Metadata is "data about data". (Moderation space: active or pro-active, after sending)

3- Traceability. Content moderation systems can also rely on storing information about sent messages. We can imagine this storage as a huge file cabinet. The storage can then be used to compare fraudulous messages which are being reported against stored messages. This technique can also be added on top of message franking, and thus reveal forwarded messages, such as the spreading of fake news, more easily. Facebook's first implementation of message franking was based on traceability. (Moderation space: active, after sending)

4- Perceptual hashing. This technique consists of storing fraudulous content, for example sexual abuse, in a database, using a unique hash for each piece of content. Now, content that will be sent can be compared against content stored in the database and that was previously identified as problematic. This is the technique that Apple wanted to introduce to their cloud storage, but under pressure from civil society abandoned the project. Now, the European Union wants to use this technique for chatcontrol. This technique circumvents end-to-end encryption as it scans content before being sent. (Moderation space: pro-active, before sending)

5- Predictive modeling. This technique consists in teaching an algorithm how fraudulous messages or images look like. The algorithm can then predictively model how future fraudulous messages or images look like and signal them as having to be moderated. (Moderation space: pro-active or active, before or after sending)