Content moderation in end-to-end encrypted systems

The Center for Democracy and Technology (CDT) published research and recommendations about doing content moderation in end-to-end encrypted (E2EE) systems. We use E2EE systems everyday, for example when using encrypted chat applications such as Signal, Wire, or WhatsApp. As the contents on these platforms are encrypted, platforms cannot easily moderate spam or fake news and need specific techniques to do so. Mallory Knodel from CDT presented CDT's research findings at the Usenix conference. I created a set of watercolor illustrations to make the slides of the talk more visually appealing and understandable. The text below each illustration was created by myself for this website, to give you context to understand them.

The problem with this type of moderation is always: who defines what "fraudulous" means? Most of the techniques described below can also be used for surveillance and censorship, and that is why we need to carefully study them to avoid, in the best case, that techniques favoring this type of usage are being implemented on platforms, or even into legal frameworks.

For: Center for Democracy and Technology, Washington/USA
Year: 2021

A robot is pointing to 6 screens with the words: definition, detection, evaluation, information, appeal, and education.

Content moderation phases. There are several phases involved when moderating content in end-to-end encrypted systems: platform operators need to define the content to be moderated, then detect this type of content, evaluate their findings, then inform users about moderated content, allow users to appeal a decision, and educate them.

A sphere with three axis: active/proactive, before/after, content/metadata

Detection phase of content moderation. Detection of content to be moderated takes place in a space of possibles: it can be done before or after messages are being sent. Content can be moderated actively or pro-actively. The detection of fraudulous messages can be done using content or metadata.

Message franking scheme. A woman with blue hair, likely Alice, the sender, sends a message to a cat, named Catnip, via a platform, her message has a red frank, who calls himself Frank, the cryptographic signature and is visibly encrypted, there is a lock on it. The message is sent through a platform, a grey block, grass is growing on its head. Its eyes are closed to show that it cannot see the message contents it forwards between sender and receiver. The sender signed their message, the receiver can verify the message using public key cryptography. The receiver can also signal the message to an independent moderator, who can judge if the message is fraudulous, using public key cryptography.

1- User reporting. This technique is called message franking, or more precisely: asymmetric message franking. It also works on decentralized messaging systems. Users can report fraudulous messages to a moderator who can transparently verify the reported message, the sender, and the receiver using a cryptographic signature and public key cryptography. (Moderation space: active, after sending)

there is an envelope with a lock. it's been sent by a woman with blue hair, holding a smartphone in her hands. around the message we see bubbles of metadata: the size of the message, the time it's been sent, the receiver, the sender, and the frequence of sending.

2- Metadata analysis. Even encrypted messages leave traces. The metadata associated with a message is generally not encrypted and can be used to detect fraudulous messages. Metadata is "data about data". (Moderation space: active or pro-active, after sending)

A robot on wheels with a red alarm signal on its head is searching through a big file cabinet.

3- Traceability. Content moderation systems can also rely on storing information about sent messages. We can imagine this storage as a huge file cabinet. The storage can then be used to compare fraudulous messages which are being reported against stored messages. This technique can also be added on top of message franking, and thus reveal forwarded messages, such as the spreading of fake news, more easily. Facebook's first implementation of message franking was based on traceability. (Moderation space: active, after sending)

on the right side there are three images of cats in provocative poses, each has a hash assigned to it. On the left side there, is one provocative cat image. An arrow links it to one of the three known and hashed images.

4- Perceptual hashing. This technique consists of storing fraudulous content, for example sexual abuse, in a database, using a unique hash for each piece of content. Now, content that will be sent can be compared against content stored in the database and that was previously identified as problematic. This is the technique that Apple wanted to introduce to their cloud storage, but under pressure from civil society abandoned the project. Now, the European Union wants to use this technique for chatcontrol. This technique circumvents end-to-end encryption as it scans content before being sent. (Moderation space: pro-active, before sending)

On the left we see three squares, to of them containing provocative cat images, and one just the word 'cat'. An arrow points to the head of a robot who is drawing a cat in provocative pose.

5- Predictive modeling. This technique consists in teaching an algorithm how fraudulous messages or images look like. The algorithm can then predictively model how future fraudulous messages or images look like and signal them as having to be moderated. (Moderation space: pro-active or active, before or after sending)