Signal Search

by Joe Engelman, Seny Kamara, Tarik Moataz and Sam Zhao.

Overview

Signal is an end-to-end encrypted messaging app made by Open Whisper Systems. It is based on the Signal protocol which was designed by Trevor Perrin and Moxie Marlinspike. In addition to underlying the Signal App, the Signal protocol is also used by WhatsApp, Facebook Messenger and Google Allo.

Why is Signal important? Together with PGP-encrypted email and OTR-based messaging, the Signal App is one of the most secure ways to communicate. Unlike PGP, however, Signal is really easy to use. The combination of strong privacy guarantees and usability is what makes Signal so important and impactful. Signal is used by journalists, dissidents, politicians, government officials and whistleblowers---including Edward Snowden. However, it is also used by everyday people. In fact, we use it ourselves to communicate with each other and with our families and friends.

What this project was about. As everyday Signal users, we noticed that the Signal App was missing a particular feature that we expected: the ability to search through our messages. As we started using Signal as our default messenger, we noticed that our usage patterns changed. In particular, in addition to sending ephemeral messages, we were also using the app to communicate more permanent data like pictures, addresses, phone numbers etc. And it was for this more permanent data that the lack of search functionality impacted us the most.

So this made us wonder why the Signal App did not support search. But after thinking about it and reading through the code, we realized that adding search functionality was a non-trivial problem. But before we can discuss this in more depth, we first have to understand how the Signal App stores messages.

The Signal Storage Architecture

When Signal is installed it first generates keying material. This includes a password $P$, a 128-bit AES key $K$, an HMAC-SHA1 key $G$ and an elliptic curve Diffie-Hellman public/private key pair $(pk=g^x, sk = x)$. The tuple $(K, G, sk)$, which we refer to as the master secrets, is then encrypted with $P$ using password-based encryption and stored on disk while pk is stored in plaintext. Note that these master secrets are only used to encrypt messages when stored and not in-transit. The password $P$ is either user-generated (if the password option is enabled) or fixed to the string “unencrypted” (if the password option is disabled).

Signal operates in two modes: open mode, which is when the password P is cached; and closed mode, when P is not cached.

Open mode. If a message arrives while Signal is in open mode, it uses the password to decrypt the master secrets $(K, G, sk)$ and proceeds to encrypt-and-MAC the message with K and $G$. The encrypted message is then stored in a SQLite table with a unique message ID. The table is depicted in Fig. 1 below and includes other metadata like a timestamp, sender and device ID. The table itself is encrypted when stored on disk if the phone is using full-disk encryption (FDE).

Fig. 1: The Signal message table.

Closed mode. If a message arrives while Signal is in closed mode, it cannot recover $K$ and $G$ since $P$ is not available. To get around this, it encrypts the message with a temporary keys until the password is entered. Specifically, it generates an ephemeral ECDH key pair $(pk'=g^y, sk'=y)$ and executes a Diffie-Hellman key exchange using $sk'=y$ and $pk=g^x$. A symmetric encryption key $K'$ and MAC key $G'$ are then derived from the resulting group element with HMAC-SHA256 used as a key derivation function. The message is then encrypted and MACed with $K'$ and $G'$, respectively. The pair $(ct, pk')$, where $ct$ is the encrypted message, is stored in temporary storage. The next time the password is entered, Signal recovers $sk$ and executes a Diffie-Hellman key exchange using $sk=x$ and $pk'=g^y$ to re-generate $K'$ and $G'$. With these keys, it can decrypt $ct$ and re-encrypt the message under $K$ before storing it in the table.

Some Possible Solutions

We first considered three natural ways to add search to the current Signal design: (1) decrypt and scan all the messages; (2) use an in-memory index; (3) use a full-database encryption (FDBE) solution like SQLCipher. The first approach, which requires decrypting and scanning every message is clearly not an option so we will not discuss it further and only consider the second and third options.

In-memory index. One possible solution is for Signal to create an index over the messages before encrypting and storing them in the table. This would allow for fast search but the index would remain unencrypted in memory and possibly on disk if the user is not using FDE. In addition, in the event of a privilege escalation attack or a memory disclosure attack on the device or app the contents of the index could be stolen.

Full-database encryption. FDBE transparently encrypts and decrypts the database (and any associated indexes) when stored and read from disk. With this approach the Signal App would not have to encrypt the messages itself and search would be handled by the database. How much of the database would appear unencrypted in memory, however, is unclear as it depends on the page size of the FDBE solution, whether an index is used or not and the location of the matching rows. In the best case, only a small amount of the data and index would be exposed in memory but in the worst case it could be a lot. Another limitation of this approach is that it would require a non-trivial redesign of the Signal App.

Since both the in-memory and FDBE approaches had limitations, we wanted to explore whether encrypted search techniques could provide a viable third alternative. In particular, we would like to achieve the best of both worlds; that is, fast search while keeping messages protected on disk (even without full-disk encryption) and in memory. To achieve this we had to extend Signal in several ways.

Enabling search. We added an option in the Signal settings to enable search. This option is shown in Fig. 2 below. When that option is set, an encrypted index is created on the message column of the Signal message table. The column is parsed to find the set of all possible keywords and an encrypted index EDB is created that maps keywords to the IDs of the messages that contain them. The EDB key is stored with the master secrets and is password-based encrypted before being stored to disk.

Fig. 2: Turning on encrypted message search.

New messages. If a message arrives when Signal is in open mode, the message and EDB key are used to generate an update token. The token is then used to update EDB. If a message arrives when Signal is in closed mode it is processed and stored as before. When the password is re-entered it is processed as in open mode.

Search. To support search, we added two UX elements so that users could enter and see their search queries. The first is a search box with which users can enter their search queries. The second is a result screen that displays the relevant messages. These elements are shown in Fig. 3. When a search query is made, the keyword and the EDB key are used to generate a search token. The token is then used with the EDB to recover the IDs of the messages that contain the keyword. These messages are then retrieved from the message table, decrypted using the message encryption key $K$ and displayed to the user.

Fig. 3: Searching over encrypted messages.

EDB instantiation. In our current implementation, the encrypted index is generated using an unpublished forward-secure and response-hiding variant of the 2Lev scheme of Cash et al. that we designed ourselves. The implementation can be found in Clusion v0.2.0. This scheme only handles single keyword search, however, so we plan to replace it with IEX which handles boolean queries.

Discussion. The above design has several advantages. First, it achieves fast search by avoiding a sequential scan since it uses an in-memory index. The index, however, is always encrypted on disk even if FDE is not turned on. When Signal is in open mode, the current design does not offer any additional security properties over a plaintext in-memory index since the EDB key is stored with the master secrets which are also stored in memory in open mode. Note that with the current Signal design, hiding the EDB key in open mode would not provide any additional security since the message encryption key is in memory during open mode. In closed mode, however, our approach provides stronger security than a plaintext in-memory index since the EDB key is not available.

Code

You can find the search-enabled variant of Signal here. We stress that this is a research project and that this version of Signal should not be used in practice. This is not the official version of Signal produced by Open Whisper Systems. The code has not been reviewed and is only made available for experimentation and research purposes. PLEASE DO NOT USE THIS!

Conclusions and Future Work

In this project, we set out to explore whether encrypted search could be used to add search functionality to the Signal messaging app securely and without compromising efficiency. The design we proposed incurs minimal changes to Signal's current architecture and has several advantages over the naive solutions which include using a plaintext in-memory index or using full-database encryption solutions like SQLCipher.

In the future, we plan on improving the project in two ways. First, we would like to store EDB and message encryption keys in either the Android Key Store or, depending on the device, in secure hardware. The benefit of this would be that the encrypted index would be secure even while the app is in open mode. Second, we would like to use encrypted search techniques to also hide some of the meta-data that Signal currently stores in plaintext like timestamps, sender and device ID.

More Reading
Older// join