Contents

# What is Structured Encryption?

The best abstraction I know of to clearly understand the area of encrypted algorithms and to be able to talk about it coherently is the notion of structured encryption. I find it to be a useful abstraction not only to design new schemes but also to highlight some of the subtleties in the area (this is in part why it was introduced).

Roughly speaking, a structured encryption scheme is an encryption scheme that encrypts a data structure and allows the encryptor to generate tokens with which one can query the encrypted data structure. So if we take the case of an array, then an array encryption scheme is a scheme $(\setup, \rtoken, \read, \wrtoken, \write, \resolve)$ that works as follows:

• $\setup$ takes as input a security parameter $k$ and an array $A$ and outputs a secret key $\k$ and an encrypted array $EA$;
• $\rtoken$ takes as input the key $\k$ and an array index $i$ and outputs a token $\tk$;
• $\read$ takes as input an encrypted array $\A$ and a token $\tk$ and outputs an encrypted result $\ct$;
• $\wtoken$ takes as input a secret key $\k$, an index $i$ and an element $e$ and outputs a write token $\wtk$;
• $\write$ takes as input an encrypted array $EA$ and write token $\wtk$ and outputs a new encrypted array $EA'$;
• $\resolve$ takes as input the secret key $\k$ and a result $\ct$ and outputs a plaintext results $a$;

Similarly, if we wanted to encrypt a graph with support for neighbor queries instead of an array with support for read and write queries then we would need to design a graph encryption scheme scheme $(\setup, \token, \neigh)$.

There are many structured encryption schemes. We know how to design them for arrays with support for read and write queries, for graphs with support for neighbor, adjacency and approximate shortest paths, for dictionaries with support for get and put queries and for multi-maps also with support for get and put queries.

graph, document or key/value database that

First we need to be clear about what we mean by an “encrypted database”. In the real world, there many kinds of databases including: hierarchical, object-oriented, [relational]( and [NoSQL]( databases. Hierarchical databases aren’t really used in practice anymore so we’ll ignore them and mostly focus on relational and NoSQL databases. A relational DB is a collection of tables that are queried using [SQL]( or the lower-level relational algebra. NoSQL databases are usually based on a simpler data representation like [dictionary data structure]( which store key/value pairs and support get and put operations. Other kinds of NoSQL databases include [graph databases]( document databases, etc.

encryption One that is often cited is FHE. One of the great things about using FHE to design encrypted algorithms is that: (1) it is simple, you just design a search circuit and then let FHE do its thing; (2) your encrypted algorithm will not leak anything.

# What is Leakage?

As I mentioned above there are many approaches one could use to design encrypted algorithms. One that is often cited is FHE. One of the great things about using FHE to design encrypted algorithms is that: (1) it is simple, you just design a search circuit and then let FHE do its thing; (2) your encrypted algorithm will not leak anything.

The current generation of FHE schemes are not practical yet for large datasets, but cryptographers have been making great strides and designing increasingly efficient schemes. While the progress has been very impressive, it is important to understand that there are fundamental limits to how fast FHE-based computing can be. More precisely, because FHE schemes evaluate circuits, their runtime is $\Omega(n)$, where $n$ is the size of the input. For many applications, however, linear time is too slow—especially in the setting of “Big Data”. In fact, even for basic algorithmic problems like search, linear time is completely infeasible on modern datasets. I want to stress here that I am not saying that FHE cannot be used at all in the design of encrypted algorithms; I am only saying that purely FHE-based solutions will be inefficient for Big Data. It could very well be, however, that FHE is used as a lower-level building block and combined with other primitives (in fact, we use homomorphic encryption in this way in this paper for example). I am also not saying that FHE can’t be used for smaller datasets or to solve approximation problems (e.g., find all the documents that contain keyword $w$ with a false positive rate of $p$ and a false negative rate of $n$).

Because of this, the area of encrypted algorithms is mostly focused on finding sub-linear solutions and in this regime it turns out that all the solutions we are aware of leak some information. This includes solutions based on ORAM, STE, PPE, garbled circuits etc. The leakages are varied and in different adversarial models. They depend on what underlying primitive is used and, more importantly, on how exactly that primitive is used in the design of the higher-level encrypted algorithm. For this reason, it is important to be clear about what leakage profile a solution achieves. It is also important to understand that the presence of leakage doesn’t necessarily mean we can exploit it.

Since sub-linear solutions leak information, it is natural to wonder whether this leakage can be exploited to recover information about the data and/or queries.

There are basically two ways we can