^{1}/Jan 0001

# What is Structured Encryption?

The best abstraction I know of to clearly understand the area of encrypted algorithms and to be able to talk about it coherently is the notion of structured encryption. I find it to be a useful abstraction not only to design new schemes but also to highlight some of the subtleties in the area (this is in part why it was introduced).

Roughly speaking, a structured encryption scheme is an encryption scheme that encrypts
a data structure and allows the encryptor to generate tokens with which one can
query the encrypted data structure. So if we take the case of an array, then
an *array encryption scheme* is a scheme $(\setup, \rtoken, \read, \wrtoken, \write, \resolve)$
that works as follows:

- $\setup$ takes as input a security parameter $k$ and an array $A$ and outputs a secret key $\k$ and an encrypted array $EA$;
- $\rtoken$ takes as input the key $\k$ and an array index $i$ and outputs a token $\tk$;
- $\read$ takes as input an encrypted array $\A$ and a token $\tk$ and outputs an encrypted result $\ct$;
- $\wtoken$ takes as input a secret key $\k$, an index $i$ and an element $e$ and outputs a write token $\wtk$;
- $\write$ takes as input an encrypted array $EA$ and write token $\wtk$ and outputs a new encrypted array $EA'$;
- $\resolve$ takes as input the secret key $\k$ and a result $\ct$ and outputs a plaintext results $a$;

Similarly, if we wanted to encrypt a graph with support for neighbor queries
instead of an array with support for read and write queries then we would need
to design a *graph encryption scheme* scheme $(\setup, \token, \neigh)$.

There are many structured encryption schemes. We know how to design them for arrays with support for read and write queries, for graphs with support for neighbor, adjacency and approximate shortest paths, for dictionaries with support for get and put queries and for multi-maps also with support for get and put queries.

graph, document or key/value database that

First we need to be clear about what we mean by an “encrypted database”. In the real world, there many kinds of databases including: hierarchical, object-oriented, [relational]( and [NoSQL]( databases. Hierarchical databases aren’t really used in practice anymore so we’ll ignore them and mostly focus on relational and NoSQL databases. A relational DB is a collection of tables that are queried using [SQL]( or the lower-level relational algebra. NoSQL databases are usually based on a simpler data representation like [dictionary data structure]( which store key/value pairs and support get and put operations. Other kinds of NoSQL databases include [graph databases]( document databases, etc.

encryption One that is often cited is FHE. One of the great things about using FHE to design encrypted algorithms is that: (1) it is simple, you just design a search circuit and then let FHE do its thing; (2) your encrypted algorithm will not leak anything.

# What is Leakage?

As I mentioned above there are many approaches one could use to design encrypted algorithms. One that is often cited is FHE. One of the great things about using FHE to design encrypted algorithms is that: (1) it is simple, you just design a search circuit and then let FHE do its thing; (2) your encrypted algorithm will not leak anything.

The current generation of FHE schemes are not practical yet for large datasets,
but cryptographers have been making great strides and designing increasingly
efficient schemes. While the progress has been very impressive, it is
important to understand that there are fundamental limits to how fast FHE-based
computing can be. More precisely, because FHE schemes evaluate circuits, their
runtime is $\Omega(n)$, where $n$ is the size of the input. For many
applications, however, linear time is too slow—especially in the setting of
“Big Data”. In fact, even for basic algorithmic problems like search, linear
time is completely infeasible on modern datasets. I want to stress here that I
am not saying that FHE cannot be used at all in the design of encrypted
algorithms; I am only saying that *purely* FHE-based solutions will be
inefficient for Big Data. It could very well be, however, that FHE is used as
a lower-level building block and combined with other primitives (in fact, we
use homomorphic encryption in this way in this
paper for example). I am also not
saying that FHE can’t be used for smaller datasets or to solve approximation
problems (e.g., find all the documents that contain keyword $w$ with a false
positive rate of $p$ and a false negative rate of $n$).

Because of this, the area of encrypted algorithms is mostly focused on finding
*sub-linear* solutions and in this regime it turns out that all the solutions
we are aware of leak *some* information. This includes solutions based on ORAM,
STE, PPE, garbled circuits etc. The leakages are varied and in different
adversarial models. They depend on what underlying primitive is used and, more
importantly, on how exactly that primitive is used in the design of the
higher-level encrypted algorithm. For this reason, it is important to be clear
about what leakage profile a solution achieves. It is also important to
understand that the presence of leakage doesn’t necessarily mean we can exploit
it.

Since sub-linear solutions leak information, it is natural to wonder whether this leakage can be exploited to recover information about the data and/or queries.

There are basically two ways we can