Workshop on Encryption for Secure Search and Other Algorithms

I just got back from the Workshop on Encryption for Secure Search and other Algorithms (ESSA) which was held in Bertinoro, Italy, and was organized by Sasha Boldyreva and Bogdan Warinschi. It was a great event and I'd like to thank the organizers for putting this together and doing such a great job. It was really nice to see all the excitement and enthusiasm behind this topic; both from the research community and from industry.

Since a few people have already asked me for details about the event I figured I would just write brief summaries of the talks. I think the slides will be posted soon so if you are interested you should be able to get more details on the workshop page ESSA.

The first talk was by Christopher Bosch who gave a survey of encrypted search. The talk was based on a paper Christopher published last year. This a really extensive and thorough survey and a great contribution to the field. The authors go over a large number of papers and try to organize and categorize them; drawing conclusions and research directions from the broad perspective they gained from writing the survey. It is a great reference for anyone interested in this field.

Kaoru Kurosawa gave a talk on two of his papers. In the first paper, the authors describe a universally composable (UC) variant of adaptive semantic security (i.e., CKA2-security) for SSE. The main difference with the standard definition is that the UC variant requires correctness and the simulation is strengthened by requiring it to be black-box (i.e., there exists a simulator for all adversaries). Kaoru then described a construction that achieves this strong notion of security. In the second part of the talk Kaoru discussed a more recent paper of his that describes an SSE scheme that handles very expressive queries but without revealing the expression (not just the keywords in the query but the form/structure of the query). This is accomplished using a new variant of garbled circuits which is very interesting in its own right.

Emily Shen talked about her work on substring matching on encrypted data. This is done using an encrypted suffix tree, i.e., using a (interactive) structured encryption scheme for suffix trees. In this work, however, she was concerned with a stronger model of security where the server can be malicious. This last constraint required her to strengthen the standard definition of security for structured encryption. The construction was very nice but a bit too involved to describe here so I recommend reading the paper for more details.

Nathan Chenette gave a nice overview of the state of the art of fuzzy searching on encrypted data in the property-preserving model. After describing the different approaches he gave stronger definitions for this primitive. Unfortunately, to achieve this notion there is what seems to be an inherent ciphertext expansion so he described a weaker notion that allows for more space-efficient constructions.

Kevin Lewi talked about his work on order revealing encryption (ORE). ORE is similar to order-preserving encryption (OPE) in that it allows for comparisons over ciphertexts. But there is one important distinction: unlike OPE, ORE does not require the comparison operation over ciphertexts to be "less than". In other words, to compare two OPE ciphertexts one can simply execute a "greater/lesser than" operation whereas with ORE one might have to execute some other arbitrary operation. This is an important relaxation and allows the authors to overcome impossibility results for OPE which say that OPE schemes have to leak more than just the order. The construction Kevin presented is based on obfuscation techniques but does not require the full power of obfuscation. In particular it avoids the use of Barrington's theorem (though it makes use of multi-linear maps) which as Kevin said makes the scheme at least "implementable" but not practical.

Murat Kantarcioglu described a few of his works including a paper that initiated research on concrete inference attacks; that is, attacks on encrypted search solutions that use statistical and optimization techniques to exploit leakage. He described the attack he and his co-authors use to try to recover the queries a user makes by exploiting the access pattern leakage of SSE. This attack is known as the IKK attack and is currently the best inference attack we have against access pattern leakage. The second part of the talk covered ways to mitigate these kinds of attacks and Murat described a clever way of using differential privacy for this.

David Cash also discussed inference attacks. In addition to standard inference attacks, however, he also described attacks where the adversary exploits more than leakage and, in particular, knows or chooses some of documents. The findings were very interesting. One thing that came out of this study was that the IKK attack-while very interesting in theory-is not really practical. There are several technical reasons for this but I' ll leave you to read David's paper when it appears if you are interested. This study also looked at a new class of schemes (not SSE/structured schemes) that have appeared in the literature recently and showed that they were vulnerable to adversaries who know and/or can choose documents (though to be fair they were not designed with that adversarial model in mind).

Unfortunately, Hugo Krawczyk couldn' t make it to the workshop at the last second so Stas Jarecki gave his talk. This was a nice overview of the work on SSE done by the IBM/Rutgers/UC Irvine team (from now on referred to as IRI) for the IARPA SPAR project. It covered a series of papers including their paper from CRYPTO' 13 that shows how to achieve conjunctive queries in sub-linear time. The talk then continued with more recent papers that focused on schemes with good I/O complexity and even more expressive queries. The talk had a nice blend of theory and systems, in particular illustrating how systems constraints like I/O complexity can sometime force you to find new and interesting solutions.

Vlad Kolesnikov talked about the system that Columbia and Bell Labs designed for the IARPA competition. This system-called Blind Seer-even had a cool logo which we learned was designed by Vlad himself! At a high level, this system makes use of garbled circuits and bloom filters and is designed to work in a 3/4-party model that includes a data owner, a policy server and an index server. Vlad described several bottlenecks they encountered and all the clever optimizations they had to design to make the system perform. There was some discussion about how Blind Seer compared to the IRI system. In the end, it seemed that the two were incomparable and achieved different tradeoffs between leakage and efficiency.

Adam O'Neill presented his recent work on modular OPE (MOPE). MOPE is a variant of OPE where a random shift and modular operation is applied to the plaintext before an OPE encryption is done. Turns out this can improve the security of OPE but not when OPE is used to do range queries. Adam described a few techniques to address this that didn' t seem to affect the efficiency of the schemes. He also showed experimental results to back this up.

Radu Sion talked about the new cloud security startup he' s doing. He couldn' t say much about the technical aspects of what they are doing but he went over some of the services they are providing and showed demoes, some of which included searching on encrypted data. Since this was a "sensitive" talk and Radu himself had to be careful not to reveal too much I' ll stop here at the risk of revealing things he may not want made public on a larger scale.

Paul Grubbs gave a talk that went over what he' s been working on at SkyHigh networks. He talked about ongoing projects SkyHigh was doing with OPE, deterministic encryption and format-preserving encryption. In addition he discussed future projects the company was planning on doing with SSE. This talk was nice in that it provided a different perspective on crypto than what you typically get in academic settings. In particular, Paul described how the solutions they considered and worked on had to fit various business and legal/regulatory constraints. This is something I' ve been exposed to at MSR and I definitely think that seeing how technology gets (or doesn' t get) deployed in the real world is very useful in developing and sharpening your intuition about what research areas are more or less promising in terms of impact.

Mayank Varia gave a great talk on the testing framework Lincoln Labs built to evaluate the encrypted search systems for the IARPA competition. I have to say this was one of my favorite talks. The scale of what they built was truly impressive. The system is composed of various frameworks. One part of the system is just for generating realistic data and queries and they do this using machine learning techniques on real data. The query generation is very flexible however, and you can use it to generate data and queries with specific characteristics for your tests. The second component is a measurement framework. The third component was an automated system for generating graphs and visualizations of the experimental results in LaTex! Overall what they built sounded very impressive and I think that we should try to adopt it as a standard way of testing/evaluating encrypted search solutions. I think the encrypted search community is lucky to have such a framework so we should take advantage of it. Mayank said that they are working on getting the code up on GitHub so I' ll update this post as soon as it' s up.

David Wu talked about a new protocol for privacy-preserving location services. Suppose you want to find out how to get from point A to point B but don' t want to disclose your location to the server that stores the maps and the server doesn' t want to reveal its own data. Without privacy, one can solve this problem by representing the map as a graph and computing the shortest path so the problem David was interested in was can you design a practical two-party protocol for shortest paths. David showed how to do this by first proposing a very nice way to compress the representation of the graph in a way that doesn' t affect the shortest paths and then computing the shortest paths on the new representation via oblivious transfer. David then presented benchmarks of their protocol for the city of Los Angeles.

Florian Bourse presented new constructions of functional encryption schemes for inner products. Unlike previous general-purpose FE schemes the goal of this work was to provide simple and efficient constructions. Florian discussed two constructions, one based on DDH and another based on LWE. Note that the functionality considered by Florian is slightly different than "inner product encryption" of Katz, Sahai, Waters and Shen, Shi and Waters. In the latter works, the decryption returns one bit of information: whether the inner product is equal to 0 or not. Here, the decryption returns the actual inner product.

Tarik Moataz talked about ORAM with constant bandwidth. What is meant in the literature by constant-bandwidth ORAM is a bit technical but, roughly speaking, one can think of it as the requirement that the metadata exchanged with the server is smaller than the data blocks. Previous work on constant-bandwidth ORAM had two limitations. The first is that they achieved only amortized constant-bandwidth. The second is that they only work with very large blocks and as such only make sense for limited kinds of applications (using standard parameters, they would have 4MB blocks). Tarik showed to get around these two limitations, giving a worst-case constant-bandwidth ORAM with much smaller block size. In addition, the scheme also improves the computational cost at the server.

Stas Jarecki talked about RAM-based MPC (i.e., MPC protocols that work in the RAM model as opposed to over circuits). The standard way to do this is to use two-party computation (2PC) to securely compute the client algorithm of an ORAM scheme. Roughly speaking, this requires the ORAM client algorithm to be MPC-friendly so that the resulting solution is efficient. While most schemes consider only the two-party setting, Stas argued that it is interesting to look at three parties as well since better efficiency could be achieved in that setting. In fact, Stas described a protocol for this setting which was a lot more efficient than protocols for the two-party setting.

Leo Reyzin gave a survey of entropy notions in cryptography. Leo went over Shannon entropy, min-entropy and average conditional min-entropy in each case giving a very nice and intuitive explanation of why and when these notions should be applied. He also discussed computational variants of entropy including HILL entropy and what is known and not known about it. Entropy notions in crypto are a bit subtle and can be hard to work with and unfortunately there isn't much material to learn from so Leo' s survey was extremely useful.