Small Summaries for Big Data
Start date: May 1, 2015,
End date: Apr 30, 2020
A fundamental challenge in processing the massive quantities of information generated by modern applications is in extracting suitable representations of the data that can be stored, manipulated and interrogated on a single machine. A promising approach is in the design and analysis of compact summaries: data structures which capture key features of the data, and which can be created effectively over distributed data sets. Popular summary structures include the Bloom filter, which compactly represents a set of items, and sketches which allow vector norms and products to be estimated. These are very attractive, since they can be computed in parallel and combined to yield a single, compact summary of the data. Yet the full potential of summaries is far from being fully realized. The Principal Investigator will lead a team, working on important problems around creating Small Summaries for Big Data. The goal is to substantially advance the state of the art in data summarization, to the point where accurate and effective summaries are available for a wide array of problems, and can be used seamlessly in applications that process big data. Several directions will be pursued, including: designing and evaluating new summaries for fundamental computations such as tracking the data distribution; summary techniques for complex structures, such as massive matrices, massive graphs, and beyond; and summaries that allow the verification of outsourced computation over big data. Success in any one of these areas could lead to substantial impact on practice, as evidenced by the influence of existing summarytechniques. Support in the form of a five-year research grant will allow the PI to consolidate his research in this area, and build an expert team to focus on these challenging algorithmic questions.
Get Access to the 1st Network for European Cooperation