Massive data no match for computer science’s Sangmi Pallickara

Sangmi Pallickara

Big data can be immensely powerful, but only if it can be efficiently used. Otherwise its sheer volume – whether from satellites, genomes or Twitter – becomes a bottleneck in the scientific process. So when scientists are faced with the sorts of massive datasets now available, knowing how to start analyzing them can be daunting, if not downright impossible.

That’s where Sangmi Pallickara and her research come in. A recent winner of a National Science Foundation CAREER Award and an assistant professor of computer science in Colorado State University’s College of Natural Sciences, Pallickara is working on numerous programs to help make big data usable.

Show me the data

“When you have a peta-scale dataset, you have no idea what data you have,” she says from her sunny office on the fourth floor of the Computer Science Building on campus. (For scale, a petabyte is 1,000 terabytes. A whole lot of data.) And to make matters more challenging, with such a large quantity, “if you want to run an analysis task, you have to identify portions of the dataset that you want to run it on.”

So how do you even find a starting place? Pallickara has several solutions. One of the projects she leads, GeoLens, creates interactive visual interfaces for these huge datasets, making them easier to slice and dice into analysis-ready pieces.

Agile models

Having an approachable interface for the data is key, but users must then be able to analyze it. This is where powerful models are needed – to turn tremendous amounts of observational data into larger, comprehensible pictures and predictions. Pallickara’s CAREER Award will focus on building these sorts of large models. She will be working specifically with spatiotemporal data – data that have both location and time properties, such as greenhouse gas measurements or traffic jam information.

The research empowered by this award will allow rapid construction of ad hoc models that can be quickly assessed, refined and deployed. The end goal is to build models that can predict what will happen, and when.

Where the rubber meets the road

Pallickara, who came to CSU by way of Florida, New York and Korea, has a background in distributed systems and distributed data. This immersed her early on in interdisciplinary research, which has bolstered her research’s impact. Her work has now been supported by the Department of Homeland Security, the National Science Foundation, the Environmental Defense Fund, HP, Amazon and Google.

She partnered with Google and the Environmental Defense Fund and, working with College of Natural Sciences associate biology professor Joe von Fischer, developed a program to map – in multiple dimensions and over time – methane leaks, a major contributor to climate change. The Google Street View mapping program has now been collecting and displaying this data in several large cities around the country.

Another application for Pallickara’s work is integrating “smart city” data into useful, real-world lessons. For example, if a city gathers air quality, weather and traffic data, Pallickara’s programs could help reduce air pollution in real time by targeting small changes, avoiding large infrastructure overhauls. Or if a city monitors local social media posts and traffic around a major public event, a usable interface and streamlined analysis of the data “can really give tremendous power to planners,” she says.

With the right models, programs and interface, she says, “you can really take the data to the next level, which is knowledge discovery.”

Pallickara’s CAREER Award will provide $491,243 over five years to support her work on new models for data. CAREER awards are the NSF’s most prestigious that support early-career faculty.