The Michigan Engineer News Center

Finding meaning in varied data

Jie Song devised a method to combine summarized datasets that group information by incompatible units.| Short Read

CSE grad student Jie Song earned the runner up Best Paper Award at the 2018 Extending Database Technology conference for her paper “GeoAlign: Interpolating Aggregates over Unaligned Partitions.” Song, working with Bernard A. Galler Collegiate Professor of Electrical Engineering and Computer Science H.V. Jagadish and Prof. Danai Koutra, devised a method to combine summarized datasets that group information by incompatible units.

Big organizations that gather lots of data, from companies to government agencies, typically format that data to meet their specific needs, meaning the way it is structured varies from group to group. On top of that, they typically make this data public only as geographic or time-period summaries to protect the privacy of individuals in the list.

Much of the data from these varied sources could provide new insights to important social problems when taken together – but until now it has been challenging to neatly reformat and combine data sets that are structured very differently or aggregated with different units (ZIP code vs. county, or even geography vs. time).

EnlargeGeoalign example
IMAGE:  An example of two different units that can represent summarized data, county and ZIP codes, and the process needed to convert one to another.

To solve this the researchers devised GeoAlign, an algorithm that converts an aggregate to desired target units. This solution is adaptive to new attributes without needing knowledge of the spatial properties of the original and target units. Experiments show that GeoAlign can easily be extended to realign aggregate data in multi-dimensional space for general use. Experiments on real, public government datasets show that GeoAlign achieves equal or better accuracy than the leading state-of-the-art approach without sacrificing scalability and robustness. It also makes better predictions in a reasonably short time, with a runtime that scales linearly with the number of units in the source and target datasets.

Geoalign example
Portrait of Steve Crang


Steve Crang
CSE Marketing and Communications Manager

Michigan Engineering

(734) 763-9996

3832 Beyster Bldg

The electrons absorb laser light and set up “momentum combs” (the hills) spanning the energy valleys within the material (the red line). When the electrons have an energy allowed by the quantum mechanical structure of the material—and also touch the edge of the valley—they emit light. This is why some teeth of the combs are bright and some are dark. By measuring the emitted light and precisely locating its source, the research mapped out the energy valleys in a 2D crystal of tungsten diselenide. Credit: Markus Borsch, Quantum Science Theory Lab, University of Michigan.

Mapping quantum structures with light to unlock their capabilities

Rather than installing new “2D” semiconductors in devices to see what they can do, this new method puts them through their paces with lasers and light detectors. | Medium Read