The Michigan Engineer News Center

Building better coronavirus databases with automatic quality checks

The team will build high-quality datasets to enable automatic quality checking and fraud detection of the new coronavirus data.| Short Read

Our campus, like the global community, is contending with COVID-19 and working to adapt to a new normal. Many are rapidly working on solutions. See all COVID-19 developments from University of Michigan Engineering.


Amid a growing coronavirus crisis, experts in all fields have begun compiling massive datasets to track the impact of the contagion. These datasets capture everything from society-wide virus response information to medical needs data, available medical resources across the country, and buyer interest for medical equipment that could drive financing for new production.

Enlarge3D coronavirus graphic
IMAGE:  Artistic rendering of the SARS-CoV-2 virus. Freepik.

To make constructing these datasets as accurate and timely as possible, Prof. Michael Cafarella is leading an NSF-funded project that will build high-quality auxiliary datasets to enable automatic quality checking and fraud detection of the new data. These safeguards are imperative to making sure coronavirus decision-making is driven by clean, accurate data.

Rapid analytical efforts by policymakers, scientists, and journalists rely on coronavirus data being complete and accurate. But like all dataset construction projects, those chronicling the coronavirus are prone to shortcomings that limit their effectiveness if left unaddressed. These issues include messy or unusable data, fraudulent data, and data that lacks necessary context.

Automatically checking coronavirus datasets against the pertinent, related datasets provided by Cafarella’s team can make them more effective and insightful. For example, an auxiliary database about hospitals might contain data about the hospital’s staff count, so a hospital resource allocator can test whether resources requested for coronavirus treatment are consistent with the level of staffing.

Cafarella’s proposed datasets would be easy to combine with the fast-moving coronavirus data construction projects.

The team will build two large auxiliary databases. The “unified medical institution auxiliary database” will be a database of all known United States medical institutions, and will include rich background information for quality-checking, as well as an easy method for data integration. The “unified government office auxiliary database” will be a database of all known government offices in the United States, such as city halls, courts, or licensing offices, at any level of government. The team will release both databases regularly, and the first release will be approximately one month after the project begins.

3D coronavirus graphic
Portrait of Steve Crang

Contact

Steve Crang
CSE Marketing and Communications Manager

Michigan Engineering

(734) 763-9996

3832 Beyster Bldg

Jay Guo holds a sheet of flexible transparent conductor on the University of Michigan’s College of Engineering North Campus. The material sandwiches a thin layer of silver between two “dielectric” materials, aluminum oxide and zinc oxide, producing a conductive anti-reflection coating on the sheet of plastic.

Making plastic more transparent while also adding electrical conductivity

Michigan Engineers change the game by making a conductive coating that’s also anti-reflective. | Medium Read