Representing data as networks is an essential way to learn about our digital and physical worlds – by doing so, we encode tons of different phenomena using these specialized graphs and draw insights from the connections within. Such networks have aided the study of exchanges between sensors or autonomous systems, communications between machines in computer networks, movement patterns in transportation engineering, communications and interactions between humans, and more.
But as our data gets bigger, so do our networks. Networks of datasets in a number of fields can amass millions or billions of entities, and drawing useful information from them, or mining them, is a tough computational problem. Particularly difficult is when the data in these networks is distributed across many machines, as is always the case when examining data drawn from the internet.
Prof. Danai Koutra has earned an Army Young Investigator Award to tackle this problem and speed up graph methods for these distributed applications. The goal of her project, called “A New Perspective for Fast Distributed Computations Over Network,” is to speed up in particular those problems that involve making a lot of queries about the network very quickly.
A multi-query, distributed setting can describe many scenarios. For instance, in a network of roads being monitored by several sensors (the nodes of the network), updated data from each of the sensors needs to be registered constantly to see changes in traffic in order to offer better route planning. These updates are queries to the network to change each node’s value.
The project will make improvements to existing graph techniques that are based on linear systems, and redesign them to deal with data distributed across different machines. In the project’s four main objectives, Koutra’s research team will address how to solve these systems using data from different machines, how to determine whether storing data with some overlap on multiple machines can speed up network operations, how to handle “streaming” networks where changes and additions are being made constantly, and how to tailor data storage on a network to work best in these distributed settings. Ultimately, Koutra says they can speed up these operations by minimizing communications across machines and building a framework specifically with distributed systems in mind.
These solutions can be useful for applications like Google’s PageRank, which the company developed to order its search results. The graph algorithm treats websites as nodes and the connections they have to other websites, via links, as edges. It’s now the most commonly used graph algorithm, used in most applications as a building block to accomplish something else – like understanding how important different nodes are or anomaly detection.
The system would also benefit semi-supervised machine learning, which can be expressed over a network in the same format as PageRank, and a variety of national security applications. For example, given a few malicious nodes in a large-scale cyber network, graph-based techniques solved in a distributed environment can be used to efficiently infer other malicious nodes and potential attacks. Along the same lines, given a small set of trustworthy agents in a multi-agent network, it would be possible to infer the reputation of other agents and inform coordination plans.
Koutra’s group has published one SIAM International Conference on Data Mining (SDM) paper related to this work, “Fast Flow-based Random Walk with Restart in a Multi-query Setting,” which tackles the project’s first two objectives with a case study of the graph method Random Walk with Restart (RWR). RWR is related to personalized PageRank, and a basis for more complex tasks.
Koutra received the 2016 ACM SIGKDD Dissertation Award for her thesis on “Exploring and Making Sense of Large Graphs”, an honorable mention for the SCS Doctoral Dissertation Award (CMU), an Adobe Data Science Research Faculty Award (2018), an NSF CAREER award (2019), an Amazon Research Faculty Award (2019), a WSDM 2019 Outstanding PC Award (2019), and has several best paper awards and nominations.