CSE PhD candidate Xinchen Yan has been selected for a Rackham Predoctoral Fellowship and a Google PhD Fellowship to support his research in machine learning and its application in computer vision, graphics and robotics.
In his thesis entitled “Learning controllable and structured representation with deep neural networks”, Xinchen investigates the conditional generation problem that synthesizes structured sensory data (e.g., image, video sequence, and 3D object shape) from a given conditioning variable.
Being able to perceive, reason and plan in a way comparable or even superior to human performance is one of the ultimate goals of machine intelligence. The perception stage of the problem often requires bottom-up process, where an intelligent system recognizes the state given sensory inputs from the physical world (e.g., image, video sequence, and 3D shape). Based on this, reasoning and planning usually require certain type of top-down process, where the system generates a plan and gets feedback through analysis-by-synthesis loop or interaction. In recent years, great advances have been made in bottom-up understanding of the physical world with deep neural networks. At the same time, top-down understanding or generative modeling has gained more attention with the emergence of variational auto-encoders (VAEs) and generative adversarial networks (GANs).
Xinchen’s past and ongoing research mainly focuses on three aspects of the conditional generation problem that involves both bottom-up and top-down processes. First, how to model the top-down generation process from a given conditioning variable with a minimum level of human supervision; Second, how to improve the generation process with in-network controllable and structured representation for better generalization to unseen data; Third, how to apply the learned model for applications such as semantic manipulation, analogy-making, and planning.
Generative image modeling is of fundamental interest in machine learning and computer vision. In his ECCV ‘16 work “Attribute2Image: Conditional Image Generation with Visual Attributes,” Xinchen studied the problem of generating images with semantically controllable constraints such as visual attributes.
In his NIPS ‘16 work, “Perspective Transformer Nets: Learning Single-View 3D Object Reconstruction without 3D Supervision,” Xinchen introduced a novel model called Perspective Transformer Nets (PTN) based on a differentiable projection operation that enables unsupervised 3D shape learning from 2D perceptual inputs. The results further demonstrated that unsupervised learning works as good as supervised method, but generalizes far better in generating novel object shapes from unseen image categories.
Building upon his previous research, Xinchen extended and applied PTN to the task of 6-DOF robotic grasping using a deep geometry-aware representation. In his ICRA’18 work “Learning 6-DOF Grasping Interaction with Deep Geometry-aware 3D Representations,” Xinchen proposed a deep model that is able to hallucinate a global 3D shape and a local view of object’s geometric surface from a single RGB-D image. The results further illustrated the benefit of this deep geometry-aware representation in grasping success classification and grasping point planning.
Xinchen Yan is advised by Prof. Honglak Lee.
About the Rackham Predoctoral Fellowship
The Rackham Predoctoral Fellowship supports outstanding doctoral students who have achieved candidacy and are actively working on dissertation research and writing. They seek to support students working on dissertation that are unusually creative, ambitious and risk-taking.
About the Google PhD Fellowship
Google PhD Fellowship program supports PhD students in computer science or closely related fields and reflects Google’s commitment to building strong relations with the global academic community. Recipients are recognized for their incredible creativity, knowledge and skills, and represent some of the most outstanding graduate researchers in computer science across the globe.