Redwood forest

Research

General Direction

My research focuses on tools and programming methodologies for highly parallel processors, from a heavily application driven perspective.

The capabilities and limitations of modern semiconductor manufacturing are forcing a shift to parallel computing, which promises to provide very high computational performance, at the cost of complicating software development. Successful parallel programming remains a complicated undertaking, requiring understanding of everything from algorithms to hardware architectures. This complexity threatens to inhibit the widespread application of these highly parallel processors.

My research aims to ease this problem through the use of a series of frameworks, which consist of library elements in addition to code generators by which to compose them. I'm currently constructing an application framework targeted at computer vision applications. This framework consists of high level components, such as clusterers and classifiers, as well as code generators to build optimized versions of these components and compose them into an application.

Currently, we have implemented several of these components in CUDA, achieving high performance, with speedups from 5-300x over serial CPU implementations. Some of these components have been manually composed into an image contour detector which provides extremely high quality image contours at semi-interactive rates. More specifically, on a GeForce GTX280, we have reduced runtime for the globalPb algorithm (Maire et al., CVPR 2008), which is currently the most accurate image contour detector known, from 4 minutes to 1.8 seconds on a 0.15 MP image. The reduced runtime significantly enlarges the scope of practical applications for this algorithm (ICCV 2009 paper). We have also implemented highly performant Support Vector Machine routines for both training and classification (ICML 2008 paper). While building these components, we explore both efficient implementations of existing algorithms as well as algorithmic changes which make the computation more suited to parallel platforms. Going forward, my research will generalize these components into a framework which can be used to build many computer vision applications, providing both high developer productivity as well as end user performance.

Acknowledgements

My research has been generously supported by an NSF Graduate Research Fellowship, the Gigascale Research Center, the ParLab at Berkeley, an NVIDIA Fellowship, and a Qualcomm Innovation Fellowship.