I architect systems that make analyzing large data sets easier and more efficient. My current focus is simplifying the process of designing hardware accelerators for specific tasks including data-parallel pipelines and image processing. In the past, I've worked on decreasing the costs of using CNNs for computer vision tasks and using distributed computing to quality control and model large financial data sets. I'm supported by a NSF Graduate Research Fellowship and a Stanford Graduate Fellowship in Science and Engineering.
Data analytics pipelines can be more energy efficient and have higher throughput when run on well designed, specialized hardware like FPGAs and CGRAs instead of more general-purpose CPUs and GPUs. However, the tools for targeting CPUs and GPUs, like pandas and PyTorch, make writing performant code much easier than those for FPGAs, which require low-level knowledge of hardware in order to efficiently schedule parallel algorithms. Aetherling is a library of operations, including map and reduce, that enables users to express data analytics pipelines as compositions of data-parallel operations and then optimally schedule the pipelines in hardware. These pipelines should have comparable performance to expert-designed specialized hardware.
Spark Summit East 2016 - February 2016
TopNotch is a framework for quality controlling big data through data quality metrics that scale up to large data sets, across schemas, and throughout large teams. TopNotch's SQL-based interface enables users across the technical spectrum to quality control data sets in their areas of expertise and understand data sets from other areas. I was the project lead and main developer for TopNotch while I worked at BlackRock.
Spark-NYC Meetup - September 2015
This presentation addresses the disparity between the current and desired big data user experiences. In this presentation, I demonstrate a web application with a scatterplot matrix visualization that allows non-technical users to utilize Spark to analyze large data sets.