I’m an Assistant Professor in the Manning College of Information and Computer Sciences at the University of Massachusetts, Amherst, and a visiting researcher at the Cornell Lab of Ornithology. My research lies at the intersection of computer vision and machine learning, with an emphasis on crafting real-world machine learning systems that integrate human expertise, state-of-the-art machine learning methodologies, and large-scale datasets. Merlin Sound ID is my latest contribution in this space, following the success of Seek, the iNaturalist computer vision system, and Merlin Photo ID. I completed my PhD at Caltech in 2019, advised by Pietro Perona. My thesis work focused on efficient dataset collection through human-in-the-loop systems, and fine-grained visual categorization. I completed my BS and MS at UCSD where I was advised by Serge Belongie. Most of my research work falls under the broad research agenda of Visipedia.
Use My Research
The following apps are all free and accessible on both iPhone and Android. I completed the R&D for the machine learning components that power these apps. In the case of Merlin Sound ID I also did the engineering work for deploying the model efficiently on iOS and Android.
Turn on your phone’s microphone and recognize bird vocalizations in real time. This feature is part of the Merlin Bird ID app.
Turn on your phone’s camera, point it at wildlife, and get real time classification results.
Submit observations of wildlife and get identification assistance from a computer vision system as well as a global community of wildlife enthusiasts.
Identify birds in photographs. This feature is part of the Merlin Bird ID app.
I am available for industry consulting. However, my current schedule may prevent me from handling all consulting requests, so I apologize in advance if I do not respond. My expertise covers all aspects of a machine learning system: data collection, data annotation, metric specification, model research and development, evaluation, and deployment. My prior project experience covers image, audio, video, and geospatial modalities.