My field is in natural language processing and machine learning. I explore how we can build equitable language technologies through cheap, small, and specialized language models that can be deployed at the edge, i.e., nearest to the communities who need these technologies the most.

I believe that good data is the foundation for building these models, especially when working with low-resource languages where data is scarce and quality matters more. I’m excited about techniques that involve creative (or in Tagalog, ma-diskarte) ways to extract high quality signals given these extreme constraints. That said, I’m always open to learning new approaches!

Below is a selection of work that reflects my current interests. My work has been published in top NLP conferences such as ACL, NAACL, and EMNLP. I’m always excited for potential internships or research visits, so just reach out if you find me a good match!

Selected Publications

Keywords: data-centric NLP, multilinguality, resources & evaluation

I also care a lot about advancing Filipino NLP and representing my native language. This involves:

I write a lot about Filipino NLP in this blog and organize researchers on collaborative projects through the FilBench collective.