I’m interested in machine learning for text, scientific data systems, and visualization-driven exploration. This page collects the themes that show up across my work.

Current directions

  • Natural language processing for extracting structured signals from unstructured text.
  • Automatic speech recognition and speech-adjacent modeling questions.
  • Bilingualism, language variation, and how language data can be modeled computationally.
  • Time-series, EEG, and biomedical datasets with a focus on reusable data standards.

Recent research-adjacent work

  • Curating and validating EEG datasets with BIDS and HED for cross-study reuse.
  • Building NLP pipelines for regulatory risk signals from board meeting minutes.
  • Designing visualization systems for long-running, exploratory analysis of climate and text data.