Push the frontier of video and document understanding. Design benchmarks, run experiments, and ship models that work on messy, real-world government data—not curated academic datasets.
What you'll do
- Design and implement multimodal architectures for video and document understanding
- Build rigorous evaluation suites for unstructured data at scale
- Collaborate with applied engineers to bridge research and production
- Publish findings and contribute to the team's technical direction
- Mentor junior researchers and engineers on experimental design
What we're looking for
- Strong track record in ML research—publications, open-source, or shipped systems
- Deep experience with vision-language models, video understanding, or document AI
- Proficiency in PyTorch and modern training frameworks
- Ability to design experiments that produce actionable insights
- Comfort working with ambiguous, real-world data
Nice to have
- Experience with government or regulated-domain data
- Background in OCR, layout analysis, or temporal video reasoning
- PhD in ML, CV, or NLP (not required—we care about what you've built)