Member of Technical Staff (Frontier AI) | $100-$130/hr Remote
Overview
This role is a hands-on technical owner operating at the intersection of research, data, and real-world AI systems for a company that provides frontier evaluations and reinforcement learning environments to improve LLM capabilities. The MTS will own research and evaluation initiatives end-to-end, design ML-oriented data systems, analyze model failures, and translate ambiguous behaviors into structured evaluation frameworks. They will work closely with researchers and domain experts to ensure experimental work produces clean, defensible research signal that translates into meaningful improvements in deployed systems.
What You'll Do6
- 1Own research and evaluation initiatives end-to-end: problem framing, data design, quality calibration, and signal validation.
- 2Design ML-oriented data systems, including task definitions, annotation schemas, rubrics, incentives, and pipelines optimized for downstream model performance.
- 3Analyze model and system failures to identify root causes, edge cases, and opportunities for improvement.
- 4Translate ambiguous, real-world behavior into structured evaluation frameworks and new data categories.
- 5Act as a quality gate: block claims, pause work, or force scope changes when signal strength or data integrity is insufficient.
- 6Partner with cross-functional and client-facing teams to translate research progress into clear, credible narratives grounded in evidence.
Requirements5
- 1Strong judgment around research signal quality and when work is ready to be externalized.
- 2Experience designing ML-oriented datasets, evaluation frameworks, and QA processes.
- 3Ability to translate messy, real-world system behavior into structured research and evaluation opportunities.
- 4Comfort operating in ambiguity with a bias toward ownership and decisive action.
- 5Clear written and verbal communication for explaining tradeoffs and limitations to technical and non-technical stakeholders.
Who Should Apply
The ideal candidate has a systems-level mindset with proven experience in ML data design, evaluation frameworks, and failure analysis. They thrive in ambiguous environments, take ownership, and can communicate clearly across technical and non-technical audiences. Prior work with reinforcement learning, agentic systems, or applied research environments is strongly preferred.
Salary Insight
The national pay range for this full-time position is base salary of $280,000 – $350,000 USD, plus equity compensation and performance-based bonuses. The job title also mentions $100-$130/hr in the listing, but the description provides the full-time salary range.
Required Skills
Application Tip
Highlight specific examples where you designed evaluation frameworks or data pipelines that directly improved model performance, and demonstrate your ability to translate ambiguous system behavior into structured research questions.