Evals & Continuous Learning Engineer

Pydantic
Pydantic

Posted on Jun 25, 2026

Shipping reliable AI applications means closing the loop: capture what happened in production, turn it into evaluation data, measure quality, and feed improvements back into the system.

We already have a real foundation in production — evals and experiments built into Logfire, our observability platform, plus our open source pydantic-evals library. We're looking for someone who has worked on evaluation or LLM-observability platforms to own this end to end — and to push it toward genuine continuous learning, where systems measurably improve from their own production data.

This might be you if you've worked on a product like Braintrust, Langfuse, LangSmith, Arize/Phoenix, or Humanloop — or built serious internal eval tooling.