LLMOps & MLOps Engineer (AI Platform)

Rohlik

Rohlik

Software Engineering, Data Science
Prague, Czechia
Posted on Jul 10, 2025

At Rohlik Group, we are architecting the future of retail through intelligent automation. Our vision is to create a truly autonomous organization, where AI-driven systems optimize every facet of our business. This ambition extends beyond our own operations; we are commercializing our proprietary technology through Veloq, our AI-driven fulfillment platform, to power the next generation of e-grocery worldwide.

We are seeking a pioneering LLMOps & MLOps Engineer to build the foundational AI platform that will power this transformation. This is not a role for an end-user of ML/AI platforms; this is an opportunity for a true platform architect who is passionate about creating the "paved paths" and scalable infrastructure that empower our AI and ML teams to innovate at speed. You will have the autonomy to design, build, and own the core operational systems—from CI/CD pipelines to model serving infrastructure, MCP servers and vector databases—that ensure our AI services run smoothly, reliably, and efficiently in production.

What You Will Build & Own (Responsibilities)

As an LLMOps & MLOps Engineer, your mission is to build and operate the centralized platform that enables our entire AI organization. You will bridge the gap between data science and operations, providing the tools and infrastructure that make our AI vision a reality.

Architect and Build the Central AI Platform:

  • Design, build, and maintain the scalable infrastructure for our ML and LLM workloads on a major cloud platform (AWS, GCP, or Azure).

  • Create "paved paths" and standardized templates for our AI and ML teams, developing repeatable solutions and reference architectures that enable them to easily train, test, and deploy models with confidence and speed.

  • Develop and manage core platform components, including the infrastructure for feature stores, vector databases, and model registries to support advanced applications like Retrieval-Augmented Generation (RAG).

Develop and Operationalize the MLOps/LLMOps Lifecycle:

  • Create and develop a scalable API gateway for Large Language Models, providing a unified, secure, and observable interface for all AI services. You will implement governance policies for model access, including token-based rate limiting, caching, and routing to ensure cost-effective and compliant use of LLMs across the organization.

  • Develop and operate Model Context Protocol (MCP) servers to standardize how our AI agents and models securely interact with external tools and data sources, creating a universal, plug-and-play format for seamless integration.

  • Build and own robust, automated CI/CD pipelines for the entire ML lifecycle, covering the versioning and deployment of code, data, prompts, and models.

  • Collaborate with AI engineers to containerize and orchestrate services using technologies like Docker and Kubernetes, ensuring smooth transitions from development to production.

Ensure Production Reliability and Efficiency:

  • Implement comprehensive observability and monitoring for the AI platform and its services, tracking performance, cost, and quality metrics using tools like Langfuse and Datadog.

  • Ensure the platform runs smoothly by managing load balancing, auto-scaling, and resource allocation for GPU-intensive workloads.

  • Drive cost-efficiency by optimizing infrastructure, implementing intelligent resource allocation, and providing visibility into the unit economics of our AI services.

The Profile We're Looking For (Qualifications)

We are looking for a pragmatic and product-focused platform engineer who combines deep operational expertise with a strategic mindset and a passion for enabling others to build impactful AI solutions.

Foundational Experience:

  • A strong background in software engineering, DevOps, or Site Reliability Engineering (SRE) with proven proficiency in Python.

  • Experience building and deploying infrastructure on a major cloud platform (AWS, GCP, or Azure).

  • A solid understanding of AI and ML fundamentals and the model development lifecycle.

Platform & Ops Expertise (Must-Have):

  • Proven, hands-on experience building and managing the infrastructure for production-grade systems using Machine Learning and Large Language Models (LLMs).

  • Deep expertise in MLOps/LLMOps platform development, including building and maintaining CI/CD pipelines and automation using tools like GitHub Actions, Gitlab Pipelines, , or TeamCity.

  • Expert knowledge of infrastructure-as-code (e.g., Terraform, Ansible) and containerization and orchestration technologies (Docker, Kubernetes).

  • Experience implementing and managing robust monitoring, logging, and observability solutions for production systems.

Generative AI Acumen (Good-to-Have):

  • A strong understanding of the infrastructure requirements for RAG and Agentic systems, and experience building platforms to support them.

  • Familiarity with the operational challenges of agentic frameworks (e.g., LangGraph, LangChain) and experience designing systems to deploy and monitor them reliably.

  • Knowledge of LLM-specific observability tools (e.g., Langfuse) for tracing, evaluation, and prompt management.

  • MCP Server development experience

Why You'll Thrive Here

  • Build the Engine, Not Just the Car: You won't just use the tools; you will build the entire factory. You will have the ownership to create the foundational platform that powers all AI innovation at Rohlik.

  • High-Leverage Impact: Your work will directly enable dozens of engineers and data scientists, amplifying their productivity and accelerating the delivery of our most strategic AI initiatives, including the commercial Veloq platform.

  • Modern Tech Stack: You will work with the latest and most powerful tools in the MLOps and cloud-native ecosystem, with the freedom to choose and implement the best technologies for building a world-class AI platform.

  • Solve Foundational Problems: You will tackle the core challenges of reliability, scalability, and efficiency for cutting-edge AI workloads, setting the standard for how we operate AI in a high-stakes, real-time retail environment.

About Rohlik

Rohlik is the leading Central European e-grocer.

Making customers happy in the Czech Republic, Hungary, Austria, Germany and Romania.

What we offer

Are we a good fit?

Our goal in life is to make other peoples’ lives better. Such a mission is difficult, life at Rohlik is difficult – it is not for everyone.

Is it the right one for you?

Are you in?