Engineering Leader – AI & Machine Learning Operations (AIOps)
Fully Remote United States - Remote Product
Description

Job Title: Engineering Leader – AI & Machine Learning Operations (AIOps)
Job Type: Full-time
Location: Remote - United States (PST)


About CloudBees

CloudBees enables enterprises to deliver scalable, compliant, and secure software, empowering developers to do their best work. 


Seamlessly integrating into any hybrid and heterogeneous environment, CloudBees is more than a tool—it's a strategic partner in your cloud transformation journey, ensuring security, compliance, and operational efficiency while enhancing the developer experience across your entire software development lifecycle. It allows developers to bring and execute their code anywhere, providing greater flexibility and freedom through fast, self-serve, and secure workflows.


CloudBees supports organizations at every step of their DevSecOps journey, whether using Jenkins on-premise or transitioning software delivery to the cloud. We’re helping customers build the future, today.


About the Role

CloudBees is seeking a visionary and hands-on Engineering Leader to drive our Agentic & AI Operations (AIOps) strategy. Lead the development of the Cloudbees AI platform designed to support the fine-tuning, deployment, and management of AI, ML, and Agentic Services; Guiding the strategic direction of the engineering team, with a primary focus on platform reliability, scalability, and maintainability. Build and oversee robust systems that empower our customers to optimize and personalize AI and Agents to their specific needs. You will lead a growing team of engineers focused on building reliable, scalable AI & ML infrastructure and pipelines that power intelligent features across our platform.


We’re looking for someone with startup experience, a passion for AI & ML tooling, and deep understanding of operationalizing AI/ML workflows. You'll partner closely with data scientists, product managers, and platform engineers to transform AI ideas into production-grade, secure, and efficient systems.


As the founding Engineering leader of AI Foundations team, this is a high-impact role that sits at the intersection of artificial intelligence and software delivery – ideal for someone passionate about pushing the boundaries of developer productivity and intelligent automation. 


Key Responsibilities

  • Lead and scale a team responsible for AIOps, including model deployment, monitoring, and lifecycle management.
  • Architect and implement AI/ML pipelines that are scalable, observable, and reproducible.
  • Collaborate with cross-functional teams (data science, DevOps, product) to integrate AI/ML systems into our SaaS platform.
  • Establish best practices for AI/ML experimentation, CI/CD for models, data versioning, and model governance.
  • Own the full stack of AIOps infrastructure, from data ingestion to real-time inference systems.
  • Drive technical vision and roadmap for ML platform development.
  • Act as a mentor and coach, helping engineers grow in a fast-paced, startup environment.
  • Manage a team of 5+ 
  • Ability to launch new platforms 0 - 1 and drive adoption internally and externally with partner teams.

Qualifications

Required:

  • 7+ years of engineering experience, including platform engineering, system development, or related roles  with at least 3 years in leadership roles.
  • 3 years of experience with large-scale systems, with a focus on reliability, scalability, and maintainability; and 1 year of experience with AI/ML systems
  • Strong hands-on experience with MLOps tools (e.g., MLflow, Kubeflow, SageMaker, Airflow, Metaflow).
  • Proven track record building ML pipelines in production environments.
  • Experience with cloud infrastructure (AWS, GCP, or Azure) and container orchestration (Kubernetes).
  • Deep knowledge of CI/CD practices as they relate to ML lifecycle.
  • Prior experience in a startup or fast-paced SaaS environment.
  • Strong collaboration and communication skills.
  • Experience deploying and managing services such as Amazon bedrock or Vertex AI - LLM

Preferred (not required):

  • Experience integrating ML capabilities into developer-centric tools or platforms.
  • Familiarity with data observability and ML monitoring tools (e.g., EvidentlyAI, Prometheus/Grafana for models).
  • Knowledge of data privacy, compliance, and security in ML systems.

Compensation & Benefits

  • Base Pay Range: [$210,000 – $250,000 annually] In accordance with applicable law, this represents a reasonable estimated compensation range for this role. Actual compensation will be determined based on skills, experience, and geographic location and may be more or less than the amount shown above. Outside of base compensation, CloudBees also offer stock options and variable bonuses. 
  • What CloudBees Offers:
    • Health Insurance
    • Dental Insurance
    • Vision Insurance
    • Short & Long Term Disability
    • Life Insurance
    • HSA/FSA
    • Remote Work Environment
    • Flexible Time Off 
    • Paid Company Holidays
    • Parental Leave
    • Variable Bonus Plan dependent on your role
    • Stock grant opportunities dependent on your role
    • 401(k) with Company Match

EEO Statement

CloudBees is an Equal Opportunity Employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. All employment decisions are made based on qualifications, merit, and business need, without regard to race, color, religion, sex, sexual orientation, gender identity, age, national origin, disability, veteran status, or any other protected characteristic as outlined by federal, state, or local laws.


Disclaimer

This job description is intended to describe the general nature and level of work being performed. It is not an exhaustive list of all duties, responsibilities, and qualifications required of employees assigned to this position. Duties, responsibilities, and activities may change at any time with or without notice.