Principal SaaS Capacity Engineer

OracleApplyPublished 27 days agoFirst seen 27 days ago
Apply

Required Qualifications

  • Bachelor’s or Master’s degree in Computer Science, Electrical Engineering, Cloud/Systems Engineering, or a related field.
  • 5+ years of experience in cloud infrastructure, SaaS operations, or capacity engineering roles.
  • Hands-on experience with large-scale distributed systems, OCI (or AWS, Azure, GCP), and SaaS production environments.
  • Strong programming and scripting experience (Python, Go, Shell, SQL) for automation and AI/ML model deployment.
  • Proven experience deploying AI/ML solutions for capacity forecasting, anomaly detection, and intelligent workload tuning.
  • Deep understanding of cloud capacity topology and distributed service dependencies.
  • Proficiency with infrastructure-as-code (Terraform, Ansible, Helm, Kubernetes).
  • Familiarity with AIOps tools and AI-driven observability platforms (Datadog, Dynatrace, Splunk, or similar).
  • Knowledge of resiliency and disaster recovery strategies, including AI-simulated failure modeling.

Preferred Qualifications

  • Advanced degree (Master’s/PhD) with specialization in AI, ML, Data Science, or distributed systems engineering.
  • Experience building and deploying self-healing, AI-driven automation at scale in a SaaS environment.
  • Domain expertise in reinforcement learning applications for automated resource optimization.
  • Direct exposure to Oracle Cloud Infrastructure (OCI) systems and tools.
  • Experience with cloud-native AI/ML services, MLOps, and continuous model monitoring.

Competencies and Skills

  • Expertise in designing, developing, and deploying AI/ML models for cloud infrastructure use cases (demand forecasting, anomaly detection, workload optimization).
  • Advanced proficiency in automation, orchestration, and self-healing system architectures.
  • Skilled in communicating technical concepts, AI-powered analytics, and strategic insights to engineering and executive audiences.
  • Strong analytical and critical thinking skills, with a deep data-driven mindset.
  • Curiosity and initiative to explore APIs, system profiles, and operational anomalies, translating technical findings into impactful business outcomes.
  • Highly collaborative, adaptive, and passionate about operational excellence and continuous learning.
  • Ability to influence cross-team priorities and drive best practices in AI-enhanced capacity engineering.

Qualifications

Career Level - IC4

Responsibilities

Key Responsibilities

  • Service Accountability: Ensure SaaS production capacity availability, optimization, scaling automation, reserve management, and quota governance.
  • AI/ML Integration: Apply AI/ML models for predictive capacity forecasting, anomaly detection, and workload auto-tuning to anticipate demand spikes and prevent outages.
  • Observability & AIOps: Leverage AI-powered observability and AIOps platforms for end-to-end system monitoring, intelligent alerting, and automated incident mitigation.
  • Strategic Partnership: Collaborate with Product and Development teams to design, validate, and align AI-driven scaling and capacity planning strategies with new launches and initiatives.
  • Automation & Orchestration: Design, implement, and optimize automation and orchestration pipelines, including self-healing systems, policy-driven provisioning, and disaster recovery simulations, using AI to enhance reliability and operational resilience.
  • Data-Driven Decision Support: Deliver advanced instrumentation, AI-powered analytics, and actionable dashboards to inform executives, engineering teams, and stakeholders.
  • Technical Leadership: Translate complex OCI stack and cloud platform resources (compute, storage, DB, networking) into business-ready, AI-enhanced capacity solutions and performance profiles.
  • Simulation & Resiliency: Use AI/ML models to simulate, validate, and improve resiliency and disaster recovery scenarios for faster, more robust recovery response.
  • Collaboration & Communication: Present AI-driven insights, risks, and recommendations to engineering teams, ICs, and executives to illuminate capacity trends and data-driven priorities.
  • Continuous Innovation: Assess new AI/ML techniques, AIOps platforms, and automation tools for ongoing improvements in infrastructure reliability, scalability, and cost optimization.

As a world leader in cloud solutions, Oracle uses tomorrow’s technology to tackle today’s challenges. We’ve partnered with industry-leaders in almost every sector—and continue to thrive after 40+ years of change by operating with integrity. 

We know that true innovation starts when everyone is empowered to contribute. That’s why we’re committed to growing an inclusive workforce that promotes opportunities for all.

Oracle careers open the door to global opportunities where work-life balance flourishes. We offer competitive benefits based on parity and consistency and support our people with flexible medical, life insurance, and retirement options. We also encourage employees to give back to their communities through our volunteer programs.

We’re committed to including people with disabilities at all stages of the employment process. If you require accessibility assistance or accommodation for a disability at any point, let us know by emailing accommodation-request_mb@oracle.com or by calling +1 888 404 2494 in the United States.

Oracle is an Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability and protected veterans’ status, or any other characteristic protected by law. Oracle will consider for employment qualified applicants with arrest and conviction records pursuant to applicable law.