SiteOps Global Production Systems & Software Engineering Manager
Description
Meta is seeking a Production Systems & Software Engineering Manager to join our Data Center Site Operations (SiteOps) team. This role leads the Systems & Software Engineering team which drives the integration, performance, and alignment of tooling, automation, break/fix triage, and related workflows critical to Site Operations. As a leader at the forefront of the global data center industry, you will thrive in environments where adaptability and flexibility are key. You will manage Tooling and Systems engineers, leveraging your technical expertise and attention to detail to support and develop individual contributors. By building trust and credibility across cross-functional teams, you will empower Fleet and Strategic Tooling teams to deliver signals-based support for tooling, automation, software, and mass production workflows that enable Global Operations. Collaboration is central to this role. You will partner with Production Engineering (PE), Core Systems, Enterprise Engineering (EE), and other stakeholders to deliver actionable signals and influence their roadmaps, ensuring SiteOps Global Operations are effectively supported. We are looking for a leader who can rapidly understand complex technical challenges faced by subject-matter experts and local site teams, align globally distributed teams and partner organizations around shared goals and set clear priorities and direction, securing buy-in and commitment from all relevant stakeholders.
Responsibilities
Develop and collaboratively own the roadmap for all tooling, automation, processes and workflows for compute, storage and accelerator delivery from Infra into mass production (MP) deployments. Serve as the central point of contact representing these functions across SiteOps Develop and collaboratively own the processes and workflows required to support Global Operations in maintaining a high SLA for our compute, storage and accelerator platforms Build relationships and collaboration with engineering and cross functional teams across the company. Actively solicit feedback from teams, and use that feedback to improve operational effectiveness as infrastructure scales Lead the team to identify and root cause systemic issues in the fleet and drive resolution. Deliver maximum server fleet up-time and utilization rates, by leveraging data to understand hardware failure conditions and root cause Provide people management, mentorship, coaching, and career development to build an environment fostering commitment to impact Support leadership meetings and facilitate alignment on key issues and opportunities provide timely alerts and data for enabling cross-functional teams to develop requisite corrective actions and forward looking implementations Collaborate with stakeholders, functional owners and subject matter experts to interpret and articulate business and operations needs Travel up to 30% is required
Qualifications
BS or BA in technical field or commensurate experience 10+ years experience in managing teams in software design, workflows and validation, working with cross functional teams to deliver products to production Experience working across a global organization and building partnerships with cross functional teams inside and outside of the organization Demonstrated success in developing and executing a strategic roadmap that supports organizational scaling Experience in processing and analyzing large sets of data Demonstrated knowledge of server and storage platforms, principles, technologies, protocols, and standards Experience managing multiple concurrent projects and managing tight timelines Large-scale data center environment experience, including tooling and automation deployments Experience in data center system and workflows development and deployments Leadership presence and presentation skills
Compensation: $173,000/year to $245,000/year + bonus + equity + benefits