Software Engineering Manager, Diagnostics, Cloud Platforms
The CSCA Diagnostics team is a cornerstone of Google’s infrastructure, responsible for developing the hardware-focused tests and diagnostics that ensure the reliability and health of our global fleet. We design and implement end-to-end software solutions to validate machine state during manufacturing, data center turnup, and repair operations.
In this role, your core mission is to lead the engineering efforts that safeguard Google's hardware integrity while supporting the rapidly growing machine deployment). You will be responsible for managing critical projects involving Intel and AMD CPU servers, as well as high-performance GPU server projects. Your central tasks involve defining the technical goal for diagnostic infrastructure that scales with the fleet, managing high-performing engineering teams, and ensuring that the AI/ML platforms are successfully diagnosed and repaired with high precision at unprecedented scale.
The AI and Infrastructure team is redefining what’s possible. We empower Google customers with breakthrough capabilities and insights by delivering AI and Infrastructure at unparalleled scale, efficiency, reliability and velocity. Our customers include Googlers, Google Cloud customers, and billions of Google users worldwide.
We're the driving force behind Google's groundbreaking innovations, empowering the development of our cutting-edge AI models, delivering unparalleled computing power to global services, and providing the essential platforms that enable developers to build the future. From software to hardware our teams are shaping the future of world-leading hyperscale computing, with key teams working on the development of our TPUs, Vertex AI for Google Cloud, Google Global Networking, Data Center operations, systems research, and much more.
Responsibilities
- Set and communicate team priorities that support the broader organization's goals. Align strategy, processes, and decision-making across teams.
- Set clear expectations with individuals based on their level and role and aligned to the broader organization's goals. Meet regularly with individuals to discuss performance and development and provide feedback and coaching.
- Develop the mid-term technical goal and roadmap within the scope of your (often multiple) team(s). Evolve the roadmap to meet anticipated future requirements and infrastructure needs.
- Design, guide and vet systems designs within the scope of the broader area, and write product or system development code to solve ambiguous problems.
- Review code developed by other engineers and provide feedback to ensure best practices (e.g., style guidelines, checking code in, accuracy, testability, and efficiency).
Minimum qualifications:
- Bachelor's degree in Engineering, Computer Science, or equivalent practical experience.
- 8 years of experience in software development and programming languages in C, C++, or Python.
3 years of experience with developing large-scale infrastructure, distributed systems or networks, or with compute technologies, storage or hardware architecture.
- 3 years of experience in a technical leadership role.
- 2 years of experience in a people management or team leadership role.
Preferred qualifications:
- Master’s degree or PhD in Engineering, Computer Science, or a related technical field.
- 8 years of experience with data structures/algorithms.
- 3 years of experience in a technical leadership role leading project teams and setting technical direction.
- 3 years of experience working in a complex, matrixed organization involving cross-functional, or cross-business projects.