Senior Software Engineer, ML Networking
Our team is supporting the development and implementation of Google Cloud GPU roadmap, including the integration of shared capabilities to enhance ML fungibility. We have a robust pipeline of programs scheduled for General Availability in 2026, including: A4X+ and A5.
We're not just maintaining the status quo. We're innovators delivering novel capabilities such as GPU RDMA, ML Networking for Virtual Machines (VMs) and bare metal, monitoring, packet telemetry, etc. Our team is growing rapidly, and with you on board, it will grow even stronger. As we deliver these key projects, you will have a unique opportunity to shape the future of Google's network. Your ideas and expertise will directly influence how we build and evolve the infrastructure that powers everything at Google.
Responsibilities
- Understand capabilities provided by series of ConnectX (CX) Network Interface Cards (NICs).
- Design features that seamlessly integrate GPU-to-GPU communication capabilities into the unique Google Cloud Infrastructure.
- Bring your designs to life by meticulously coding and implementing the features that enable efficient and reliable GPU-to-GPU communication on the new VM families.
- Deliver virtual ML networking infrastructure enabling ML workloads to run in Google Cloud Platform (GCP).
- Enable fast GPU RDMA networking for VM and Baremetal by exposing the NICs.
Minimum qualifications:
- Bachelor's degree or equivalent practical experience.
- 5 years of experience with one or more general purpose programming languages including but not limited to: Java, C/C++, Python, or Go.
Preferred qualifications:
- Experience with software architecture, software engineering, networking protocols, network virtualization, or networking.