Network Production Engineering, Infrastructure (DC)
Description
The Network Infrastructure team is responsible for designing, building, and operating one of the largest networks in the world. Networking is at the core of all Meta products and experiences, and we are looking for Network Production Engineers who are interested in solving complex technical challenges in the Backbone, Datacenter, and AI Network domains. The scale of the network and its continuous expansion presents an opportunity to work on and solve interesting engineering challenges in the datacenter network domain. We constantly push the boundaries of what is possible. We create new and innovative ways of designing and operating our global datacenter networks and do it at scale with efficiency. We imagine what our tomorrow is going to be and make it a reality. Production Network Engineers at Meta are hybrid software and network engineers who design, build, and operate our worldwide network. This team owns the complete lifecycle of the network, which includes areas of planning, design, product definition, QA, deployment, and monitoring. Simple, elegant, and scalable network design, automation, and data analytics are the keys to meeting our demands. In this role, you will be part of a team that is responsible for conceiving design solutions, developing and deploying network software, systems, and tools that keep the network operating at maximum reliability, scalability, and efficiency. This role offers an opportunity to solve the scaling challenges of supporting billions of people using our family of apps; to cutting-edge challenges in AI workloads that power new Meta products.
Responsibilities
Establish and implement global best practices and design new scalable network solutions Conceptualize, build, and maintain automation and tools to support New Product Introductions, network deployment, release engineering and operations Work closely with our hardware, software and sourcing teams to develop new networking solutions and influence the future of networking and its associated infrastructure Conduct thorough investigations into complex technical issues across networks, ranging from automated tooling to hardware failures and network issues Be on-call to learn from real world production challenges and take the lessons to improve current and future generation products Help increase operational efficiency between peers and cross-functional teams by identifying roadblocks, designing and delivering automation solutions, and driving change Develop operational process improvements and implement them in scalable, automated workflows to enhance operational efficiency Proactively find gaps that impact multiple teams, come up with the execution plan and drive the project and influence other teams to reach there
Qualifications
Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience 6+ years of experience with planning, designing, building and/or operating scalable systems/networks 4+ years of coding experience in at least one programming language (e.g. Python, Go, C++, or Java), and rapidly learning new development languages Demonstrated knowledge of TCP, IPv4/6, Routing Protocols (one or more of BGP, MPLS, ISIS, or similar), and related network services (e.g. DHCP and DNS) Experience developing and understanding network device configuration in multi vendor environments (Juniper, Cisco, Arista, Brocade, etc.) Hardware evaluation and vendor management Knowledge of IB/RDMA/RoCE Networks, including RDMA congestion control mechanisms, AI training workloads and demands they exert on networks Experience in network automation software leveraging software defined networking principles, Experience in building software solutions for managing network infrastructure, with a focus on scalability and reliability Experience in operation of modern data center architectures (spine-leaf, fabric) Experience operating and designing Software Defined Networking-based backbone networks Experience in Large-scale systems design/operation experience
Compensation: $154,000/year to $217,000/year + bonus + equity + benefits