Manager, Network Production Engineering

MetaApplyPublished 23 days agoFirst seen 8 days ago
Apply

Description

Meta's data center network is the heart of connecting servers and GPUs. The demand for data center network capacity is rapidly growing, requiring the development of new data center network products. As the speed of capacity deployment increases, so is the demand for it to be reliable. Network interruptions inside of Meta's data center network have an exponentially negative impact on Meta's goals. In this role, you'll be playing a leadership role for supporting the team responsible for the reliability and operations of Meta's production data center network. Working with your team you'll drive reliability initiatives to ensure a stable data center network for both our AI initiatives and all of our products. You'll focus on streamlining operations through automation and software development while preparing your team to support new network topologies and platforms. This role will place you in the critical path for enabling Meta's business objectives and provide you the opportunity to experience unprecedented network scale.

Responsibilities

Support and lead engineers working on Meta's production data center network focusing on challenges related to reliability, scalability, and efficiency of operations and the data center network Understand and contribute to technical architectures, tooling needs, automation plans, network platform launch plans and create comprehensive plans for prioritizing technical and resourcing challenges Actively drive and participate in the handling of incidents on Meta's data center network Work with your team to develop automation software to streamline network operations Develop lasting partnerships with organizational leaders across Meta's network and Infrastructure teams Empower engineers to develop their careers, matching their strengths with projects tailored to their skill levels, long-term skill development, personalities, and work styles Help build and enrich an collaborative work environment comprised of people with a broad range of experiences, perspectives, approaches, and backgrounds Assess employee performance on an ongoing basis, address under-performance, and recognize and promote performance Work closely with dedicated recruiting staff to expand the team including interviewing candidates, participating in conferences/events, and on-boarding new employees Balance the need to “keep things running” with allocating time to long-term, high-impact projects

Qualifications

4+ years of direct management experience in a technology role BS or MS in Computer Science, Engineering, or a related technical discipline, or equivalent experience Experience supporting network devices in a production setting Experience with building teams and/or organizations, including hiring and managing performance Communication and cross-collaboration experience Experience developing software to support network operations Intimate knowledge of common data center network routing topologies (BGP) Familiarity with common network switch hardware architecture