Production Systems Engineer
Description
The RTP team is responsible for the end-to-end Hardware Lifecycle of all Meta servers, from exploration and development to production health. We work closely with various teams to ensure the smooth operation of systems across the planet, troubleshooting complex issues at various scales, from microscopic to fleet-wide, and driving projects to successful business outcomes.
Responsibilities
Build and develop tooling solutions to automate business critical processes in service of managing the health of the Meta production fleet Troubleshoot, diagnose and root cause system failures, working with key partners to identify and deliver solutions Proactively identify opportunities to fix or enhance tooling, hardware and processes Build subject matter expertise in one or more of the specialist areas covered by the RTP (Reliability and Testability Platform) team in Dublin - Firmware Deployment Scientific approach to troubleshooting, root-cause analysis and investigation
Qualifications
Bachelors degree in computer science, a related technical discipline, or equivalent work experience Experience coding in a higher-level language (Python, PHP, Java, Go, Rust, C++) Experience building, maintaining and debugging production services or platforms - usually but not necessarily in a Linux/Unix environment Edge/CDN (Content Delivery Network) hardware or Silicon Sustainment 8+ years experience coding in a higher-level language (Python, PHP, Java, Go, Rust, C++) Knowledge of server architecture and components across Compute/Storage/AI Systems/Networking Demonstrated experience in communication and collaboration