Job Description
We are looking for a Lead HPC Network Engineer to drive the strategy, architecture, and engineering excellence behind advanced AI, research, and Kubernetes-based GPU infrastructure for a major global technology client.
The role focuses on defining the technical vision, leading architecture decisions, and setting engineering standards for high-performance network fabrics supporting large-scale LLM and distributed AI workloads, including InfiniBand/RDMA, high-speed Ethernet, Kubernetes networking, host‑side GPU networking, SmartNIC/DPU technologies, and deep network observability. As a technical leader, you will mentor senior engineers, influence client roadmaps, and own end‑to‑end delivery of mission‑critical network platforms.
The ideal candidate combines deep expertise across InfiniBand NDR/HDR and next‑generation fabrics, RDMA/RoCE, NVIDIA/Mellanox networking, NCCL/MSCCL communication patterns, Linux host networking, PCIe/GPU/NIC topology, and Kubernetes...
Ready to Apply?
Submit your application for Lead HPC Network Engineer - AI Infrastructure at EPAM Systems
Apply Now