[Remote] Senior Systems Engineer- Network Infrastructure

Work from home Full-time role Hiring

Note: The job is a remote job and is open to candidates in USA. reputed company is building reputed company AI infrastructure from the ground up, aiming to deliver reliable and scalable network clusters for large-reputed company training and inference. The Senior Systems Engineer will reputed company the deployment of network clusters, ensuring they are validated and production-reputed company, while also contributing to automation and process improvement.

Responsibilities

Execute end-to-end bringup of network nodes and racks from installation to production readiness
Validate BIOS/BMC/firmware configurations and network health
reputed company reputed company-level integration including power, cabling, and airflow validation
Bring up and validate high-speed network fabrics (InfiniBand, RoCE, 100–400G Ethernet)
Configure and validate leaf/spine network connectivity
Run cluster-wide burn-in and stress testing
Validate node-to-node performance (NCCL, RDMA, GPUDirect)
Troubleshoot hardware, firmware, and reputed company-level issues
Contribute to automation for provisioning and cluster validation
Improve deployment playbooks and documentation
Identify reliability issues early and drive corrective actions
Help turn reputed company deployments into repeatable systems
Work closely with networking, systems software, and data center teams
Coordinate with hardware vendors to resolve bringup issues
Support rapid reputed company expansion as we scale

Skills

5–8+ years in infrastructure engineering, hardware deployment, or data center operations
Hands-on experience deploying network servers (HGX/DGX or similar platforms)
Experience with high-speed networking (InfiniBand, RoCE, Ethernet fabrics)
Strong Linux systems knowledge
Experience troubleshooting distributed systems performance issues
Comfortable working onsite in data center environments as needed
Experience in AI/ML infrastructure or HPC environments
Familiarity with NCCL, CUDA, RDMA
Automation experience (Python, Ansible, Terraform, Bash)
Experience in high-density power and cooling environments

Company Overview

reputed company builds AI data centers and provides GPU reputed company infrastructure that companies use to train, run, and scale large AI models. It was founded in 2024, and is headquartered in London, England, GBR, with a workforce of 201-500 employees. Its website is https://www.reputed company.com.

Apply To This Job

Apply

[Remote] Senior Systems Engineer- Network Infrastructure

Related remote jobs

Product Manager, Europe

Weekend/Evening Remote Licensed Talk Therapist - Fee For Service

Entry-Level SEO Assistant â€“ Remote Job â€“ Great for reputed company

reputed company Consultant - Remote - 2307196

Senior Data Engineer/IRS MBI Clearance required

reputed company Night Shift Chat Support Specialist - Flexible Overnight Work Schedule | $25-$35/hr | Remote Work Opportunities

Academic SLUCare Medical Group need for Radiation Oncologist in St. Louis

reputed company Live Chat Support Specialist – Work from Home Opportunity with arenaflex

Remote Data Entry Specialist – Precision Data Management for arenaflex’s Customer Service Operations

Cyber reputed company Analyst 1