Site Reliability Engineer | $40-$70/hr Remote

$40–$70/hr

Posted April 26, 2026

contract

Apply Now Read full description

Overview

This Site Reliability Engineer role involves deploying, monitoring, and recovering containerized AI training environments. You will use advanced terminal techniques to manage infrastructure, automate tasks, and ensure system stability. The work directly supports training next-generation AI systems, leveraging your domain expertise rather than prior AI experience.

What You'll Do7

1Lead deployment, monitoring, and recovery of containerized AI training environments using advanced terminal techniques
2Proactively identify, diagnose, and resolve infrastructure bottlenecks and failures in long-running processes
3Orchestrate resilient system builds and infrastructure management to ensure stability and optimal resource utilization
4Collaborate with engineering teams to refine CI/CD pipelines and automate routine operational tasks
5Manage and optimize filesystem structures, networked storage, and process scheduling in Dockerized sandboxes
6Conduct rapid mid-execution replanning during error states and unforeseen runtime issues
7Document best practices, emergent solutions, and contribute to knowledge transfer across the team

Requirements5

1Demonstrated expert proficiency with terminal-based problem solving and complex system administration
2Deep expertise in containerized environments (e.g., Docker, Kubernetes) and sandbox orchestration
3Strong Python skills for scripting, automation, and debugging production systems
4Proficiency in Bash and familiarity with JavaScript/TypeScript, Go, Rust, or C/C++
5Experience with build systems, package managers, databases, version control, and cryptography tools

Who Should Apply

This role is ideal for an experienced SRE with deep terminal skills and container orchestration expertise. You should be comfortable with dynamic infrastructure recovery, long-running process management, and scripting in Python. A background in ML ops or AI infrastructure is a plus, but not required.

Salary Insight

$40-$70 per hour (contract position).

Required Skills

Terminal-Native Problem SolvingDynamic Infrastructure RecoveryContainerized Environment MasteryPython

Application Tip

Highlight your experience with terminal-based problem solving and containerized environments (Docker/Kubernetes) by providing specific examples of complex infrastructure recovery scenarios you've handled.

Apply Now· $40–$70/hr