M
Micro1
RemoteHotSite Reliability EngineeringDevOps & Cloud

Site Reliability Engineer | $40-$70/hr Remote

$40–$70/hr
Posted April 26, 2026
contract

Overview

This Site Reliability Engineer role involves deploying, monitoring, and recovering containerized AI training environments. You will use advanced terminal techniques to manage infrastructure, automate tasks, and ensure system stability. The work directly supports training next-generation AI systems, leveraging your domain expertise rather than prior AI experience.

What You'll Do7

  • 1Lead deployment, monitoring, and recovery of containerized AI training environments using advanced terminal techniques
  • 2Proactively identify, diagnose, and resolve infrastructure bottlenecks and failures in long-running processes
  • 3Orchestrate resilient system builds and infrastructure management to ensure stability and optimal resource utilization
  • 4Collaborate with engineering teams to refine CI/CD pipelines and automate routine operational tasks
  • 5Manage and optimize filesystem structures, networked storage, and process scheduling in Dockerized sandboxes
  • 6Conduct rapid mid-execution replanning during error states and unforeseen runtime issues
  • 7Document best practices, emergent solutions, and contribute to knowledge transfer across the team

Requirements5

  • 1Demonstrated expert proficiency with terminal-based problem solving and complex system administration
  • 2Deep expertise in containerized environments (e.g., Docker, Kubernetes) and sandbox orchestration
  • 3Strong Python skills for scripting, automation, and debugging production systems
  • 4Proficiency in Bash and familiarity with JavaScript/TypeScript, Go, Rust, or C/C++
  • 5Experience with build systems, package managers, databases, version control, and cryptography tools

Who Should Apply

This role is ideal for an experienced SRE with deep terminal skills and container orchestration expertise. You should be comfortable with dynamic infrastructure recovery, long-running process management, and scripting in Python. A background in ML ops or AI infrastructure is a plus, but not required.

Salary Insight

$40-$70 per hour (contract position).

Required Skills

Terminal-Native Problem SolvingDynamic Infrastructure RecoveryContainerized Environment MasteryPython

Application Tip

Highlight your experience with terminal-based problem solving and containerized environments (Docker/Kubernetes) by providing specific examples of complex infrastructure recovery scenarios you've handled.

Share: