
Cloud Systems Architect (Remote)
Quik Hire Staffing · NAMER
- Hybrid
- Contract
- $120,000 / year
- NAMER
Email the hiring manager to get a response.
Get their verified email + an intro that's ready to send.
Subject: Interested in the Cloud Systems Architect (Remote) role at Quik Hire Staffing
Hi Riley — I came across the Cloud Systems Architect (Remote) opening and wanted to reach out directly. I've spent the last few years doing exactly this kind of work, and Quik Hire Staffing stood out because…
✎ Personalized to your résumé after sign-up.
- ✓ Verified email of the hiring manager
- ✓ Intro email personalized to your résumé
- ✓ $9/mo = unlimited — any job link
Secure checkout · cancel anytime
Job highlights
- Design and maintain scalable Linux/Kubernetes infrastructure.
- Monitor systems and analyze performance metrics.
- Automate operations and improve incident response.
- Collaborate with development and operations teams.
- Contribute to next-generation AI model development.
About the role
Site Reliability Engineer (LInE) - Contractor
We are hiring for one of our clients, seeking a Site Reliability Engineer (LInE) to work on a contractor basis. As a Site Reliability Engineer, you will apply your expertise to help train next-generation AI systems, shaping how models learn, reason, and perform through high-quality, real-world input. This role offers a unique opportunity to contribute to the development of frontier AI models, leveraging your domain knowledge to drive innovation in the AI industry.
Key Responsibilities:
- Design, implement, and maintain scalable infrastructure using Linux, Kubernetes, and Prometheus, ensuring seamless deployments and high system availability.
- Monitor system health, analyze performance metrics, and proactively address bottlenecks or potential failures, minimizing manual intervention and increasing system reliability.
- Automate operational processes to minimize manual intervention and increase system reliability, and respond swiftly to incidents, conduct root cause analysis, and drive continuous improvements in incident response procedures.
- Collaborate closely with development and operations teams to deliver seamless deployments and high system availability, creating comprehensive documentation and clear runbooks for operational excellence.
- Respond to incidents, conduct root cause analysis, and drive continuous improvements in incident response procedures, ensuring high system availability and minimizing downtime.
Required Skills & Qualifications:
- Proven experience designing, implementing, and maintaining scalable infrastructure using Linux, Kubernetes, and Prometheus, with a strong understanding of system health monitoring and performance metrics analysis.
- Strong understanding of automation tools and technologies, with experience in automating operational processes to minimize manual intervention and increase system reliability.
- Excellent problem-solving skills, with the ability to analyze complex system issues, identify root causes, and develop effective solutions.
- Strong communication and collaboration skills, with the ability to work closely with development and operations teams to deliver seamless deployments and high system availability.
- Experience with comprehensive documentation and clear runbooks for operational excellence, with a strong attention to detail and ability to create clear, concise documentation.
More About the Opportunity:
This role offers a unique opportunity to work with a global leader in the AI industry, leveraging your domain knowledge to drive innovation and shape the development of next-generation AI systems. You will have the opportunity to work on a global scale, collaborating with top experts and contributing to the creation of cutting-edge AI models.
Equal Opportunity Employer:
We hire based on skills and expertise. All qualified candidates are welcome regardless of background, experience, or prior employment history. Applications are reviewed solely on demonstrated technical ability and qualifications.
Key skills/competency
- Site Reliability Engineering (SRE)
- Linux
- Kubernetes
- Prometheus
- System Design
- Infrastructure Automation
- Performance Monitoring
- Incident Response
- Root Cause Analysis
- AI Model Training
Skills & topics
- Site Reliability Engineer
- SRE
- Linux
- Kubernetes
- Prometheus
- Cloud Infrastructure
- Automation
- System Monitoring
- Incident Response
- AI
How to get hired
- Tailor your resume: Highlight Linux, Kubernetes, Prometheus, and automation experience for the Site Reliability Engineer role.
- Showcase problem-solving: Provide specific examples of analyzing complex system issues and implementing solutions.
- Demonstrate collaboration: Emphasize experience working with development and operations teams.
- Prepare for technical questions: Be ready to discuss infrastructure design, monitoring, and incident response.
- Emphasize AI interest: Articulate your understanding of AI model training and its importance.
Technical preparation
Behavioral questions
Frequently asked questions
- What is the work arrangement for the Site Reliability Engineer role at Quik Hire Staffing?
- The Site Reliability Engineer position at Quik Hire Staffing is a remote role, offering the flexibility to work from anywhere. This allows for a global reach in talent acquisition and project collaboration.
- What are the primary responsibilities of a Site Reliability Engineer at Quik Hire Staffing?
- The primary responsibilities include designing, implementing, and maintaining scalable infrastructure using Linux, Kubernetes, and Prometheus. You will also monitor system health, automate operational processes, and respond to incidents to ensure high system availability and reliability for AI model training.
- What technical skills are essential for the Site Reliability Engineer position?
- Essential technical skills include proven experience with Linux, Kubernetes, and Prometheus for infrastructure design and maintenance. A strong understanding of system health monitoring, performance metrics analysis, and automation tools is also crucial.
- How does Quik Hire Staffing ensure an equitable hiring process for the Site Reliability Engineer role?
- Quik Hire Staffing is committed to an equal opportunity employer policy. Hiring for the Site Reliability Engineer role is based solely on demonstrated skills and qualifications, regardless of background or prior employment history.
- What makes this Site Reliability Engineer opportunity unique?
- This role offers a unique chance to contribute to the development of next-generation AI systems and frontier AI models. You'll work with a global leader in the AI industry, collaborating with top experts to drive innovation.
- Can I apply for the Site Reliability Engineer role if I have prior employment gaps?
- Yes, Quik Hire Staffing hires based on skills and expertise. All qualified candidates are welcome, and prior employment history is not a determining factor for the Site Reliability Engineer position.
