AI Systems Administration Specialist
Argonne National Laboratory
Job Overview
Who's the hiring manager?
Sign up to PitchMeAI to discover the hiring manager's details for this job. We will also write them an intro email for you.

Job Description
AI Systems Administration Specialist at Argonne National Laboratory
The Argonne Leadership Computing Facility (ALCF) is home to Aurora, one of the world's pioneering exascale supercomputers. With its extraordinary computing speed and advanced artificial intelligence capabilities, Aurora is set to revolutionize scientific research. ALCF is dedicated to supporting high-performance computing (HPC) and adjacent services that are crucial to the research workflow.
The Argonne Leadership Computing Facility (ALCF) is seeking a skilled AI Systems Administration Specialist to join their team to support the AI Testbed, which is tasked with the evaluation of emerging hardware and software platforms for Artificial Intelligence (AI) and Machine Learning (ML) for science.
About the Role
- Work directly with first-class systems alongside scientific staff and research colleagues within the division.
- Serve as a systems administrator working on Argonne’s AI Testbed, installing and managing diverse AI and machine learning related hardware and software.
- Work directly with other subject matter experts to ensure the sustainability and availability of the testbed infrastructure.
- Support machines in a mixed operating system environment and work efficiently with other operations groups.
- Have researchers rely on your guidance when it comes to the environment, directly impacting research productivity.
- Work in a hybrid environment with 2+ days onsite in Lemont, Illinois, with the ability to work fully onsite if preferred.
Required Skills and Qualifications
- Experience in UNIX systems administration, especially Linux, with an emphasis on OS installation and upgrading, package building and management, common services and applications, and troubleshooting.
- Experience with Salt, Ansible or similar configuration management tools.
- Experience with Git, or other modern version control platforms.
- Effective problem-solving skills.
- Working knowledge of scripting languages, particularly Python.
- Ability to write concise documentation.
- Ability to work effectively as a member of a team.
- Flexibility in handling assignments and working on several projects simultaneously.
- Knowledge and understanding of how to safely operate within a datacenter, including tasks such as mounting and unmounting server hardware.
- Ability to handle physical labor of installing racks and servers in a datacenter, including lifting up to 20 pounds independently and upwards with additional help.
- Understanding of IPv4 networking.
- Ability to model Argonne’s core values: impact, safety, respect, integrity and teamwork.
- Proof of U.S. citizenship is required to comply with federal regulations and contract.
Preferred Skills and Qualifications
- Knowledge of AI/ML systems architecture and workflows (Groq, Cerebras, Graphcore, SambaNova).
- Working knowledge of Kubernetes management.
- Knowledge of scientific applications.
- Knowledge of high-performance networking technologies such as Infiniband and Slingshot.
- Knowledge of Storage Area Networking and storage arrays, such as NetApp.
- Knowledge of parallel and distributed file systems such as Lustre, and their associated hardware.
- Knowledge of high-performance computing techniques, graphics, and visualization.
- Experience with software packaging, building software from source, and dynamic linking.
- Understanding of MPI, and implementations.
- Ability to gather site requirements and represent them to design and development teams.
- Experience implementing CI or CD workflows.
Compensation and Level
This position can be hired at one of two levels (PT3 or PT4) based on relevant knowledge and skills. The minimum requirements are:
- PT3: Bachelor's degree and 4+ years of experience, or a Master's degree and 2+ years of experience, or equivalent. The expected pay range for this position is $86,299 - $134,626.
- PT4: Bachelor's degree and 6+ years of experience, or a Master's degree and 4+ years of experience, or equivalent. The expected pay range for this position is $106,455 - $166,070.
Key skills/competency
- UNIX Systems Administration
- Linux
- Configuration Management
- AI/ML Platform Support
- Python Scripting
- Datacenter Operations
- High-Performance Networking
- Kubernetes
- Parallel File Systems
- Troubleshooting
How to Get Hired at Argonne National Laboratory
- Research Argonne National Laboratory's mission: Study their mission, values, recent news, and employee testimonials on LinkedIn and Glassdoor to align with their scientific endeavors.
- Tailor your resume effectively: Highlight specific experience in UNIX/Linux administration, AI/ML platform support, configuration management tools like Salt or Ansible, and Python scripting for the AI Systems Administration Specialist role.
- Showcase problem-solving skills: Prepare concrete examples of complex technical challenges you've resolved, demonstrating your analytical thinking and troubleshooting capabilities during interviews.
- Prepare for technical depths: Be ready to discuss Linux internals, networking protocols, datacenter operations, and potentially specific AI/ML architectures and HPC technologies relevant to ALCF's environment.
- Emphasize collaboration and documentation: Highlight instances where you've worked effectively in teams, communicated complex technical information clearly, and produced concise system documentation.
Frequently Asked Questions
Find answers to common questions about this job opportunity
Explore similar opportunities that match your background