6 days ago

AI Systems Administration Specialist

Argonne National Laboratory

On Site
Full Time
$140,000
Lemont, IL

Job Overview

Job TitleAI Systems Administration Specialist
Job TypeFull Time
CategoryCommerce
Experience5 Years
DegreeMaster
Offered Salary$140,000
LocationLemont, IL

Who's the hiring manager?

Sign up to PitchMeAI to discover the hiring manager's details for this job. We will also write them an intro email for you.

Uncover Hiring Manager

Job Description

AI Systems Administration Specialist at Argonne National Laboratory

The Argonne Leadership Computing Facility (ALCF) is home to Aurora, one of the world's pioneering exascale supercomputers. With its extraordinary computing speed and advanced artificial intelligence capabilities, Aurora is set to revolutionize scientific research. ALCF is dedicated to supporting high-performance computing (HPC) and adjacent services that are crucial to the research workflow.

The Argonne Leadership Computing Facility (ALCF) is seeking a skilled AI Systems Administration Specialist to join their team to support the AI Testbed, which is tasked with the evaluation of emerging hardware and software platforms for Artificial Intelligence (AI) and Machine Learning (ML) for science.

About the Role

  • Work directly with first-class systems alongside scientific staff and research colleagues within the division.
  • Serve as a systems administrator working on Argonne’s AI Testbed, installing and managing diverse AI and machine learning related hardware and software.
  • Work directly with other subject matter experts to ensure the sustainability and availability of the testbed infrastructure.
  • Support machines in a mixed operating system environment and work efficiently with other operations groups.
  • Have researchers rely on your guidance when it comes to the environment, directly impacting research productivity.
  • Work in a hybrid environment with 2+ days onsite in Lemont, Illinois, with the ability to work fully onsite if preferred.

Required Skills and Qualifications

  • Experience in UNIX systems administration, especially Linux, with an emphasis on OS installation and upgrading, package building and management, common services and applications, and troubleshooting.
  • Experience with Salt, Ansible or similar configuration management tools.
  • Experience with Git, or other modern version control platforms.
  • Effective problem-solving skills.
  • Working knowledge of scripting languages, particularly Python.
  • Ability to write concise documentation.
  • Ability to work effectively as a member of a team.
  • Flexibility in handling assignments and working on several projects simultaneously.
  • Knowledge and understanding of how to safely operate within a datacenter, including tasks such as mounting and unmounting server hardware.
  • Ability to handle physical labor of installing racks and servers in a datacenter, including lifting up to 20 pounds independently and upwards with additional help.
  • Understanding of IPv4 networking.
  • Ability to model Argonne’s core values: impact, safety, respect, integrity and teamwork.
  • Proof of U.S. citizenship is required to comply with federal regulations and contract.

Preferred Skills and Qualifications

  • Knowledge of AI/ML systems architecture and workflows (Groq, Cerebras, Graphcore, SambaNova).
  • Working knowledge of Kubernetes management.
  • Knowledge of scientific applications.
  • Knowledge of high-performance networking technologies such as Infiniband and Slingshot.
  • Knowledge of Storage Area Networking and storage arrays, such as NetApp.
  • Knowledge of parallel and distributed file systems such as Lustre, and their associated hardware.
  • Knowledge of high-performance computing techniques, graphics, and visualization.
  • Experience with software packaging, building software from source, and dynamic linking.
  • Understanding of MPI, and implementations.
  • Ability to gather site requirements and represent them to design and development teams.
  • Experience implementing CI or CD workflows.

Compensation and Level

This position can be hired at one of two levels (PT3 or PT4) based on relevant knowledge and skills. The minimum requirements are:

  • PT3: Bachelor's degree and 4+ years of experience, or a Master's degree and 2+ years of experience, or equivalent. The expected pay range for this position is $86,299 - $134,626.
  • PT4: Bachelor's degree and 6+ years of experience, or a Master's degree and 4+ years of experience, or equivalent. The expected pay range for this position is $106,455 - $166,070.

Key skills/competency

  • UNIX Systems Administration
  • Linux
  • Configuration Management
  • AI/ML Platform Support
  • Python Scripting
  • Datacenter Operations
  • High-Performance Networking
  • Kubernetes
  • Parallel File Systems
  • Troubleshooting

Tags:

AI Systems Administrator
Systems administration
AI/ML support
Infrastructure management
Troubleshooting
Scientific computing
Hardware installation
Software management
Configuration management
Network administration
Linux
UNIX
Salt
Ansible
Git
Python
Kubernetes
Infiniband
Lustre
NetApp
MPI

Share Job:

How to Get Hired at Argonne National Laboratory

  • Research Argonne National Laboratory's mission: Study their mission, values, recent news, and employee testimonials on LinkedIn and Glassdoor to align with their scientific endeavors.
  • Tailor your resume effectively: Highlight specific experience in UNIX/Linux administration, AI/ML platform support, configuration management tools like Salt or Ansible, and Python scripting for the AI Systems Administration Specialist role.
  • Showcase problem-solving skills: Prepare concrete examples of complex technical challenges you've resolved, demonstrating your analytical thinking and troubleshooting capabilities during interviews.
  • Prepare for technical depths: Be ready to discuss Linux internals, networking protocols, datacenter operations, and potentially specific AI/ML architectures and HPC technologies relevant to ALCF's environment.
  • Emphasize collaboration and documentation: Highlight instances where you've worked effectively in teams, communicated complex technical information clearly, and produced concise system documentation.

Frequently Asked Questions

Find answers to common questions about this job opportunity

Explore similar opportunities that match your background