HPC Systems Administrator @ Vector Institute
Your Application Journey
Email Hiring Manager
Job Details
Position Summary
The Vector Institute is seeking an HPC Systems Administrator to join our growing team in Toronto as we continue our work of establishing Canada as a centre of expertise for AI. You will be involved in building and maintaining High-Performance Computing environments for world-class research in Machine Learning.
Key Responsibilities
- Support over 250+ node, 10,000+ core, 1,200+ GPU HPC compute clusters.
- Support GPU-enabled workstation office environment.
- Provide guidance and support to the research community.
- Develop and maintain tools for automatic installation and configuration.
- Perform hardware/software upgrades and maintenance.
- Install scientific software, libraries across various OS.
- Support researchers with all their computing needs.
- Maintain network infrastructure and system security.
- Handle enterprise IT operations.
Key Success Measures
- Ensure smooth system functioning with proactive troubleshooting and maintenance.
- Deliver strong support for both research and enterprise IT needs.
- Build and maintain tools for local and cloud infrastructure administration.
Profile of the Ideal Candidate
A degree or diploma in computer science or engineering and more than three years of hands-on Linux/UNIX systems administration experience in a research environment is required. The role demands experience managing HPC grids and job schedulers like Slurm, strong programming and scripting skills, and a problem-solving attitude. Excellent communication skills and an ability to work autonomously in a fast-paced environment are essential.
Qualifications And Assets
- Experience with HPC workload management systems (Slurm, SGE, Moab/Torque).
- Experience with large scale-out storage (SAN/NAS) and file systems (ZFS, GPFS).
- Good understanding of high-speed internetworking (100GE, Infiniband).
- Experience supporting data management, backups, archives and monitoring.
- Familiarity with application tools/databases (MySQL, PostgreSQL) and open source infrastructure (openLDAP, NFS, openZFS, 2FA systems).
Equal Opportunity
At the Vector Institute, we support diversity and welcome candidates from all backgrounds including underrepresented groups. If you require accommodations during the recruitment process, please contact hr@vectorinstitute.ai.
Key skills/competency
- HPC
- Linux
- Systems Administration
- Slurm
- Networking
- Automation
- Scripting
- Security
- Storage
- Research Support
How to Get Hired at Vector Institute
🎯 Tips for Getting Hired
- Research Vector Institute: Understand their AI and research focus.
- Customize your resume: Highlight HPC and Linux skills.
- Showcase relevant projects: Emphasize automation and troubleshooting.
- Prepare technical insights: Review HPC and scheduler systems.
- Practice communication: Be clear about problem-solving examples.