Senior Infrastructure Software Engineer
WEKA
Job Overview
Who's the hiring manager?
Sign up to PitchMeAI to discover the hiring manager's details for this job. We will also write them an intro email for you.

Job Description
Senior Infrastructure Software Engineer at WEKA
WEKA is transforming how organizations build, run, and scale AI and accelerated compute workflows with NeuralMesh™, our intelligent, adaptive mesh storage system. Unlike traditional data infrastructures, which become more fragile as compute environments grow and performance demands increase, NeuralMesh becomes faster, stronger, and more efficient as it scales - providing a flexible, adaptable foundation for enterprise and agentic AI innovation that maximizes GPU utilization, accelerates time to first token, and lowers the cost of innovation.
WEKA is a growth-stage company backed by world-class venture capital investors and AI infrastructure industry leaders. Our technology, purpose-built for AI, has garnered over 140 patents and is trusted by more than 30% of Fortune 50 enterprises, as well as the world’s leading hyperscalers, neoclouds, and AI innovators. Our team is customer-obsessed and works accountably, boldly, and collaboratively to ensure customer success. If we sound like your kind of people, join us!
About The Role
At WEKA, we’re building a next-generation platform for validating large-scale distributed systems. Our goal is to continuously ensure the correctness, performance, and resilience of the WEKA Data Platform across every layer of the stack.
As a Senior Infrastructure Software Engineer, you’ll work hands-on on the systems and frameworks that test, stress, and validate complex distributed infrastructure under real-world conditions. You’ll help design and build automated environments that simulate scale, concurrency, and failure scenarios, and you’ll contribute to evolving how we ensure reliability and correctness in modern infrastructure systems.
This role is ideal for engineers with a strong distributed systems background who enjoy deep technical problem-solving, working close to the system, and building tools that improve quality, stability, and confidence at scale.
What You’ll Do
- Design and implement core components of a distributed testing infrastructure and quality platform.
- Build automated frameworks to validate functionality, performance, and resilience at scale.
- Collaborate closely with infrastructure, storage, and platform teams to ensure quality is built into the development lifecycle.
- Contribute to improving tooling, test coverage, and engineering best practices across the organization.
What You Bring
- Strong experience (5+ years) building or working on large-scale distributed systems in areas such as storage, networking, cloud infrastructure, or backend platforms.
- Solid understanding of concurrency, system correctness, and reliability in production systems.
- Hands-on programming experience in one or more of the following languages: Go, C++, Rust, or Python.
- Experience building test frameworks, infrastructure tooling, or internal platforms is a strong advantage.
- Curiosity and interest in modern approaches to testing, automation, and system validation (including AI-assisted techniques).
- Ability to work independently on complex technical problems while collaborating effectively with cross-functional teams.
Nice to Have
- Experience with observability, performance testing, fault injection, or chaos engineering.
- Familiarity with CI/CD pipelines for large-scale systems.
- Exposure to AI/ML-driven testing or automation tools.
Why Join Us
- Work on cutting-edge AI infrastructure and distributed systems at scale.
- Build platforms that directly impact product quality, reliability, and customer trust.
- Collaborate with deeply technical engineers across storage, infrastructure, and platform teams.
- Solve challenging problems at the intersection of scale, performance, and correctness.
Key skills/competency
- Distributed Systems
- Testing Infrastructure
- System Validation
- Software Engineering
- Go/C++/Rust/Python
- Concurrency
- Reliability Engineering
- Automated Testing
- Cloud Infrastructure
- Performance Engineering
How to Get Hired at WEKA
- Research WEKA's culture: Study their mission, values, recent news, and employee testimonials on LinkedIn and Glassdoor.
- Tailor your resume: Highlight distributed systems, testing, and your proficiency in Go, C++, Rust, or Python for WEKA.
- Showcase problem-solving skills: Prepare specific examples of how you tackled complex infrastructure challenges and improved system quality.
- Demonstrate system knowledge: Articulate your deep understanding of concurrency, system correctness, and reliability in production environments.
- Highlight collaboration: Emphasize experience working effectively with cross-functional teams to achieve engineering goals at WEKA.
Frequently Asked Questions
Find answers to common questions about this job opportunity
Explore similar opportunities that match your background