11 days ago

Cloud Technical Solutions Engineer, Compute

Google

On Site
Full Time
$150,000
Bengaluru, Karnataka, India

Job Overview

Job TitleCloud Technical Solutions Engineer, Compute
Job TypeFull Time
CategoryCommerce
Experience5 Years
DegreeMaster
Offered Salary$150,000
LocationBengaluru, Karnataka, India

Who's the hiring manager?

Sign up to PitchMeAI to discover the hiring manager's details for this job. We will also write them an intro email for you.

Uncover Hiring Manager

Job Description

Cloud Technical Solutions Engineer, Compute

The Google Cloud team helps companies, schools, and government seamlessly make the switch to Google products and supports them along the way. You listen to the customer and swiftly problem-solve technical issues to show how our products can make businesses more productive, collaborative, and innovative. You work closely with a cross-functional team of web developers and systems administrators, not to mention a variety of both regional and international customers. Your relationships with customers are crucial in helping Google grow its Cloud business and helping companies around the world innovate.

In this role, you will own customer issues and provide specialized support to other teams. You will be a part of a global team that provides support to ensure customers can deploy their Artificial Intelligence (AI) and Machine Learning (ML) workloads on AI Infrastructure products. You will troubleshoot technical problems with hardware and software debugging, networking, Linux system administration, coding/scripting, and updating documentation. You will help the customer’s success in the AI/ML space by making improvements to the product, internal tools, processes, and documentation. You will help drive business growth by recognizing and advocating for the customers’ tests related to AI deployments.

Minimum qualifications:

  • Bachelor’s degree in Science, Technology, Engineering, Mathematics, or equivalent practical experience.
  • 6 years of experience with writing code in one or more general purpose programming languages (e.g., C++, Java, Python, Go, etc).
  • Experience with Linux/Unix systems with debugging issues across the hardware/software boundary on enterprise-grade server infrastructure.
  • Experience in troubleshooting for customer needs, and triaging technical issues across the stack (e.g., hardware faults, networking, virtualization, kernel drivers, firmware, performance).

Preferred qualifications:

  • Experience in working with distributed systems with the knowledge of common solutions, design patterns, or best practices.
  • Experience in working with Artificial Intelligence/Machine Learning (AI/ML) computing hardware, including Graphics Processing Unit (GPUs) or other accelerators.
  • Experience with containerization and orchestration technologies like Kubernetes or Slurm.
  • Experience with ML frameworks (e.g., TensorFlow, Pytorch), with the knowledge of the AI/ML training and inference lifecycle.
  • Excellent troubleshooting and communication skills with attention to details.

Responsibilities

  • Manage customer’s problems through diagnosis, resolution, or implementation of new investigation tools to increase productivity for customer issues on AI/ML infrastructure.
  • Develop an understanding of AI/ML workloads and underlying hardware architectures by troubleshooting, reproducing, determining the root cause for customer reported issues, and building tools for diagnosis.
  • Act as a consultant and subject expert for internal stakeholders in Engineering, Business, and customer organizations to resolve deployment and operational obstacles in AI infrastructure environments.
  • Work with multiple Product and Engineering teams to find ways to improve the product, and interact with our Site Reliability Engineering (SRE) teams to drive production.
  • Be available for non-standard work hours or shifts which may include weekends as needed.

Key skills/competency

  • AI/ML Infrastructure
  • Technical Troubleshooting
  • Linux System Administration
  • Hardware/Software Debugging
  • Networking
  • Distributed Systems
  • Kubernetes
  • TensorFlow/PyTorch
  • Customer Support
  • Root Cause Analysis

Tags:

Cloud Technical Solutions Engineer, Compute
troubleshooting
customer support
debugging
AI/ML
Linux
networking
virtualization
root cause analysis
documentation
stakeholder management
C++
Java
Python
Go
Kubernetes
Slurm
TensorFlow
PyTorch
GPUs
Linux/Unix

Share Job:

How to Get Hired at Google

  • Research Google's culture: Study their mission, values, recent news, and employee testimonials on LinkedIn and Glassdoor.
  • Tailor your resume: Highlight extensive experience in AI/ML, cloud solutions, troubleshooting, and programming for Google.
  • Showcase technical depth: Emphasize expertise in Python/Go, Linux/Unix, Kubernetes, GPUs, and ML frameworks.
  • Prepare for technical interviews: Practice system design, advanced debugging, and coding challenges relevant to cloud infrastructure.
  • Demonstrate customer focus: Share concrete examples of resolving complex technical problems and improving customer success.

Frequently Asked Questions

Find answers to common questions about this job opportunity

Explore similar opportunities that match your background