Senior Manager, Cloud Operations
Oracle
Job Overview
Who's the hiring manager?
Sign up to PitchMeAI to discover the hiring manager's details for this job. We will also write them an intro email for you.

Job Description
Senior Manager, Cloud Operations at Oracle
As a Senior Manager, Cloud Operations in the OCI AI/GPU Compute Host Management Group at Oracle, you will lead a team responsible for the ownership, operation, maintenance, and optimization of AI/GPU host management within Oracle Cloud Infrastructure (OCI). Leveraging your expertise in management, hardware, Linux, software architecture, and distributed systems, you will guide your team in delivering robust and reliable OCI services.
The OCI AI/GPU Compute Host Management Group is critical for deployment of servers in new data center builds and directly contributes to the reliability and scalability of OCI cloud infrastructure. You will play a pivotal role in ensuring that operational plans comply with corporate and governmental regulations and compliance standards.
Your responsibilities will include overseeing the resolution of issues within the Host Management Group suite of services, planning and executing service patching and upgrades, and identifying opportunities to enhance automation and optimization across OCI infrastructure and services. Your team will manage service queues, triage and diagnose AI compute hardware and software issues, and drive ongoing process improvements to ensure operational excellence.
Key Responsibilities
- Lead a team of Operators and Developers, providing strategic direction and hands-on support.
- Bring proven leadership, people management, and communication skills.
- Apply strong analytical abilities and deep understanding of large-scale distributed systems.
- Comprehend complex system interactions and dive deep into any part of the stack.
- Establish, develop, and provide ongoing direction for your team.
- Collaborate with geographically distributed teams to drive organizational success.
- Oversee resolution of issues within Host Management Group services.
- Plan and execute service patching and upgrades.
- Identify opportunities for automation and optimization across OCI.
- Manage service queues, triage, and diagnose AI compute hardware/software issues.
- Drive ongoing process improvements for operational excellence.
Required Qualifications
- BS or MS degree in a relevant field, or equivalent experience.
- 5+ years of experience in software engineering, operations, or a related domain.
- 5+ years of people management and/or technical leadership experience.
- Demonstrated experience building teams, including recruiting, hiring, and performance management.
- Strong organizational and planning abilities, including scheduling and resource management.
- Strong operational experience, including service team reporting on metrics for availability, operator/engineer performance, and ticket resolution analytics.
- Proficiency with scripting languages such as Python and BASH.
- Solid knowledge of distributed systems fundamentals.
- Working familiarity with networking protocols (e.g., TCP/IP, HTTP) and standard network architectures.
About Oracle
Only Oracle brings together the data, infrastructure, applications, and expertise to power everything from industry innovations to life-saving care. With AI embedded across products and services, Oracle helps customers turn that promise into a better future for all. Discover your potential at a company leading the way in AI and cloud solutions that impact billions of lives.
Key Skills/Competency
- Cloud Operations
- AI/GPU Management
- Distributed Systems
- Team Leadership
- Linux Administration
- Python Scripting
- BASH Scripting
- Network Protocols
- Service Reliability
- Automation
How to Get Hired at Oracle
- Research Oracle's culture: Study their mission, values, recent news, and employee testimonials on LinkedIn and Glassdoor, focusing on their AI and Cloud leadership.
- Tailor your resume: Highlight your experience in Cloud Operations, AI/GPU compute, Linux, distributed systems, and team leadership, aligning with Oracle's infrastructure needs.
- Prepare for technical challenges: Be ready to discuss your expertise in Python/BASH scripting, networking protocols, and problem-solving within large-scale distributed environments.
- Showcase leadership capabilities: Provide concrete examples of building and managing high-performing technical teams, fostering collaboration, and driving operational excellence.
- Demonstrate impact and scalability: Frame your experience around ensuring reliability, optimizing performance, and contributing to the scalability of critical cloud infrastructure.
Frequently Asked Questions
Find answers to common questions about this job opportunity
Explore similar opportunities that match your background