Cloud Operations Engineer
- kennethcamarillo8
- May 25
- 2 min read
Location: Richmond, VA (Hybrid – 3 Days/Week Onsite)
HiPaaS is seeking a Cloud Operations Engineer to join our infrastructure team and help manage our Azure-based production environments. This role will focus on system reliability, performance optimization, automation, and supporting mission-critical services for enterprise and healthcare solutions.
Key Responsibilities:
Participate in 24/7 on-call rotations for Azure production support.
Respond to and resolve production incidents and outages.
Conduct root cause analysis for major incidents and implement preventive solutions.
Provide advanced technical support for Azure Databricks and related services.
Monitor system logs and resolve issues related to Kubernetes infrastructure (AKS).
Maintain and optimize Azure infrastructure, ensuring performance, availability, and cost efficiency.
Implement and manage backup and disaster recovery strategies.
Perform capacity planning and drive cost optimization (FinOps).
Automate operational tasks and develop self-healing systems.
Implement and maintain security controls and ensure compliance.
Collaborate with development teams to enhance application performance and scalability.
Assist in the design and implementation of new Azure-based solutions.
Manage Azure resource utilization and monitor system health and security.
Create and maintain Terraform, Bash, and PowerShell scripts.
Document infrastructure processes, configurations, and architectures.
Stay current with new Azure capabilities and cloud operations best practices.
Mentor junior team members and foster a knowledge-sharing culture.
Required Qualifications:
Proven experience managing Azure cloud infrastructure in production environments.
Strong expertise in Azure Kubernetes Service (AKS) and container orchestration.
Hands-on experience with Databricks, system monitoring, and incident response.
Proficient in Bash, PowerShell, and Terraform scripting.
Understanding of FinOps principles for cloud cost optimization.
Familiarity with backup, disaster recovery, and cloud security practices.
Excellent troubleshooting and documentation skills.
Ability to work onsite in Richmond, VA at least 3 days per week.