公司简介
• Application support/management, Including scoped application metrics/reports
• Conduct root cause analysis of incidents and implement measures to prevent recurrence.
• Deliver changes in accordance with guidelines, representing them in Change Advisory Board (CAB) meetings as required.
• Complete tasks to maintain a stable and resilient service.
• Drive continuous improvement in service health and resilience.
• Provide data and documentation to internal and external auditors, including walkthroughs and reports.
• Research, develop, and implement new solutions to reduce manual work and effort.
• Maintain proactive relationships with external 3rd vendors.
• Foster strong relationships with business stakeholders and internal technical teams.
• Proven experience in reliability engineering or a similar role.
• Strong understanding of system architecture, design principles, and cloud platforms.
• Proficiency in scripting languages (e.g., python) for automation purposes.
• Familiarity with monitoring tools and incident management systems.
• Effective communication skills to convey technical concepts to non-technical stakeholders.
• Adaptability to learn and implement new technologies and tools as required.