
Operations Engineer – Fault & Performance Mgt
- السعودية
- دائم
- دوام كامل
- Operations Engineer to handle the end-to-end UFPM platform (IBM Netcool / MYCOM OSI or equivalent) operations to sustain 99.9% availability.
- Enable rapid fault detection, correlation, and escalation to meet or beat SLA targets for MTTD/MTTR.
- Optimize thresholds, service-impact models, and noise suppression to reduce false alarms and misses.
- Drive data-driven performance monitoring and trend analysis across IP/MPLS, 4G/5G, FTTH, and data center domains.
- Lead RCAs for major incidents and implement preventive actions that harden stability and resilience.
- Build and maintain integrations between NMS/OSS and ITSM (ServiceNow/Remedy), championing automation.
- Uphold security, documentation, SOPs, and operational standards; collaborate effectively across teams in a 24×7 rotation.
- Operate and maintain UFPM platforms (e.g., IBM Netcool, MYCOM OSI or equivalent), including patching, upgrades, and 99.9%+ availability.
- Ensure proactive fault detection, event correlation/de-duplication, and SLA-aligned alert escalation.
- Monitor KPIs across IP/MPLS, 4G/5G, FTTH, and data center/cloud; build dashboards and health checks.
- Define and optimize thresholds, suppression, and service-impact models to reduce noise and missed alerts.
- Lead incident response for P1/P2 events; perform RCAs and drive corrective/preventive actions.
- Integrate UFPM with NMS/EMS/OSS and ITSM tools (ServiceNow/Remedy) via APIs/webhooks; maintain CI/topology mapping.
- Develop/run automations and runbooks (e.g., event enrichment, auto-ticketing, remediation); script as needed.
- Perform trend analysis and capacity planning using historical data; recommend performance optimizations.
- Maintain governance: SOPs, configuration/version backups, access controls, and security/compliance adherence.
- Participate in a 24×7 on-call rotation; act as an escalation point and mentor NOC/operations teams.
- Bachelor's in Telecommunications, Computer Science, Electrical/Electronics Engineering, or related field.
- 7+ years in Network Operations, Service Assurance, or UFPM systems management.
- Hands-on with IBM Netcool, MYCOM OSI, or equivalent (operations, administration, configuration).
- Strong multi-vendor/multi-technology exposure-IP/MPLS, 4G/5G (Core/RAN), FTTH, and data centers.
- Proficient in network monitoring & performance management, KPI dashboards, SNMP/telemetry.
- Led P1/P2 incident response and RCAs with demonstrable improvements to MTTD/MTTR.
- Experience integrating NMS/OSS with ITSM tools (ServiceNow/Remedy) via APIs/webhooks (REST/JSON).
- Scripting for automation and correlation/threshold tuning (e.g., Python/SQL); exposure to Prometheus/Grafana/ELK/AIOps is a plus.
- Maintains SOPs, config/version backups, access controls; adheres to security/compliance standards.
- Strong analytical/problem-solving and clear English communication; effective cross-team collaboration and readiness for on-call rotation.
- Good command on English language, Arabic speaking will be a plus.