Job Description
Responsibilities
. Monitor and maintain production data pipelines to ensure 99.9% uptime and optimal performance
. Implement comprehensive logging, alerting, and monitoring systems using Application monitoring tools
. Perform regular health checks performance, job execution times, and resource utilization to identify and resolve bottlenecks proactively
. Manage incident response procedures for pipeline failures, including root cause analysis, resolution, and post-incident reviews
. Establish and maintain disaster recovery procedures and backup strategies for critical data assets within the Databricks environment
. Conduct regular performance tuning of Spark jobs and Databricks cluster configurations to optimize cost and execution efficiency
. Maintain comprehensive documentation for operational procedures, runbooks, and troubleshooting guides
. Coordinate scheduled maintenance windows and system upgrades wit...
Ready to Apply?
Submit your application for Senior Systems Engineer - L3 Operations (Data Analytics & AI) (Ref 26210a) at jobline resources pte. ltd.
Apply Now