Observability Operations Engineer (m/f/d)

: Frankfurt (50%) and Remote
: Start: 01.07.2026 (ASAP)
: 1 week ago

Job type:: Project
Duration:: 31.12.2026 + Option
Scope of work:: full-time - (100%)
Languages:: German, English

ID: 178980

Westhouse is one of the leading international recruitment agencies for the procurement of highly qualified experts in fields such as IT lifecycle management, SAP, engineering, commerce and specialist consultancy.

For our client we are currently looking for a Observability Operations Engineer (m/f/d) - Frankfurt (50%) and Remote.

Your tasks

Support CI/CD pipelines and ensure operational readiness for deployments
Validation of deployment artifacts from an operations perspective.
Defining and enforcing quality assurance measures (e.g. required documentation of standard operation procedures, successful test reports, …) to ensure the high quality of delivered products and services.
Ensuring rollback strategies and operational monitoring (observability) are in place for production deployments
Monitoring, Incident, Problem and Change Management in the specific context of providing managed Kubernetes
Monitoring system health, performance metrics, and service availability across multi-tenant environments.
Identifying, analyzing, and resolving incidents, minimizing service disruption.
Triggering root cause analysis and implementation of corrective and preventive actions.
Automation of operations critical standard processes following established software development lifecycles
Validate all automated procedures following the established software development lifecycle including staging, testing, and validation reviews
Implementing monitoring and logging strategies to support audit and compliance requirements
Performing routine security scans and remediating identified vulnerabilities

Interested?

Tobias Gollmann

Tel.: +49-89-383772-4135
Email: t.gollmann@westhouse-consulting.com

Apply here

Share posting

Your qualifications

At least 3 years of operational experience with self-managed Kubernetes clusters, self-managed services providing Kubernetes clusters and productive applications or systems in on premise environments on Kubernetes.
Hands-on experience with monitoring and logging tools (e.g., Prometheus, Grafana, Datadog, Mimir, Loki, Open Telemetry collector) both from usage as well as administration/operations perspective.
Deep understanding of networking concepts, including protocols, load balancing, and security.
Profound knowledge and implementation experience with CI/CD processes, tooling (e.g. GitLab, Jenkins, Tekton, Argo Workflows, and Argo CD), concepts and associated quality and security assurance for software delivery
Fundamental understanding of core operations processes (incident management, change management, problem management, IT Service Management) as well as SRE concepts.
Experience in gathering operational insights from monitoring or observability including SLI/SLA/SLO management and tracking.
Hands-on experience in documenting procedures properly and enforcing clear runbooks or playbooks.