Operations & Support Expertise — Storage (m/f/d)

: Frankfurt (50%) and Remote
: Start: 01.07.2026 (ASAP)
: 2 weeks ago

Job type:: Project
Duration:: 31.12.2026 + Option
Scope of work:: full-time - (100%)
Languages:: German, English

ID: 178976

Westhouse is one of the leading international recruitment agencies for the procurement of highly qualified experts in fields such as IT lifecycle management, SAP, engineering, commerce and specialist consultancy.

For our client we are currently looking for a Operations & Support Expertise — Storage (m/f/d) - Frankfurt (50%) and Remote.

Your tasks

Provide Tier-3 operational ownership for Storage Products for Local Production (DE).
Handle complex incidents, deep troubleshooting, and root cause analysis; drive permanent fixes and preventive measures.
Ensure operational readiness for storage changes
Monitoring/alerting coverage, performance baselines, hardening, patch strategy, rollback and recovery procedures, runbooks.
Execute and improve standard operational procedures through automation (reduce toil, improve MTTR and stability).
Automate standard operational tasks (capacity checks, validation procedures, provisioning workflows where applicable).
Ensure operational readiness for deployments
Validation of deployment artifacts from an operations perspective.
Defining and enforcing quality assurance measures (e.g. required documentation of standard operation procedures, successful test reports, …) to ensure the high quality of delivered products and services.
Ensuring rollback strategies and operational monitoring (observability) are in place for production deployments
Ensure operational stability and responsiveness for the managed Kubernetes platform
Monitoring system health, performance metrics, and service availability across multi-tenant environments.
Identifying, analyzing, and resolving incidents, minimizing service disruption.
Triggering root cause analysis and implementation of corrective and preventive actions.
Reduce operational toil and improve service reliability
Address recurring operational issues by automating remedial standard operations processes
Validate all automated procedures following the established software development lifecycle including staging, testing, and validation reviews
Ensure platform operations adhere to security and compliance standards
Implementing monitoring and logging strategies to support audit and compliance requirements.
Performing routine security scans and remediating identified vulnerabilities

Interested?

Tobias Gollmann

Tel.: +49-89-383772-4135
Email: t.gollmann@westhouse-consulting.com

Apply here

Share posting

Your qualifications

5+ years in IT storage operations / service delivery / platform operations with demonstrated leadership in mission-critical environments.
Proven experience implementing/leading Incident, Problem, Change, Release governance in production.
Experience supporting platform workloads that rely on shared storage services.
Storage types: File Storage, Block Storage, Object Storage from Netapp (Ontap)
Protocols/services: NFS; object storage operations (S3-like concepts).
Kubernetes storage integration: CSI driver concepts and troubleshooting (PV/PVC lifecycle understanding).
Virtualization (Storage): Experience operating storage virtualization in enterprise environments.
ITSM / Collaboration: Jira Service Management (JSM), Jira, Confluence.
Fundamental understanding of core operations processes (incident management, change management, problem management, IT Service Management) as well as SRE concepts
Experience in gathering operational insights from monitoring or observability including SLI/SLA/SLO management and tracking.
Hand-on experience in documenting procedures properly and enforcing clear runbooks or playbooks.
Observability Hands-on experience with monitoring and logging tools (e.g., Prometheus, Grafana, Datadog, Mimir, Loki).
Familiarity with enterprise DevOps toolchains is a plus (GitLab, JFrog Artifactory, Backstage, Harness).
understanding of modern platform operations (Kubernetes/containers, automation, observability), sufficient to govern specialists.
Platform delivery concepts: GitOps and IaC awareness (Terraform/OpenTofu, ArgoCD, Helm) to govern deployment/readiness standards.