Job Title : Confluent Kafka Infrastructure Administrator Location : Remote (Pak) / Hybrid (Riyadh, KSA) Employment Type : Full-Time / Contract-to-Hire Experience Level : 5+ years Role Summary As a Confluent Kafka Infrastructure Administrator , you'll be the operational backbone for our Kafka ecosystem, handling the full lifecycle of clusters from installation to ongoing maintenance.
You'll deploy and manage Confluent Platform in multi-environment setups (dev / test / UAT / preprod / prod / DR), ensuring high availability, security, and performance.
This hands-on role involves automation, troubleshooting, and collaboration with engineering teams to support real-time data pipelines in production-scale environments.
Key Responsibilities Installation & Setup : Lead the deployment and initial configuration of Confluent Platform (Enterprise / Cloud editions) and Apache Kafka clusters across multiple environments (dev, test, UAT, preprod / prod, DR), including : Installing brokers, ZooKeeper / KRaft controllers, Schema Registry, Kafka Connect, and ksqlDB using automated tools like Ansible, Terraform, or Confluent CLI / Helm charts on Kubernetes, bare-metal, or cloud (GCP).
Configuring topics, partitions, replication factors, and security (SSL / TLS encryption, SASL authentication, ACLs / RBAC via ZooKeeper or KRaft) with zero-downtime setups.
Integrating auxiliary services like MirrorMaker 2 or Confluent Replicator for cross-environment replication, and performing initial load testing / health checks with tools like Kafka's perf tools.
Automating environment provisioning for reproducibility, including VPC peering, load balancers, and integration with existing infrastructure.
Maintenance & Operations : Oversee day-to-day cluster management, upgrades, and patching in multi-environment landscapes, ensuring >
99.9% uptime and compliance with SLAs.
Monitoring & Troubleshooting : Implement and maintain observability with Prometheus, Grafana, Confluent Control Center; set up alerts for metrics like lag, throughput, and broker health; perform root-cause analysis for incidents (e.g., replication failures, disk exhaustion).
Optimization & Scaling : Conduct capacity planning, performance tuning (e.g., JVM tweaks, partition balancing), and horizontal / vertical scaling; implement Tiered Storage for cost efficiency in large-scale setups.
Security & Compliance : Enforce security best practices (OAuth / Kerberos, auditing); manage access controls and ensure GDPR / HIPAA compliance across environments.
Automation & Documentation : Develop scripts (Python / Bash) for routine tasks like backups, rollouts, and failover; create runbooks and contribute to IaC pipelines (CI / CD with Jenkins / GitHub Actions).
Collaboration : Partner with data engineers, DevOps, and architects to support POCs, integrations (e.g., Flink / Spark), and environment migrations; provide 24 / 7 on-call support for prod incidents.
Required Qualifications & Skills Bachelor's in Computer Science / Engineering; Confluent Certified Administrator (CCAAK) or equivalent a plus.
5+ years in infrastructure ops; 3+ years hands-on with Confluent Platform & Apache Kafka (KRaft mode preferred).
Proven track record in deploying / managing Kafka clusters across multi-environments (dev / test / UAT / preprd / prod / DR); expertise in Ansible / Terraform / Helm for automated installs.
Proficiency in Bash / Python scripting; Kafka admin APIs, Connect, Streams, ksqlDB; Confluent CLI, REST Proxy.
Containerization (Docker / K8s); CI / CD (Jenkins / GitHub Actions); multi-cloud (AWS / GCP / Azure) with managed services like MSK.
Tools like Prometheus / Grafana / ELK; experience with upgrades, backups, DR (MirrorMaker / Replicator), and performance tuning.
SSL / TLS, SASL, ACLs / RBAC; auditing and compliance in distributed systems.
Strong troubleshooting, documentation, and communication; ability to handle on-call rotation.
Powered by JazzHR
Administrator • Islamabad, Federal Teritory, PK