Lead/Architect level requirement to assist on building out Kafka platform and specifically with focus on Flink; Should be able to contribute to design, propose, and evaluate solutions. Should also be hands-on with development and lead the Kafka development team.
Description:
- Expert-level architecture and implementation experience of Flink applications on Confluent Platform, specifically for high-volume, low-latency stream processing.
- Extensive experience architecting, implementing, and administering the Confluent Cloud Kafka and Flink platform in production environments.
- Advanced proficiency in core Flink concepts including state management (Keyed/Operator State, RocksDB), Exactly-Once semantics, and configuring checkpointing and savepoints for fault tolerance.
- Deep knowledge of Event Time processing, Watermarks (Bounded Out-of-Orderness), and complex Windowing(Tumbling, Sliding, Session) for accurate stream analytics.
- Advanced knowledge of KSQL DB and KStreams for rapid development of real-time stream processing/analytics alongside Flink.
- Proven proficiency in Kafka Connectors (including Change Data Capture/CDC) from configuration to end-to-end integration in cloud environments.
- Demonstrated experience applying Flink and Kafka in the Retail Industry for use cases such as real-time inventory management, dynamic pricing, fraud detection, and personalized customer experience (e.g., clickstream analysis).
- Strong background in platform governance: schema registry, RBAC, audit logging, retention, and compliance.
- Deep expertise with Terraform and the Confluent Terraform provider; adherence to Infrastructure-as-Code (IaC) methodology and automation.
- Practical experience designing and managing Harness CI/CD pipelines (or other similar tools) for automated deployment and configuration management of Flink jobs.
- Advanced knowledge of GCP networking, including Private Service Connect (PSC), DNS, Firewalls, and enterprise security.
- Track record in implementing cloud-native monitoring and observability solutions; troubleshooting, Flink performance tuning, and incident response.
- Thorough experience with Disaster Recovery (DR), High Availability (HA) strategies, backup/restore, and multi-region design.
- Practical experience with cost optimization, resource monitoring, and right-sizing specifically for Flink and Kafka resources in Confluent Cloud.
- Strong abilities in schema management, version compatibility, and data governance.
- Demonstrated capability in capacity planning, partitioning, and scaling high-throughput streaming architectures.
- Experienced in Agile/DevOps methodologies.
- Experience providing hands-on production support for mission-critical streaming platforms.
#INDCAN