logo
Return to Blogs

In today's data-driven world, businesses thrive on instant access to diverse, real-time data. We embarked on a project to create a robust, PostgreSQL-based data lake, leveraging Qlik Replicate to handle high-velocity change data with minimal latency and maximum throughput. This post details our journey and the significant impact it had on our organization.

Our primary goal was to establish a centralized data repository that could ingest and process real-time data from various sources using Qlik Replicate. The solution we implemented involved a streamlined pipeline:

real-time-data-pipeline
  • Diverse Sources: We configured Qlik Replicate to capture change data from Oracle, MSSQL, and MySQL databases.
  • Real-Time Streaming: The captured data was streamed in real-time to Apache Kafka, a distributed event streaming platform.
  • Kafka's Role: Kafka acted as a critical buffer, ensuring low-latency and reliable message queuing for high-volume data streams.
  • Unified Data Lake: The processed data was then ingested into PostgreSQL, creating a unified and easily accessible data lake for our business teams.
  • Issue: We were unable to get a count of daily changes captured from the source and applied to the target in Qlik Replicate while the task continuously captured and applied changes to the target.
  • Fix: We developed a framework that tracks changes from the incoming source and the changes applied to the target. This framework enabled us to audit source changes and applied target changes daily, ensuring data integrity and transparency.
  • Near-Zero Latency: We achieved near real-time data processing through optimized hardware and network configurations.
  • High Throughput: Our efficient infrastructure design allowed us to handle large volumes of data seamlessly.
  • Versatile Data Support: The solution supports both transactional and non-transactional data, catering to a wide range of business needs.
  • High Availability: Implementing Qlik Replicate in an Active/Passive setup with shared storage ensured continuous pipeline operation, even during outages or planned maintenance.