Realtime_Data_Streaming_Pipeline

  • Orchestrated an end-to-end, real-time data pipeline with Airflow, Kafka, Spark, Cassandra, and Postgres, automating API ingestion and moving data through each stage in a fully containerized Docker Compose setup.
  • Streamed and buffered continuous user data with Kafka + Zookeeper, decoupling ingestion from processing and improving reliability, fault tolerance, and replayability at scale.
  • Processed streaming records with Apache Spark in a master-worker architecture and persisted outputs to Cassandra, enabling low-latency distributed storage for high-volume events.
  • Achieved measurable performance gains in testing, including under 500 ms end-to-end latency and 10,000+ messages per second, demonstrating a production-ready pipeline for scalable real-time analytics.