Emerging Tech - Data Story: What has Changed?

Published:

Data Story: What has Changed?

  • Business Intelligence -> Big Data
  • Traditional Data Warehouse -> Data Lake
  • Applications -> Microservices
  • ETL -> ETL is next on list of obsolete terms
  • Rise of Spark -> Spark provides an ideal middleware framework for writing code that gets the job done fast, reliable, readable.

The Future of ETL Tools?

  • Only develop connectors for integration?
  • Re-build entire backend to Hadoop/Spark/Flink?
  • Provide a GUI with code genration?

What is ETL Hell?

  • Data getting out of sync
  • Performance issues
  • Waste of server resources (peak performance)
  • Plain text codes in hidden stages
  • Click, click, click, click (RSI danger!)
  • CSV files are not type safe
  • All-or-nothing approach in batch jobs
  • Legacy code

Is NoETL the future?

Why NoETL? ETL is an intermediary step, and at each ETL step you can introduce errors and risk:

  • ETL can lose data
  • ETL can duplicate data after failover
  • ETL tools can cost millions of dollars
  • ETL decreases throughput
  • ETL increases the complexity of the pipeline
    source: noetl.org

Key Takeaways

  • Pick one: ETL, ELTm ELTL,..
  • Treat all data equal: batch and stream
  • Continuous ETL: don’t wait for a phase to complete
  • Don’t just transform: enrich, alert, predict
  • Build for scale: distribute data nad logic
  • Automate everything
  • Think about NoETL: each copy is a risk!