Refactor ETL

To achieve the intended ROI from both Hadoop and the 21st century approaches to Big Data, refactoring all ETL so that its done on Hadoop instead of elsewhere is a great start.


  • Identify go-forward ETL candidates/Phases and begin design/refactoring architecture document phase

  •  Document current ETL solution strategy and implementations

  •  Collaboration and design meetings with data source staff and current ETL staff


    • Candidate Implementation phase

    • Parallel run phase

    • Production turn-over phase

    •  Technical Debt recovery phase


In an Agile driven phased approach, the client’s existing ETL is moved [source by source] to the Hadoop, freeing staff, resources and complexity of the existing ETL solution space. When completed, identified ETL processes are Hadoop based and able to fully participate in Big Data archive/extract/machine learning/corollary activities on the Hadoop as well as automated extraction to traditional BI layers if needed.


Machine Learning Articles

Datalakes are all the rage in IT right now, but can it really transform the tactics and strategy of the business ?

Read more