Refactor ETL

Consolidate and recover your technical debt related to BI: Refactor ETL by moving it to Hadoop

To fully achieve the intended ROI from both Hadoop and the 21st century approaches to Big Data, refactoring all ETL so that its done on Hadoop instead of elsewhere is a great start.


  • Identify go-forward ETL candidates/Phases and begin design/refactoring architecture document phase

  • Document current ETL solution strategy and implementations

  • Collaboration and design meetings with data source staff and current ETL staff


    • Candidate Implementation phase

    • Parallel run phase

    • Production turn-over phase

    • Technical Debt recovery phase


In an Agile driven phased approach, the client’s existing ETL is moved [source by source] to the Hadoop, freeing staff, resources and complexity of the existing ETL solution space.  When completed, identified ETL processes are Hadoop based and able to fully participate in Big Data archive/extract/machine learning/corollary activities on the Hadoop as well as automated extraction to traditional BI layers if needed.


  • Duration:  2 – 4 weeks per source
  • Staffing:  1 team per source [ .25 Sr. Architect, .25 Business Analyst, 1 Sr. Hadoop Data Process Engineer]
  • Cost:  $xx.xx - $xx.xx
  • Delivery Input:
  • Design/Collaboration meetings with ETL Staff and Data Source Staff/Vendor
  • Functioning Hadoop cluster with dependent tooling layers configured and in place
  • Delivery Artifacts:
  • Guided meetings with client and vendors
  • Solution Architecture Document [one per source]:
    • Approach Description and Goal Alignment Statement
    • Sequence Diagrams/Descriptions
    • Layer Collaboration Diagrams/Descriptions
    • Layer elaboration/configuration Diagrams/Descriptions
    • Job Flow and Recovery Design
    • Production of downstream data artifacts description and process
    • Example analytic or relational data artifact generation and use from specific source data flow
  • Delivery artifact education and meeting at hand-off points