In recent times I have encountered 3 or 4 debates (both at work and on the web) on whether you need an ETL tool when you already have an ESB (or EAI tool?). The reason this comes up is that if you just look at the connectivity and transformation capabilities it is nigh impossible to tell them apart. (Update – there is a discussion on LinkedIn about this very topic).
To my mind the key point of difference is the volume of data they are designed for. ETL tools tend toward high-volume batch-oriented capabilities such as job scheduling and management as well as the ability to split jobs into parallel streams by configuration (rather than coding in your ESB). They also have native intelligence to use the bulk update abilities of the databases where they are often used (again, you’d likely have to code this into your ESB). Processes in the ETL space are often time-critical but in the range of minutes to hours rather than seconds (there was a slide on this at the recent Informatica 9 world tour – todo:add link).
There are probably a few more reasons but the above should suffice for the purpose of this discussion.
Interestingly, in recent months there have been a few announcements of data integration / ETL-type vendors adding real-time integration capabilities to their portfolios. Informatica with 29West, Oracle with GoldenGate, SAS with DataFlux and so on.
This leaves me wondering – what differentiates them from your garden-variety ESB? Why would I buy yet another tool for realtime integration just because it has the word ‘data’ rather than ‘application’ or ‘service’?
But wait, just when you thought it was confusing enough, Informatica are heavily touting the concept of
“SOA-based data services” (complete with lots of white papers & webinars by/with David Linthicum for true SOA cred) that allow you to surface information from your warehouse directly into your operational systems without the operational systems needing to know where the data comes from. Oracle’s data service integrator (formerly BEA Liquid Data) is similar.
The Ujuzi take? I haven’t figured this one out yet, but it does feel that approx 3 years from now, we will probably see tools that can be applied to all of the above scenarios – the uber-integrator that can do service mediation, policy enforcement, transformation, maybe a bit of orchestration if you’re that way inclined, some ETL, some data services, some real time data shuffling etc. There is just too much commonality between these for it to make sense to have 4-5 different products that do very similar things. I want one modular product, with pluggable engines that you can bring to bear as required. One skillset to develop on it. One skillset to operate it.
What do you think?