After a few years of pushing into enterprise architecture I have taken a break and gone back to a solution/delivery focussed role. This was a move made with some trepidation but 5-6 weeks in, I have to say that it is good to be back in solution mode, working to create a practical rendition of someone else’s enterprise architecture vision. What better way to inject some reality into the Powerpoint-coloured glasses 🙂 Interestingly, even though I am officially “just” a delivery guy now, quite a few of the things I am having to do will have enterprise reach if successful. So watch this space for insights from “the other side”. Topics may include SOA in a COTS environment, data governance and possibly even some stuff on the middleware of the day (see my tag cloud for hints) if I manage to get close enough to it.
In recent times I have encountered 3 or 4 debates (both at work and on the web) on whether you need an ETL tool when you already have an ESB (or EAI tool?). The reason this comes up is that if you just look at the connectivity and transformation capabilities it is nigh impossible to tell them apart. (Update – there is a discussion on LinkedIn about this very topic).
To my mind the key point of difference is the volume of data they are designed for. ETL tools tend toward high-volume batch-oriented capabilities such as job scheduling and management as well as the ability to split jobs into parallel streams by configuration (rather than coding in your ESB). They also have native intelligence to use the bulk update abilities of the databases where they are often used (again, you’d likely have to code this into your ESB). Processes in the ETL space are often time-critical but in the range of minutes to hours rather than seconds (there was a slide on this at the recent Informatica 9 world tour – todo:add link).
There are probably a few more reasons but the above should suffice for the purpose of this discussion.
Interestingly, in recent months there have been a few announcements of data integration / ETL-type vendors adding real-time integration capabilities to their portfolios. Informatica with 29West, Oracle with GoldenGate, SAS with DataFlux and so on.
This leaves me wondering – what differentiates them from your garden-variety ESB? Why would I buy yet another tool for realtime integration just because it has the word ‘data’ rather than ‘application’ or ‘service’?
But wait, just when you thought it was confusing enough, Informatica are heavily touting the concept of
“SOA-based data services” (complete with lots of white papers & webinars by/with David Linthicum for true SOA cred) that allow you to surface information from your warehouse directly into your operational systems without the operational systems needing to know where the data comes from. Oracle’s data service integrator (formerly BEA Liquid Data) is similar.
The Ujuzi take? I haven’t figured this one out yet, but it does feel that approx 3 years from now, we will probably see tools that can be applied to all of the above scenarios – the uber-integrator that can do service mediation, policy enforcement, transformation, maybe a bit of orchestration if you’re that way inclined, some ETL, some data services, some real time data shuffling etc. There is just too much commonality between these for it to make sense to have 4-5 different products that do very similar things. I want one modular product, with pluggable engines that you can bring to bear as required. One skillset to develop on it. One skillset to operate it.
What do you think?
Welcome – this blog is about sharing and growing our ujuzi:
- our experience – over 30 years in total in the domains of business process improvement and using technology for business advantage
- our knowledge and expertise – accumulated over time, and ever growing
- our skill and technique – how to put what we know into action
We don’t claim to know it all – far from it – this blog is more about an opportunity to get some good old 360-degree review on what we think we do know!