Tag Archives: pentaho

Some open-source gotchas

The increasing use of open-source is a reality in our environments.   Regardless of what you have defined as your SOE, your technical teams are constantly playing with new tools and coming up with new ideas – and hopefully you encourage this.  Every commercial vendor worth their salt embraces open source in some form of other, whether it is just a rubber-stamp exercise or as part of their portfolio offering.  IBM use Geronimo to lower the barrier to entry to the WebSphere brand, and Oracle look set to do the same with Glassfish as an entry point to the WebLogic brand (yay!).

This is all fine for “commodity” capabilities such as app servers, operating systems (!) and suchlike, but what about the more specialised niches where standards are not so mature?   For example, ESBs, data integration tools and suchlike?  Well, open source is a great way to get familiar with the sorts of capabilities you should be looking for when you do get to commercial vendor selection, but equally the open source offerings have matured beyond the point where they are just a stepping stone to the real thing.  There is quite a large number of decent lightweight ESBs (Mule, wsO2 etc) and data integration tools (e.g. Pentaho Kettle) out there that quite frankly will address 80% of the functional use cases you might throw at them, and hopefully the non-functional ones too.

Here are a few things to look out for as you boldly embrace open source, either as an architectrural stepping stone or a core element of your IT environment:

  1. Pick a popular tool – there must be a good reason that people like to use it (and it won’t be because it has been mandated from on high to ensure that the maximum return is squeezed out of an over-priced commercial offering)
  2. Don’t try to support yourself – Even if the tool is open-source, rely on the community to fix bugs, and avoid deploying fixes you have built yourself as you will effectively have your own fork of the code if your fixes don’t make it into the mainstream.  Maintaining this discipline will also allow you to take up commercial support when the tool is a roaring success and becomes mission-critical.
  3. Don’t hack the tool – Just because you have the source code doesn’t mean you can customise the tool to do things it wasn’t designed to.  Why?  Because when the tool fails because of a defect in your code, the baby will be thrown out with the bathwater.  On the other hand, if you are particularly keen on a commercial offering, by all means do take an open source tool, hack it to make it break and then use this as an excuse to get management to fork out for a commercial offering because open source sucks!
  4. Skills – Unfortunately in the increasingly outsourced world, where projects can no longer be delivered fully using in-house capability, open source presents another challenge.  The outsource vendors tend to invest in the mainstream.  This is natural because this is where one is guaranteed to get the volume of work.  Unfortunately this can be a bit of a blocker for open source, because the more niche the technology you want to use is, the less likely the outsourcer will be able to provide resources to build it, and if you contract them on a fixed-price to build it, you’d better have a good reference architecture to govern their deliverable!.  This isn’t only a problem with outsourcing.  Niche technologies also raise the bar on internal hiring.
  5. Non-functionals – if you manage to steer through the above issues (and others I haven’t thought of today) your project will probably be a success, which means it will quickly be promoted to being a mission-critical part of the enterprise.   Follow your basic functional proof of concept with a few non-functional scenarios to test out how it handles failover, load balancing and suchlike.
  6. Familiarise yourself with the product roadmap – particularly for major releases.   In the open source world, sometimes decisions are made that favour architectural purity (or some other greater good) over basic functionality.  Case in point – Glassfish 2.1 has clustering support but 3.0 doesn’t! (this is a personal annoyance for me at present – luckily it is still at the exploratory stage).

Finally, don’t forget that there will be use cases where a commercially supported product is better than “plain old” open source.   Depending on your need, this could be open source with commercial offering, or a commercial product – just keep an open mind and choose based on what you really need.