logo talend high Open Source ETL with Talend (SCALE 9x)

Many companies use data integration tools to reconcile large data sets, or extract them into a data warehouse for analysis. Talend is a leading company specializing in open source data integration solutions. Ross Turk, Senior Directory of Community explains a bit about what data integration is and how companies are using it to enrich existing data sets and squeeze more insight and information out of them.

One of the more common data integration practices is ETL, which is short for extract, transform, load. In plainspeak, it’s taking data from one place, changing it in some way, and putting it somewhere else. Almost everyone who runs a technical system does this at some point, whether they know it or not.

Talend Open Studio lets people visually design the flow of data across multiple systems. These flows can be batched, to extract data for reporting purposes, or event-driven, to integrate data between two running systems.

For example, a small company could create a job to copy data from Salesforce.com to an Excel spreadsheet so they can report on their pipeline. A web developer could pull a user IP out of a MySQL database, combine it with a geo database, and load the resulting city/state into another database (or the same database, even). I’m using it right now to extract traffic patterns from our Talend forums and study the community’s behavior.

Large companies use it to establish data warehouses: data from a billion different sources, collected together in one place, structured so that it’s easy to write queries against.

When I was in engineering at SourceForge, we spent a really long time building a system to collect data from apache logs, SVN, mailing lists, and our site database, slice it a billion different ways, and load it into a centralized statistics system. Talend Open Studio could have helped, if it had existed at the time.

Medium and large companies are probably well aware of the data integration industry, and ETL tools in particular. Small companies, though, mostly do this stuff by custom-coding Perl/Python/Ruby because they don’t know tools exist, or they think the tools are too expensive. The Talend studio is free and open source, and I think it could be useful to a lot of people who aren’t familiar with this kind of tool.

Tags: