How to Integrate With the Cloud

When businesses decide to go to the cloud for an enterprise application and open an account with or some other SaaS (software as a service) provider, they typically don't consider how that SaaS app will integrate with their existing software.

But integration is crucial. By now, every business understands you can't have multiple applications operating on different versions of the same customer record, for example, without those versions being updated and reconciled.

[ Read David Linthicum's Deep Dive report on cloud services. | Subscribe to InfoWorld's Cloud Computing Report newsletter and stay up to date. ]

Without a solid integration strategy, data quality quickly becomes a problem. You don't want a new SaaS system to be hindered by having to enter data twice -- or worse, by not having the correct data available when a core business process requires it.

So how do enterprises that adopt SaaS applications develop an effective approach to integration? As always, the process begins with business requirements. The good news is that new, innovative integration technologies offer cost efficiencies unavailable just a few years ago -- although in some cases, requirements dictate that you opt for an old-school integration solution.

With SaaS, latency will be more of an issue, and impoverished APIs may limit integration benefits. In general, integration of SaaS apps is restricted to data integration and asynchronous process integration, ruling out the closely coupled application clusters some enterprises depend on. Within these constraints, how far you decide to push integration with SaaS apps depends on your business needs.

Making SaaS play nice with data

The beauty -- and the downside -- of SaaS is that the businesspeople don't need IT to establish accounts and to get up and running. IT has less work to do in the short term. But without integration, SaaS silos spring up, resulting in duplicate data, inaccurate reports, and ultimately, damaging data discrepancies.

Integration technology allows clouds and core enterprise systems to share data while dealing with the different ways that the data is structured. This is accomplished through data mediation subsystems that manage the underlying differences in both structure and content in flight. With SaaS in particular, you need a flexible integration solution, because both the source and target system interfaces change more frequently than those presented by traditional enterprise software.

Back in the '90s, integration technology was immature and expensive. These days, you can find lightweight open source integration solutions, such as that provided by Jitterbit, or cloud-delivered integration offered by the likes of Boomi (now a part of Dell) or Pervasive Software. Even integration appliances have emerged, such as that offered by Cast Iron Systems (now a part of IBM).

This is on top of the fifth- or sixth-generation, enterprise-class integration solutions sold by IBM, Informatica Oracle, Software AG, and other established players that have been around for years.

So how do you choose the right solution from the dozens available? It helps to start by understanding typical integration patterns and the features they demand.

The fundamentals of integration

There are several ways to move data from one system to another, some more sophisticated than others. For example, many enterprises still rely on the primitive FTP method to transfer data -- even when integrating newfangled SaaS with local applications.

The typical way to accomplish this is to lay down data from the source systems into a file once a day. Next, transfer that file from the cloud provider to the enterprise server and load the data into the target application or database. While this may seem reasonable, no mechanisms deal with the differences in data structure or content. Also, the transfer can happen once or twice a day at most, so data latency is an issue. Finally, failures may leave the source or target systems with bad or inaccurate data. Although FTP seems like the simplest approach, it's never the right one.

In the same vein, some organizations opt to build integration technology themselves, in effect coding an integration server from scratch. While this keeps developers busy, the results are almost always ineffective and inefficient. Now that such a broad range of integration solutions are available and affordable, there's no excuse to go down the path of ground-up custom coding.

That leaves you with commercially available and open source solutions, which vary widely. Navigating the technology requires some basic understanding, including the concepts of semantic mediation, connectivity, validation, and routing.

Semantic mediation (also known as data transformation) is the process of dealing with the differences in data structures or data semantics as they exist within the source system -- say, from to an SAP target system. The structures and data content are changed in flight while moving from source to target, such as First_Name(char 20) to F_Name (char 10). Data is sent to the target using the native structure, even though the structure consumed from the source is foreign.

Typically the links between source and target structures are set up using maps that chart the structure from the source schema to the target schema. Within most integration engines, this is typically a visual, drag-and-drop process. Structures can be mediated in a matter of minutes, and information can flow between two very different data structures.

Connectivity is the ability for the integration technology to adapt to the interfaces provided by the cloud or enterprise-based systems -- typically, APIs. Adapters account for the differences in the interfaces and the way the integration technology deals with the data. In the case of, for example, you invoke a Web service that produces data bound to a structure, and the adapter is able to consume that data into the integration engine where it is manipulated as required -- and then sent out another adapter to a local application, such an ERP or an inventory control system.

Validation is the ability of an integration server to validate data, such as making sure a ZIP code is correct. Routing is the ability to make sure the right data ends up getting to the right system.

The way this technology works is rather simple: It reacts to events, such as a customer record being updated or a sale being recorded. In reacting to the event, it carries out some preprogrammed function, such as extracting the changed data from the local enterprise system, accounting for the differences in structure and content, and updating the remote cloud-based system with the changed data, typically in less than a second. These events can occur at a rate of hundreds or thousands a minute, or just a few per day.

Subscribe to the Business Brief Newsletter