Both of these approaches are fairly consistent regardless of the external data source. Whether you are utilizing outside data to ensure customer address accuracy (for shipping products for example), utilizing foreign exchange rates (for localized pricing on e-commerce sites), computing sales tax rates to complete online transactions, or anything else on which an external data source must be depended on, the process in both cases is generally the same.
The first option is to acquire, update, and maintain these external data sources yourself. This means you must acquire these datasets from vendors in their entirety since in most cases, you don't know which individual records of the data you will need. You will then need to store and maintain this data, build the data access for the applications will get the data, and perform ongoing data updates to keep it fresh and timely.
The other option is to "plug in" to these data sources over the Internet that exist out in the Cloud, typically via a Web services API using a protocol such as XML-based SOAP or REST.
Here is a breakdown of what is required in both cases:
#1) Using traditional in-house managed datasets you must
- obtain/purchase the necessary data set
- extract it from its medium (CDs, reel tapes, ftp download)
- create a load/data mapping process
- create a data access mechanism to get the data into applications, a Web site, or a business process
- regularly update the data - daily, weekly, monthly, quarterly
- store the data within a database system
- Obtain hardware resources for storing data
- take any production system offline to refresh data when the updates occur (or else create a sophisticated data synchronization method)
- perform testing after a data refresh to make sure the data reload was done properly
- typically have a "data steward" to manage and perform these various tasks
or
#2) Using a Web Services API you must- integrate the API once into an application, Web site or business process with a couple of lines of code. This can often take a minute or two for sophisticated programmers, but might require referring to some code examples and some assistance for less experienced programmers, which might take a few hours, or even a day.
It should be clear which of the two approaches is much less complex, and also more time and cost-effective. Considering that the second Web Services API approach is generally transaction-oriented, you will also pay only for what you use rather than for entire datasets as in the first case, providing the ability to "grow into" the use of external data sources rather than placing big, costly bets.
There are always scenarios where it does make sense to manage the data yourself, but these are generally few and far between. What I have found in conversation with organizations that still do this self-data management is because of the arduous requirements of maintaining and managing external data sources, the data just doesn't get maintained, or at least is not updated as much as it should be. For example, I have had discussions with organizations that only update currency rate tables once a year. That becomes inaccurate quite quickly, especially these days, and could have serious repercussions.
Beyond the obvious cost and complexity analysis, it is really the difference in outsourcing a task that is not part of your core business and freeing up those resources to focus on your core business, versus trying to do everything yourself and not focusing on what is core.
Comments