Handling data is one of the most complicated and key functionality of an application. However one of the key challenges faced is the format of the data. There are legacy data, data available from various data providers, etc., ETL - Extract, Transform and Load promises to resolve the heterogeneous data format. In this research project I analyze the ETL process and present a solution for continuous integration of heterogeneous data in an application.
The information and services available to the applications extend beyond classical database and file system methods. Other methods through which data available are, for example, Web Syndication, Web services, Web API, Legacy Application Interaction etc. These contribute to the enormous volume of data and numerous formats an application needs to handle. To facilitate the process, we present a framework solution that will enable applications to consume data in various format and make it homogeneous format.
A high volume of data is gathered by applications in various industries at a very high rate. The collection of data from various sources is a complex task. The data is not always available from standard database stream. They could be file system, database, feeds, through web services, interaction with legacy systems, etc., In this section we will discuss about various data operations that occur in collecting data from various sources.
The first key factor is data loading The purpose of this topic is to provide a framework to integrate data not data loading. There is another research project - Continuous Data Sources I'm are currently working on that allows data source definitions and data loading from various sources. For the purpose of this project, I assume the application already has the ability to load data from various sources.
For the application to collect the data from various sources and then use it within the application, the data must be aggregated. Application will not able work with independent pieces of data from each source.
The solution we provide is part of end-to-end solution to handle huge volume of diversified data from nuemerous sources. In this project we focus only the data integration part. The solution can be either implemented and used as standalone for data integration or used in conjunction with other parts of the solution. We are still in the process of updating our web pages with the end-to-end solution. You can subscribe here to learn when we update our site.
Our framework consists of two parts
1. Data provider - This is the source through which the data is made available to the application
2. Framework Library