Home‎ > ‎Database‎ > ‎

Extraction transformation and loading


            Extraction transformation and loading is a process referred to as “ETL.”  The ETL process consists of a set of three steps. These steps include extracting data from internal and external sources, transforming the data to meet certain needs, and loading the data into a data warehouse. It is a process that supports business activities and decision making.

            Extraction is the first step in the ETL process. This step involves extracting the data from the sources. Only the data that can be useful for business activities and decision making should be extracted. Sometimes useless data can be extracted but the data the data will be checked for relevance at a later step in ETL. The data can come from many different sources and it may also come in many different formats.  A common data source formats are relational databases and flat files. The extraction of data with different formats is difficult so extraction must be able to understand data with different formats.

            Transformation is the second step in  the ETL process. It sets various rules for the extracted data. The rules make all the data be transformed into the same format. Sometimes the data will require little or no transformation. In the transformation step the data is also checked for relevance to business activities and decision making. Transformation activities can include merging data from different sources, encoding, sorting, filtering, data cleansing, and transposing. Once all of the data has a single format and it has been checked for relevance transformation is done and the data is ready for the loading process of ETL.

            Loading is the third and final step in the ETL process. It takes the transformed data and loads it into a data warehouse. Depending on how the data warehouse is going to be used the loading process can widely vary. Some data warehouses can overwrite existing data or others can add new data to the existing data.  After the loading step, ETL is complete. The data can then be used for various things like business activities and decision making.

            There are several challenges with ETL.  A challenge with ETL is that it is a time consuming process. It is difficult to save time. The step that takes the longest in ETL is loading the data to the data warehouse. Another challenge with ETL is that the steps are dependant of each other. Transformation cannot be done if extraction has not been done and loading cannot be done if transformation has not been done. Even though ETL is time consuming, it must be done properly. It must be done properly because it will be used for important things like business activities and decision making. Signs of good ETL include performance and scalability.

            There are many products and solutions in the market that deal with the challenges of the ETL to make it  a better process. A product that can  make ETL a better process is SAS Data Enterprise Integration Server. This solution can access data from many sources. It can extract, transform, load, cleanse, conform, aggregate, and manage data. It also supports data warehousing. Some of the benefits of using SAS Data Enterprise Integration Server are that it will always access the data you need, it will improve productivity, it will deliver consistent and verifiable information, and it will eliminate redundancy.

            I think the efficiency of ETL needs to be improved because it is too time consuming. I also think ETL is an important process because it provides good information for business activities and decision making. The better the ETL process is done, the better the users will be able to make good decisions.