We will be exploring a type of batch processing called Extract, Transform and Load (ETL). Extract, Transform and Load or (ETL) does exactly what the name implies. It is the process of extracting large amounts of data from multiple sources and formats and transforming it into one specific format before loading it into a database or target file.
We will be exploring a type of batch processing called Extract, Transform and Load (ETL). Extract, Transform and Load or (ETL) does exactly what the name implies. It is the process of extracting large amounts of data from multiple sources and formats and transforming it into one specific format before loading it into a database or target file.
In this project, we will generate a pipeline to extract cars dealership data from files of three different types: JSON, XML and CSV. The data sources are available in the dealership data folder above and contain four columns of interest, the car model, the year of manufacture, its price, and the type of fuel.
After extracting the data from each file and appending them to one data frame, we will transform the desired columns and standardise them before loading our new dataframe into a new file. The new file generated is now ready for any analysis using any kind of Business Intelligence tools.