So we actually want to only get 3 columns from this spreadsheet: The product code, the date of sales and the number of goods sold (the metric).

In order to do that, we need to actually know the layout of the spreadsheet before we do the ETL. Again, to be able to do that, we need to know the exact layout of the file.

Below I’ll describe an actual (obfuscated) example that you will probably recognize as it is equally hideous as simple in it’s horrible complexity.

Take a look at this file: Let’s assume that this spreadsheet describes the number of products sold on a given date.

