Workshop - MELI Data Challenge 2021

1. Fetching the data

Load train and test datasets

Load extra item data

Convert to a df and use sku as the index

Hydrate the initial datasets with the extra data

2. Exploration

List all the columns

Get some stats for each column

Visualize the time series

Visualize daily sales grouped by site

Visualize weekly sales grouped by site

Get the top levels of categorical variable for a site

Asses overlap between train and test skus

Plot distributions

Plot distribution for continuos variable
Plot distribution for categorical variable

Plot the relationship between two continuos variables

Distribution of target stock

3. Building your validation set

Make a temporary split
Now let's build the validation dataset by calculating target stock and inventory days.

4. Modeling

Baseline #1: UNIFORM distribution

We need a baseline to know what is our starting point. We will use it latter to validate more complex models.
Besides we could iterate a simple baseline model to get better models

This is how a uniform distribution baseline output would look like

How the inventory_days probability distribution looks like for a random observation
Now let's score this model's prediction
Scoring function:

In the public leaderboard this approach got a score of 5.07

Baseline #2: Linear Model

As the uniform distributioin works so well, the idea is to slighly move the distribution toward the target day. To do so we are going to use a very wide normal distribution.

Model definition
Model Training
How the inventory_days probability distribution looks like for a random observation in this case

5. Error analysis

Here we see ....

6. Train model to submit

Now that we have validated that the approach works, we train the model with all the data in order to make a submission

Generate predictions on test data
Finally we generate a submission file with the model predictions