Getting started with #MeliDataChallenge

Here specify the path where you data is located

Load train data

Load item metadata

Different approaches to build a baseline model...

1) Tops items of the most visited domain

Here the idea is the following: We find out which is the most visited domain by the user, and the we recommend the top selling items of that domain.

First we generate a dict of the form: {'domain': {'item_id': no. of purchases } }.

This is the "learning" stage of this simple model (that's why we do it only with the train data!).

Then we define some auxiliary functions for making the predictions

Now we are ready to generate our recommendations for the test rows

We extract the target value for the test rows

Measure performance

2) Last viewed items

We simply recommed the last items visited by the user

Now we are ready to generate the recommendations

y_pred = [] for row in tqdm(rows_test): recom = last_viewed(row) y_pred.append(recom)

Measure performance

3) Views-purchases

The idea here is to predict what most users, who visited the very same item that a certain user, ended up buying .

First we build a dictionary that maps item viewed with item bought and their frequency.

This is the "learning" stage of this simple model (that's why we do it only with the train data!).

Now we are ready to generate our recommendations for the test rows

Measure performance

How could these baselines be improved?