Data description
In this section, you can find the description of the different datasets that are provided to address the Challenge's task. At the end of the section, you will find the links to download the data.
Train Dataset
As for training, one file is provided: train_data.parquet. The image below illustrates what a row of the train dataset looks like:
This dataset comprises two months (February and March 2021) of sales data at a daily level for a subset of Mercadolibre's SKUs (stock keeping units). Each row corresponds to a particular date-SKU combination. You can think of a SKU as a combination of an Item and a variation. An item could be, for example, a "t-shirt of X brand" and a variation of it could be "size M, color black". So, we are interested in predicting the stock at SKU level because you could run out of stock for the color black but you could have plenty of the t-shirt X for color brown. Besides SKU and date, for each row, the following fields are available:
Attributes | Description |
---|---|
sold_quantity | number of units of the corresponding SKU that were sold on that particular date. |
current_price | point in time correct SKU's price. |
currency | currency in which the price is expressed. |
listing_type | type of listing the SKU had for that particular date. Possible values are classic or premium and they relate to the exposure the items have and the fee charged to the seller as a sales comission. Another important advantage for an item listed as premium is its capability to pay in installments without interest rate. |
shipping_logistic_type | type of shipping method the SKU offered, for that particular date. Possible values are fulfillment, cross_docking and drop_off. |
shipping_payment | whether the shipping for the offered SKU at that particular date was free or paid, from the buyer's perspective. |
minutes_active | number of minutes the SKU was available for purchase on that particular date. |
Test Dataset
For testing, the following file is provided test_data.csv. This file contains only two columns:
Attribute | Description |
---|---|
SKU | indicates the SKU for which you have to make your prediction |
target_stock | inventory level (aka number of units of the corresponding SKU for which you have to provide your estimation of inventory days. |
Items Data
In the file items_static_metadata.jl (notice that the file is actually a jasonline file) there is some extra data related to the SKUs characteristics. It comprises a list of dicts where each dict contains a specific SKU metadata. The following fields are available:
Attribute | Description |
---|---|
SKU | stock-keeping-unit. This is a unique identifier for each physical different unit of inventory. |
item_id | unique identifier of the listing the SKU belongs to. The same listing can be associated with more than one SKUs, for example, the SKUs "t-shirt X, size M color Black" and "t-shirt X, size M, color red" share the same item_id which is "t-shirt X". |
item_domain_id | listing's domain id. A domain is a kind of listings clustering within MercadoLibre. For example "t-shirt X" could be in the domain MLB_SPORT_TSHIRTS. |
item_title | the listing's title. The title is at item level. So, in the previous example, one possible title could be "t-shirt X". |
site_id | the MercadoLibre's site the listing belongs to. The labels MLB, MLA and MLM refer to Brazil, Argentina and Mexico respectively. |
product_id | listing product id. Field might be null for some listings. Because an item is listed for the seller, it is often common that different sellers sell the same thing (same product). This could be the case of "t-shirt X" if it is sold for many sellers. So in an intent to catalog the same product, MercadoLibre assigns the same product_id to all those items. |
product_id_family | listing product family id. Field might be null for some listings. This is the same as above, but with a high hierarchy catalog product. |
Sample submission
We provide a sample submission file (sample_submission.csv.gzip) for you to visualize the expected format of a submission.
A detailed explanation of the expected format of your submission can be found in the Evaluation and rules section.