Data description

Datasets

Two datasets are provided: train_dataset.jl.gz and test_dataset.jl.gz
The image below illustrates how a row of the dataset looks like:



Each row represents a user’s purchase and it has two attributes associated with him: user_history and item_bought.


Attributes Description
user_history A week of user navigation records until 2 hours before the target purchase.
item_bought Unique identification no. of the product purchased. This column will be missing in the test_dataset for obvious reasons.

The field user_history comprises a list of 3-tuples, where each 3-tuple represents an event in the user’s navigation records with the format:

(event_type, event_timestamp, event_info)


Element Description
event_type Either view or search.
event_timestamp Exact moment when the event took place.
event_info If event_type is a view, this field is an item_id. Otherwise, this field contains the query string corresponding to the search event.

The column item_bought is missing for the test dataset as this constitutes the target variable.

Extra Data

In the file items_data.jl.gz there is information that can be used as features in the models. The following fields are available:

Attribute Description
Item_id Unique id no. of the listing. This field is obfuscated.
Title The listing's title.
Price The listing's price (USD).
Category_id Leaf category.
Product_id Listing product id. Field might be null for some listings.
Domain_id Listing domain id. A domain is a kind of listings clustering without any explicit relationship with the category tree.
Condition Whether the listing refers to a new or used item.

Sample submission

We provide a sample submission file (sample_submission.csv) for you to visualize the expected format of a submission.

A detailed explanation of the expected format of your submission can be found in the Evaluation and rules section.


Download links

⚖️License Dataset: CommonsAttribution-NonCommercial-ShareAlike