Two datasets are provided: train_dataset.jl.gz and test_dataset.jl.gz
The image below illustrates how a row of the dataset looks like:
Each row represents a user’s purchase and it has two attributes associated with him: user_history and item_bought.
|user_history||A week of user navigation records until 2 hours before the target purchase.|
|item_bought||Unique identification no. of the product purchased. This column will be missing in the test_dataset for obvious reasons.|
The field user_history comprises a list of 3-tuples, where each 3-tuple represents an event in the user’s navigation records with the format:
(event_type, event_timestamp, event_info)
|event_type||Either view or search.|
|event_timestamp||Exact moment when the event took place.|
|event_info||If event_type is a view, this field is an item_id. Otherwise, this field contains the query string corresponding to the search event.|
The column item_bought is missing for the test dataset as this constitutes the target variable.
In the file items_data.jl.gz there is information that can be used as features in the models. The following fields are available:
|Item_id||Unique id no. of the listing. This field is obfuscated.|
|Title||The listing's title.|
|Price||The listing's price (USD).|
|Product_id||Listing product id. Field might be null for some listings.|
|Domain_id||Listing domain id. A domain is a kind of listings clustering without any explicit relationship with the category tree.|
|Condition||Whether the listing refers to a new or used item.|
We provide a sample submission file (sample_submission.csv) for you to visualize the expected format of a submission.
A detailed explanation of the expected format of your submission can be found in the Evaluation and rules section.