Working with Kaggle datasets
Exploring the kaggle
command API¶
We explore what commands can be executed using Kaggle.
- competitions (c) : exploring the available competitions.
- datasets (d) : explore the available datasets.
- kernels (k) : explore the available kernels.
!kaggle -h
Exploring kaggle
competitions¶
We can see that there are many competitions available.
-
ref
: Name of the competition. -
deadline
: When the competition is due for its submission. -
category
: Competitions have different categories as follows-
featured
: These competitions usually have prize money (i.e., 25k-100k) split between the top 3 teams. -
playground
: These datasets are just to play and explore. -
gettingStarted
: Beginners should try exploring these datasets to get new skills -
masters
: Machine learning experts can try these datasets and win prize money >100k. -
research
: These are datasets for research purposes. -
recruitment
: Firms are usingkaggle
to identify new hires so you can try these datasets to build up your profile.
-
-
reward
: Total prize money for the top x3 teams entering the competition. -
teamCount
: Number of teams that are entering the competition. -
userHasEntered
: Have you entered the specified competition?
!kaggle c list
gettingStarted
competitions are shown below
!kaggle c list --category gettingStarted
masters
competitions are shown below
!kaggle c list --category masters
playground
competitions are shown below
!kaggle c list --category playground
featured
competitions are shown below
!kaggle competitions list --category featured
Example: View the titanic
dataset leaderboard¶
We can explore the best scores of each team/user for a specific competition. We can see that many people have a top score of 1.0 which means they have perfect prediction!
!kaggle competitions leaderboard titanic --show
Example: Downloading the titanic
dataset¶
We will explore one of the most well-known datasets, that is the titanic
dataset. Always list all the files associated to the competition of interest before downloading as some of the requied files can be >100MB. In the titanic
dataset, the files are small since they are < 1MB.
!kaggle competitions files titanic
We can easily download the files into our selected directory
!kaggle competitions download titanic
Evaluating your submission scores¶
We can also evaluate our machine learning submission scores in Kaggle by competition. In the below example, the competition was the home-credit-default-risk
competition
!kaggle competitions submissions -c home-credit-default-risk
Summary¶
We have learnt how to use the kaggle
API to explore kaggle competitions and download datasets. We also learnt how to obtain our submitted machine learning model performance scores based on our competition submissions. For more details see the Kaggle API Github or see the documentation on the Kaggle website.
Comments
Comments powered by Disqus