House Prices: Advanced Regression Techniques challenge

My attempt at the House Prices: Advanced Regression Techniques challenge Kaggle competition. This repo contains the code to predict the house prices in dollars for a set of data. The code is available via jupyter notebooks and its divided into two main notebooks:

notebooks/Data Analysis and Visualizations.ipynb has all the data analysis and data visualization code used for the house prices dataset
notebooks/Machine Learning with Scikit-learn.ipynb has the code for building up models for predicting the survival label of the passengers of the Titanic test set and for building the predicted results for submission on Kaggle using Scikit-learn

Note: The Machine learning notebook requires that the data visualizations notebook has been run so that the processed data is available for use by this notebook.

Requirements

Python3 (3.6 recommended)
jupyter
scipy stack (pandas, scipy, scikit-learn, etc.)

docker (optional, recommended)

Getting started

The code is available via jupyter notebooks for easier use.

To run these notebooks, you need to start a jupyter server. Here, you can do it in two ways:

a) run a local jupyter server or
b) run a self-contained docker image.

Run a local jupyter server

To start the jupyter server you must first have python + jupyter installed. The quickest way to accomplish this is by installing anaconda.

After installing anaconda, you should create an environment:

$ conda create -n py36_jupyter python=3.6 anaconda

This command will install the recommended version of CPython and the necessary packages to run the code.

Finally, to start a jupyter server you simply need to run the following command:

$ jupyter notebook

Run a self-contained docker image

To run the notebooks using docker, you first need to build the container's docker image. To do so, you just need to do the following:

i) Build the container using a Makefile macro:
```
$ make build
```

ii) Run the container using a command:

$ docker image build -t jupyter_spark_custom .

Then, to start the container you can:

i) Run the container using a Makefile macro:
```
$ make run
```

ii) Run the container using a command:

$ docker run --rm -p 8888:8888 -v "$PWD"/notebooks:/home/jovyan/work --name jupyter_kaggle_house_prices jupyter_spark_custom

Setting up the data

To run the cells in the notebooks, you must first download the data for the house prices challenge. You can get it from Kaggle directly and you should put the train.csv and test.csv files inside the notebooks/data/ directory.

You can also install and setup the the kaggle api in your system and then run make download in the terminal to automatically download the data to the correct folder.

Note: The data needed to run the notebooks is not provided by this repo.

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.editorconfig		.editorconfig
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

House Prices: Advanced Regression Techniques challenge

Requirements

Getting started

Run a local jupyter server

Run a self-contained docker image

Setting up the data

License

About

Releases

Packages

Languages

License

farrajota/kaggle_house_prices

Folders and files

Latest commit

History

Repository files navigation

House Prices: Advanced Regression Techniques challenge

Requirements

Getting started

Run a local jupyter server

Run a self-contained docker image

Setting up the data

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages