{ "cells": [ { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": true }, "outputs": [], "source": [ "%matplotlib notebook\n", "import numpy as np\n", "import numpy as np\n", "import matplotlib\n", "import matplotlib.animation\n", "import matplotlib.pyplot as plt\n", "from mpl_toolkits.mplot3d import Axes3D\n", "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Dataset and visualization\n", "----\n", "\n", "The goal for this notebook is to show you some data, define terms of supervised learning, and give you confidence to go out and grab data from the wild world. Also, the first rule of machine learning: **LOOK AT YOUR DATA**. \n", "\n", "I cannot emphasize this maxim enough: **LOOK AT YOUR DATA**\n", "\n", "\n", "1. Housing prices are one of the most popular datasets on Kaggle--and its classical. We're going to use the [Ames set](http://jse.amstat.org/v19n3/decock/DataDocumentation.txt). This is an updated dataset for the famous \"Boston Housing Dataset\", which has been used for many years in stats classes.\n", "\n", "2. I also want to point out the [UCI machine learning datasets](https://archive.ics.uci.edu/ml/index.php), which are amazing for ML datasets. Could be valuable for your projects!\n", "\n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | Order | \n", "PID | \n", "MS.SubClass | \n", "MS.Zoning | \n", "Lot.Frontage | \n", "Lot.Area | \n", "Street | \n", "Alley | \n", "Lot.Shape | \n", "Land.Contour | \n", "... | \n", "Pool.Area | \n", "Pool.QC | \n", "Fence | \n", "Misc.Feature | \n", "Misc.Val | \n", "Mo.Sold | \n", "Yr.Sold | \n", "Sale.Type | \n", "Sale.Condition | \n", "SalePrice | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | \n", "2 | \n", "526350040 | \n", "20 | \n", "RH | \n", "80.0 | \n", "11622 | \n", "Pave | \n", "NaN | \n", "Reg | \n", "Lvl | \n", "... | \n", "0 | \n", "NaN | \n", "MnPrv | \n", "NaN | \n", "0 | \n", "6 | \n", "2010 | \n", "WD | \n", "Normal | \n", "105000 | \n", "
2 | \n", "3 | \n", "526351010 | \n", "20 | \n", "RL | \n", "81.0 | \n", "14267 | \n", "Pave | \n", "NaN | \n", "IR1 | \n", "Lvl | \n", "... | \n", "0 | \n", "NaN | \n", "NaN | \n", "Gar2 | \n", "12500 | \n", "6 | \n", "2010 | \n", "WD | \n", "Normal | \n", "172000 | \n", "
3 | \n", "4 | \n", "526353030 | \n", "20 | \n", "RL | \n", "93.0 | \n", "11160 | \n", "Pave | \n", "NaN | \n", "Reg | \n", "Lvl | \n", "... | \n", "0 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "0 | \n", "4 | \n", "2010 | \n", "WD | \n", "Normal | \n", "244000 | \n", "
4 | \n", "5 | \n", "527105010 | \n", "60 | \n", "RL | \n", "74.0 | \n", "13830 | \n", "Pave | \n", "NaN | \n", "IR1 | \n", "Lvl | \n", "... | \n", "0 | \n", "NaN | \n", "MnPrv | \n", "NaN | \n", "0 | \n", "3 | \n", "2010 | \n", "WD | \n", "Normal | \n", "189900 | \n", "
5 | \n", "6 | \n", "527105030 | \n", "60 | \n", "RL | \n", "78.0 | \n", "9978 | \n", "Pave | \n", "NaN | \n", "IR1 | \n", "Lvl | \n", "... | \n", "0 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "0 | \n", "6 | \n", "2010 | \n", "WD | \n", "Normal | \n", "195500 | \n", "
6 | \n", "7 | \n", "527127150 | \n", "120 | \n", "RL | \n", "41.0 | \n", "4920 | \n", "Pave | \n", "NaN | \n", "Reg | \n", "Lvl | \n", "... | \n", "0 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "0 | \n", "4 | \n", "2010 | \n", "WD | \n", "Normal | \n", "213500 | \n", "
7 | \n", "8 | \n", "527145080 | \n", "120 | \n", "RL | \n", "43.0 | \n", "5005 | \n", "Pave | \n", "NaN | \n", "IR1 | \n", "HLS | \n", "... | \n", "0 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "0 | \n", "1 | \n", "2010 | \n", "WD | \n", "Normal | \n", "191500 | \n", "
8 | \n", "9 | \n", "527146030 | \n", "120 | \n", "RL | \n", "39.0 | \n", "5389 | \n", "Pave | \n", "NaN | \n", "IR1 | \n", "Lvl | \n", "... | \n", "0 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "0 | \n", "3 | \n", "2010 | \n", "WD | \n", "Normal | \n", "236500 | \n", "
9 | \n", "10 | \n", "527162130 | \n", "60 | \n", "RL | \n", "60.0 | \n", "7500 | \n", "Pave | \n", "NaN | \n", "Reg | \n", "Lvl | \n", "... | \n", "0 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "0 | \n", "6 | \n", "2010 | \n", "WD | \n", "Normal | \n", "189000 | \n", "
9 rows × 82 columns
\n", "