{ "metadata": { "anaconda-cloud": {}, "kernelspec": { "name": "python", "display_name": "Pyolite", "language": "python" }, "language_info": { "codemirror_mode": { "name": "python", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8" } }, "nbformat_minor": 4, "nbformat": 4, "cells": [ { "cell_type": "markdown", "source": "
\n \"cognitiveclass.ai\n
\n\n# Data Analysis with Python\n\nEstimated time needed: **30** minutes\n\n## Objectives\n\nAfter completing this lab you will be able to:\n\n* Explore features or charecteristics to predict price of car\n", "metadata": {} }, { "cell_type": "markdown", "source": "

Table of Contents

\n\n
\n
    \n
  1. Import Data from Module
  2. \n
  3. Analyzing Individual Feature Patterns using Visualization
  4. \n
  5. Descriptive Statistical Analysis
  6. \n
  7. Basics of Grouping
  8. \n
  9. Correlation and Causation
  10. \n
  11. ANOVA
  12. \n
\n\n
\n\n
\n", "metadata": {} }, { "cell_type": "markdown", "source": "

What are the main characteristics that have the most impact on the car price?

\n", "metadata": {} }, { "cell_type": "markdown", "source": "

1. Import Data from Module 2

\n", "metadata": {} }, { "cell_type": "markdown", "source": "

Setup

\n", "metadata": {} }, { "cell_type": "markdown", "source": "you are running the lab in your browser, so we will install the libraries using `piplite`\n", "metadata": {} }, { "cell_type": "code", "source": "#you are running the lab in your browser, so we will install the libraries using ``piplite``\nimport piplite\nawait piplite.install(['pandas'])\nawait piplite.install(['matplotlib'])\nawait piplite.install(['scipy'])\nawait piplite.install(['seaborn'])\n", "metadata": { "trusted": true }, "execution_count": 7, "outputs": [] }, { "cell_type": "markdown", "source": "Import libraries:\n", "metadata": {} }, { "cell_type": "markdown", "source": "If you run the lab locally using Anaconda, you can load the correct library and versions by uncommenting the following:\n", "metadata": {} }, { "cell_type": "code", "source": "#If you run the lab locally using Anaconda, you can load the correct library and versions by uncommenting the following:\n#install specific version of libraries used in lab\n#! mamba install pandas==1.3.3\n#! mamba install numpy=1.21.2\n#! mamba install scipy=1.7.1-y\n#! mamba install seaborn=0.9.0-y", "metadata": { "trusted": true }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": "import pandas as pd\nimport numpy as np", "metadata": { "trusted": true }, "execution_count": 1, "outputs": [ { "name": "stderr", "text": "/lib/python3.9/site-packages/pandas/compat/__init__.py:124: UserWarning: Could not import the lzma module. Your installed Python is incomplete. Attempting to use lzma compression will result in a RuntimeError.\n warnings.warn(msg)\n", "output_type": "stream" } ] }, { "cell_type": "markdown", "source": "This function will download the dataset into your browser\n", "metadata": {} }, { "cell_type": "code", "source": "#This function will download the dataset into your browser \n\nfrom pyodide.http import pyfetch\n\nasync def download(url, filename):\n response = await pyfetch(url)\n if response.status == 200:\n with open(filename, \"wb\") as f:\n f.write(await response.bytes())\n", "metadata": { "trusted": true }, "execution_count": 2, "outputs": [] }, { "cell_type": "markdown", "source": "Load the data and store it in dataframe `df`:\n", "metadata": {} }, { "cell_type": "markdown", "source": "This dataset was hosted on IBM Cloud object. Click HERE for free storage.\n", "metadata": {} }, { "cell_type": "code", "source": "path='https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DA0101EN-SkillsNetwork/labs/Data%20files/automobileEDA.csv'", "metadata": { "trusted": true }, "execution_count": 3, "outputs": [] }, { "cell_type": "markdown", "source": "you will need to download the dataset; if you are running locally, please comment out the following\n", "metadata": {} }, { "cell_type": "markdown", "source": "\\#you will need to download the dataset; if you are running locally, please comment out the following\nawait download(path, \"auto.csv\")\npath=\"auto.csv\"\n", "metadata": {} }, { "cell_type": "code", "source": "await download(path, \"auto.csv\")\nfilename=\"auto.csv\"", "metadata": { "trusted": true }, "execution_count": 4, "outputs": [] }, { "cell_type": "code", "source": "\ndf = pd.read_csv(filename)\ndf.head()", "metadata": { "trusted": true }, "execution_count": 5, "outputs": [ { "execution_count": 5, "output_type": "execute_result", "data": { "text/plain": " symboling normalized-losses make aspiration num-of-doors \\\n0 3 122 alfa-romero std two \n1 3 122 alfa-romero std two \n2 1 122 alfa-romero std two \n3 2 164 audi std four \n4 2 164 audi std four \n\n body-style drive-wheels engine-location wheel-base length ... \\\n0 convertible rwd front 88.6 0.811148 ... \n1 convertible rwd front 88.6 0.811148 ... \n2 hatchback rwd front 94.5 0.822681 ... \n3 sedan fwd front 99.8 0.848630 ... \n4 sedan 4wd front 99.4 0.848630 ... \n\n compression-ratio horsepower peak-rpm city-mpg highway-mpg price \\\n0 9.0 111.0 5000.0 21 27 13495.0 \n1 9.0 111.0 5000.0 21 27 16500.0 \n2 9.0 154.0 5000.0 19 26 16500.0 \n3 10.0 102.0 5500.0 24 30 13950.0 \n4 8.0 115.0 5500.0 18 22 17450.0 \n\n city-L/100km horsepower-binned diesel gas \n0 11.190476 Medium 0 1 \n1 11.190476 Medium 0 1 \n2 12.368421 Medium 0 1 \n3 9.791667 Medium 0 1 \n4 13.055556 Medium 0 1 \n\n[5 rows x 29 columns]", "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
symbolingnormalized-lossesmakeaspirationnum-of-doorsbody-styledrive-wheelsengine-locationwheel-baselength...compression-ratiohorsepowerpeak-rpmcity-mpghighway-mpgpricecity-L/100kmhorsepower-binneddieselgas
03122alfa-romerostdtwoconvertiblerwdfront88.60.811148...9.0111.05000.0212713495.011.190476Medium01
13122alfa-romerostdtwoconvertiblerwdfront88.60.811148...9.0111.05000.0212716500.011.190476Medium01
21122alfa-romerostdtwohatchbackrwdfront94.50.822681...9.0154.05000.0192616500.012.368421Medium01
32164audistdfoursedanfwdfront99.80.848630...10.0102.05500.0243013950.09.791667Medium01
42164audistdfoursedan4wdfront99.40.848630...8.0115.05500.0182217450.013.055556Medium01
\n

5 rows × 29 columns

\n
" }, "metadata": {} } ] }, { "cell_type": "markdown", "source": "

2. Analyzing Individual Feature Patterns Using Visualization

\n", "metadata": {} }, { "cell_type": "markdown", "source": "To install Seaborn we use pip, the Python package manager.\n", "metadata": {} }, { "cell_type": "markdown", "source": "Import visualization packages \"Matplotlib\" and \"Seaborn\". Don't forget about \"%matplotlib inline\" to plot in a Jupyter notebook.\n", "metadata": {} }, { "cell_type": "code", "source": "import matplotlib.pyplot as plt\nimport seaborn as sns\n%matplotlib inline ", "metadata": { "trusted": true }, "execution_count": 8, "outputs": [] }, { "cell_type": "markdown", "source": "

How to choose the right visualization method?

\n

When visualizing individual variables, it is important to first understand what type of variable you are dealing with. This will help us find the right visualization method for that variable.

\n", "metadata": {} }, { "cell_type": "code", "source": "# list the data types for each column\nprint(df.dtypes)", "metadata": { "trusted": true }, "execution_count": 9, "outputs": [ { "name": "stdout", "text": "symboling int64\nnormalized-losses int64\nmake object\naspiration object\nnum-of-doors object\nbody-style object\ndrive-wheels object\nengine-location object\nwheel-base float64\nlength float64\nwidth float64\nheight float64\ncurb-weight int64\nengine-type object\nnum-of-cylinders object\nengine-size int64\nfuel-system object\nbore float64\nstroke float64\ncompression-ratio float64\nhorsepower float64\npeak-rpm float64\ncity-mpg int64\nhighway-mpg int64\nprice float64\ncity-L/100km float64\nhorsepower-binned object\ndiesel int64\ngas int64\ndtype: object\n", "output_type": "stream" } ] }, { "cell_type": "markdown", "source": "
\n

Question #1:

\n\nWhat is the data type of the column \"peak-rpm\"? \n\n
\n", "metadata": {} }, { "cell_type": "code", "source": "# Write your code below and press Shift+Enter to execute \nprint(df[\"peak-rpm\"].dtypes)", "metadata": { "trusted": true }, "execution_count": 10, "outputs": [ { "name": "stdout", "text": "float64\n", "output_type": "stream" } ] }, { "cell_type": "markdown", "source": "
Click here for the solution\n\n```python\ndf['peak-rpm'].dtypes\n```\n\n
\n", "metadata": {} }, { "cell_type": "markdown", "source": "For example, we can calculate the correlation between variables of type \"int64\" or \"float64\" using the method \"corr\":\n", "metadata": {} }, { "cell_type": "code", "source": "df.corr()", "metadata": { "trusted": true }, "execution_count": 11, "outputs": [ { "execution_count": 11, "output_type": "execute_result", "data": { "text/plain": " symboling normalized-losses wheel-base length \\\nsymboling 1.000000 0.466264 -0.535987 -0.365404 \nnormalized-losses 0.466264 1.000000 -0.056661 0.019424 \nwheel-base -0.535987 -0.056661 1.000000 0.876024 \nlength -0.365404 0.019424 0.876024 1.000000 \nwidth -0.242423 0.086802 0.814507 0.857170 \nheight -0.550160 -0.373737 0.590742 0.492063 \ncurb-weight -0.233118 0.099404 0.782097 0.880665 \nengine-size -0.110581 0.112360 0.572027 0.685025 \nbore -0.140019 -0.029862 0.493244 0.608971 \nstroke -0.008245 0.055563 0.158502 0.124139 \ncompression-ratio -0.182196 -0.114713 0.250313 0.159733 \nhorsepower 0.075819 0.217299 0.371147 0.579821 \npeak-rpm 0.279740 0.239543 -0.360305 -0.285970 \ncity-mpg -0.035527 -0.225016 -0.470606 -0.665192 \nhighway-mpg 0.036233 -0.181877 -0.543304 -0.698142 \nprice -0.082391 0.133999 0.584642 0.690628 \ncity-L/100km 0.066171 0.238567 0.476153 0.657373 \ndiesel -0.196735 -0.101546 0.307237 0.211187 \ngas 0.196735 0.101546 -0.307237 -0.211187 \n\n width height curb-weight engine-size bore \\\nsymboling -0.242423 -0.550160 -0.233118 -0.110581 -0.140019 \nnormalized-losses 0.086802 -0.373737 0.099404 0.112360 -0.029862 \nwheel-base 0.814507 0.590742 0.782097 0.572027 0.493244 \nlength 0.857170 0.492063 0.880665 0.685025 0.608971 \nwidth 1.000000 0.306002 0.866201 0.729436 0.544885 \nheight 0.306002 1.000000 0.307581 0.074694 0.180449 \ncurb-weight 0.866201 0.307581 1.000000 0.849072 0.644060 \nengine-size 0.729436 0.074694 0.849072 1.000000 0.572609 \nbore 0.544885 0.180449 0.644060 0.572609 1.000000 \nstroke 0.188829 -0.062704 0.167562 0.209523 -0.055390 \ncompression-ratio 0.189867 0.259737 0.156433 0.028889 0.001263 \nhorsepower 0.615077 -0.087027 0.757976 0.822676 0.566936 \npeak-rpm -0.245800 -0.309974 -0.279361 -0.256733 -0.267392 \ncity-mpg -0.633531 -0.049800 -0.749543 -0.650546 -0.582027 \nhighway-mpg -0.680635 -0.104812 -0.794889 -0.679571 -0.591309 \nprice 0.751265 0.135486 0.834415 0.872335 0.543155 \ncity-L/100km 0.673363 0.003811 0.785353 0.745059 0.554610 \ndiesel 0.244356 0.281578 0.221046 0.070779 0.054458 \ngas -0.244356 -0.281578 -0.221046 -0.070779 -0.054458 \n\n stroke compression-ratio horsepower peak-rpm \\\nsymboling -0.008245 -0.182196 0.075819 0.279740 \nnormalized-losses 0.055563 -0.114713 0.217299 0.239543 \nwheel-base 0.158502 0.250313 0.371147 -0.360305 \nlength 0.124139 0.159733 0.579821 -0.285970 \nwidth 0.188829 0.189867 0.615077 -0.245800 \nheight -0.062704 0.259737 -0.087027 -0.309974 \ncurb-weight 0.167562 0.156433 0.757976 -0.279361 \nengine-size 0.209523 0.028889 0.822676 -0.256733 \nbore -0.055390 0.001263 0.566936 -0.267392 \nstroke 1.000000 0.187923 0.098462 -0.065713 \ncompression-ratio 0.187923 1.000000 -0.214514 -0.435780 \nhorsepower 0.098462 -0.214514 1.000000 0.107885 \npeak-rpm -0.065713 -0.435780 0.107885 1.000000 \ncity-mpg -0.034696 0.331425 -0.822214 -0.115413 \nhighway-mpg -0.035201 0.268465 -0.804575 -0.058598 \nprice 0.082310 0.071107 0.809575 -0.101616 \ncity-L/100km 0.037300 -0.299372 0.889488 0.115830 \ndiesel 0.241303 0.985231 -0.169053 -0.475812 \ngas -0.241303 -0.985231 0.169053 0.475812 \n\n city-mpg highway-mpg price city-L/100km diesel \\\nsymboling -0.035527 0.036233 -0.082391 0.066171 -0.196735 \nnormalized-losses -0.225016 -0.181877 0.133999 0.238567 -0.101546 \nwheel-base -0.470606 -0.543304 0.584642 0.476153 0.307237 \nlength -0.665192 -0.698142 0.690628 0.657373 0.211187 \nwidth -0.633531 -0.680635 0.751265 0.673363 0.244356 \nheight -0.049800 -0.104812 0.135486 0.003811 0.281578 \ncurb-weight -0.749543 -0.794889 0.834415 0.785353 0.221046 \nengine-size -0.650546 -0.679571 0.872335 0.745059 0.070779 \nbore -0.582027 -0.591309 0.543155 0.554610 0.054458 \nstroke -0.034696 -0.035201 0.082310 0.037300 0.241303 \ncompression-ratio 0.331425 0.268465 0.071107 -0.299372 0.985231 \nhorsepower -0.822214 -0.804575 0.809575 0.889488 -0.169053 \npeak-rpm -0.115413 -0.058598 -0.101616 0.115830 -0.475812 \ncity-mpg 1.000000 0.972044 -0.686571 -0.949713 0.265676 \nhighway-mpg 0.972044 1.000000 -0.704692 -0.930028 0.198690 \nprice -0.686571 -0.704692 1.000000 0.789898 0.110326 \ncity-L/100km -0.949713 -0.930028 0.789898 1.000000 -0.241282 \ndiesel 0.265676 0.198690 0.110326 -0.241282 1.000000 \ngas -0.265676 -0.198690 -0.110326 0.241282 -1.000000 \n\n gas \nsymboling 0.196735 \nnormalized-losses 0.101546 \nwheel-base -0.307237 \nlength -0.211187 \nwidth -0.244356 \nheight -0.281578 \ncurb-weight -0.221046 \nengine-size -0.070779 \nbore -0.054458 \nstroke -0.241303 \ncompression-ratio -0.985231 \nhorsepower 0.169053 \npeak-rpm 0.475812 \ncity-mpg -0.265676 \nhighway-mpg -0.198690 \nprice -0.110326 \ncity-L/100km 0.241282 \ndiesel -1.000000 \ngas 1.000000 ", "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
symbolingnormalized-losseswheel-baselengthwidthheightcurb-weightengine-sizeborestrokecompression-ratiohorsepowerpeak-rpmcity-mpghighway-mpgpricecity-L/100kmdieselgas
symboling1.0000000.466264-0.535987-0.365404-0.242423-0.550160-0.233118-0.110581-0.140019-0.008245-0.1821960.0758190.279740-0.0355270.036233-0.0823910.066171-0.1967350.196735
normalized-losses0.4662641.000000-0.0566610.0194240.086802-0.3737370.0994040.112360-0.0298620.055563-0.1147130.2172990.239543-0.225016-0.1818770.1339990.238567-0.1015460.101546
wheel-base-0.535987-0.0566611.0000000.8760240.8145070.5907420.7820970.5720270.4932440.1585020.2503130.371147-0.360305-0.470606-0.5433040.5846420.4761530.307237-0.307237
length-0.3654040.0194240.8760241.0000000.8571700.4920630.8806650.6850250.6089710.1241390.1597330.579821-0.285970-0.665192-0.6981420.6906280.6573730.211187-0.211187
width-0.2424230.0868020.8145070.8571701.0000000.3060020.8662010.7294360.5448850.1888290.1898670.615077-0.245800-0.633531-0.6806350.7512650.6733630.244356-0.244356
height-0.550160-0.3737370.5907420.4920630.3060021.0000000.3075810.0746940.180449-0.0627040.259737-0.087027-0.309974-0.049800-0.1048120.1354860.0038110.281578-0.281578
curb-weight-0.2331180.0994040.7820970.8806650.8662010.3075811.0000000.8490720.6440600.1675620.1564330.757976-0.279361-0.749543-0.7948890.8344150.7853530.221046-0.221046
engine-size-0.1105810.1123600.5720270.6850250.7294360.0746940.8490721.0000000.5726090.2095230.0288890.822676-0.256733-0.650546-0.6795710.8723350.7450590.070779-0.070779
bore-0.140019-0.0298620.4932440.6089710.5448850.1804490.6440600.5726091.000000-0.0553900.0012630.566936-0.267392-0.582027-0.5913090.5431550.5546100.054458-0.054458
stroke-0.0082450.0555630.1585020.1241390.188829-0.0627040.1675620.209523-0.0553901.0000000.1879230.098462-0.065713-0.034696-0.0352010.0823100.0373000.241303-0.241303
compression-ratio-0.182196-0.1147130.2503130.1597330.1898670.2597370.1564330.0288890.0012630.1879231.000000-0.214514-0.4357800.3314250.2684650.071107-0.2993720.985231-0.985231
horsepower0.0758190.2172990.3711470.5798210.615077-0.0870270.7579760.8226760.5669360.098462-0.2145141.0000000.107885-0.822214-0.8045750.8095750.889488-0.1690530.169053
peak-rpm0.2797400.239543-0.360305-0.285970-0.245800-0.309974-0.279361-0.256733-0.267392-0.065713-0.4357800.1078851.000000-0.115413-0.058598-0.1016160.115830-0.4758120.475812
city-mpg-0.035527-0.225016-0.470606-0.665192-0.633531-0.049800-0.749543-0.650546-0.582027-0.0346960.331425-0.822214-0.1154131.0000000.972044-0.686571-0.9497130.265676-0.265676
highway-mpg0.036233-0.181877-0.543304-0.698142-0.680635-0.104812-0.794889-0.679571-0.591309-0.0352010.268465-0.804575-0.0585980.9720441.000000-0.704692-0.9300280.198690-0.198690
price-0.0823910.1339990.5846420.6906280.7512650.1354860.8344150.8723350.5431550.0823100.0711070.809575-0.101616-0.686571-0.7046921.0000000.7898980.110326-0.110326
city-L/100km0.0661710.2385670.4761530.6573730.6733630.0038110.7853530.7450590.5546100.037300-0.2993720.8894880.115830-0.949713-0.9300280.7898981.000000-0.2412820.241282
diesel-0.196735-0.1015460.3072370.2111870.2443560.2815780.2210460.0707790.0544580.2413030.985231-0.169053-0.4758120.2656760.1986900.110326-0.2412821.000000-1.000000
gas0.1967350.101546-0.307237-0.211187-0.244356-0.281578-0.221046-0.070779-0.054458-0.241303-0.9852310.1690530.475812-0.265676-0.198690-0.1103260.241282-1.0000001.000000
\n
" }, "metadata": {} } ] }, { "cell_type": "markdown", "source": "The diagonal elements are always one; we will study correlation more precisely Pearson correlation in-depth at the end of the notebook.\n", "metadata": {} }, { "cell_type": "markdown", "source": "
\n

Question #2:

\n\n

Find the correlation between the following columns: bore, stroke, compression-ratio, and horsepower.

\n

Hint: if you would like to select those columns, use the following syntax: df[['bore','stroke','compression-ratio','horsepower']]

\n
\n", "metadata": {} }, { "cell_type": "code", "source": "# Write your code below and press Shift+Enter to execute \ndf[['bore','stroke','compression-ratio','horsepower']].corr()", "metadata": { "trusted": true }, "execution_count": 12, "outputs": [ { "execution_count": 12, "output_type": "execute_result", "data": { "text/plain": " bore stroke compression-ratio horsepower\nbore 1.000000 -0.055390 0.001263 0.566936\nstroke -0.055390 1.000000 0.187923 0.098462\ncompression-ratio 0.001263 0.187923 1.000000 -0.214514\nhorsepower 0.566936 0.098462 -0.214514 1.000000", "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
borestrokecompression-ratiohorsepower
bore1.000000-0.0553900.0012630.566936
stroke-0.0553901.0000000.1879230.098462
compression-ratio0.0012630.1879231.000000-0.214514
horsepower0.5669360.098462-0.2145141.000000
\n
" }, "metadata": {} } ] }, { "cell_type": "markdown", "source": "
Click here for the solution\n\n```python\ndf[['bore', 'stroke', 'compression-ratio', 'horsepower']].corr()\n```\n\n
\n", "metadata": {} }, { "cell_type": "markdown", "source": "

Continuous Numerical Variables:

\n\n

Continuous numerical variables are variables that may contain any value within some range. They can be of type \"int64\" or \"float64\". A great way to visualize these variables is by using scatterplots with fitted lines.

\n\n

In order to start understanding the (linear) relationship between an individual variable and the price, we can use \"regplot\" which plots the scatterplot plus the fitted regression line for the data.

\n", "metadata": {} }, { "cell_type": "markdown", "source": "Let's see several examples of different linear relationships:\n", "metadata": {} }, { "cell_type": "markdown", "source": "

Positive Linear Relationship

\n", "metadata": {} }, { "cell_type": "markdown", "source": "Let's find the scatterplot of \"engine-size\" and \"price\".\n", "metadata": {} }, { "cell_type": "code", "source": "# Engine size as potential predictor variable of price\nsns.regplot(x=\"engine-size\", y=\"price\", data=df)\nplt.ylim(0,)", "metadata": { "scrolled": true, "trusted": true }, "execution_count": 13, "outputs": [ { "execution_count": 13, "output_type": "execute_result", "data": { "text/plain": "(0.0, 53380.99820496612)" }, "metadata": {} }, { "output_type": "display_data", "data": { "text/plain": "
", "image/png": "\n" }, "metadata": { "needs_background": "light" } } ] }, { "cell_type": "markdown", "source": "

As the engine-size goes up, the price goes up: this indicates a positive direct correlation between these two variables. Engine size seems like a pretty good predictor of price since the regression line is almost a perfect diagonal line.

\n", "metadata": {} }, { "cell_type": "markdown", "source": "We can examine the correlation between 'engine-size' and 'price' and see that it's approximately 0.87.\n", "metadata": {} }, { "cell_type": "code", "source": "df[[\"engine-size\", \"price\"]].corr()", "metadata": { "trusted": true }, "execution_count": 14, "outputs": [ { "execution_count": 14, "output_type": "execute_result", "data": { "text/plain": " engine-size price\nengine-size 1.000000 0.872335\nprice 0.872335 1.000000", "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
engine-sizeprice
engine-size1.0000000.872335
price0.8723351.000000
\n
" }, "metadata": {} } ] }, { "cell_type": "markdown", "source": "Highway mpg is a potential predictor variable of price. Let's find the scatterplot of \"highway-mpg\" and \"price\".\n", "metadata": {} }, { "cell_type": "code", "source": "sns.regplot(x=\"highway-mpg\", y=\"price\", data=df)", "metadata": { "trusted": true }, "execution_count": 15, "outputs": [ { "execution_count": 15, "output_type": "execute_result", "data": { "text/plain": "" }, "metadata": {} }, { "output_type": "display_data", "data": { "text/plain": "
", "image/png": "\n" }, "metadata": { "needs_background": "light" } } ] }, { "cell_type": "markdown", "source": "

As highway-mpg goes up, the price goes down: this indicates an inverse/negative relationship between these two variables. Highway mpg could potentially be a predictor of price.

\n", "metadata": {} }, { "cell_type": "markdown", "source": "We can examine the correlation between 'highway-mpg' and 'price' and see it's approximately -0.704.\n", "metadata": {} }, { "cell_type": "code", "source": "df[['highway-mpg', 'price']].corr()", "metadata": { "trusted": true }, "execution_count": 16, "outputs": [ { "execution_count": 16, "output_type": "execute_result", "data": { "text/plain": " highway-mpg price\nhighway-mpg 1.000000 -0.704692\nprice -0.704692 1.000000", "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
highway-mpgprice
highway-mpg1.000000-0.704692
price-0.7046921.000000
\n
" }, "metadata": {} } ] }, { "cell_type": "markdown", "source": "

Weak Linear Relationship

\n", "metadata": {} }, { "cell_type": "markdown", "source": "Let's see if \"peak-rpm\" is a predictor variable of \"price\".\n", "metadata": {} }, { "cell_type": "code", "source": "sns.regplot(x=\"peak-rpm\", y=\"price\", data=df)", "metadata": { "trusted": true }, "execution_count": 17, "outputs": [ { "execution_count": 17, "output_type": "execute_result", "data": { "text/plain": "" }, "metadata": {} }, { "output_type": "display_data", "data": { "text/plain": "
", "image/png": "iVBORw0KGgoAAAANSUhEUgAAAZIAAAEGCAYAAABPdROvAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/Il7ecAAAACXBIWXMAAAsTAAALEwEAmpwYAAA/AElEQVR4nO29e5wc9XXg+z39mJc0I42kGSH0sCQjLBCJsRkT/FitYidr7HgRzhIb78Zw7xJQLviabOIE2CSEkPW9ECfBxl6zkh/X4DwAK8lay4K9AaIo3iBkyUa2BcKSR8KjsaQZSaN5z/Sjzv2jfj1T09M90z39njnfz6fV1aequuqnmq5T5/E7R1QVwzAMw5groUqfgGEYhlHbmCIxDMMwCsIUiWEYhlEQpkgMwzCMgjBFYhiGYRREpNInUG5WrFih69evr/RpGIZh1BSHDh06p6ptmdYtOEWyfv16Dh48WOnTMAzDqClE5I1s68y1ZRiGYRSEKRLDMAyjIEyRGIZhGAVhisQwDMMoCFMkhmEYRkEsuKythcDeoz3s3NdJV98Ia1ub2LF1I9s2t1f6tAzDmKeYRTLP2Hu0h/v3HKFncIyljVF6Bse4f88R9h7tqfSpGYYxTym5IhGRsIh8X0SecZ8fEJFuEXnFvT4Y2PY+ETkuIq+LyPsD8mtE5Idu3aMiIk5eLyJPOfnLIrK+1OOpdnbu6yQaFprqIoj479GwsHNfZ6VPzTCMeUo5LJK7gdfSZI+o6tXu9SyAiFwJ3AxsAa4HvigiYbf9Y8DtwCb3ut7JbwP6VPUy4BHg4ZKOpAbo6huhMRqeImuMhjnVN1KhMzIMY75TUkUiImuAXwG+nMPm24EnVXVcVU8Ax4FrRWQV0KKq+9XvwvUEcGNgn8fd8m7gfSlrZaGytrWJ0Xhyimw0nmRNa1OFzsgwjPlOqS2SzwK/B3hp8k+IyA9E5Ksi0upkq4GuwDannGy1W06XT9lHVRNAP7A8/SRE5A4ROSgiB3t7ewsbUZWzY+tG4kllJJZA1X+PJ5UdWzdW+tQMw5inlEyRiMiHgB5VPZS26jHgzcDVwGngz0t1DilUdZeqdqhqR1tbxppj84Ztm9t58IYttDc30D8ap725gQdv2GJZW4ZhlIxSpv++G7jBBdMbgBYR+UtV/fXUBiLyJeAZ97EbWBvYf42TdbvldHlwn1MiEgGWAOdLMJaaYtvmdlMchmGUjZJZJKp6n6quUdX1+EH0F1X1113MI8WHgR+55T3AzS4TawN+UP2Aqp4GBkTkOhf/uAX4ZmCfW93yTe4YWqoxGYZhGNOpxITEPxWRqwEFTgI7AFT1iIg8DbwKJIC7VDUVNb4T+BrQCDznXgBfAb4uIseBC/gKyzAMwygjstAe4Ds6OtT6kRiGYeSHiBxS1Y5M62xmu2EYhlEQpkgMwzCMgjBFYhiGYRSEKRLDMAyjIEyRGIZhGAVhisQwDMMoCFMkhmEYRkGYIjEMwzAKwhSJYRiGURDWs90wqoy9R3vYua+Trr4R1rY2sWPrRivCaVQ1ZpEYRhWx92gP9+85Qs/gGEsbo/QMjnH/niPsPdpT6VMzjKyYRTIPsSfa2mXnvk6iYaGpzv9pNtVFGIkl2Lmv066hUbWYRTLPsCfa2qarb4TGaHiKrDEa5lTfSIXOyDBmxxTJPCP4RCviv0fDws59nZU+NSMH1rY2MRpPTpGNxpOsaW2q0BkZxuyYIpln2BNtbbNj60biSWUklkDVf48nlR1bN1b61AwjK6ZI5hn2RFvbbNvczoM3bKG9uYH+0TjtzQ08eMMWi48YVU3Jg+0iEgYOAt2q+iERWQY8BazH75D4EVXtc9veB9wGJIFPquq3nfwaJjskPgvcraoqIvXAE8A1+L3aP6qqJ0s9pmpmx9aN3L/nCCOxBI3RMKPxpD3R1hjbNreb4jBqinJYJHcDrwU+3wu8oKqbgBfcZ0TkSvxWuVuA64EvOiUE8BhwO34f901uPfhKp09VLwMeAR4u7VCqH3uiNQyj3JTUIhGRNcCvAJ8GftuJtwPb3PLjwF7gHid/UlXHgROuD/u1InISaFHV/e47nwBuxO/bvh14wH3XbuALIiK60PoHp2FPtIZhlJNSWySfBX4P8AKylap62i2fAVa65dVAV2C7U0622i2ny6fso6oJoB9Ynn4SInKHiBwUkYO9vb2FjMcwDMNIo2SKREQ+BPSo6qFs2zjLoeTWg6ruUtUOVe1oa2sr9eEMwzAWFKV0bb0buEFEPgg0AC0i8pfAWRFZpaqnRWQVkJop1w2sDey/xsm63XK6PLjPKRGJAEvwg+6GYRhGmSiZRaKq96nqGlVdjx9Ef1FVfx3YA9zqNrsV+KZb3gPcLCL1IrIBP6h+wLnBBkTkOhER4Ja0fVLfdZM7xoKOjxiGYZSbStTaegh4WkRuA94APgKgqkdE5GngVSAB3KWqqQkRdzKZ/vucewF8Bfi6C8xfwFdYhmEYRhmRhfYA39HRoQcPHqz0aRiGYdQUInJIVTsyrbOZ7YZhGEZBmCIxDMMwCsIUiWEYhlEQpkgMwzCMgjBFYhiGYRSEtdo1DKMqsBbRtYspEsOoMhbiDTXVIjoaliktoh+EeT/2+YC5tgyjikjdUHsGx6bcUPce7Zl95xrGWkTXNqZIDKOKWKg3VGsRXduYIjGMKmKh3lCtRXRtY4rEmDfsPdrDx3bt5z0Pv8jHdu2vSXfQQr2h7ti6kXhSGYklUPXfrUV07WCKxJgXzJfYwkK9oVqL6NrGsraMeUEwtgDQVBdhJJZg577OmroZbdvczoP44znVN8KaBZK1BdYiupYxRWIUlUqlrnb1jbC0MTpFVquxBbuhGrWGubaMolFJ99JCjS0YRjVQyp7tDSJyQEQOi8gREfljJ39ARLpF5BX3+mBgn/tE5LiIvC4i7w/IrxGRH7p1j7pOibhuik85+csisr5U4zFmZ+e+TmKJJGf6x3j97CBn+seIJZJlSV1dqLEFw6gGSunaGgfeq6pDIhIFviMiqc6Gj6jqnwU3FpEr8TscbgEuBZ4Xkctdl8THgNuBl4FngevxuyTeBvSp6mUicjPwMPDREo7JmIEfnx1gYCxBCCEsQiKpnB+OkUgOlPzYCzm2YBiVpmSKxPVOH3Ifo+41UzvG7cCTqjoOnHDtc68VkZNAi6ruBxCRJ4Ab8RXJduABt/9u4AsiIta3vTLEk/5/eygkAIiA5ymxZHkuh8UWDKMylDRGIiJhEXkF6AH+QVVfdqs+ISI/EJGvikirk60GugK7n3Ky1W45XT5lH1VNAP3A8gzncYeIHBSRg729vcUZnDGNukgIFDxVFMVTBXVywzDmLSX9hatqUlWvBtbgWxdX4bup3gxcDZwG/ryU5+DOY5eqdqhqR1tbW6kPt2DZ1N5Mc0OEeNJjLO4RT3o0N0TY1N5c6VMzDKOElCX9V1Uvisg/AtcHYyMi8iXgGfexG1gb2G2Nk3W75XR5cJ9TIhIBlgDnSzIIY1beuXEZB05eIBwSogKeQv9YgnduXFbpUzNqgIVY9Xi+UMqsrTYRWeqWG4FfBo6KyKrAZh8GfuSW9wA3u0ysDcAm4ICqngYGROQ6l611C/DNwD63uuWbgBctPlI5Xuq8QNviOurCITyFunCItsV1vNR5odKnZlQ586UywUKllBbJKuBxEQnjK6ynVfUZEfm6iFyNH3g/CewAUNUjIvI08CqQAO5yGVsAdwJfAxrxg+yp7K+vAF93gfkL+FlfRoXo6hthxeJ62pobJmSqWpOTAo3yMl8qEyxUSpm19QPgbRnkH59hn08Dn84gPwhclUE+BvxaYWdqFIu1rU30DI5N3AzAJgUauTGfKhMsRCydxigaNinQmCtWmaC2MUViFA2r4GrMFXsIqW2saKNRVGxSoDEXrDJBbWMWiWEYVYWlXdYepkgMw6g4lv5b25hryygqNqnMmAuW/lvbmEViFA17qjTmSlffCI3R8BSZpf/WDqZIjKIRfKoU8d+jYSlLPxKjtrH039rGFEmVsPdoDx/btZ/3PPwiH9u1vyaf4u2p0pgrlv5b25giqQLmi0vIniqNuWJzkGobC7ZXAfMl0Lhj60bu33OEkViCxmiY0XjSniqNnLE5SLWLWSRVwHxxCdlTpWEsTMwiqQLmU7FDe6osHEuhNmoNs0iqAAs0GinmS7zMWFiYIqkCzCVkpLAUaqMWMddWlWAuIQOsL4dRm5Sy1W6DiBwQkcMickRE/tjJl4nIP4jIMffeGtjnPhE5LiKvi8j7A/JrROSHbt2jruUuri3vU07+soisL9V4DKMcWAq1UYuU0rU1DrxXVd8KXA1cLyLXAfcCL6jqJuAF9xkRuRK/Ve4W4Hrgi65NL8BjwO34fdw3ufUAtwF9qnoZ8AjwcAnHYxglx+JlRi1SMkWiPkPuY9S9FNgOPO7kjwM3uuXtwJOqOq6qJ4DjwLUisgpoUdX9qqrAE2n7pL5rN/C+lLViGLWIxcuMWqSkMRJnURwCLgP+q6q+LCIrVfW02+QMsNItrwb2B3Y/5WRxt5wuT+3TBaCqCRHpB5YD59LO4w7gDoB169YVZ3CGUSIsXmbUGiXN2lLVpKpeDazBty6uSluvlKGPjaruUtUOVe1oa2sr9eEMwzAWFGVJ/1XVi8A/4sc2zjp3Fe49lSDfDawN7LbGybrdcrp8yj4iEgGWAOdLMgjDMAwjI6XM2moTkaVuuRH4ZeAosAe41W12K/BNt7wHuNllYm3AD6ofcG6wARG5zsU/bknbJ/VdNwEvOivHMAzDKBOljJGsAh53cZIQ8LSqPiMiLwFPi8htwBvARwBU9YiIPA28CiSAu1Q1lQd5J/A1oBF4zr0AvgJ8XUSOAxfws74MwzCMMiIL7QG+o6NDDx48WOnTmIbVVzIMo5oRkUOq2pFpnZVIqQL2Hu3hU7sP8/2uPs4OjPH9rj4+tfuw1VcyDKMmMEVSBTz03GtcHImjHoRFUA8ujsR56LnXKn1qhmEYs2K1tqqAE+dHCAmEQv5cShFQTzlx3uorGYZR/ZhFYhiGYRSEKZIqYOOKRXgKniqK4qniqS83DMOodkyRVAH3XL+Z1qYoAiSSHgK0NkW55/rNlT41wzCMWTFFUgVs29zOZ256K29b18qqJY28bV0rn7nprZb+axhGTZBzsF1E3gRsUtXn3Uz1iKoOlu7UFhZWqM8wjFolJ4tERG7HL9O+04nWAP+9ROdkGIZh1BC5urbuAt4NDACo6jHAHp8NwzCMnBXJuKrGUh9cpd2FVVvFMAzDyEiuiuSfROQ/A40i8svAN4D/UbrTMgzDMGqFXBXJvUAv8ENgB/As8AelOinDMAyjdsg1a6sR+KqqfgkmWug2AlbDo0hY9V/DMGqVXC2SF/AVR4pG4Pnin87CZO/RHu7fc4SewTGWNkbpGRzj/j1HrPqvYRg1Qa6KpEFVh1If3HLTTDuIyFoR+UcReVVEjojI3U7+gIh0i8gr7vXBwD73ichxEXldRN4fkF8jIj906x51nRJx3RSfcvKXRWR9HmOvGnbu6yQaFprqIoj479GwsHNfZ6VPzTAMY1ZyVSTDIvL21AcRuQYYnWWfBPA7qnolcB1wl4hc6dY9oqpXu9ez7juvxO9wuAW/t/sXnQsN4DHgdvz2u5vceoDbgD5VvQx4BHg4x/FUFV19IzRGw1NkjdEwp/rMc2gYRvWTa4zkt4BviMjPAAEuAT460w6u1/pptzwoIq8Bq2fYZTvwpKqOAydc+9xrReQk0KKq+wFE5AngRvx2u9uBB9z+u4EviIhUQ9/2fGIea1ub6Bkco6lu8nKMxpOsaZ3R6DMMw6gKcrJIVPW7wGbg/wJ+E7hCVQ/lehDncnob8LITfUJEfiAiXxWRVidbDXQFdjvlZKvdcrp8yj6qmgD6geUZjn+HiBwUkYO9vb25nvacyTfmsWPrRuJJZSSWQNV/jyeVHVs3lvxcDcMwCmVGRSIi73Xvvwr8W+By9/q3TjYrIrIY+Fvgt1R1AN9N9WbganyL5c/nevK5oqq7VLVDVTva2tpKfbi8Yx7bNrfz4A1baG9uoH80TntzAw/esMWytgzDqAlmc239a+BFfCWSjgJ/N9POIhLFVyJ/pap/B6CqZwPrvwQ84z52A2sDu69xsm63nC4P7nPKzbZfApyfZUwlp6tvhKWN0Smy2WIeVrTRMIxaZUZFoqp/JCIh4DlVfTqfL3aZVV8BXlPVvwjIV7n4CcCHgR+55T3AX4vIXwCX4gfVD6hqUkQGROQ6fNfYLcDnA/vcCrwE3AS8WA3xEYt5GOXE5iAZlWbWGImqesDvzeG73w18HHhvWqrvn7pU3h8Avwj8J3ecI8DTwKvAt4C7VDXpvutO4MvAceAn+IF28BXVcheY/238GfgVx2IeRrmwOUhGNSC5PMCLyEPAOeApYDglV9ULpTu10tDR0aEHDx4s+XFST4mn+kZYY0+JRon42K7906zfkViC9uYG/uaO6yp4ZsZ8Q0QOqWpHpnW5pv9+FD8mcmea3B6xs2AxD6MczCUeZxjFJtcJiVcC/xU4DLyCH6PYUqJzMgwjR9a2NjEaT06RWTzOKDe5KpLHgSuAR/GVyJVOZhhGBbF4nFEN5OrausqVOknxjyLyailOyDAWOvlkYW3b3M6DYPE4o6Lkqki+JyLXBcqU/AJQ+oi1MW+wFNXcSGVhRcMyJQvrQZhRmdj/pVFJcnVtXQP8i4icdLWvXgLeEUjjNYysWIpq7lglaKMWydUiuX72TQwjM8GbI0BTXYSRWIKd+zrtSTqNuWRhmbVnVJqcFImqvlHqEzHmL5aimjv5VkWYiyvMMIpNrq4tw5gzlqKaO/lmYZkrzKgGTJEYJcdSVHMn30rQXX0jJJIenb1DHD0zQGfvEImkZ9aeUVZyjZEYeWJ+60ksRTU/8snCWlwX5njvMGERwiIkkkr3xTEua1tU4rM0jElMkZSAvUd7+N3dhxkcS5DwPM4NjvO7uw/zmZveumBvnpaiWhr8Itv4fUvdIhqQG0YZMNdWCXj4W0fpG4mjQCQcQoG+kTgPf+topU/NmGcMjidYvbSBSEhIekokJKxe2sDQeKLSp2YsIMwiKQGd54YJCYTcU6EIqCid54Zn2dMw8iOV5bWxbfGELFX91zDKhVkkhlHDWCKDUQ2YIikBG5Y34Sl4nqKqeJ7iqS83jGKSb5aXYZSCkrm2RGQt8ASwEr+XyS5V/ZyILMNvkLUeOAl8RFX73D73AbcBSeCTqvptJ78G+BrQCDwL3K2qKiL17hjX4Pdq/6iqnizVmHLl3g9cwad2H2ZoPEHSU8IhYWl9lHs/cEWlT82Yh1gig1FpSmmRJIDfcVWDrwPuEpEr8dvhvqCqm4AX3Gfcupvx+5xcD3xRRMLuux4Dbsfv476JyZIttwF9qnoZ8AjwcAnHkzPbNrfzZze9lbetbeWSlgbetraVP1vAGVvzib1He/jYrv285+EX+diu/VYvzDAooUWiqqeB0255UEReA1YD24FtbrPHgb3APU7+pKqOAydcH/ZrXZHIlkDl4SeAG/H7tm8HHnDftRv4goiI5tI/uMTYU+L8w8qRGEZmyhIjEZH1wNuAl4GVTskAnMF3fYGvZLoCu51ystVuOV0+ZR9VTQD9wPIMx79DRA6KyMHe3t5iDMlYgFg5EsPITMkViYgsBv4W+C1VHQiuc5ZDya0HVd2lqh2q2tHW1lbqwxnzlK6+ERqj4SkyKz5pGCWeRyIiUXwl8leq+ndOfFZEVqnqaRFZBaSczN3A2sDua5ys2y2ny4P7nBKRCLAEP+huGLOSbxmbfCvzzuUY5RiHYRSbklkk4tdo+Arwmqr+RWDVHuBWt3wr8M2A/GYRqReRDfhB9QPODTYgIte577wlbZ/Ud90EvFgN8RGj+plLs61852yUo6GXNQ0zqoFSurbeDXwceK+IvOJeHwQeAn5ZRI4Bv+Q+o6pHgKeBV4FvAXepaqr2+J3Al4HjwE/wA+3gK6rlLjD/27gMMMOYjbnEO/Kds1GOmMrOfZ3EEknO9I/x+tlBzvSPEUskLW5jlJVSZm19h8kycum8L8s+nwY+nUF+ELgqg3wM+LUCTrNgzK1QPeRzLbr6RggLdPYOEUt61IVDrFhcl3O8IxeztxwNvX58doCBsQQhJqv/nh+OkUgOzL6zYRQJm9leAOZWqB7yvRbN9RG6L46RcBNGE55ffn1xffZnq3yPUY6GXvGkXzkh7nmMJzzinofnKbFkdlVnc2GMYmOKJAey/fAsHbR6yPdaTITSNPAKyotwjHLVwUoqTAxH/c/ZsIcfoxRY9d9ZmGkSmvUiz51SuwDzvRZDsSSrlzZwbig24dq6ZHE9w7Fkxu3ncoxyNfQKCXg69XM2gsoQoKkuwkgswc59neaSNeaMKZJZmOmHN5d00IVIOWaE53st5lJ+fS7Xu9QVDlR1ihIBX6lks6zs4ccoBebamoWZJqFZCe/cKIcLMN9rMZdrV43XO5sbK5u8HHEbY+FhimQW1rY2cX54nM7eIV473c9PegbpHRpj1ZJG/tXlbVbCOwfKMSM839TcuZRf37a5nZvevprewXFeOzNI7+A4N719dUWvdyyR2RWXTV6NytCofcy1NQvv3LiMl0+cn3AfJDyP0XiMD25p5o3zw2xsW8wjH72aukjIf4VDqKr1zA5QLhdgvm6kfLffe7SH3d/rpq25nnXRMKPxJLu/183Pr1laVGWSTzwpHAqh6vn5Aup34xQnz0S54jaWFr+wMEUyC8/96Ay4H2jQ7fxPx87x8XetJ+F5JGIeI7HJdSJCJCREwkI0HCIaChEJu8+hEKGZoqHzkB1bN3L/niOMxBI0uhtwKZ6CH33+x3z5OycYjiVZVBfmN96zgU/+0uVF+/5yBKrzjSdtWN7E8d5hIiITf6NJ1RmbqJU6bmNVkhcepkhmofPcMJGwuCc/RQHP8+iawS2jqsSTSjwJo0x3MYRDQiQcIuoUS0rh1IXnp5Ipx1Pwo8//mEeePzYxUXBgLMEjzx8DKJoyKXQSYy7s3NdJPJnk/FBi4hgtjZGsyqoam6jlOwaj9jFFUgGSnpL0kozHp6+LhEITbrKoUzDhkP9ey5T6Kfjz/3h82mxzdfJiKZLFdWGO9w4TlslZ5N0Xx7isbVFRvh/gWM8g/SNxQiGZmCh5bjBGPDmYcftUE7VSu6ryId8xGLWPKZJZSLkOxPNnrXnqp1e+qbWxJMfL5CqDSXeZr2BC1E8om9pWMMUiniVNKZt8LkzEvYTJ4j9KUeNhsYQHAiH3nSLgifryLFRbE7W5jMGobUyRzELQdZBIKqGQ0FIX5Y6tby7reUy6y6b+GEMi1Ed9t1gw4G/B/uIzOJ6gtSnC+eE4nvoT/5YvijI0nijaMaJhYTimjMWTKL6+CoWgLlw71zMaFkbj4Hk6JbZYS2Mw8sMUySwEXQcnzw+zsrmBm9+xlms3Lqv0qQHgqTIaS06JxYjIhEJJKZmIi8sYc2dxXZhTfaMTnz2F3qE4m1fWF+0Y7c0N9A3HfQ3iNIl60DbDRMlq4/KVLZw4N8Tg2GSMpLkhyoYVi2ffOQ8sM6x6MEWSAynXwZn+MUZixXv6LBWqyng8yXg8yeDYpDyYTRYJ+QomHPb9/eGQv65Wg/0fvnoVf//K6YzyYtE7OJ5VXqybmqpv9YbTsrBqqc1OKkvvkiWRkmXpWWZYdWGKZAERzCYjQzYZ+MomLEI04lKXU/GYKs8o2371Gr79ag8jgVpZTXVhtl+9Zoa98uPCaIbsCODCSLxoN7W51ACrtifzcmTpWc2w6qJkikREvgp8COhR1auc7AHgdqDXbfafVfVZt+4+4Db8O9wnVfXbTn4N8DWgEXgWuFtVVUTqgSeAa/Db635UVU+WajwLBVUloUoiNj11ORqeDPBHqyxleee+TlYtaZgy6bHYN5aUURAMP6n6HqhipbvmWwOsWp/MS50AYDXDqotSOs2/BlyfQf6Iql7tXiklciVwM7DF7fNFEUnV1HgMX/lscq/Ud94G9KnqZcAjwMOlGojhE096DI8nuDgSo3dwnJ9dHOXk+WHeOD9M98VRegbHuDgSY2g8wXgiiZdeTbCEdPWNkEh6dPYOcfTMAJ29QySSXlFvLE117k8yNazA8M4Nxqb0Njk3GONYT/7prvmWMFmorQysZlh1UTJFoqr7gAs5br4deFJVx1X1BH5L3WtFZBXQoqr7XS/2J4AbA/s87pZ3A+8TS1WqCEnPj8kMjSW4MByjZ2CM7r5JJXOqb4Qz/WOcGxqnfzTOaCxJPOkVVdHMpVFVvvzm1o2EhIlyJIqfuRUNMZHuKoif9irMKd013xpg5ahjVo1YzbDqohIxkk+IyC3AQeB3VLUPWA3sD2xzysnibjldjnvvAlDVhIj0A8uBc+kHFJE7gDsA1q1bN6eT/neP/Qs/6RlicUOE5oYIzfURFjdEWVzvPgdkk8u+vDEaXrDpuP7kSyVG5ptqKgEg5IL94cB7NBxC3A065BICsqEuIB1L6GTarMzcqCpfPvlLl3Pi3BB7fnBmYhb5DT9/CfuOnWNgLFG0dNd83EILtZVBuWqGGblRbkXyGPAn+A9zfwL8OfAfS31QVd0F7ALo6OiY053l/NA4F0fjXMwScJ2JcEimKJYJhZNJFvjc3BClITq/54SkEgBIQuacqElCMlm/LBISopHJOmY9g2MZZ7afG5rtW3Nn79EeDv20n/XLmyaykQ79tJ/25gai4VjJ010zUa46ZtVItU3ErFaCPWtmehgrhLIqElU9m1oWkS8Bz7iP3cDawKZrnKzbLafLg/ucEpEIsAQ/6F4S/p9f/Tl+0jPEuaEYQ+NxBscSDI0lGBxPBJbjDI0lpmXYJD0tixJqaXCyeaqEPGdxZHIZjSeUkEA4MFcm6XmMJTz6R+KEwy71uYD5NNkyhVSVuki4pOmu2bAn88qRukGnrN7gg0z6r27SHar+e2B51uOkHSv9XfF/GxOf3ff7sskDLK6P0N5SmvlIZVUkIrJKVVPJ/h8GfuSW9wB/LSJ/AVyKH1Q/oKpJERkQkeuAl4FbgM8H9rkVeAm4CXhRS5Rsn0qvzHVCYtJThsYnlcvgWGLiNZMSGhxLTElfTX3XXJUQQCQkLG2K0tZcP0UJNTdOVUrNNa6E6iIhxuNJPA24l9TPNDs/PN0qCYd8d1koJM4NJoSEiTkcqTpRIedaC4eEn14YprWpbsr3NEbD9I/G+ZPtV1X8Zl47M02KT9LTiRtn6iYavBuk/pSDstRi+l95UhXPUxKe/55U3z0b/F6vNLeamqWU6b9/A2wDVojIKeCPgG0icjX+NTwJ7ABQ1SMi8jTwKpAA7lLV1B31TibTf59zL4CvAF8XkeP4Qf2bSzGOYHplS0OE88PjfO7FY9zNpqzKJBwSljRGWdIYdaedO0lPJ62b8cSsSmhwzMnGpyshwM8gGopxbiiW4WjZiYSExWlWT3NDNKNlVA1K6E3LFtF9cZihcT+QHw2HWNwYYfXSzAUVk56SRLNNp8lI2+IGzg+PT1gkIjAaS7CypYGfX7uUL/6Ht7t+IH6wfWAsPqGkwiGZWBd8Ep14Z+oNSsRXcKl9Uv+dwSdPgO8cO8enn32NqLNcz/SP8gff/BH3J65k61vapuyTDNwYwZUMS9XDyjDe1JPwxNO0W562nU49/0z32FBAKUvwgDrlbdq+wfJmqe1SSuOl4+f5qwM/5XT/KKtaGquq4sRCQ2ppxmwx6Ojo0IMHD+a8/cd27Z8IZqYyjUbjSZYvqucvPvrWEp5p/iQ95T89+QrnhseJhkMTN43xhEdDJMy/unzFhNKZzRIqhJQ7LhXvyZqUUEQldKDzAg9/+yjDgXLqi+oj3PP+zUW7uRzovMCfPHOEkbg3EdBviob4ww9tKeoN7EDnBZ78bhenB2a/Qf72U4c5Pzw+JXOrWv8+i8mBzgt87sVjREJCQzTEWNwj4Sl3vzf7A95Cp1DXlogcUtWOTOtsZvssBHtQjCeSRMMhWpuinBkYnX3nMhMOCb3D47Q0RJDAM6aiDI4l+M1/nb3QZNASCiqbzJZQPLCuuO64oCXU0jBDUkKaEhpPJP2ndHFPsSUwiF4/MzChRMB/Oh6Je7x+ZqCoyip1g8zFAj49MEpYoKtvfMISq9a/z2Ly5He7iIRkQoGmYlNPfrfLFEkFMEUyC831EY71DBEO9FY4OzDOm5YXrwdFMVnV0jjtCXUs7nFJy8wutnBIWNIUZUlTdMbtMpGuhNIV0BQrKG2b9ElliQJjQilE/TjVw986yta3tM1sCTVGaYjMbgk9fegU4dDUNrZJz+PpQ6f4+LvWF3S+KfK9QS6KhnnjwshEinQi6f4+l83v9N/TA6O0NEy9fTVEQ/NegVYrpkhmQTXdiTuxohKnMys3v2Mtn3vxGKPx5BST/+Z3rJ195zlSDiU0MBZneIqVNLM7TtXvRdI3Guebr/xs1vOIhMRXLvWT7rh0S2gkliQUCsQx3L7FdAvmbWFk6ZFChRMl8nHPzYW5PjAZpcEUySwEi+iNJ1I/7DpG4sW7eRSTazcu42428eR3uzgzMMolVR6ELEQJJZLehKXzG098l0yXJCTwrjevYGg8zkDKPZfFEuobidM3MrMllPTwg/Rp/LvH/mWaEgrGiDLFg5obItSnWUKL6iK8cX7YD07nYAEPxxKsbKmnbyQ+qXgW11W0SnW+7rm5UIkHJiM7pkhmIVhELxhsb19UvB4UxebajcuqVnEUk0g4xNKmOpY21ZHMptcVHty+ZZo4qISGAhlwQ2lWz8BYfCI2dLZ/LGsV3lyUUCaiYXEKyE9ION0/6mdlJXXCwlCF4fE4Pzh1cXLbhggNkdDEk/nawEz2Sv99liN+UWsPTPMdUySzEJw5HAkJo/GkPflUIdmqWmWTB5VQPnz9X07y1MEuRuMeDdEQ73tLO++5vI1XfnqR7xw/R99ojKZohHXLmmioC00ooWyWUDyZ2RLSiX98zg7G+K2nDk/ZJhoW6iNhNynSl4n4c2c2tTfzjUOncraEikm54hcL5YGpFjBFMgvBmcPV2CHRKC8ff9f6aYH1A50X+KdjvURCwqVLGhiLe/ysfzRjKmo2S2jQKZxnDp+ecEt5qiQVkklFmSxzkcLvLTPVhaUK4wmPF1/v4cXXe7KOI90SmnDDZXHPBbebTQlZ/GLhYYokB1I1ffqG/WZDEz+hwGSx1OSu1CQyf7VMpKOmr0+l505smzbxLBg3na2cQqqwYfoEs1TphGx9NIL7Zzr/9DwDYMrs4dQ2qWlqqYq4M5WMCJ5f8P8hSHDaW7AcBKkJbwRKRLhtwwLJDP8/YfGPWcr5Uk9+t4tEMsnFkcBkyPpwRlfObJbQW9qbs86PePubljp32+QcoP+2t5P+0TjhkEzMG4onPUIirFzSkLcllAuzKaE1rY2cOD/MaMyPX8TdrPBffdtqVLVollCpA/pG7pgiyYPWRfm5QaqBcAETKjL93uf6fY8+/2O+/J0TDMeSLKoL8xvv2cAnf+nyOZ9bOpcubaSrb7rr5NKljWxY4QeqJxScTiq59JIaqXIYqZuyp4rnTZbNSM2qDvLGhWEGR+NIIEDeNxwn4Q3nPY7ZfP/pSuizzx+jqS40NdjeFMVT+PItk3PH4kFLKFAXbjDN9TZF7j6Pxac6COeqhO7/H0eyKqHmtCy5yXWZLaFyBPSN3DFFYpScR5//MZ978TghgUjIDwZ/7sXjAEVTJovqwoTFWTBMlpFfVDfpXpmwhgK6cC6KUdWvw5RSKomkBwLhgDmZECWe8OZkDeXj+881yyuVbZheKywXUu64gdmUUCpZYTzB4GhxlZA/hkkldH5onKQq0VBooiaap/DFvT8h7nl5u+OMwjBFUiDV1i+7Gvnyd074T/MBmTh5sRTJUCzJmtbGKb3OVyyum7HX+VwREddq2P9cFwkxlvBQZaJgpIjvmtqwYtFET5akpyQ8L+3z5Puc3G9lmOc018QEmMUSCsaJ5mgJjaWlU1wcjfOH3zwy7TyiYZmwehbVR2hpjEwopWyWUOqzKaHZMUVSANXaL7vaGBxLZOwVMjhWvLkOa1ubOHl+aIoslvRYv7z0PUEuX9nCiXNDWfuRpKoi+ISzfk9K0XgeEwonWIE2kZyucIbjyenzSKponlMhllC6EhoIpGd/4+Ap+kdjxBIenlPgqVhhWISxxHQldGE4xoXh/IqX+mPwLaGWQNHS2ZSQv32E+mj26z2fMEVSADv3dRJPJjk/NHkDaWmMsHNf57xQJMWytrI9Gxcz/P3Ojcs4cPKCqzDrK5GewRgfe0fp/eWpFPFC+5H4Cid145lZ4cSTfhB+XWsTPUPjvGl53UTiw2isuuc55cpMSmhoNM7j+9+YcJf6cS749V9Yx8fftZ540gtYPFPrx02pKTfmu+yGAxl0mZRQIe64zApnZkuoub62lJApkgI41jNI/0h8ondFwlPODcaIJwcrel7FUAC1Zm291HmBtsV1aVZBhJc6L/DJEh+73M2lggrnrl+8jPv3HCGe9JwS82+Cd257MytbGpwVM2ndJJK+dVPrVb+/39XPsqYow7HJTLlFdWG+39XPx/GV0LJFdSybQ4LMhBLKUMQ0aBkF5wilJq4W2xJKVzjTPqeW6yetpZaGKHWRuTVvmyumSAoglvCDrKFAENeTzB38ykWxFEC2boBzsbZSgdBM8mLR1TfCisX1tDVPlslWVU71jRTvIDNQqbav2za3c9Opi9My4v7NVZfMuF/CWTS+cvGIJ3WKpVPtiub0wCiti+pYtmhqletiTHospRKaXlPOJSkUWQnVRUK+kglMSG1tquOq1Uu46xcvy/v7ZsMUSQFEw8JoHDwv0JUPqAtXLjBXLAXQ1TfC0sap9a8ao+E53ZhbGyOcH5keD2ltLN6fX6qUTWrc4GeHrWktbhXcakuu2Hu0h93f66atuZ51zq22+3vd/PyapTOeVyQcIpLFc6KqxJ0lk/CUZFKJe55v3Tj5bCzUoo3lsIRyUUKxhMf5RIzzaUroZ/1jtaVIROSrwIeAHlW9ysmWAU8B6/E7JH5EVfvcuvuA2/B71n1SVb/t5Ncw2SHxWeBuVVURqQeeAK7B79X+UVU9WarxZOLylS0cPdNP/2gCT/0n7CWNETatbCnnaUyhWAqgqDfmLBkvxcyECZaymUucIhcFsfdoD7+7+zCDYwkSnse5wXF+d/dhPnPTWyumTHbu6+Tc4BgjgQynpmiooDidiFAXEerI7B7xPCWW9HzrJaBk4knfhWZFG+dGIUoolvBmbe89FvfYtLI0ySeltEi+BnwB/2af4l7gBVV9SETudZ/vEZEr8VvlbsHv2f68iFzu2u0+BtyO37P9WeB6/Ha7twF9qnqZiNwMPAx8tITjmUYqwBsOCVHnvhkYS/LOCk6IKpYCKPTGHCRbmfVipuYWEqfI1R348LeO0jfizyKPhEOo+sUaH/7W0Yopku/99DzjacbeSNzj+z89X7JjhkJCQyhMQ4ZgcCLpce/f/pD6SIiGaBhVaKwTRmNWtLGU1EVCLIvMrIQK7ZA4EyVTJKq6T0TWp4m34/dxB3gc2Avc4+RPquo4cML1Yb9WRE4CLaq6H0BEngBuxFck24EH3HftBr4gIqJldO5WMsCbjR1bN/Kp3Yfpvjg60XJ2cX2EP/yVK/P6nmIGkBOZAiQzyOfKXOMUuboDO88NA77bJzVnJCQpeWVIVyIpiphZnReRcIif9Y+ytDE6xeKMhoTeoTGWL6qfsGZSFsxcsaKNuXOg8wLfOHSKs4NjJXHJljtGslJVT7vlM8BKt7wa2B/Y7pSTxd1yujy1TxeAqiZEpB9YDpxLP6iI3AHcAbBu3bqiDAQqH+DNhgC4GlVopmpWuVG0AHI58n8LINhOOTiZMf06eqq4/ApwMbGEgkiVDKRKyGQVjyU81i1bNK3vTCrAH3cB/0TSc4qm+gP+tcKBzgs8/O2jjIwnSKqWxCVb3hyxAM5yKMtfiqruUtUOVe1oa2sr2veubW2aVgyvFAHefNi5r5OWxiibVjZzxaolbFrZTEtjlJ37Oit2TtEsqYjZ5OWmuT5C98UxEs6CS3hK98UxFtdPfc6KuDSzyUKaU+WVINuRKzkPe8fWjQyMxjl2dpDXTvdz7OwgA6PxjG7RcEhoiIZpboiybFEd7S0NrGltYsOKRaxb1sSqJY0sX1xPS2OUproI0bDNMs+XXf/cycBoHMW3GJVJl2yxKLdFclZEVqnqaRFZBaTqXHcDwSjZGifrdsvp8uA+p0QkAizBD7qXjR1bN/K7uw/T3TdKwvOIhHzXVr5upGJSzGyrYpHtybLYT5xzLQyZrZ1y+vk11YUZi3vTqhoH63mVm1CWqscV1G2A+y90lZeRuT0xpjLLGjNMzgxaMqlAv1kymenqG3ETdWWiwriKFtUlW+5Hwj3ArW75VuCbAfnNIlIvIhuATcAB5wYbEJHrxH8MuSVtn9R33QS8WM74SIpi/GCKSTVaSbFMd7oZ5HMhVRhyNJ6cUhjy0ed/POu+qXbKkbCQVCUSFlYvbZiWDNDe3DC1TYArvx90bZadbE/nFXxq37mvk0hIJopYhkWIhKSoVnHQkmnNYsmsaK5nSWOURfVmyZSaUqb//g1+YH2FiJwC/gh4CHhaRG4D3gA+AqCqR0TkaeBVIAHc5TK2AO5kMv33OfcC+ArwdReYv4Cf9VVWdu7rZEljlFVLJnPX5zppr1gUM9uqWGQLqBYSaE3ny985Aa4RVEInb/K5FIYMtlNOMRJL0J6mIFTVr2IgMjFvKDVLvFLzSxbVhRl2HRJTCQAilbWSfnx2gIGxBCH8/6tEUjk/HCORHCjL8WeyZCbiMQllPJkklpj/VszapY28cWEE8ZRQoJzMZSuK93BZyqytj2VZ9b4s238a+HQG+UHgqgzyMeDXCjnHQqlGN1K5y3VUC0Pj/lyeiYZg6rfZHcqW1hQgV+WbslyCFYYvWVzPuaHxipWTed/mNv7+ldMTn1ORx/dtLl4sMF/iztIMhQIVHzwtqgU6V6LhENFwCOoAJn+7KQUTS0y6yBIFZpVVC3dsfbMfbI/5wfZwSFhaH+XeD1xRtGPYzPYCKNds6nypVLmOSiLOREj/2efizshV+WarMBxLKtEKFe88MxBjaWOEgbHJSbEtDRHODORfVqNY1EVCjMaSeBqo+KCUvf5TPqQUTHp9yGmxmLTJl7XAtRuXcc/7N/ONQ6foGRwrycOlKZICqEY3ElRfGY9yEA1BMkPljmiO965clO87Ny7j5RPnJ+qGxZNJN7M6TM/A+ERTrUQyyZj7Wyg1XX0jRNNK8kTDUlGreFN7M6+fGeDiaHxCuS1tjLKpvbli5zRXUgUyM02+DM7wn7BkErmVkCk3125cxnuvaK+9CYkLgXK4kfJVCnuP9vCp3YcZGk+Q9JRzQ+N8avdh/qyCZTzKQV0kzHgiMS2jqi5bQak58NyPzkzrF6XqW6Gqk33oUT+TKtuM/qKiSu/QZHlzT6F3KM6apZWLkWSq+NA/lqhoxYdSkG2Gv6pOcY+lluMJb1qb5vmCKZICKaUbaS61nR567jUujsQJuwY/6sHFkTgPPffaPFckISKu3WzKneJ5WlR3yvGeoYwNuibShQPLALFE6RVJ7+B4XvJyUI0VH8qJiFAfCVOf4e6acpXFgmnLidqouDwTpkiqmLnUdjpx3uWMBwKd6iknzld2tn2pKYc7ZaaSLmGZnKiYyhgLh0ofExjP4j7LJi8H5ar4MNd5Q5VkJldZMKMslvRIeNXrKkvHFEkVU421naqVcrhTZro1iwiRtLTgDcsrm3RRKcqRhPLo8z/msy8cm4hXDYwl+OwLxwCqXplkY2pG2SSe5wf5U+6x4HK1uMqqN43CmKjtpMpkbSePGf94Nq5YhKf+NoriqeKpL5/PvNR5geb6MElPGU/4TZqa68O81HmhLMdf2hRFQr4CkZD/uZjplbXEjq0biSeVkVgCVf+92Eko/21f57RmaZ768vlGKOS7yhbXR/zJl80NrF7ayHo3+fLSpZOTLytVRsYskiomEhLfGoEpj8Mz1Xa65/rNk3GVpF+2pbUpyj3Xby75+VaS1CS41M3FL+mf4NjZ8kyC+7Ob3lqRuTtN0TAj8emxmKYK9vsuRxJKtkSGsiQ4VBGpyZeZAv6p5mQpV1l6dl9Rz6Nk32wUTFNdmPGEN1EgMJUVNNOs5W2b2/lMhW5qlWRoPJnxCXVwvDw3lmImXeSTqbduWQNHz053da5bVsGyLSzMuUzVxJTmZPn3ycobUyRVzOUrWzhxbmha9suGFTN3OVuIP+JYInNAMpt8LoTwZ8tnkheLfDP1jvVmDmBnkxtGKbAYSRWzY+tG6iJhLlnSwFtWNnPJkgbqIuGKT3isRsrR8iQaCU0rzy4Utxx+KlMv15Lf5ahjVo00Zplpmk1ulBb7X69itm1u58EbttDe3ED/aJz25gYevGHLgrM2cqEui/83m3wubFjeRDgk1IdDNERC1IdDhENS1OysznPDU0p+h0QsUy8Di+ojhJhaiTnk5Eb5sf/1Kmc+uKncZO+M8mLRvriOU/3TJ+G1Ly6eg/jeD1wxpWpAKYrfGbmxqb2Zk+EhBkan1jdbv3xmt69RGswiMUrO6qWZA7/Z5HMiFKJtcXSioVNIoG1xFCnipMBtm9u55bo3URcO4SnUhUPcct2biqroNyxv8tO33Uxnz/PTtxfqnJRs7Ni6kWh4qts3Gja3b6UwRWKUnP9y48/RlOa7boqG+C83/lzRjrG2tYmWxjq2XLqEn1u9hC2XLqGlsa6ok+D2Hu1h9/e6aWuu54pLmmlrrmf397rZe7Rn9p1z5N4PXJHXnJSGSGa7Lpt8vmBu3+qiIq4tETkJDAJJIKGqHSKyDHgKWA+cBD6iqn1u+/uA29z2n1TVbzv5NUw2vXoWuLsSXRKN2Wmqj+Ax6RJqmsWXnW+xyrm0Pc73GDv3dRINy8SM7aa6SNEbmW3b3J7XnJRsSWlFTFarWuaD23e+UMkYyS+q6rnA53uBF1T1IRG5132+R0SuxO9+uAW4FHheRC53HRQfA24HXsZXJNcz2UHRqBLy7SS592jPnBpF5dP2eC7HKFcjs3xukNnqf81UF8wwik01uba2A4+75ceBGwPyJ1V1XFVPAMeBa0VkFdCiqvudFfJEYB+jiujqG6ExbebtTDfg4JO/iLiyDzP3/E4pq03tzWy+pIVN7c0saYxm3Wcux1jb2sRo2izyamhkZhiVplKKRIH/JSKHROQOJ1upqqmeoWeAlW55NdAV2PeUk612y+nyaYjIHSJyUEQO9vb2FmsMRo7kewPOV/HMZZ+5HKMcNaTypRxpz4YxG5VSJO9R1bcDHwDuEpGtwZXOwiiaba6qu1S1Q1U72toq18t6oZLvDXguT/757jOXY1RjgPcTv3hZXnLDKAUViZGoard77xGRvweuBc6KyCpVPe3cVqlUmG5gbWD3NU7W7ZbT5UaVkW8Rv7m0MM53n7m2Sa62AG+qZHqt9eUw5hdS7iQnEVkEhFR10C3/A/Ag8D7gfCDYvkxVf09EtgB/ja9sLgVeADapalJEDgCfZDLY/nlVfXam43d0dOjBgwdLNj6jOKQyqvIpPJnvPnM5hmEsVETkkKp2ZFxXAUWyEfh79zEC/LWqflpElgNPA+uAN/DTfy+4fX4f+I9AAvgtVX3OyTuYTP99Dvi/Z0v/NUViGIaRP1WlSCqNKRLDMIz8mUmRVFP6r2EYhlGDmCIxDMMwCsIUiWEYhlEQpkgMwzCMglhwwXYR6cXPClsIrADOzbrV/MPGvbCwcZeHN6lqxhndC06RLCRE5GC2LIv5jI17YWHjrjzm2jIMwzAKwhSJYRiGURCmSOY3uyp9AhXCxr2wsHFXGIuRGIZhGAVhFolhGIZREKZIDMMwjIIwRVJjiEhYRL4vIs+4zw+ISLeIvOJeHwxse5+IHBeR10Xk/QH5NSLyQ7fuURGp+nZ6InLSnfMrInLQyZaJyD+IyDH33hrYfl6MPcu45/01F5GlIrJbRI6KyGsi8s4Fcr0zjbv6r7eq2quGXsBv4/dnecZ9fgD4VIbtrgQOA/XABuAnQNitOwBcBwh++f0PVHpcOYz7JLAiTfanwL1u+V7g4fk29izjnvfXHHgc+A23XAcsXSDXO9O4q/56m0VSQ4jIGuBXgC/nsPl24ElVHVfVE8Bx4FrXfbJFVfer/xf3BHBjqc65xGzH/+Hh3m8MyOf72DMxL8YtIkuArcBXAFQ1pqoXmefXe4ZxZ6Nqxm2KpLb4LPB7gJcm/4SI/EBEvhow91cDXYFtTjnZarecLq92FPhfInJIRO5wspWqetotnwFWuuX5NPZM44b5fc03AL3A/+fcuF8Wv5vqfL/e2cYNVX69TZHUCCLyIaBHVQ+lrXoMeDNwNXAa+PMyn1q5eI+qvh34AHCXiGwNrnRPXvMxlz3TuOf7NY8AbwceU9W3AcP4rqwJ5un1zjbuqr/epkhqh3cDN4jISeBJ4L0i8peqelZVk6rqAV/C720P0A2sDey/xsm63XK6vKpR1W733oPfqvla4Kwz43HvPW7zeTP2TONeANf8FHBKVV92n3fj32Dn+/XOOO5auN6mSGoEVb1PVdeo6nrgZuBFVf311A/L8WHgR255D3CziNSLyAZgE3DAuQYGROQ6l8lxC/DN8o0kf0RkkYg0p5aBf4M/zj3ArW6zW5kcx7wYe7Zxz/drrqpngC4ReYsTvQ94lXl+vbONuyaud6WzFOyV/wvYxmTW1teBHwI/cH9YqwLb/T5+JsfrBLI2gA73x/gT4Au4CgfV+gI24menHAaOAL/v5MuBF4BjwPPAsvk09hnGvRCu+dXAQTfG/w60zvfrPcO4q/56W4kUwzAMoyDMtWUYhmEUhCkSwzAMoyBMkRiGYRgFYYrEMAzDKAhTJIZhGEZBmCIxjAogIl8TkZsqfR6GUQxMkRhGDSAikUqfg2FkwxSJYeSBiKx3vSL+yvWL2C0iTa7/wz+54orfDpTyuF1Evisih0Xkb0WkKcN3/omzUMJp8m0i8s8isgd/hnPGY7ttT4rI/+v6VRwUkbe78/iJiPxmWf5zjAWLKRLDyJ+3AF9U1SuAAeAu4PPATap6DfBV4NNu279T1Xeo6luB14Dbgl8kIp8B2oD/U1WTGY71duBuVb08y7HvDGz7U1W9Gvhn4GvATfg9Kf64sOEaxsyYIjGM/OlS1f/tlv8SeD9wFfAPIvIK8AdMFs27ylkVPwT+A7Al8D1/CCxR1d/U7CUmDqjfayLbsd8TWLfHvf8QeFlVB1W1FxgXkaV5j9IwcsT8roaRP+k3/UHgiKq+M8O2XwNuVNXDIvJ/4NdJS/Fd4BoRWaaqF0TkF4Cdbt39+BbH8CzHDn4ed+9eYDn12X7rRskwi8Qw8mediKSUxr8H9gNtKZmIREUkZXk0A6dFJIpvkQT5FvAQ8D9FpFlVX1bVq91rD5lJP/Z3ijUow5grpkgMI39ex28y9Rp+ddbP48cjHhaRw8ArwLvctn8IvAz8b+Bo+hep6jfwe0zsEZHGORz7scKGYhiFY9V/DSMPRGQ9fgn/qxbSsQ1jJswiMQzDMArCLBLDMAyjIMwiMQzDMArCFIlhGIZREKZIDMMwjIIwRWIYhmEUhCkSwzAMoyD+f5uAe88FnJgwAAAAAElFTkSuQmCC\n" }, "metadata": { "needs_background": "light" } } ] }, { "cell_type": "markdown", "source": "

Peak rpm does not seem like a good predictor of the price at all since the regression line is close to horizontal. Also, the data points are very scattered and far from the fitted line, showing lots of variability. Therefore, it's not a reliable variable.

\n", "metadata": {} }, { "cell_type": "markdown", "source": "We can examine the correlation between 'peak-rpm' and 'price' and see it's approximately -0.101616.\n", "metadata": {} }, { "cell_type": "code", "source": "df[['peak-rpm','price']].corr()", "metadata": { "trusted": true }, "execution_count": 18, "outputs": [ { "execution_count": 18, "output_type": "execute_result", "data": { "text/plain": " peak-rpm price\npeak-rpm 1.000000 -0.101616\nprice -0.101616 1.000000", "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
peak-rpmprice
peak-rpm1.000000-0.101616
price-0.1016161.000000
\n
" }, "metadata": {} } ] }, { "cell_type": "markdown", "source": "
\n

Question 3 a):

\n\n

Find the correlation between x=\"stroke\" and y=\"price\".

\n

Hint: if you would like to select those columns, use the following syntax: df[[\"stroke\",\"price\"]].

\n
\n", "metadata": {} }, { "cell_type": "code", "source": "# Write your code below and press Shift+Enter to execute\ndf[[\"stroke\",\"price\"]].corr()", "metadata": { "trusted": true }, "execution_count": 22, "outputs": [ { "execution_count": 22, "output_type": "execute_result", "data": { "text/plain": " stroke price\nstroke 1.00000 0.08231\nprice 0.08231 1.00000", "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
strokeprice
stroke1.000000.08231
price0.082311.00000
\n
" }, "metadata": {} } ] }, { "cell_type": "markdown", "source": "
Click here for the solution\n\n```python\n\n#The correlation is 0.0823, the non-diagonal elements of the table.\n\ndf[[\"stroke\",\"price\"]].corr()\n\n```\n\n
\n", "metadata": {} }, { "cell_type": "markdown", "source": "
\n

Question 3 b):

\n\n

Given the correlation results between \"price\" and \"stroke\", do you expect a linear relationship?

\n

Verify your results using the function \"regplot()\".

\n
\n", "metadata": {} }, { "cell_type": "code", "source": "# Write your code below and press Shift+Enter to execute \nsns.regplot(x=\"stroke\", y=\"price\", data=df)", "metadata": { "trusted": true }, "execution_count": 21, "outputs": [ { "execution_count": 21, "output_type": "execute_result", "data": { "text/plain": "" }, "metadata": {} }, { "output_type": "display_data", "data": { "text/plain": "
", "image/png": "\n" }, "metadata": { "needs_background": "light" } } ] }, { "cell_type": "markdown", "source": "
Click here for the solution\n\n```python\n\n#There is a weak correlation between the variable 'stroke' and 'price.' as such regression will not work well. We can see this using \"regplot\" to demonstrate this.\n\n#Code: \nsns.regplot(x=\"stroke\", y=\"price\", data=df)\n\n```\n\n
\n", "metadata": {} }, { "cell_type": "markdown", "source": "

Categorical Variables

\n\n

These are variables that describe a 'characteristic' of a data unit, and are selected from a small group of categories. The categorical variables can have the type \"object\" or \"int64\". A good way to visualize categorical variables is by using boxplots.

\n", "metadata": {} }, { "cell_type": "markdown", "source": "Let's look at the relationship between \"body-style\" and \"price\".\n", "metadata": {} }, { "cell_type": "code", "source": "sns.boxplot(x=\"body-style\", y=\"price\", data=df)", "metadata": { "scrolled": true, "trusted": true }, "execution_count": 23, "outputs": [ { "execution_count": 23, "output_type": "execute_result", "data": { "text/plain": "" }, "metadata": {} }, { "output_type": "display_data", "data": { "text/plain": "
", "image/png": "\n" }, "metadata": { "needs_background": "light" } } ] }, { "cell_type": "markdown", "source": "

We see that the distributions of price between the different body-style categories have a significant overlap, so body-style would not be a good predictor of price. Let's examine engine \"engine-location\" and \"price\":

\n", "metadata": {} }, { "cell_type": "code", "source": "sns.boxplot(x=\"engine-location\", y=\"price\", data=df)", "metadata": { "scrolled": true, "trusted": true }, "execution_count": 24, "outputs": [ { "execution_count": 24, "output_type": "execute_result", "data": { "text/plain": "" }, "metadata": {} }, { "output_type": "display_data", "data": { "text/plain": "
", "image/png": "iVBORw0KGgoAAAANSUhEUgAAAZIAAAEGCAYAAABPdROvAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/Il7ecAAAACXBIWXMAAAsTAAALEwEAmpwYAAAgNklEQVR4nO3df5AfdZ3n8eeLCQeDGhaGMRcnwaATyw2wG83IxgPPHyQwskrwDiX4I+OZI9wRIbprucTyFqxa9sRaZUksUBSPCf6AGF2JVBJNAi64muAEkZAAxZQEyRjCOCABCZGZvO+P/nzhO8MkmUynp2fyfT2qvjXd7+5P96enJnl/P/3p/nwUEZiZmQ3XEWVXwMzMxjYnEjMzy8WJxMzMcnEiMTOzXJxIzMwsl3FlV2CknXDCCTFlypSyq2FmNqZs2rTpDxHRONi2mkskU6ZMoaOjo+xqmJmNKZIe29c239oyM7NcnEjMzCwXJxIzM8vFicTMzHJxIrFh6+np4bLLLqOnp6fsqphZiZxIbNja29vZvHkzy5YtK7sqZlYiJxIblp6eHtasWUNEsGbNGrdKzGpY4YlEUp2kX0u6Pa1fKalL0n3pc07VvosldUp6WNLZVfEZkjanbUskKcWPknRrim+UNKXo67FMe3s7e/fuBaCvr8+tErMaNhItkkXAgwNi10TE9PRZBSBpGjAXOBloBa6TVJf2vx64CJiaPq0pPh94OiKagWuAqwu9EnvJunXr6O3tBaC3t5e1a9eWXCMzK0uhiUTSJOBvgW8OYfc5wC0RsSciHgU6gdMkTQTGR8SGyGbhWgacV1WmPS2vAM6stFasWLNmzWLcuGxghHHjxjF79uySa2RmZSm6RfKvwGeBvQPin5R0v6RvSTouxZqAx6v22Z5iTWl5YLxfmYjoBZ4BGgZWQtICSR2SOrq7u/NdkQHQ1tbGEUdkfz51dXXMmzev5BqZWVkKSySS3gc8GRGbBmy6HngjMB3YAXy5qDpURMQNEdESES2NjYOOOWYHqaGhgdbWViTR2tpKQ8Mr8reZ1YgiB208HTg3daYfDYyX9O2I+GhlB0nfAG5Pq13A5Kryk1KsKy0PjFeX2S5pHHAs4MeHRkhbWxvbtm1za8SsxhXWIomIxRExKSKmkHWi3xERH019HhUfAB5IyyuBuelJrJPIOtXviYgdwC5JM1P/xzzgtqoybWn5/HSOKOqarL+GhgaWLFni1ohZjStjGPkvSZoOBLANuBggIrZIWg5sBXqBhRHRl8pcAtwE1AOr0wfgRuBmSZ3AU2QJy8zMRpBq7Qt8S0tLeD4SM7ODI2lTRLQMts1vtpuZWS5OJGZmlosTiZmZ5eJEYmZmuTiRmJlZLk4kZmaWixOJmZnl4kRiZma5OJHYsHnOdjMDJxLLwXO2mxk4kdgw9fT0sHr1aiKC1atXu1ViVsOcSGxY2tvbX5pq98UXX3SrxKyGOZHYsKxdu5bKgJ8RwU9/+tOSa2RmZXEisWGZMGHCftfNrHY4kdiw7Ny5c7/rZlY7nEhsWGbPnk02YSVI4qyzziq5RmZWlsITiaQ6Sb+WdHtaP17SWkmPpJ/HVe27WFKnpIclnV0VnyFpc9q2JE25S5qW99YU3yhpStHXY5m2tjaOPPJIAI488kjP225Ww0aiRbIIeLBq/XJgfURMBdandSRNI5sq92SgFbhOUl0qcz1wEdk87lPTdoD5wNMR0QxcA1xd7KVYRUNDA62trUjive99r+dtN6thhSYSSZOAvwW+WRWeA7Sn5XbgvKr4LRGxJyIeBTqB0yRNBMZHxIbIHhNaNqBM5VgrgDMrrRUrXltbG6eeeqpbI2Y1rugWyb8CnwX2VsUmRMSOtPwEUHncpwl4vGq/7SnWlJYHxvuViYhe4BngFV+NJS2Q1CGpo7u7O8/1WJWGhgaWLFni1ohZjSsskUh6H/BkRGza1z6phRFF1aHqPDdEREtEtDQ2NhZ9OjOzmjKuwGOfDpwr6RzgaGC8pG8DOyVNjIgd6bbVk2n/LmByVflJKdaVlgfGq8tslzQOOBbwWB1mZiOosBZJRCyOiEkRMYWsE/2OiPgosBJoS7u1Abel5ZXA3PQk1klkner3pNtguyTNTP0f8waUqRzr/HSOwls4Zmb2siJbJPvyRWC5pPnAY8CHACJii6TlwFagF1gYEX2pzCXATUA9sDp9AG4EbpbUCTxFlrDMzGwEqda+wLe0tERHR0fZ1TAzG1MkbYqIlsG2+c12MzPLxYnEzMxycSIxM7NcnEjMzCwXJxIzM8vFicSGraenh8suu8zztZvVOCcSG7b29nY2b97s+drNalwZLyTaYaCnp4c1a9YQEaxZs4Z58+Z58EYDYOnSpXR2dpZah66ubBSlpqamA+xZvObmZi699NKyq1Eot0hsWNrb29m7NxvUua+vz60SG1V2797N7t27y65GzfCb7TYs55xzDs8///xL68cccwyrVq0qsUZmL1u0aBEA1157bck1OXz4zXY75GbNmtVvzvbZs2eXXCMzK4sTiQ3LueeeS6U1GxG8//3vL7lGZlYWJxIblpUrV/Zrkfz4xz8uuUZmVhYnEhuWdevW9WuRrF27tuQamVlZnEhsWN7xjnfsd93Makdh75FIOhq4CzgqnWdFRFwh6UrgIqA77fq5iFiVyiwG5gN9wGUR8ZMUn8HLE1utAhZFREg6ClgGzCCbYveCiNhW1DXZy2rtab+xYDS8vzFaVH4Plae3al3R77IU+ULiHuA9EfGcpCOBn0uqzGx4TUT8S/XOkqaRzXB4MvA6YJ2kN6VZEq8nSz4byRJJK9ksifOBpyOiWdJc4GrgggKvyZK777673/pdd93F4sWLS6qNQfaf5yNbfs2Jr+478M6Huf/0YnazZc9jftT/d8/VFX6OwhJJmjv9ubR6ZPrs72vsHOCWiNgDPJqmzz1N0jZgfERsAJC0DDiPLJHMAa5M5VcAX5Ukz9tevAkTJrBt27Z+61a+E1/dx+feuqvsatgo8s/3ji/8HIX2kUiqk3Qf8CSwNiI2pk2flHS/pG9JOi7FmoDHq4pvT7GmtDww3q9MRPQCzwCvGKdD0gJJHZI6uru7B262Ydi5c+d+182sdhSaSCKiLyKmA5PIWhenkN2meiMwHdgBfLnIOqR63BARLRHR0tjYWPTpasLAFxDPOuuskmpiZmUbkae2IuKPwJ1Aa0TsTAlmL/AN4LS0WxcwuarYpBTrSssD4/3KSBoHHEvW6W4FO/fcc/ut+4VEs9pVWCKR1CjpL9JyPTAbeEjSxKrdPgA8kJZXAnMlHSXpJGAqcE9E7AB2SZqp7A24ecBtVWXa0vL5wB3uHxkZfiHRzCqKfGprItAuqY4sYS2PiNsl3SxpOlnH+zbgYoCI2CJpObAV6AUWpie2AC7h5cd/V6cPwI3Azalj/imyp75sBAz2QuKnP/3pkmtV27q6uvjTs3Uj0rlqY8djz9bxqq6uA++YQ5FPbd0PvGWQ+Mf2U+Yq4KpB4h3AKYPEXwA+mK+mNhyzZs1i1apV9Pb2Mm7cOA/aaFbDPLGVDUtbWxtr1qwBoK6ujnnz5pVcI2tqamJP7w4//mv9/PO94zmq4Am+PESKDUtDQwOtra1IorW11bMjmtUwt0hs2Nra2ti2bZtbI6PI755zHwnAzuez78gTjtlbck3K97vn6pha8DmcSGzYGhoaWLJkSdnVsKS5ubnsKowaf05jbR31ev9OplL834YTiQ1bT08PX/jCF7jiiit8a2sUKHJQvrHGU+2OLCcSG7b29nY2b97MsmXL/OivvWQ0jEI8mkb/LXrk3dHAne02LD09PaxZs4aIYM2aNfT0eEABGz3q6+upr68vuxo1wy0SG5b29nb6+rL3RXt7e90qsZcc7t++7ZXcIrFhWbdu3UuJpK+vz1PtmtUwJxIbljPOOKPfuqfaNatdTiQ2LJUBG83MnEhsWAZOtTtw3cxqhxOJDcusWbP6DSPvQRvNapcTiQ3Lueee228YeU9sZVa7nEhsWDyxlZlVOJHYsAw2sZWZ1aYip9o9WtI9kn4jaYukL6T48ZLWSnok/TyuqsxiSZ2SHpZ0dlV8hqTNaduSNOUuaVreW1N8o6QpRV2P9Tdr1izGjcveZ/XEVma1rcgWyR7gPRHx18B0oFXSTOByYH1ETAXWp3UkTSObKvdkoBW4Lk3TC3A9cBHZQJZT03aA+cDTEdEMXANcXeD1WJW2tjaOOCL78/HEVma1rbBEEpnn0uqR6RPAHKA9xduB89LyHOCWiNgTEY8CncBpkiYC4yNiQ2T3UpYNKFM51grgzEprxYrlia3MrKLQPhJJdZLuA54E1kbERmBCROxIuzwBTEjLTcDjVcW3p1hTWh4Y71cmInqBZ4BX/I8maYGkDkkd3d3dh+LSjKxVcuqpp7o1YlbjCk0kEdEXEdOBSWSti1MGbA+yVkqhIuKGiGiJiJbGxsaiT1czKhNbuTViVttG5KmtiPgjcCdZ38bOdLuK9PPJtFsXMLmq2KQU60rLA+P9ykgaBxwLeDxzM7MRVORTW42S/iIt1wOzgYeAlUBb2q0NuC0trwTmpiexTiLrVL8n3QbbJWlm6v+YN6BM5VjnA3dE5ZlUMzMbEUXORzIRaE9PXh0BLI+I2yX9ElguaT7wGPAhgIjYImk5sBXoBRZGRF861iXATUA9sDp9AG4EbpbUCTxF9tSXmZmNINXaF/iWlpbo6OgouxpmZmOKpE0R0TLYNr/ZbmZmuXiq3TFo6dKldHZ2ll0NurqyZx6ampoOsGexmpubPb2rWYmcSGzYdu/eXXYVzGwUcCIZg0bLt+9FixYBcO2115ZcEzMrk/tIzMwsFycSMzPLxYnEzMxyGXIikfR6SbPScr2k1xRXLTMzGyuGlEgkXUQ2TPvXU2gS8KOC6mRmZmPIUFskC4HTgV0AEfEI8NqiKmVmZmPHUBPJnoj4c2UljbRbW2OrmJnZoIaaSP5d0ueAekmzge8DPy6uWmZmNlYMNZFcDnQDm4GLgVXA54uqlJmZjR1DfbO9HvhWRHwDsil0U+z5oipmZmZjw1BbJOvJEkdFPbDu0FfHzMzGmqEmkqMj4rnKSlo+Zn8FJE2WdKekrZK2SFqU4ldK6pJ0X/qcU1VmsaROSQ9LOrsqPkPS5rRtSZopkTSb4q0pvlHSlIO4djMzOwSGmkj+JOmtlRVJM4ADDf3aC/x9REwDZgILJU1L266JiOnpsyodcxrZDIcnk83tfl26hQZwPXAR2fS7U9N2gPnA0xHRDFwDXD3E6zEzs0NkqH0knwK+L+n3gID/DFywvwJprvUdaflZSQ8C+5u4Yg5wS0TsAR5N0+eeJmkbMD4iNgBIWgacRzbd7hzgylR+BfBVSfK87WZmI2dILZKI+BXwZuB/A/8L+MuI2DTUk6RbTm8BNqbQJyXdL+lbko5LsSbg8api21OsKS0PjPcrExG9wDNAwyDnXyCpQ1JHd3f3UKttZmZDsN9EIuk96ed/A94PvCl93p9iByTp1cAPgE9FxC6y21RvBKaTtVi+PNzKD1VE3BARLRHR0tjYWPTpzMxqyoFubb0TuIMsiQwUwA/3V1jSkWRJ5DsR8UOAiNhZtf0bwO1ptQuYXFV8Uop1peWB8eoy29Pb9scCPQe4JjMzO4T2m0gi4gpJRwCrI2L5wRw4PVl1I/BgRHylKj4x9Z8AfAB4IC2vBL4r6SvA68g61e+JiD5JuyTNJLs1Ng9YWlWmDfglcD5wh/tHzMxG1gE72yNir6TPAgeVSMgGefwYsFnSfSn2OeBCSdPJWjTbyN6UJyK2SFoObCV74mthRPSlcpcAN5G9v7I6fSBLVDenjvmnyJ76MjOzETTUp7bWSfoMcCvwp0owIp7aV4GI+DnZE14DrdpPmauAqwaJdwCnDBJ/AfjgfmtuZmaFGmoiuYCsBXHJgPgbDm11zMxsrBlqIplGlkTOIEsodwNfK6pSZmY2dgw1kbSTTWq1JK1/OMU+VESlzMxs7BhqIjklDXVScaekrUVUyMzMxpahjrV1b3r8FgBJfwN0FFMlMzMbS4baIpkB/ELS79L6icDDkjYDERF/VUjtzMxs1BtqImk98C5mZlaLhpRIIuKxoitiZmZj01D7SMzMzAblRGJmZrk4kZiZWS5OJGZmlosTiZmZ5eJEYmZmuTiRmJlZLk4kZmaWS2GJRNJkSXdK2ippi6RFKX68pLWSHkk/j6sqs1hSp6SHJZ1dFZ8haXPatiRN44ukoyTdmuIbJU0p6nrMzGxwRbZIeoG/T6MGzwQWSpoGXA6sj4ipwPq0Tto2FziZbEiW6yTVpWNdD1xENo/7VF4esmU+8HRENAPXAFcXeD1mZjaIwhJJROyIiHvT8rPAg0ATMIdsLhPSz/PS8hzglojYExGPAp3AaZImAuMjYkNEBLBsQJnKsVYAZ1ZaK2ZmNjJGpI8k3XJ6C7ARmBARO9KmJ4AJabkJeLyq2PYUa0rLA+P9ykREL/AM0DDI+RdI6pDU0d3dfSguyczMksITiaRXAz8APhURu6q3pRZGFF2HiLghIloioqWxsbHo05mZ1ZRCE4mkI8mSyHci4ocpvDPdriL9fDLFu4DJVcUnpVhXWh4Y71dG0jjgWKDn0F+JmZntS5FPbQm4EXgwIr5StWkl0JaW24DbquJz05NYJ5F1qt+TboPtkjQzHXPegDKVY50P3JFaOWZmNkKGOrHVcJwOfAzYLOm+FPsc8EVguaT5wGPAhwAiYouk5cBWsie+FkZEXyp3CXATUA+sTh/IEtXNkjqBp8ie+jIzsxFUWCKJiJ8D+3qC6sx9lLkKuGqQeAdwyiDxF4AP5qimmZnl5DfbzcwsFycSMzPLxYnEzMxycSIxM7NcnEjMzCwXJxIzM8vFicTMzHJxIjEzs1ycSMzMLBcnEjMzy8WJxMzMcily0MbD0tKlS+ns7Cy7GqNC5fewaNGikmsyOjQ3N3PppZeWXQ2zEedEcpA6Ozu574EH6Tvm+LKrUroj/pyN2L/ptztLrkn56p5/quwqmJXGiWQY+o45nt1vPqfsatgoUv/QqrKrYFYa95GYmVkuRc6Q+C1JT0p6oCp2paQuSfelzzlV2xZL6pT0sKSzq+IzJG1O25akWRJJMynemuIbJU0p6lrMzGzfimyR3AS0DhK/JiKmp88qAEnTyGY3PDmVuU5SXdr/euAisql3p1Ydcz7wdEQ0A9cAVxd1IWZmtm+FJZKIuIts+tuhmAPcEhF7IuJRoBM4TdJEYHxEbEhzsS8Dzqsq056WVwBnVlorZmY2csroI/mkpPvTra/jUqwJeLxqn+0p1pSWB8b7lYmIXuAZoGGwE0paIKlDUkd3d/ehuxIzMxvxRHI98EZgOrAD+PJInDQiboiIlohoaWxsHIlTmpnVjBFNJBGxMyL6ImIv8A3gtLSpC5hcteukFOtKywPj/cpIGgccC/QUV3szMxvMiCaS1OdR8QGg8kTXSmBuehLrJLJO9XsiYgewS9LM1P8xD7itqkxbWj4fuCP1o5iZ2Qgq7IVESd8D3gWcIGk7cAXwLknTgQC2ARcDRMQWScuBrUAvsDAi+tKhLiF7AqweWJ0+ADcCN0vqJOvUn1vUtZiZ2b4Vlkgi4sJBwjfuZ/+rgKsGiXcApwwSfwH4YJ46mplZfn6z3czMcnEiMTOzXJxIzMwsFycSMzPLxYnEzMxycSIxM7NcnEjMzCwXJxIzM8vFicTMzHJxIjEzs1ycSMzMLBcnEjMzy8WJxMzMcnEiMTOzXJxIzMwsFycSMzPLpbBEIulbkp6U9EBV7HhJayU9kn4eV7VtsaROSQ9LOrsqPkPS5rRtSZpylzQt760pvlHSlKKuxczM9q2wGRLJpsf9KrCsKnY5sD4ivijp8rT+D5KmkU2VezLwOmCdpDel6XavBy4CNgKrgFay6XbnA09HRLOkucDVwAUFXg8AXV1d1D3/DPUPrSr6VDaG1D3fQ1dXb9nVMCtFYS2SiLiLbC71anOA9rTcDpxXFb8lIvZExKNAJ3CapInA+IjYEBFBlpTOG+RYK4AzK60VMzMbOUW2SAYzISJ2pOUngAlpuQnYULXf9hR7MS0PjFfKPA4QEb2SngEagD8MPKmkBcACgBNPPDHXBTQ1NfHEnnHsfvM5uY5jh5f6h1bR1DThwDuaHYZK62xPLYwYoXPdEBEtEdHS2Ng4Eqc0M6sZI51IdqbbVaSfT6Z4FzC5ar9JKdaVlgfG+5WRNA44FugprOZmZjaokU4kK4G2tNwG3FYVn5uexDoJmArck26D7ZI0M/V/zBtQpnKs84E7UivHzMxGUGF9JJK+B7wLOEHSduAK4IvAcknzgceADwFExBZJy4GtQC+wMD2xBXAJ2RNg9WRPa61O8RuBmyV1knXqzy3qWszMbN8KSyQRceE+Np25j/2vAq4aJN4BnDJI/AXgg3nqaGZm+fnNdjMzy8WJxMzMcnEiMTOzXJxIzMwsl5F+s/2wUPf8Ux5rCzjihV0A7D16fMk1KV/d80/x8kANZrXFieQgNTc3l12FUaOz81kAmt/g/0Bhgv82rGY5kRykSy+9tOwqjBqLFi0C4Nprry25JmZWJveRmJlZLk4kZmaWixOJmZnl4kRiZma5OJGYmVkuTiRmZpaLE4mZmeXiRGJmZrmUkkgkbZO0WdJ9kjpS7HhJayU9kn4eV7X/Ykmdkh6WdHZVfEY6TqekJWkWRTMzG0FltkjeHRHTI6IlrV8OrI+IqcD6tI6kaWSzH54MtALXSapLZa4HLiKbmndq2m5mZiNoNA2RModsal6AduBnwD+k+C0RsQd4NE2te5qkbcD4iNgAIGkZcB4vT8V72Fq6dCmdnZ1lV+OlOlSGSilLc3Ozh64xK1FZLZIAfippk6QFKTYhInak5Sd4eSjVJuDxqrLbU6wpLQ+Mv4KkBZI6JHV0d3cfqmuoefX19dTX15ddDTMrWVktkjMiokvSa4G1kh6q3hgRISkO1cki4gbgBoCWlpZDdtyy+Nu3mY0mpbRIIqIr/XwS+DfgNGCnpIkA6eeTafcuYHJV8Ukp1pWWB8bNzGwEjXgikfQqSa+pLANnAQ8AK4G2tFsbcFtaXgnMlXSUpJPIOtXvSbfBdkmamZ7WmldVxszMRkgZt7YmAP+WntQdB3w3ItZI+hWwXNJ84DHgQwARsUXScmAr0AssjIi+dKxLgJuAerJO9sO+o93MbLRRxJjvMjgoLS0t0dHRUXY1zMzGFEmbql7X6MdvtpuZWS5OJGZmlosTiZmZ5eJEYmZmudRcZ7ukbrKnwuzQOAH4Q9mVMBuE/zYPrddHRONgG2oukdihJaljX09ymJXJf5sjx7e2zMwsFycSMzPLxYnE8rqh7AqY7YP/NkeI+0jMzCwXt0jMzCwXJxIzM8vFicReQdJlkh6U9J2cx5ki6cOHql5mNjo5kdhgLgFmR8RHKgFJw5lyYArgRGKFUmbY/5dJqjuU9alFTiTWj6SvAW8AVkt6RtLNkv4DuDm1MO6QdL+k9ZJOTGVukrRE0i8k/VbS+elwXwTeIek+SZ8u6ZLsMJT+Fh+WtIxsYrz/I+lX6W/zC1X7/UjSJklbJC2oij8n6cuSfgO8vYRLOKz4qS17BUnbgBbgk8D7gTMiYrekHwMrIqJd0ieAcyPiPEk3Aa8CLgDeDKyMiGZJ7wI+ExHvK+Ey7DAmaQrwW+C/AOOB84GLAZHNqvqliLhL0vER8ZSkeuBXwDsjokdSABdExPJyruDw4haJHcjKiNidlt8OfDct3wycUbXfjyJib0RsJZsF06xoj0XEBrLpus8Cfg3cS/ZlZmra57LU6tgATK6K9wE/GNnqHr7KmGrXxpY/DXG/PVXLKqIiZgNU/jYF/N+I+Hr1xtQingW8PSKel/Qz4Oi0+YWqKbstJ7dI7GD8Apiblj8C3H2A/Z8FXlNojczgJ8AnJL0aQFKTpNcCxwJPpyTyZmBmmZU8nDmR2MG4FPgfku4HPgYsOsD+9wN9kn7jznYrSkT8lOyW6y8lbQZWkH2BWQOMk/Qg2YMfG8qr5eHNne1mZpaLWyRmZpaLE4mZmeXiRGJmZrk4kZiZWS5OJGZmlosTidkwSXqdpBWH6FhTJD1wKI5VdcyPS3pd1fo3JU07lOcwA7/ZbjZsEfF7sjGeRquPkw1o+HuAiPifpdbGDltukVhNkvRRSfekkYm/LqkujQh7VXqBcoOkCWnfN6b1zZL+SdJzKf5SKyJ9+/+hpDWSHpH0papznSXpl5LulfT9yhvY+6nb0ZL+XzrfryW9O8XrJP2LpAfSKLeXpvg/ppFvH5B0QxpW/XyygTe/k66xXtLPJLWkMhem4z8g6eqqcw/6OzDbHycSqzmS/pJspOLTI2I62QB+HyEbwXhDRPw1cBdwUSpyLXBtRJwKbN/Poaen454KXCBpsqQTgM8DsyLirUAH8HcHqOJCINL5LgTaJR0NLCCb42V6RPwVUJl47KsR8baIOAWoB94XESvSuT4SEdOrBt4k3e66GnhPqvPbJJ2XNu/rd2C2T04kVovOBGYAv5J0X1p/A/Bn4Pa0zyay/7QhG/X4+2n5u+zb+oh4JiJeALYCrycb32ka8B/pXG0pvj9nAN8GiIiHgMeAN5ENQPj1iOhN255K+79b0sY0PMh7gJMPcPy3AT+LiO50rO8A/zVt29fvwGyf3EditUhAe0Qs7heUPhMvjxnUx8H/+6geAblSXsDaiLhwwLn+BqiMVvuPZOOSHbTUUrkOaImIxyVdycsj3A7Hizl/B1aD3CKxWrQeOD+NEIuk4yXtr5WwAfjvaXnufvbbV9nTJTWnc71K0psiYmO65TQ9IlYOKHM32a02JL0JOBF4GFgLXKw07bGk43k5afwh9b1Ud/7va/Tle4B3SjpB2TSzFwL/fpDXZfYSJxKrOWnyrc8DP00jGa8FJu6nyKeAv0v7NgPPHMS5usmenvpeKv9LsomX9uc64Ih0q+pW4OMRsQf4JvA74P40WdOHI+KPwDfIns76CdksgBU3AV+rdLZX1WkHcDlwJ/AbYFNE3DbUazIbyKP/mh2ApGOA3RERkuYCF0bEnLLrZTZa+P6n2YHNAL4qScAfgU+UWx2z0cUtEjMzy8V9JGZmlosTiZmZ5eJEYmZmuTiRmJlZLk4kZmaWy/8HfZs+WFeRrAsAAAAASUVORK5CYII=\n" }, "metadata": { "needs_background": "light" } } ] }, { "cell_type": "markdown", "source": "

Here we see that the distribution of price between these two engine-location categories, front and rear, are distinct enough to take engine-location as a potential good predictor of price.

\n", "metadata": {} }, { "cell_type": "markdown", "source": "Let's examine \"drive-wheels\" and \"price\".\n", "metadata": {} }, { "cell_type": "code", "source": "# drive-wheels\nsns.boxplot(x=\"drive-wheels\", y=\"price\", data=df)", "metadata": { "trusted": true }, "execution_count": 25, "outputs": [ { "execution_count": 25, "output_type": "execute_result", "data": { "text/plain": "" }, "metadata": {} }, { "output_type": "display_data", "data": { "text/plain": "
", "image/png": "\n" }, "metadata": { "needs_background": "light" } } ] }, { "cell_type": "markdown", "source": "

Here we see that the distribution of price between the different drive-wheels categories differs. As such, drive-wheels could potentially be a predictor of price.

\n", "metadata": {} }, { "cell_type": "markdown", "source": "

3. Descriptive Statistical Analysis

\n", "metadata": {} }, { "cell_type": "markdown", "source": "

Let's first take a look at the variables by utilizing a description method.

\n\n

The describe function automatically computes basic statistics for all continuous variables. Any NaN values are automatically skipped in these statistics.

\n\nThis will show:\n\n
    \n
  • the count of that variable
  • \n
  • the mean
  • \n
  • the standard deviation (std)
  • \n
  • the minimum value
  • \n
  • the IQR (Interquartile Range: 25%, 50% and 75%)
  • \n
  • the maximum value
  • \n
      \n", "metadata": {} }, { "cell_type": "markdown", "source": "We can apply the method \"describe\" as follows:\n", "metadata": {} }, { "cell_type": "code", "source": "df.describe()", "metadata": { "trusted": true }, "execution_count": 26, "outputs": [ { "execution_count": 26, "output_type": "execute_result", "data": { "text/plain": " symboling normalized-losses wheel-base length width \\\ncount 201.000000 201.00000 201.000000 201.000000 201.000000 \nmean 0.840796 122.00000 98.797015 0.837102 0.915126 \nstd 1.254802 31.99625 6.066366 0.059213 0.029187 \nmin -2.000000 65.00000 86.600000 0.678039 0.837500 \n25% 0.000000 101.00000 94.500000 0.801538 0.890278 \n50% 1.000000 122.00000 97.000000 0.832292 0.909722 \n75% 2.000000 137.00000 102.400000 0.881788 0.925000 \nmax 3.000000 256.00000 120.900000 1.000000 1.000000 \n\n height curb-weight engine-size bore stroke \\\ncount 201.000000 201.000000 201.000000 201.000000 197.000000 \nmean 53.766667 2555.666667 126.875622 3.330692 3.256904 \nstd 2.447822 517.296727 41.546834 0.268072 0.319256 \nmin 47.800000 1488.000000 61.000000 2.540000 2.070000 \n25% 52.000000 2169.000000 98.000000 3.150000 3.110000 \n50% 54.100000 2414.000000 120.000000 3.310000 3.290000 \n75% 55.500000 2926.000000 141.000000 3.580000 3.410000 \nmax 59.800000 4066.000000 326.000000 3.940000 4.170000 \n\n compression-ratio horsepower peak-rpm city-mpg highway-mpg \\\ncount 201.000000 201.000000 201.000000 201.000000 201.000000 \nmean 10.164279 103.405534 5117.665368 25.179104 30.686567 \nstd 4.004965 37.365700 478.113805 6.423220 6.815150 \nmin 7.000000 48.000000 4150.000000 13.000000 16.000000 \n25% 8.600000 70.000000 4800.000000 19.000000 25.000000 \n50% 9.000000 95.000000 5125.369458 24.000000 30.000000 \n75% 9.400000 116.000000 5500.000000 30.000000 34.000000 \nmax 23.000000 262.000000 6600.000000 49.000000 54.000000 \n\n price city-L/100km diesel gas \ncount 201.000000 201.000000 201.000000 201.000000 \nmean 13207.129353 9.944145 0.099502 0.900498 \nstd 7947.066342 2.534599 0.300083 0.300083 \nmin 5118.000000 4.795918 0.000000 0.000000 \n25% 7775.000000 7.833333 0.000000 1.000000 \n50% 10295.000000 9.791667 0.000000 1.000000 \n75% 16500.000000 12.368421 0.000000 1.000000 \nmax 45400.000000 18.076923 1.000000 1.000000 ", "text/html": "
      \n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
      symbolingnormalized-losseswheel-baselengthwidthheightcurb-weightengine-sizeborestrokecompression-ratiohorsepowerpeak-rpmcity-mpghighway-mpgpricecity-L/100kmdieselgas
      count201.000000201.00000201.000000201.000000201.000000201.000000201.000000201.000000201.000000197.000000201.000000201.000000201.000000201.000000201.000000201.000000201.000000201.000000201.000000
      mean0.840796122.0000098.7970150.8371020.91512653.7666672555.666667126.8756223.3306923.25690410.164279103.4055345117.66536825.17910430.68656713207.1293539.9441450.0995020.900498
      std1.25480231.996256.0663660.0592130.0291872.447822517.29672741.5468340.2680720.3192564.00496537.365700478.1138056.4232206.8151507947.0663422.5345990.3000830.300083
      min-2.00000065.0000086.6000000.6780390.83750047.8000001488.00000061.0000002.5400002.0700007.00000048.0000004150.00000013.00000016.0000005118.0000004.7959180.0000000.000000
      25%0.000000101.0000094.5000000.8015380.89027852.0000002169.00000098.0000003.1500003.1100008.60000070.0000004800.00000019.00000025.0000007775.0000007.8333330.0000001.000000
      50%1.000000122.0000097.0000000.8322920.90972254.1000002414.000000120.0000003.3100003.2900009.00000095.0000005125.36945824.00000030.00000010295.0000009.7916670.0000001.000000
      75%2.000000137.00000102.4000000.8817880.92500055.5000002926.000000141.0000003.5800003.4100009.400000116.0000005500.00000030.00000034.00000016500.00000012.3684210.0000001.000000
      max3.000000256.00000120.9000001.0000001.00000059.8000004066.000000326.0000003.9400004.17000023.000000262.0000006600.00000049.00000054.00000045400.00000018.0769231.0000001.000000
      \n
      " }, "metadata": {} } ] }, { "cell_type": "markdown", "source": "The default setting of \"describe\" skips variables of type object. We can apply the method \"describe\" on the variables of type 'object' as follows:\n", "metadata": {} }, { "cell_type": "code", "source": "df.describe(include=['object'])", "metadata": { "scrolled": true, "trusted": true }, "execution_count": 27, "outputs": [ { "execution_count": 27, "output_type": "execute_result", "data": { "text/plain": " make aspiration num-of-doors body-style drive-wheels \\\ncount 201 201 201 201 201 \nunique 22 2 2 5 3 \ntop toyota std four sedan fwd \nfreq 32 165 115 94 118 \n\n engine-location engine-type num-of-cylinders fuel-system \\\ncount 201 201 201 201 \nunique 2 6 7 8 \ntop front ohc four mpfi \nfreq 198 145 157 92 \n\n horsepower-binned \ncount 200 \nunique 3 \ntop Low \nfreq 115 ", "text/html": "
      \n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
      makeaspirationnum-of-doorsbody-styledrive-wheelsengine-locationengine-typenum-of-cylindersfuel-systemhorsepower-binned
      count201201201201201201201201201200
      unique22225326783
      toptoyotastdfoursedanfwdfrontohcfourmpfiLow
      freq321651159411819814515792115
      \n
      " }, "metadata": {} } ] }, { "cell_type": "markdown", "source": "

      Value Counts

      \n", "metadata": {} }, { "cell_type": "markdown", "source": "

      Value counts is a good way of understanding how many units of each characteristic/variable we have. We can apply the \"value_counts\" method on the column \"drive-wheels\". Don’t forget the method \"value_counts\" only works on pandas series, not pandas dataframes. As a result, we only include one bracket df['drive-wheels'], not two brackets df[['drive-wheels']].

      \n", "metadata": {} }, { "cell_type": "code", "source": "df['drive-wheels'].value_counts()", "metadata": { "trusted": true }, "execution_count": 28, "outputs": [ { "execution_count": 28, "output_type": "execute_result", "data": { "text/plain": "fwd 118\nrwd 75\n4wd 8\nName: drive-wheels, dtype: int64" }, "metadata": {} } ] }, { "cell_type": "markdown", "source": "We can convert the series to a dataframe as follows:\n", "metadata": {} }, { "cell_type": "code", "source": "df['drive-wheels'].value_counts().to_frame()", "metadata": { "trusted": true }, "execution_count": 29, "outputs": [ { "execution_count": 29, "output_type": "execute_result", "data": { "text/plain": " drive-wheels\nfwd 118\nrwd 75\n4wd 8", "text/html": "
      \n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
      drive-wheels
      fwd118
      rwd75
      4wd8
      \n
      " }, "metadata": {} } ] }, { "cell_type": "markdown", "source": "Let's repeat the above steps but save the results to the dataframe \"drive_wheels_counts\" and rename the column 'drive-wheels' to 'value_counts'.\n", "metadata": {} }, { "cell_type": "code", "source": "drive_wheels_counts = df['drive-wheels'].value_counts().to_frame()\ndrive_wheels_counts.rename(columns={'drive-wheels': 'value_counts'}, inplace=True)\ndrive_wheels_counts", "metadata": { "trusted": true }, "execution_count": 30, "outputs": [ { "execution_count": 30, "output_type": "execute_result", "data": { "text/plain": " value_counts\nfwd 118\nrwd 75\n4wd 8", "text/html": "
      \n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
      value_counts
      fwd118
      rwd75
      4wd8
      \n
      " }, "metadata": {} } ] }, { "cell_type": "markdown", "source": "Now let's rename the index to 'drive-wheels':\n", "metadata": {} }, { "cell_type": "code", "source": "drive_wheels_counts.index.name = 'drive-wheels'\ndrive_wheels_counts", "metadata": { "trusted": true }, "execution_count": 31, "outputs": [ { "execution_count": 31, "output_type": "execute_result", "data": { "text/plain": " value_counts\ndrive-wheels \nfwd 118\nrwd 75\n4wd 8", "text/html": "
      \n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
      value_counts
      drive-wheels
      fwd118
      rwd75
      4wd8
      \n
      " }, "metadata": {} } ] }, { "cell_type": "markdown", "source": "We can repeat the above process for the variable 'engine-location'.\n", "metadata": {} }, { "cell_type": "code", "source": "# engine-location as variable\nengine_loc_counts = df['engine-location'].value_counts().to_frame()\nengine_loc_counts.rename(columns={'engine-location': 'value_counts'}, inplace=True)\nengine_loc_counts.index.name = 'engine-location'\nengine_loc_counts.head(10)", "metadata": { "trusted": true }, "execution_count": 32, "outputs": [ { "execution_count": 32, "output_type": "execute_result", "data": { "text/plain": " value_counts\nengine-location \nfront 198\nrear 3", "text/html": "
      \n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
      value_counts
      engine-location
      front198
      rear3
      \n
      " }, "metadata": {} } ] }, { "cell_type": "markdown", "source": "

      After examining the value counts of the engine location, we see that engine location would not be a good predictor variable for the price. This is because we only have three cars with a rear engine and 198 with an engine in the front, so this result is skewed. Thus, we are not able to draw any conclusions about the engine location.

      \n", "metadata": {} }, { "cell_type": "markdown", "source": "

      4. Basics of Grouping

      \n", "metadata": {} }, { "cell_type": "markdown", "source": "

      The \"groupby\" method groups data by different categories. The data is grouped based on one or several variables, and analysis is performed on the individual groups.

      \n\n

      For example, let's group by the variable \"drive-wheels\". We see that there are 3 different categories of drive wheels.

      \n", "metadata": {} }, { "cell_type": "code", "source": "df['drive-wheels'].unique()", "metadata": { "trusted": true }, "execution_count": 33, "outputs": [ { "execution_count": 33, "output_type": "execute_result", "data": { "text/plain": "array(['rwd', 'fwd', '4wd'], dtype=object)" }, "metadata": {} } ] }, { "cell_type": "markdown", "source": "

      If we want to know, on average, which type of drive wheel is most valuable, we can group \"drive-wheels\" and then average them.

      \n\n

      We can select the columns 'drive-wheels', 'body-style' and 'price', then assign it to the variable \"df_group_one\".

      \n", "metadata": {} }, { "cell_type": "code", "source": "df_group_one = df[['drive-wheels','body-style','price']]", "metadata": { "trusted": true }, "execution_count": 34, "outputs": [] }, { "cell_type": "markdown", "source": "We can then calculate the average price for each of the different categories of data.\n", "metadata": {} }, { "cell_type": "code", "source": "# grouping results\ndf_group_one = df_group_one.groupby(['drive-wheels'],as_index=False).mean()\ndf_group_one", "metadata": { "trusted": true }, "execution_count": 35, "outputs": [ { "execution_count": 35, "output_type": "execute_result", "data": { "text/plain": " drive-wheels price\n0 4wd 10241.000000\n1 fwd 9244.779661\n2 rwd 19757.613333", "text/html": "
      \n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
      drive-wheelsprice
      04wd10241.000000
      1fwd9244.779661
      2rwd19757.613333
      \n
      " }, "metadata": {} } ] }, { "cell_type": "markdown", "source": "

      From our data, it seems rear-wheel drive vehicles are, on average, the most expensive, while 4-wheel and front-wheel are approximately the same in price.

      \n\n

      You can also group by multiple variables. For example, let's group by both 'drive-wheels' and 'body-style'. This groups the dataframe by the unique combination of 'drive-wheels' and 'body-style'. We can store the results in the variable 'grouped_test1'.

      \n", "metadata": {} }, { "cell_type": "code", "source": "# grouping results\ndf_gptest = df[['drive-wheels','body-style','price']]\ngrouped_test1 = df_gptest.groupby(['drive-wheels','body-style'],as_index=False).mean()\ngrouped_test1", "metadata": { "trusted": true }, "execution_count": 36, "outputs": [ { "execution_count": 36, "output_type": "execute_result", "data": { "text/plain": " drive-wheels body-style price\n0 4wd hatchback 7603.000000\n1 4wd sedan 12647.333333\n2 4wd wagon 9095.750000\n3 fwd convertible 11595.000000\n4 fwd hardtop 8249.000000\n5 fwd hatchback 8396.387755\n6 fwd sedan 9811.800000\n7 fwd wagon 9997.333333\n8 rwd convertible 23949.600000\n9 rwd hardtop 24202.714286\n10 rwd hatchback 14337.777778\n11 rwd sedan 21711.833333\n12 rwd wagon 16994.222222", "text/html": "
      \n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
      drive-wheelsbody-styleprice
      04wdhatchback7603.000000
      14wdsedan12647.333333
      24wdwagon9095.750000
      3fwdconvertible11595.000000
      4fwdhardtop8249.000000
      5fwdhatchback8396.387755
      6fwdsedan9811.800000
      7fwdwagon9997.333333
      8rwdconvertible23949.600000
      9rwdhardtop24202.714286
      10rwdhatchback14337.777778
      11rwdsedan21711.833333
      12rwdwagon16994.222222
      \n
      " }, "metadata": {} } ] }, { "cell_type": "markdown", "source": "

      This grouped data is much easier to visualize when it is made into a pivot table. A pivot table is like an Excel spreadsheet, with one variable along the column and another along the row. We can convert the dataframe to a pivot table using the method \"pivot\" to create a pivot table from the groups.

      \n\n

      In this case, we will leave the drive-wheels variable as the rows of the table, and pivot body-style to become the columns of the table:

      \n", "metadata": {} }, { "cell_type": "code", "source": "grouped_pivot = grouped_test1.pivot(index='drive-wheels',columns='body-style')\ngrouped_pivot", "metadata": { "trusted": true }, "execution_count": 37, "outputs": [ { "execution_count": 37, "output_type": "execute_result", "data": { "text/plain": " price \\\nbody-style convertible hardtop hatchback sedan \ndrive-wheels \n4wd NaN NaN 7603.000000 12647.333333 \nfwd 11595.0 8249.000000 8396.387755 9811.800000 \nrwd 23949.6 24202.714286 14337.777778 21711.833333 \n\n \nbody-style wagon \ndrive-wheels \n4wd 9095.750000 \nfwd 9997.333333 \nrwd 16994.222222 ", "text/html": "
      \n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
      price
      body-styleconvertiblehardtophatchbacksedanwagon
      drive-wheels
      4wdNaNNaN7603.00000012647.3333339095.750000
      fwd11595.08249.0000008396.3877559811.8000009997.333333
      rwd23949.624202.71428614337.77777821711.83333316994.222222
      \n
      " }, "metadata": {} } ] }, { "cell_type": "markdown", "source": "

      Often, we won't have data for some of the pivot cells. We can fill these missing cells with the value 0, but any other value could potentially be used as well. It should be mentioned that missing data is quite a complex subject and is an entire course on its own.

      \n", "metadata": {} }, { "cell_type": "code", "source": "grouped_pivot = grouped_pivot.fillna(0) #fill missing values with 0\ngrouped_pivot", "metadata": { "scrolled": true, "trusted": true }, "execution_count": 38, "outputs": [ { "execution_count": 38, "output_type": "execute_result", "data": { "text/plain": " price \\\nbody-style convertible hardtop hatchback sedan \ndrive-wheels \n4wd 0.0 0.000000 7603.000000 12647.333333 \nfwd 11595.0 8249.000000 8396.387755 9811.800000 \nrwd 23949.6 24202.714286 14337.777778 21711.833333 \n\n \nbody-style wagon \ndrive-wheels \n4wd 9095.750000 \nfwd 9997.333333 \nrwd 16994.222222 ", "text/html": "
      \n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
      price
      body-styleconvertiblehardtophatchbacksedanwagon
      drive-wheels
      4wd0.00.0000007603.00000012647.3333339095.750000
      fwd11595.08249.0000008396.3877559811.8000009997.333333
      rwd23949.624202.71428614337.77777821711.83333316994.222222
      \n
      " }, "metadata": {} } ] }, { "cell_type": "markdown", "source": "
      \n

      Question 4:

      \n\n

      Use the \"groupby\" function to find the average \"price\" of each car based on \"body-style\".

      \n
      \n", "metadata": {} }, { "cell_type": "code", "source": "# Write your code below and press Shift+Enter to execute \n# grouping results\ndf_group_one = df[['body-style','price']]\ndf_group_one = df_group_one.groupby(['body-style'],as_index=False).mean()\ndf_group_one\n", "metadata": { "trusted": true }, "execution_count": 39, "outputs": [ { "execution_count": 39, "output_type": "execute_result", "data": { "text/plain": " body-style price\n0 convertible 21890.500000\n1 hardtop 22208.500000\n2 hatchback 9957.441176\n3 sedan 14459.755319\n4 wagon 12371.960000", "text/html": "
      \n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
      body-styleprice
      0convertible21890.500000
      1hardtop22208.500000
      2hatchback9957.441176
      3sedan14459.755319
      4wagon12371.960000
      \n
      " }, "metadata": {} } ] }, { "cell_type": "markdown", "source": "
      Click here for the solution\n\n```python\n# grouping results\ndf_gptest2 = df[['body-style','price']]\ngrouped_test_bodystyle = df_gptest2.groupby(['body-style'],as_index= False).mean()\ngrouped_test_bodystyle\n\n```\n\n
      \n", "metadata": {} }, { "cell_type": "markdown", "source": "If you did not import \"pyplot\", let's do it again.\n", "metadata": {} }, { "cell_type": "code", "source": "import matplotlib.pyplot as plt\n%matplotlib inline ", "metadata": { "trusted": true }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": "

      Variables: Drive Wheels and Body Style vs. Price

      \n", "metadata": {} }, { "cell_type": "markdown", "source": "Let's use a heat map to visualize the relationship between Body Style vs Price.\n", "metadata": {} }, { "cell_type": "code", "source": "#use the grouped results\nplt.pcolor(grouped_pivot, cmap='RdBu')\nplt.colorbar()\nplt.show()", "metadata": { "trusted": true }, "execution_count": 40, "outputs": [ { "output_type": "display_data", "data": { "text/plain": "", "image/png": "" }, "metadata": {} }, { "output_type": "display_data", "data": { "text/plain": "
      " }, "metadata": {} } ] }, { "cell_type": "markdown", "source": "

      The heatmap plots the target variable (price) proportional to colour with respect to the variables 'drive-wheel' and 'body-style' on the vertical and horizontal axis, respectively. This allows us to visualize how the price is related to 'drive-wheel' and 'body-style'.

      \n\n

      The default labels convey no useful information to us. Let's change that:

      \n", "metadata": {} }, { "cell_type": "code", "source": "fig, ax = plt.subplots()\nim = ax.pcolor(grouped_pivot, cmap='RdBu')\n\n#label names\nrow_labels = grouped_pivot.columns.levels[1]\ncol_labels = grouped_pivot.index\n\n#move ticks and labels to the center\nax.set_xticks(np.arange(grouped_pivot.shape[1]) + 0.5, minor=False)\nax.set_yticks(np.arange(grouped_pivot.shape[0]) + 0.5, minor=False)\n\n#insert labels\nax.set_xticklabels(row_labels, minor=False)\nax.set_yticklabels(col_labels, minor=False)\n\n#rotate label if too long\nplt.xticks(rotation=90)\n\nfig.colorbar(im)\nplt.show()", "metadata": { "trusted": true }, "execution_count": 41, "outputs": [ { "output_type": "display_data", "data": { "text/plain": "", "image/png": "" }, "metadata": {} }, { "output_type": "display_data", "data": { "text/plain": "
      " }, "metadata": {} } ] }, { "cell_type": "markdown", "source": "

      Visualization is very important in data science, and Python visualization packages provide great freedom. We will go more in-depth in a separate Python visualizations course.

      \n\n

      The main question we want to answer in this module is, \"What are the main characteristics which have the most impact on the car price?\".

      \n\n

      To get a better measure of the important characteristics, we look at the correlation of these variables with the car price. In other words: how is the car price dependent on this variable?

      \n", "metadata": {} }, { "cell_type": "markdown", "source": "

      5. Correlation and Causation

      \n", "metadata": {} }, { "cell_type": "markdown", "source": "

      Correlation: a measure of the extent of interdependence between variables.

      \n\n

      Causation: the relationship between cause and effect between two variables.

      \n\n

      It is important to know the difference between these two. Correlation does not imply causation. Determining correlation is much simpler the determining causation as causation may require independent experimentation.

      \n", "metadata": {} }, { "cell_type": "markdown", "source": "

      Pearson Correlation

      \n

      The Pearson Correlation measures the linear dependence between two variables X and Y.

      \n

      The resulting coefficient is a value between -1 and 1 inclusive, where:

      \n
        \n
      • 1: Perfect positive linear correlation.
      • \n
      • 0: No linear correlation, the two variables most likely do not affect each other.
      • \n
      • -1: Perfect negative linear correlation.
      • \n
      \n", "metadata": {} }, { "cell_type": "markdown", "source": "

      Pearson Correlation is the default method of the function \"corr\". Like before, we can calculate the Pearson Correlation of the of the 'int64' or 'float64' variables.

      \n", "metadata": {} }, { "cell_type": "code", "source": "df.corr()", "metadata": { "trusted": true }, "execution_count": 42, "outputs": [ { "execution_count": 42, "output_type": "execute_result", "data": { "text/plain": " symboling normalized-losses wheel-base length \\\nsymboling 1.000000 0.466264 -0.535987 -0.365404 \nnormalized-losses 0.466264 1.000000 -0.056661 0.019424 \nwheel-base -0.535987 -0.056661 1.000000 0.876024 \nlength -0.365404 0.019424 0.876024 1.000000 \nwidth -0.242423 0.086802 0.814507 0.857170 \nheight -0.550160 -0.373737 0.590742 0.492063 \ncurb-weight -0.233118 0.099404 0.782097 0.880665 \nengine-size -0.110581 0.112360 0.572027 0.685025 \nbore -0.140019 -0.029862 0.493244 0.608971 \nstroke -0.008245 0.055563 0.158502 0.124139 \ncompression-ratio -0.182196 -0.114713 0.250313 0.159733 \nhorsepower 0.075819 0.217299 0.371147 0.579821 \npeak-rpm 0.279740 0.239543 -0.360305 -0.285970 \ncity-mpg -0.035527 -0.225016 -0.470606 -0.665192 \nhighway-mpg 0.036233 -0.181877 -0.543304 -0.698142 \nprice -0.082391 0.133999 0.584642 0.690628 \ncity-L/100km 0.066171 0.238567 0.476153 0.657373 \ndiesel -0.196735 -0.101546 0.307237 0.211187 \ngas 0.196735 0.101546 -0.307237 -0.211187 \n\n width height curb-weight engine-size bore \\\nsymboling -0.242423 -0.550160 -0.233118 -0.110581 -0.140019 \nnormalized-losses 0.086802 -0.373737 0.099404 0.112360 -0.029862 \nwheel-base 0.814507 0.590742 0.782097 0.572027 0.493244 \nlength 0.857170 0.492063 0.880665 0.685025 0.608971 \nwidth 1.000000 0.306002 0.866201 0.729436 0.544885 \nheight 0.306002 1.000000 0.307581 0.074694 0.180449 \ncurb-weight 0.866201 0.307581 1.000000 0.849072 0.644060 \nengine-size 0.729436 0.074694 0.849072 1.000000 0.572609 \nbore 0.544885 0.180449 0.644060 0.572609 1.000000 \nstroke 0.188829 -0.062704 0.167562 0.209523 -0.055390 \ncompression-ratio 0.189867 0.259737 0.156433 0.028889 0.001263 \nhorsepower 0.615077 -0.087027 0.757976 0.822676 0.566936 \npeak-rpm -0.245800 -0.309974 -0.279361 -0.256733 -0.267392 \ncity-mpg -0.633531 -0.049800 -0.749543 -0.650546 -0.582027 \nhighway-mpg -0.680635 -0.104812 -0.794889 -0.679571 -0.591309 \nprice 0.751265 0.135486 0.834415 0.872335 0.543155 \ncity-L/100km 0.673363 0.003811 0.785353 0.745059 0.554610 \ndiesel 0.244356 0.281578 0.221046 0.070779 0.054458 \ngas -0.244356 -0.281578 -0.221046 -0.070779 -0.054458 \n\n stroke compression-ratio horsepower peak-rpm \\\nsymboling -0.008245 -0.182196 0.075819 0.279740 \nnormalized-losses 0.055563 -0.114713 0.217299 0.239543 \nwheel-base 0.158502 0.250313 0.371147 -0.360305 \nlength 0.124139 0.159733 0.579821 -0.285970 \nwidth 0.188829 0.189867 0.615077 -0.245800 \nheight -0.062704 0.259737 -0.087027 -0.309974 \ncurb-weight 0.167562 0.156433 0.757976 -0.279361 \nengine-size 0.209523 0.028889 0.822676 -0.256733 \nbore -0.055390 0.001263 0.566936 -0.267392 \nstroke 1.000000 0.187923 0.098462 -0.065713 \ncompression-ratio 0.187923 1.000000 -0.214514 -0.435780 \nhorsepower 0.098462 -0.214514 1.000000 0.107885 \npeak-rpm -0.065713 -0.435780 0.107885 1.000000 \ncity-mpg -0.034696 0.331425 -0.822214 -0.115413 \nhighway-mpg -0.035201 0.268465 -0.804575 -0.058598 \nprice 0.082310 0.071107 0.809575 -0.101616 \ncity-L/100km 0.037300 -0.299372 0.889488 0.115830 \ndiesel 0.241303 0.985231 -0.169053 -0.475812 \ngas -0.241303 -0.985231 0.169053 0.475812 \n\n city-mpg highway-mpg price city-L/100km diesel \\\nsymboling -0.035527 0.036233 -0.082391 0.066171 -0.196735 \nnormalized-losses -0.225016 -0.181877 0.133999 0.238567 -0.101546 \nwheel-base -0.470606 -0.543304 0.584642 0.476153 0.307237 \nlength -0.665192 -0.698142 0.690628 0.657373 0.211187 \nwidth -0.633531 -0.680635 0.751265 0.673363 0.244356 \nheight -0.049800 -0.104812 0.135486 0.003811 0.281578 \ncurb-weight -0.749543 -0.794889 0.834415 0.785353 0.221046 \nengine-size -0.650546 -0.679571 0.872335 0.745059 0.070779 \nbore -0.582027 -0.591309 0.543155 0.554610 0.054458 \nstroke -0.034696 -0.035201 0.082310 0.037300 0.241303 \ncompression-ratio 0.331425 0.268465 0.071107 -0.299372 0.985231 \nhorsepower -0.822214 -0.804575 0.809575 0.889488 -0.169053 \npeak-rpm -0.115413 -0.058598 -0.101616 0.115830 -0.475812 \ncity-mpg 1.000000 0.972044 -0.686571 -0.949713 0.265676 \nhighway-mpg 0.972044 1.000000 -0.704692 -0.930028 0.198690 \nprice -0.686571 -0.704692 1.000000 0.789898 0.110326 \ncity-L/100km -0.949713 -0.930028 0.789898 1.000000 -0.241282 \ndiesel 0.265676 0.198690 0.110326 -0.241282 1.000000 \ngas -0.265676 -0.198690 -0.110326 0.241282 -1.000000 \n\n gas \nsymboling 0.196735 \nnormalized-losses 0.101546 \nwheel-base -0.307237 \nlength -0.211187 \nwidth -0.244356 \nheight -0.281578 \ncurb-weight -0.221046 \nengine-size -0.070779 \nbore -0.054458 \nstroke -0.241303 \ncompression-ratio -0.985231 \nhorsepower 0.169053 \npeak-rpm 0.475812 \ncity-mpg -0.265676 \nhighway-mpg -0.198690 \nprice -0.110326 \ncity-L/100km 0.241282 \ndiesel -1.000000 \ngas 1.000000 ", "text/html": "
      \n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
      symbolingnormalized-losseswheel-baselengthwidthheightcurb-weightengine-sizeborestrokecompression-ratiohorsepowerpeak-rpmcity-mpghighway-mpgpricecity-L/100kmdieselgas
      symboling1.0000000.466264-0.535987-0.365404-0.242423-0.550160-0.233118-0.110581-0.140019-0.008245-0.1821960.0758190.279740-0.0355270.036233-0.0823910.066171-0.1967350.196735
      normalized-losses0.4662641.000000-0.0566610.0194240.086802-0.3737370.0994040.112360-0.0298620.055563-0.1147130.2172990.239543-0.225016-0.1818770.1339990.238567-0.1015460.101546
      wheel-base-0.535987-0.0566611.0000000.8760240.8145070.5907420.7820970.5720270.4932440.1585020.2503130.371147-0.360305-0.470606-0.5433040.5846420.4761530.307237-0.307237
      length-0.3654040.0194240.8760241.0000000.8571700.4920630.8806650.6850250.6089710.1241390.1597330.579821-0.285970-0.665192-0.6981420.6906280.6573730.211187-0.211187
      width-0.2424230.0868020.8145070.8571701.0000000.3060020.8662010.7294360.5448850.1888290.1898670.615077-0.245800-0.633531-0.6806350.7512650.6733630.244356-0.244356
      height-0.550160-0.3737370.5907420.4920630.3060021.0000000.3075810.0746940.180449-0.0627040.259737-0.087027-0.309974-0.049800-0.1048120.1354860.0038110.281578-0.281578
      curb-weight-0.2331180.0994040.7820970.8806650.8662010.3075811.0000000.8490720.6440600.1675620.1564330.757976-0.279361-0.749543-0.7948890.8344150.7853530.221046-0.221046
      engine-size-0.1105810.1123600.5720270.6850250.7294360.0746940.8490721.0000000.5726090.2095230.0288890.822676-0.256733-0.650546-0.6795710.8723350.7450590.070779-0.070779
      bore-0.140019-0.0298620.4932440.6089710.5448850.1804490.6440600.5726091.000000-0.0553900.0012630.566936-0.267392-0.582027-0.5913090.5431550.5546100.054458-0.054458
      stroke-0.0082450.0555630.1585020.1241390.188829-0.0627040.1675620.209523-0.0553901.0000000.1879230.098462-0.065713-0.034696-0.0352010.0823100.0373000.241303-0.241303
      compression-ratio-0.182196-0.1147130.2503130.1597330.1898670.2597370.1564330.0288890.0012630.1879231.000000-0.214514-0.4357800.3314250.2684650.071107-0.2993720.985231-0.985231
      horsepower0.0758190.2172990.3711470.5798210.615077-0.0870270.7579760.8226760.5669360.098462-0.2145141.0000000.107885-0.822214-0.8045750.8095750.889488-0.1690530.169053
      peak-rpm0.2797400.239543-0.360305-0.285970-0.245800-0.309974-0.279361-0.256733-0.267392-0.065713-0.4357800.1078851.000000-0.115413-0.058598-0.1016160.115830-0.4758120.475812
      city-mpg-0.035527-0.225016-0.470606-0.665192-0.633531-0.049800-0.749543-0.650546-0.582027-0.0346960.331425-0.822214-0.1154131.0000000.972044-0.686571-0.9497130.265676-0.265676
      highway-mpg0.036233-0.181877-0.543304-0.698142-0.680635-0.104812-0.794889-0.679571-0.591309-0.0352010.268465-0.804575-0.0585980.9720441.000000-0.704692-0.9300280.198690-0.198690
      price-0.0823910.1339990.5846420.6906280.7512650.1354860.8344150.8723350.5431550.0823100.0711070.809575-0.101616-0.686571-0.7046921.0000000.7898980.110326-0.110326
      city-L/100km0.0661710.2385670.4761530.6573730.6733630.0038110.7853530.7450590.5546100.037300-0.2993720.8894880.115830-0.949713-0.9300280.7898981.000000-0.2412820.241282
      diesel-0.196735-0.1015460.3072370.2111870.2443560.2815780.2210460.0707790.0544580.2413030.985231-0.169053-0.4758120.2656760.1986900.110326-0.2412821.000000-1.000000
      gas0.1967350.101546-0.307237-0.211187-0.244356-0.281578-0.221046-0.070779-0.054458-0.241303-0.9852310.1690530.475812-0.265676-0.198690-0.1103260.241282-1.0000001.000000
      \n
      " }, "metadata": {} } ] }, { "cell_type": "markdown", "source": "Sometimes we would like to know the significant of the correlation estimate.\n", "metadata": {} }, { "cell_type": "markdown", "source": "P-value\n\n

      What is this P-value? The P-value is the probability value that the correlation between these two variables is statistically significant. Normally, we choose a significance level of 0.05, which means that we are 95% confident that the correlation between the variables is significant.

      \n\nBy convention, when the\n\n
        \n
      • p-value is $<$ 0.001: we say there is strong evidence that the correlation is significant.
      • \n
      • the p-value is $<$ 0.05: there is moderate evidence that the correlation is significant.
      • \n
      • the p-value is $<$ 0.1: there is weak evidence that the correlation is significant.
      • \n
      • the p-value is $>$ 0.1: there is no evidence that the correlation is significant.
      • \n
      \n", "metadata": {} }, { "cell_type": "markdown", "source": "We can obtain this information using \"stats\" module in the \"scipy\" library.\n", "metadata": {} }, { "cell_type": "code", "source": "from scipy import stats", "metadata": { "trusted": true }, "execution_count": 43, "outputs": [] }, { "cell_type": "markdown", "source": "

      Wheel-Base vs. Price

      \n", "metadata": {} }, { "cell_type": "markdown", "source": "Let's calculate the Pearson Correlation Coefficient and P-value of 'wheel-base' and 'price'.\n", "metadata": {} }, { "cell_type": "code", "source": "pearson_coef, p_value = stats.pearsonr(df['wheel-base'], df['price'])\nprint(\"The Pearson Correlation Coefficient is\", pearson_coef, \" with a P-value of P =\", p_value) ", "metadata": { "trusted": true }, "execution_count": 44, "outputs": [ { "name": "stdout", "text": "The Pearson Correlation Coefficient is 0.5846418222655085 with a P-value of P = 8.076488270732243e-20\n", "output_type": "stream" } ] }, { "cell_type": "markdown", "source": "

      Conclusion:

      \n

      Since the p-value is $<$ 0.001, the correlation between wheel-base and price is statistically significant, although the linear relationship isn't extremely strong (~0.585).

      \n", "metadata": {} }, { "cell_type": "markdown", "source": "

      Horsepower vs. Price

      \n", "metadata": {} }, { "cell_type": "markdown", "source": "Let's calculate the Pearson Correlation Coefficient and P-value of 'horsepower' and 'price'.\n", "metadata": {} }, { "cell_type": "code", "source": "pearson_coef, p_value = stats.pearsonr(df['horsepower'], df['price'])\nprint(\"The Pearson Correlation Coefficient is\", pearson_coef, \" with a P-value of P = \", p_value) ", "metadata": { "trusted": true }, "execution_count": 45, "outputs": [ { "name": "stdout", "text": "The Pearson Correlation Coefficient is 0.8095745670036559 with a P-value of P = 6.369057428260101e-48\n", "output_type": "stream" } ] }, { "cell_type": "markdown", "source": "

      Conclusion:

      \n\n

      Since the p-value is $<$ 0.001, the correlation between horsepower and price is statistically significant, and the linear relationship is quite strong (~0.809, close to 1).

      \n", "metadata": {} }, { "cell_type": "markdown", "source": "

      Length vs. Price

      \n\nLet's calculate the Pearson Correlation Coefficient and P-value of 'length' and 'price'.\n", "metadata": {} }, { "cell_type": "code", "source": "pearson_coef, p_value = stats.pearsonr(df['length'], df['price'])\nprint(\"The Pearson Correlation Coefficient is\", pearson_coef, \" with a P-value of P = \", p_value) ", "metadata": { "trusted": true }, "execution_count": 46, "outputs": [ { "name": "stdout", "text": "The Pearson Correlation Coefficient is 0.6906283804483643 with a P-value of P = 8.01647746615853e-30\n", "output_type": "stream" } ] }, { "cell_type": "markdown", "source": "

      Conclusion:

      \n

      Since the p-value is $<$ 0.001, the correlation between length and price is statistically significant, and the linear relationship is moderately strong (~0.691).

      \n", "metadata": {} }, { "cell_type": "markdown", "source": "

      Width vs. Price

      \n", "metadata": {} }, { "cell_type": "markdown", "source": "Let's calculate the Pearson Correlation Coefficient and P-value of 'width' and 'price':\n", "metadata": {} }, { "cell_type": "code", "source": "pearson_coef, p_value = stats.pearsonr(df['width'], df['price'])\nprint(\"The Pearson Correlation Coefficient is\", pearson_coef, \" with a P-value of P =\", p_value ) ", "metadata": { "trusted": true }, "execution_count": 47, "outputs": [ { "name": "stdout", "text": "The Pearson Correlation Coefficient is 0.7512653440522666 with a P-value of P = 9.200335510483739e-38\n", "output_type": "stream" } ] }, { "cell_type": "markdown", "source": "#### Conclusion:\n\nSince the p-value is < 0.001, the correlation between width and price is statistically significant, and the linear relationship is quite strong (\\~0.751).\n", "metadata": {} }, { "cell_type": "markdown", "source": "### Curb-Weight vs. Price\n", "metadata": {} }, { "cell_type": "markdown", "source": "Let's calculate the Pearson Correlation Coefficient and P-value of 'curb-weight' and 'price':\n", "metadata": {} }, { "cell_type": "code", "source": "pearson_coef, p_value = stats.pearsonr(df['curb-weight'], df['price'])\nprint( \"The Pearson Correlation Coefficient is\", pearson_coef, \" with a P-value of P = \", p_value) ", "metadata": { "trusted": true }, "execution_count": 48, "outputs": [ { "name": "stdout", "text": "The Pearson Correlation Coefficient is 0.8344145257702845 with a P-value of P = 2.189577238893816e-53\n", "output_type": "stream" } ] }, { "cell_type": "markdown", "source": "

      Conclusion:

      \n

      Since the p-value is $<$ 0.001, the correlation between curb-weight and price is statistically significant, and the linear relationship is quite strong (~0.834).

      \n", "metadata": {} }, { "cell_type": "markdown", "source": "

      Engine-Size vs. Price

      \n\nLet's calculate the Pearson Correlation Coefficient and P-value of 'engine-size' and 'price':\n", "metadata": {} }, { "cell_type": "code", "source": "pearson_coef, p_value = stats.pearsonr(df['engine-size'], df['price'])\nprint(\"The Pearson Correlation Coefficient is\", pearson_coef, \" with a P-value of P =\", p_value) ", "metadata": { "trusted": true }, "execution_count": 49, "outputs": [ { "name": "stdout", "text": "The Pearson Correlation Coefficient is 0.8723351674455188 with a P-value of P = 9.265491622196808e-64\n", "output_type": "stream" } ] }, { "cell_type": "markdown", "source": "

      Conclusion:

      \n\n

      Since the p-value is $<$ 0.001, the correlation between engine-size and price is statistically significant, and the linear relationship is very strong (~0.872).

      \n", "metadata": {} }, { "cell_type": "markdown", "source": "

      Bore vs. Price

      \n", "metadata": {} }, { "cell_type": "markdown", "source": "Let's calculate the Pearson Correlation Coefficient and P-value of 'bore' and 'price':\n", "metadata": {} }, { "cell_type": "code", "source": "pearson_coef, p_value = stats.pearsonr(df['bore'], df['price'])\nprint(\"The Pearson Correlation Coefficient is\", pearson_coef, \" with a P-value of P = \", p_value ) ", "metadata": { "trusted": true }, "execution_count": 50, "outputs": [ { "name": "stdout", "text": "The Pearson Correlation Coefficient is 0.54315538326266 with a P-value of P = 8.049189483935489e-17\n", "output_type": "stream" } ] }, { "cell_type": "markdown", "source": "

      Conclusion:

      \n

      Since the p-value is $<$ 0.001, the correlation between bore and price is statistically significant, but the linear relationship is only moderate (~0.521).

      \n", "metadata": {} }, { "cell_type": "markdown", "source": "We can relate the process for each 'city-mpg' and 'highway-mpg':\n", "metadata": {} }, { "cell_type": "markdown", "source": "

      City-mpg vs. Price

      \n", "metadata": {} }, { "cell_type": "code", "source": "pearson_coef, p_value = stats.pearsonr(df['city-mpg'], df['price'])\nprint(\"The Pearson Correlation Coefficient is\", pearson_coef, \" with a P-value of P = \", p_value) ", "metadata": { "trusted": true }, "execution_count": 51, "outputs": [ { "name": "stdout", "text": "The Pearson Correlation Coefficient is -0.6865710067844684 with a P-value of P = 2.3211320655672453e-29\n", "output_type": "stream" } ] }, { "cell_type": "markdown", "source": "

      Conclusion:

      \n

      Since the p-value is $<$ 0.001, the correlation between city-mpg and price is statistically significant, and the coefficient of about -0.687 shows that the relationship is negative and moderately strong.

      \n", "metadata": {} }, { "cell_type": "markdown", "source": "

      Highway-mpg vs. Price

      \n", "metadata": {} }, { "cell_type": "code", "source": "pearson_coef, p_value = stats.pearsonr(df['highway-mpg'], df['price'])\nprint( \"The Pearson Correlation Coefficient is\", pearson_coef, \" with a P-value of P = \", p_value ) ", "metadata": { "trusted": true }, "execution_count": 52, "outputs": [ { "name": "stdout", "text": "The Pearson Correlation Coefficient is -0.7046922650589534 with a P-value of P = 1.749547114447437e-31\n", "output_type": "stream" } ] }, { "cell_type": "markdown", "source": "#### Conclusion:\n\nSince the p-value is < 0.001, the correlation between highway-mpg and price is statistically significant, and the coefficient of about -0.705 shows that the relationship is negative and moderately strong.\n", "metadata": {} }, { "cell_type": "markdown", "source": "

      6. ANOVA

      \n", "metadata": {} }, { "cell_type": "markdown", "source": "

      ANOVA: Analysis of Variance

      \n

      The Analysis of Variance (ANOVA) is a statistical method used to test whether there are significant differences between the means of two or more groups. ANOVA returns two parameters:

      \n\n

      F-test score: ANOVA assumes the means of all groups are the same, calculates how much the actual means deviate from the assumption, and reports it as the F-test score. A larger score means there is a larger difference between the means.

      \n\n

      P-value: P-value tells how statistically significant our calculated score value is.

      \n\n

      If our price variable is strongly correlated with the variable we are analyzing, we expect ANOVA to return a sizeable F-test score and a small p-value.

      \n", "metadata": {} }, { "cell_type": "markdown", "source": "

      Drive Wheels

      \n", "metadata": {} }, { "cell_type": "markdown", "source": "

      Since ANOVA analyzes the difference between different groups of the same variable, the groupby function will come in handy. Because the ANOVA algorithm averages the data automatically, we do not need to take the average before hand.

      \n\n

      To see if different types of 'drive-wheels' impact 'price', we group the data.

      \n", "metadata": {} }, { "cell_type": "code", "source": "grouped_test2=df_gptest[['drive-wheels', 'price']].groupby(['drive-wheels'])\ngrouped_test2.head(2)", "metadata": { "trusted": true }, "execution_count": 53, "outputs": [ { "execution_count": 53, "output_type": "execute_result", "data": { "text/plain": " drive-wheels price\n0 rwd 13495.0\n1 rwd 16500.0\n3 fwd 13950.0\n4 4wd 17450.0\n5 fwd 15250.0\n136 4wd 7603.0", "text/html": "
      \n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
      drive-wheelsprice
      0rwd13495.0
      1rwd16500.0
      3fwd13950.0
      44wd17450.0
      5fwd15250.0
      1364wd7603.0
      \n
      " }, "metadata": {} } ] }, { "cell_type": "code", "source": "df_gptest", "metadata": { "trusted": true }, "execution_count": 54, "outputs": [ { "execution_count": 54, "output_type": "execute_result", "data": { "text/plain": " drive-wheels body-style price\n0 rwd convertible 13495.0\n1 rwd convertible 16500.0\n2 rwd hatchback 16500.0\n3 fwd sedan 13950.0\n4 4wd sedan 17450.0\n.. ... ... ...\n196 rwd sedan 16845.0\n197 rwd sedan 19045.0\n198 rwd sedan 21485.0\n199 rwd sedan 22470.0\n200 rwd sedan 22625.0\n\n[201 rows x 3 columns]", "text/html": "
      \n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
      drive-wheelsbody-styleprice
      0rwdconvertible13495.0
      1rwdconvertible16500.0
      2rwdhatchback16500.0
      3fwdsedan13950.0
      44wdsedan17450.0
      ............
      196rwdsedan16845.0
      197rwdsedan19045.0
      198rwdsedan21485.0
      199rwdsedan22470.0
      200rwdsedan22625.0
      \n

      201 rows × 3 columns

      \n
      " }, "metadata": {} } ] }, { "cell_type": "markdown", "source": "We can obtain the values of the method group using the method \"get_group\".\n", "metadata": {} }, { "cell_type": "code", "source": "grouped_test2.get_group('4wd')['price']", "metadata": { "trusted": true }, "execution_count": 55, "outputs": [ { "execution_count": 55, "output_type": "execute_result", "data": { "text/plain": "4 17450.0\n136 7603.0\n140 9233.0\n141 11259.0\n144 8013.0\n145 11694.0\n150 7898.0\n151 8778.0\nName: price, dtype: float64" }, "metadata": {} } ] }, { "cell_type": "markdown", "source": "We can use the function 'f_oneway' in the module 'stats' to obtain the F-test score and P-value.\n", "metadata": {} }, { "cell_type": "code", "source": "# ANOVA\nf_val, p_val = stats.f_oneway(grouped_test2.get_group('fwd')['price'], grouped_test2.get_group('rwd')['price'], grouped_test2.get_group('4wd')['price']) \n \nprint( \"ANOVA results: F=\", f_val, \", P =\", p_val) ", "metadata": { "trusted": true }, "execution_count": 56, "outputs": [ { "name": "stdout", "text": "ANOVA results: F= 67.95406500780399 , P = 3.3945443577151245e-23\n", "output_type": "stream" } ] }, { "cell_type": "markdown", "source": "This is a great result with a large F-test score showing a strong correlation and a P-value of almost 0 implying almost certain statistical significance. But does this mean all three tested groups are all this highly correlated?\n\nLet's examine them separately.\n", "metadata": {} }, { "cell_type": "markdown", "source": "#### fwd and rwd\n", "metadata": {} }, { "cell_type": "code", "source": "f_val, p_val = stats.f_oneway(grouped_test2.get_group('fwd')['price'], grouped_test2.get_group('rwd')['price']) \n \nprint( \"ANOVA results: F=\", f_val, \", P =\", p_val )", "metadata": { "trusted": true }, "execution_count": 57, "outputs": [ { "name": "stdout", "text": "ANOVA results: F= 130.5533160959111 , P = 2.2355306355677845e-23\n", "output_type": "stream" } ] }, { "cell_type": "markdown", "source": "Let's examine the other groups.\n", "metadata": {} }, { "cell_type": "markdown", "source": "#### 4wd and rwd\n", "metadata": {} }, { "cell_type": "code", "source": "f_val, p_val = stats.f_oneway(grouped_test2.get_group('4wd')['price'], grouped_test2.get_group('rwd')['price']) \n \nprint( \"ANOVA results: F=\", f_val, \", P =\", p_val) ", "metadata": { "scrolled": true, "trusted": true }, "execution_count": 58, "outputs": [ { "name": "stdout", "text": "ANOVA results: F= 8.580681368924756 , P = 0.004411492211225333\n", "output_type": "stream" } ] }, { "cell_type": "markdown", "source": "

      4wd and fwd

      \n", "metadata": {} }, { "cell_type": "code", "source": "f_val, p_val = stats.f_oneway(grouped_test2.get_group('4wd')['price'], grouped_test2.get_group('fwd')['price']) \n \nprint(\"ANOVA results: F=\", f_val, \", P =\", p_val) ", "metadata": { "trusted": true }, "execution_count": 59, "outputs": [ { "name": "stdout", "text": "ANOVA results: F= 0.665465750252303 , P = 0.41620116697845655\n", "output_type": "stream" } ] }, { "cell_type": "markdown", "source": "

      Conclusion: Important Variables

      \n", "metadata": {} }, { "cell_type": "markdown", "source": "

      We now have a better idea of what our data looks like and which variables are important to take into account when predicting the car price. We have narrowed it down to the following variables:

      \n\nContinuous numerical variables:\n\n
        \n
      • Length
      • \n
      • Width
      • \n
      • Curb-weight
      • \n
      • Engine-size
      • \n
      • Horsepower
      • \n
      • City-mpg
      • \n
      • Highway-mpg
      • \n
      • Wheel-base
      • \n
      • Bore
      • \n
      \n\nCategorical variables:\n\n
        \n
      • Drive-wheels
      • \n
      \n\n

      As we now move into building machine learning models to automate our analysis, feeding the model with variables that meaningfully affect our target variable will improve our model's prediction performance.

      \n", "metadata": {} }, { "cell_type": "markdown", "source": "### Thank you for completing this lab!\n\n## Author\n\nJoseph Santarcangelo\n\n### Other Contributors\n\nMahdi Noorian PhD\n\nBahare Talayian\n\nEric Xiao\n\nSteven Dong\n\nParizad\n\nHima Vasudevan\n\nFiorella Wenver\n\nYi Yao.\n\n## Change Log\n\n| Date (YYYY-MM-DD) | Version | Changed By | Change Description |\n| ----------------- | ------- | ---------- | ---------------------------------- |\n| 2020-10-30 | 2.1 | Lakshmi | changed URL of csv |\n| 2020-08-27 | 2.0 | Lavanya | Moved lab to course repo in GitLab |\n\n
      \n\n##

      © IBM Corporation 2020. All rights reserved.

      \n", "metadata": {} }, { "cell_type": "code", "source": "", "metadata": {}, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": "", "metadata": {}, "execution_count": null, "outputs": [] } ] }