{ "metadata": { "anaconda-cloud": {}, "kernelspec": { "name": "python", "display_name": "Pyolite", "language": "python" }, "language_info": { "codemirror_mode": { "name": "python", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8" }, "metadata": { "interpreter": { "hash": "ac2eaa0ea0ebeafcc7822e65e46aa9d4f966f30b695406963e145ea4a91cd4fc" } } }, "nbformat_minor": 4, "nbformat": 4, "cells": [ { "cell_type": "markdown", "source": "
\n \"cognitiveclass.ai\n
\n\n# Model Evaluation and Refinement\n\nEstimated time needed: **30** minutes\n\n## Objectives\n\nAfter completing this lab you will be able to:\n\n* Evaluate and refine prediction models\n", "metadata": {} }, { "cell_type": "markdown", "source": "

Table of Contents

\n\n", "metadata": {} }, { "cell_type": "markdown", "source": "

Setup

\n", "metadata": {} }, { "cell_type": "markdown", "source": "you are running the lab in your browser, so we will install the libraries using `piplite`\n", "metadata": {} }, { "cell_type": "code", "source": "#you are running the lab in your browser, so we will install the libraries using ``piplite``\nimport piplite\nawait piplite.install(['pandas'])\nawait piplite.install(['matplotlib'])\nawait piplite.install(['scipy'])\nawait piplite.install(['seaborn'])\nawait piplite.install(['ipywidgets'])\nawait piplite.install(['tqdm'])", "metadata": { "trusted": true }, "execution_count": 1, "outputs": [] }, { "cell_type": "markdown", "source": "If you run the lab locally using Anaconda, you can load the correct library and versions by uncommenting the following:\n", "metadata": {} }, { "cell_type": "code", "source": "#install specific version of libraries used in lab\n#! mamba install pandas==1.3.3 -y\n#! mamba install numpy=1.21.2 -y\n#! mamba install sklearn=0.20.1 -y\n#! mamba install ipywidgets=7.4.2 -y\n#! mamba install tqdm", "metadata": { "trusted": true }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": "import pandas as pd\nimport numpy as np", "metadata": { "trusted": true }, "execution_count": 2, "outputs": [ { "name": "stderr", "text": "/lib/python3.9/site-packages/pandas/compat/__init__.py:124: UserWarning: Could not import the lzma module. Your installed Python is incomplete. Attempting to use lzma compression will result in a RuntimeError.\n warnings.warn(msg)\n", "output_type": "stream" } ] }, { "cell_type": "markdown", "source": "This function will download the dataset into your browser\n", "metadata": {} }, { "cell_type": "code", "source": "#This function will download the dataset into your browser \n\nfrom pyodide.http import pyfetch\n\nasync def download(url, filename):\n response = await pyfetch(url)\n if response.status == 200:\n with open(filename, \"wb\") as f:\n f.write(await response.bytes())", "metadata": { "trusted": true }, "execution_count": 3, "outputs": [] }, { "cell_type": "code", "source": "import pandas as pd\nimport numpy as np\n", "metadata": { "trusted": true }, "execution_count": 4, "outputs": [] }, { "cell_type": "markdown", "source": "This dataset was hosted on IBM Cloud object. Click HERE for free storage.\n", "metadata": {} }, { "cell_type": "code", "source": "path = 'https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DA0101EN-SkillsNetwork/labs/Data%20files/module_5_auto.csv'", "metadata": { "trusted": true }, "execution_count": 5, "outputs": [] }, { "cell_type": "markdown", "source": "you will need to download the dataset; if you are running locally, please comment out the following\n", "metadata": {} }, { "cell_type": "code", "source": "#you will need to download the dataset; if you are running locally, please comment out the following \nawait download(path, \"auto.csv\")\npath=\"auto.csv\"", "metadata": { "trusted": true }, "execution_count": 6, "outputs": [] }, { "cell_type": "code", "source": "\ndf = pd.read_csv(path)", "metadata": { "trusted": true }, "execution_count": 7, "outputs": [] }, { "cell_type": "code", "source": "df.to_csv('module_5_auto.csv')", "metadata": { "trusted": true }, "execution_count": 8, "outputs": [] }, { "cell_type": "markdown", "source": "First, let's only use numeric data:\n", "metadata": {} }, { "cell_type": "code", "source": "df=df._get_numeric_data()\ndf.head()", "metadata": { "trusted": true }, "execution_count": 9, "outputs": [ { "execution_count": 9, "output_type": "execute_result", "data": { "text/plain": " Unnamed: 0 Unnamed: 0.1 symboling normalized-losses wheel-base \\\n0 0 0 3 122 88.6 \n1 1 1 3 122 88.6 \n2 2 2 1 122 94.5 \n3 3 3 2 164 99.8 \n4 4 4 2 164 99.4 \n\n length width height curb-weight engine-size ... stroke \\\n0 0.811148 0.890278 48.8 2548 130 ... 2.68 \n1 0.811148 0.890278 48.8 2548 130 ... 2.68 \n2 0.822681 0.909722 52.4 2823 152 ... 3.47 \n3 0.848630 0.919444 54.3 2337 109 ... 3.40 \n4 0.848630 0.922222 54.3 2824 136 ... 3.40 \n\n compression-ratio horsepower peak-rpm city-mpg highway-mpg price \\\n0 9.0 111.0 5000.0 21 27 13495.0 \n1 9.0 111.0 5000.0 21 27 16500.0 \n2 9.0 154.0 5000.0 19 26 16500.0 \n3 10.0 102.0 5500.0 24 30 13950.0 \n4 8.0 115.0 5500.0 18 22 17450.0 \n\n city-L/100km diesel gas \n0 11.190476 0 1 \n1 11.190476 0 1 \n2 12.368421 0 1 \n3 9.791667 0 1 \n4 13.055556 0 1 \n\n[5 rows x 21 columns]", "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
Unnamed: 0Unnamed: 0.1symbolingnormalized-losseswheel-baselengthwidthheightcurb-weightengine-size...strokecompression-ratiohorsepowerpeak-rpmcity-mpghighway-mpgpricecity-L/100kmdieselgas
000312288.60.8111480.89027848.82548130...2.689.0111.05000.0212713495.011.19047601
111312288.60.8111480.89027848.82548130...2.689.0111.05000.0212716500.011.19047601
222112294.50.8226810.90972252.42823152...3.479.0154.05000.0192616500.012.36842101
333216499.80.8486300.91944454.32337109...3.4010.0102.05500.0243013950.09.79166701
444216499.40.8486300.92222254.32824136...3.408.0115.05500.0182217450.013.05555601
\n

5 rows × 21 columns

\n
" }, "metadata": {} } ] }, { "cell_type": "markdown", "source": "Libraries for plotting:\n", "metadata": {} }, { "cell_type": "code", "source": "from ipywidgets import interact, interactive, fixed, interact_manual", "metadata": { "trusted": true }, "execution_count": 10, "outputs": [] }, { "cell_type": "markdown", "source": "

Functions for Plotting

\n", "metadata": {} }, { "cell_type": "code", "source": "def DistributionPlot(RedFunction, BlueFunction, RedName, BlueName, Title):\n width = 12\n height = 10\n plt.figure(figsize=(width, height))\n\n ax1 = sns.distplot(RedFunction, hist=False, color=\"r\", label=RedName)\n ax2 = sns.distplot(BlueFunction, hist=False, color=\"b\", label=BlueName, ax=ax1)\n\n plt.title(Title)\n plt.xlabel('Price (in dollars)')\n plt.ylabel('Proportion of Cars')\n\n plt.show()\n plt.close()", "metadata": { "trusted": true }, "execution_count": 11, "outputs": [] }, { "cell_type": "code", "source": "def PollyPlot(xtrain, xtest, y_train, y_test, lr,poly_transform):\n width = 12\n height = 10\n plt.figure(figsize=(width, height))\n \n \n #training data \n #testing data \n # lr: linear regression object \n #poly_transform: polynomial transformation object \n \n xmax=max([xtrain.values.max(), xtest.values.max()])\n\n xmin=min([xtrain.values.min(), xtest.values.min()])\n\n x=np.arange(xmin, xmax, 0.1)\n\n\n plt.plot(xtrain, y_train, 'ro', label='Training Data')\n plt.plot(xtest, y_test, 'go', label='Test Data')\n plt.plot(x, lr.predict(poly_transform.fit_transform(x.reshape(-1, 1))), label='Predicted Function')\n plt.ylim([-10000, 60000])\n plt.ylabel('Price')\n plt.legend()", "metadata": { "trusted": true }, "execution_count": 12, "outputs": [] }, { "cell_type": "markdown", "source": "

Part 1: Training and Testing

\n\n

An important step in testing your model is to split your data into training and testing data. We will place the target data price in a separate dataframe y_data:

\n", "metadata": {} }, { "cell_type": "code", "source": "y_data = df['price']", "metadata": { "trusted": true }, "execution_count": 13, "outputs": [] }, { "cell_type": "markdown", "source": "Drop price data in dataframe **x_data**:\n", "metadata": {} }, { "cell_type": "code", "source": "x_data=df.drop('price',axis=1)", "metadata": { "trusted": true }, "execution_count": 14, "outputs": [] }, { "cell_type": "markdown", "source": "Now, we randomly split our data into training and testing data using the function train_test_split.\n", "metadata": {} }, { "cell_type": "code", "source": "from sklearn.model_selection import train_test_split\n\n\nx_train, x_test, y_train, y_test = train_test_split(x_data, y_data, test_size=0.10, random_state=1)\n\n\nprint(\"number of test samples :\", x_test.shape[0])\nprint(\"number of training samples:\",x_train.shape[0])\n", "metadata": { "trusted": true }, "execution_count": 15, "outputs": [ { "name": "stdout", "text": "number of test samples : 21\nnumber of training samples: 180\n", "output_type": "stream" } ] }, { "cell_type": "markdown", "source": "The test_size parameter sets the proportion of data that is split into the testing set. In the above, the testing set is 10% of the total dataset.\n", "metadata": {} }, { "cell_type": "markdown", "source": "
\n

Question #1):

\n\nUse the function \"train_test_split\" to split up the dataset such that 40% of the data samples will be utilized for testing. Set the parameter \"random_state\" equal to zero. The output of the function should be the following: \"x_train1\" , \"x_test1\", \"y_train1\" and \"y_test1\".\n\n
\n", "metadata": {} }, { "cell_type": "code", "source": "# Write your code below and press Shift+Enter to execute \nx_train1, x_test1, y_train1, y_test1 = train_test_split(x_data, y_data, test_size=0.40, random_state=0)\nprint(\"number of test samples :\", x_test1.shape[0])\nprint(\"number of training samples:\",x_train1.shape[0])", "metadata": { "trusted": true }, "execution_count": 17, "outputs": [ { "name": "stdout", "text": "number of test samples : 81\nnumber of training samples: 120\n", "output_type": "stream" } ] }, { "cell_type": "markdown", "source": "
Click here for the solution\n\n```python\nx_train1, x_test1, y_train1, y_test1 = train_test_split(x_data, y_data, test_size=0.4, random_state=0) \nprint(\"number of test samples :\", x_test1.shape[0])\nprint(\"number of training samples:\",x_train1.shape[0])\n```\n\n
\n", "metadata": {} }, { "cell_type": "markdown", "source": "Let's import LinearRegression from the module linear_model.\n", "metadata": {} }, { "cell_type": "code", "source": "from sklearn.linear_model import LinearRegression", "metadata": { "trusted": true }, "execution_count": 18, "outputs": [] }, { "cell_type": "markdown", "source": "We create a Linear Regression object:\n", "metadata": {} }, { "cell_type": "code", "source": "lre=LinearRegression()", "metadata": { "trusted": true }, "execution_count": 19, "outputs": [] }, { "cell_type": "markdown", "source": "We fit the model using the feature \"horsepower\":\n", "metadata": {} }, { "cell_type": "code", "source": "lre.fit(x_train[['horsepower']], y_train)", "metadata": { "trusted": true }, "execution_count": 20, "outputs": [ { "execution_count": 20, "output_type": "execute_result", "data": { "text/plain": "LinearRegression()" }, "metadata": {} } ] }, { "cell_type": "markdown", "source": "Let's calculate the R^2 on the test data:\n", "metadata": {} }, { "cell_type": "code", "source": "lre.score(x_test[['horsepower']], y_test)", "metadata": { "trusted": true }, "execution_count": 21, "outputs": [ { "execution_count": 21, "output_type": "execute_result", "data": { "text/plain": "0.3635875575078824" }, "metadata": {} } ] }, { "cell_type": "markdown", "source": "We can see the R^2 is much smaller using the test data compared to the training data.\n", "metadata": {} }, { "cell_type": "code", "source": "lre.score(x_train[['horsepower']], y_train)", "metadata": { "trusted": true }, "execution_count": 22, "outputs": [ { "execution_count": 22, "output_type": "execute_result", "data": { "text/plain": "0.6619724197515103" }, "metadata": {} } ] }, { "cell_type": "markdown", "source": "
\n

Question #2):

\n \nFind the R^2 on the test data using 40% of the dataset for testing.\n\n
\n", "metadata": {} }, { "cell_type": "code", "source": "# Write your code below and press Shift+Enter to execute \nx_train1, x_test1, y_train1, y_test1 = train_test_split(x_data, y_data, test_size=0.40, random_state=0)\nlre.fit(x_train1[['horsepower']], y_train1)\nlre.score(x_test1[['horsepower']], y_test1)", "metadata": { "trusted": true }, "execution_count": 24, "outputs": [ { "execution_count": 24, "output_type": "execute_result", "data": { "text/plain": "0.7139364665406973" }, "metadata": {} } ] }, { "cell_type": "markdown", "source": "
Click here for the solution\n\n```python\nx_train1, x_test1, y_train1, y_test1 = train_test_split(x_data, y_data, test_size=0.4, random_state=0)\nlre.fit(x_train1[['horsepower']],y_train1)\nlre.score(x_test1[['horsepower']],y_test1)\n\n```\n\n
\n", "metadata": {} }, { "cell_type": "markdown", "source": "**Sometimes you do not have sufficient testing data**; as a result, you may want to perform **cross-validation**. Let's go over several methods that you can use for cross-validation.\n", "metadata": {} }, { "cell_type": "markdown", "source": "

Cross-Validation Score

\n", "metadata": {} }, { "cell_type": "markdown", "source": "Let's import model_selection from the module cross_val_score.\n", "metadata": {} }, { "cell_type": "code", "source": "from sklearn.model_selection import cross_val_score", "metadata": { "trusted": true }, "execution_count": 25, "outputs": [] }, { "cell_type": "markdown", "source": "We input the object, the feature (\"horsepower\"), and the target data (y_data). The parameter 'cv' determines the number of folds. In this case, it is 4.\n", "metadata": {} }, { "cell_type": "code", "source": "Rcross = cross_val_score(lre, x_data[['horsepower']], y_data, cv=4)", "metadata": { "trusted": true }, "execution_count": 26, "outputs": [] }, { "cell_type": "markdown", "source": "The default scoring is R^2. Each element in the array has the average R^2 value for the fold:\n", "metadata": {} }, { "cell_type": "code", "source": "Rcross", "metadata": { "trusted": true }, "execution_count": 27, "outputs": [ { "execution_count": 27, "output_type": "execute_result", "data": { "text/plain": "array([0.7746232 , 0.51716687, 0.74785353, 0.04839605])" }, "metadata": {} } ] }, { "cell_type": "markdown", "source": "We can calculate the average and standard deviation of our estimate:\n", "metadata": {} }, { "cell_type": "code", "source": "print(\"The mean of the folds are\", Rcross.mean(), \"and the standard deviation is\" , Rcross.std())", "metadata": { "trusted": true }, "execution_count": 28, "outputs": [ { "name": "stdout", "text": "The mean of the folds are 0.5220099150421197 and the standard deviation is 0.29118394447560203\n", "output_type": "stream" } ] }, { "cell_type": "markdown", "source": "We can use negative squared error as a score by setting the parameter 'scoring' metric to 'neg_mean_squared_error'.\n", "metadata": {} }, { "cell_type": "code", "source": "-1 * cross_val_score(lre,x_data[['horsepower']], y_data,cv=4,scoring='neg_mean_squared_error')", "metadata": { "trusted": true }, "execution_count": 29, "outputs": [ { "execution_count": 29, "output_type": "execute_result", "data": { "text/plain": "array([20254142.84026702, 43745493.26505171, 12539630.34014929,\n 17561927.72247586])" }, "metadata": {} } ] }, { "cell_type": "markdown", "source": "
\n

Question #3):

\n \nCalculate the average R^2 using two folds, then find the average R^2 for the second fold utilizing the \"horsepower\" feature: \n\n
\n", "metadata": {} }, { "cell_type": "code", "source": "# Write your code below and press Shift+Enter to execute \nRcross1 = cross_val_score(lre, x_data[['horsepower']], y_data, cv=2)\nprint(Rcross1)\nprint(\"The mean of the folds are\", Rcross1.mean(), \"and the standard deviation is\" , Rcross1.std())", "metadata": { "trusted": true }, "execution_count": 30, "outputs": [ { "name": "stdout", "text": "[0.59015621 0.44319613]\nThe mean of the folds are 0.5166761697127429 and the standard deviation is 0.07348004195771385\n", "output_type": "stream" } ] }, { "cell_type": "markdown", "source": "
Click here for the solution\n\n```python\nRc=cross_val_score(lre,x_data[['horsepower']], y_data,cv=2)\nRc.mean()\n\n```\n\n
\n", "metadata": {} }, { "cell_type": "markdown", "source": "**You can also use the function 'cross_val_predict' to predict the output**. The function splits up the data into the specified number of folds, with one fold for testing and the other folds are used for training. First, import the function:\n", "metadata": {} }, { "cell_type": "code", "source": "from sklearn.model_selection import cross_val_predict", "metadata": { "trusted": true }, "execution_count": 31, "outputs": [] }, { "cell_type": "markdown", "source": "We input the object, the feature \"horsepower\", and the target data y_data. The parameter 'cv' determines the number of folds. In this case, it is 4. We can produce an output:\n", "metadata": {} }, { "cell_type": "code", "source": "yhat = cross_val_predict(lre,x_data[['horsepower']], y_data,cv=4)\nyhat[0:5]", "metadata": { "trusted": true }, "execution_count": 32, "outputs": [ { "execution_count": 32, "output_type": "execute_result", "data": { "text/plain": "array([14141.63807508, 14141.63807508, 20814.29423473, 12745.03562306,\n 14762.35027598])" }, "metadata": {} } ] }, { "cell_type": "markdown", "source": "

Part 2: Overfitting, Underfitting and Model Selection

\n\n

It turns out that the test data, sometimes referred to as the \"out of sample data\", is a much better measure of how well your model performs in the real world. One reason for this is overfitting.\n\nLet's go over some examples. It turns out these differences are more apparent in Multiple Linear Regression and Polynomial Regression so we will explore overfitting in that context.

\n", "metadata": {} }, { "cell_type": "markdown", "source": "Let's create Multiple Linear Regression objects and train the model using 'horsepower', 'curb-weight', 'engine-size' and 'highway-mpg' as features.\n", "metadata": {} }, { "cell_type": "code", "source": "lr = LinearRegression()\nlr.fit(x_train[['horsepower', 'curb-weight', 'engine-size', 'highway-mpg']], y_train)", "metadata": { "trusted": true }, "execution_count": 33, "outputs": [ { "execution_count": 33, "output_type": "execute_result", "data": { "text/plain": "LinearRegression()" }, "metadata": {} } ] }, { "cell_type": "markdown", "source": "Prediction using training data:\n", "metadata": {} }, { "cell_type": "code", "source": "yhat_train = lr.predict(x_train[['horsepower', 'curb-weight', 'engine-size', 'highway-mpg']])\nyhat_train[0:5]", "metadata": { "trusted": true }, "execution_count": 34, "outputs": [ { "execution_count": 34, "output_type": "execute_result", "data": { "text/plain": "array([ 7426.6731551 , 28323.75090803, 14213.38819709, 4052.34146983,\n 34500.19124244])" }, "metadata": {} } ] }, { "cell_type": "markdown", "source": "Prediction using test data:\n", "metadata": {} }, { "cell_type": "code", "source": "yhat_test = lr.predict(x_test[['horsepower', 'curb-weight', 'engine-size', 'highway-mpg']])\nyhat_test[0:5]", "metadata": { "trusted": true }, "execution_count": 35, "outputs": [ { "execution_count": 35, "output_type": "execute_result", "data": { "text/plain": "array([11349.35089149, 5884.11059106, 11208.6928275 , 6641.07786278,\n 15565.79920282])" }, "metadata": {} } ] }, { "cell_type": "markdown", "source": "Let's perform some model evaluation using our training and testing data separately. First, we import the seaborn and matplotlib library for plotting.\n", "metadata": {} }, { "cell_type": "code", "source": "import matplotlib.pyplot as plt\n%matplotlib inline\nimport seaborn as sns", "metadata": { "trusted": true }, "execution_count": 36, "outputs": [] }, { "cell_type": "markdown", "source": "Let's examine the distribution of the predicted values of the training data.\n", "metadata": {} }, { "cell_type": "code", "source": "Title = 'Distribution Plot of Predicted Value Using Training Data vs Training Data Distribution'\nDistributionPlot(y_train, yhat_train, \"Actual Values (Train)\", \"Predicted Values (Train)\", Title)", "metadata": { "trusted": true }, "execution_count": 37, "outputs": [ { "name": "stderr", "text": "/lib/python3.9/site-packages/seaborn/distributions.py:2619: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `kdeplot` (an axes-level function for kernel density plots).\n warnings.warn(msg, FutureWarning)\n/lib/python3.9/site-packages/seaborn/distributions.py:2619: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `kdeplot` (an axes-level function for kernel density plots).\n warnings.warn(msg, FutureWarning)\n", "output_type": "stream" }, { "output_type": "display_data", "data": { "text/plain": "", "image/png": "" }, "metadata": {} } ] }, { "cell_type": "markdown", "source": "Figure 1: Plot of predicted values using the training data compared to the actual values of the training data.\n", "metadata": {} }, { "cell_type": "markdown", "source": "So far, the model seems to be doing well in learning from the training dataset. But what happens when the model encounters new data from the testing dataset? When the model generates new values from the test data, we see the distribution of the predicted values is much different from the actual target values.\n", "metadata": {} }, { "cell_type": "code", "source": "Title='Distribution Plot of Predicted Value Using Test Data vs Data Distribution of Test Data'\nDistributionPlot(y_test,yhat_test,\"Actual Values (Test)\",\"Predicted Values (Test)\",Title)", "metadata": { "trusted": true }, "execution_count": 38, "outputs": [ { "name": "stderr", "text": "/lib/python3.9/site-packages/seaborn/distributions.py:2619: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `kdeplot` (an axes-level function for kernel density plots).\n warnings.warn(msg, FutureWarning)\n/lib/python3.9/site-packages/seaborn/distributions.py:2619: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `kdeplot` (an axes-level function for kernel density plots).\n warnings.warn(msg, FutureWarning)\n", "output_type": "stream" }, { "output_type": "display_data", "data": { "text/plain": "", "image/png": "" }, "metadata": {} } ] }, { "cell_type": "markdown", "source": "Figure 2: Plot of predicted value using the test data compared to the actual values of the test data.\n", "metadata": {} }, { "cell_type": "markdown", "source": "

Comparing Figure 1 and Figure 2, it is evident that the distribution of the test data in Figure 1 is much better at fitting the data. This difference in Figure 2 is apparent in the range of 5000 to 15,000. This is where the shape of the distribution is extremely different. Let's see if polynomial regression also exhibits a drop in the prediction accuracy when analysing the test dataset.

\n", "metadata": {} }, { "cell_type": "code", "source": "from sklearn.preprocessing import PolynomialFeatures", "metadata": { "trusted": true }, "execution_count": 39, "outputs": [] }, { "cell_type": "markdown", "source": "

Overfitting

\n

Overfitting occurs when the model fits the noise, but not the underlying process. Therefore, when testing your model using the test set, your model does not perform as well since it is modelling noise, not the underlying process that generated the relationship. Let's create a degree 5 polynomial model.

\n", "metadata": {} }, { "cell_type": "markdown", "source": "Let's use 55 percent of the data for training and the rest for testing:\n", "metadata": {} }, { "cell_type": "code", "source": "x_train, x_test, y_train, y_test = train_test_split(x_data, y_data, test_size=0.45, random_state=0)", "metadata": { "trusted": true }, "execution_count": 40, "outputs": [] }, { "cell_type": "markdown", "source": "We will perform a degree 5 polynomial transformation on the feature 'horsepower'.\n", "metadata": {} }, { "cell_type": "code", "source": "pr = PolynomialFeatures(degree=5)\nx_train_pr = pr.fit_transform(x_train[['horsepower']])\nx_test_pr = pr.fit_transform(x_test[['horsepower']])\npr", "metadata": { "trusted": true }, "execution_count": 41, "outputs": [ { "execution_count": 41, "output_type": "execute_result", "data": { "text/plain": "PolynomialFeatures(degree=5)" }, "metadata": {} } ] }, { "cell_type": "markdown", "source": "Now, let's create a Linear Regression model \"poly\" and train it.\n", "metadata": {} }, { "cell_type": "code", "source": "poly = LinearRegression()\npoly.fit(x_train_pr, y_train)", "metadata": { "trusted": true }, "execution_count": 42, "outputs": [ { "execution_count": 42, "output_type": "execute_result", "data": { "text/plain": "LinearRegression()" }, "metadata": {} } ] }, { "cell_type": "markdown", "source": "We can see the output of our model using the method \"predict.\" We assign the values to \"yhat\".\n", "metadata": {} }, { "cell_type": "code", "source": "yhat = poly.predict(x_test_pr)\nyhat[0:5]", "metadata": { "trusted": true }, "execution_count": 43, "outputs": [ { "execution_count": 43, "output_type": "execute_result", "data": { "text/plain": "array([ 6728.58641321, 7307.91998787, 12213.73753589, 18893.37919224,\n 19996.10612156])" }, "metadata": {} } ] }, { "cell_type": "markdown", "source": "Let's take the first five predicted values and compare it to the actual targets.\n", "metadata": {} }, { "cell_type": "code", "source": "print(\"Predicted values:\", yhat[0:4])\nprint(\"True values:\", y_test[0:4].values)", "metadata": { "trusted": true }, "execution_count": 44, "outputs": [ { "name": "stdout", "text": "Predicted values: [ 6728.58641321 7307.91998787 12213.73753589 18893.37919224]\nTrue values: [ 6295. 10698. 13860. 13499.]\n", "output_type": "stream" } ] }, { "cell_type": "markdown", "source": "We will use the function \"PollyPlot\" that we defined at the beginning of the lab to display the training data, testing data, and the predicted function.\n", "metadata": {} }, { "cell_type": "code", "source": "PollyPlot(x_train[['horsepower']], x_test[['horsepower']], y_train, y_test, poly,pr)", "metadata": { "trusted": true }, "execution_count": 45, "outputs": [ { "output_type": "display_data", "data": { "text/plain": "
", "image/png": "\n" }, "metadata": { "needs_background": "light" } } ] }, { "cell_type": "markdown", "source": "Figure 3: A polynomial regression model where red dots represent training data, green dots represent test data, and the blue line represents the model prediction.\n", "metadata": {} }, { "cell_type": "markdown", "source": "We see that the estimated function appears to track the data but around 200 horsepower, the function begins to diverge from the data points.\n", "metadata": {} }, { "cell_type": "markdown", "source": "R^2 of the training data:\n", "metadata": {} }, { "cell_type": "code", "source": "poly.score(x_train_pr, y_train)", "metadata": { "trusted": true }, "execution_count": 46, "outputs": [ { "execution_count": 46, "output_type": "execute_result", "data": { "text/plain": "0.5567716897754004" }, "metadata": {} } ] }, { "cell_type": "markdown", "source": "R^2 of the test data:\n", "metadata": {} }, { "cell_type": "code", "source": "poly.score(x_test_pr, y_test)", "metadata": { "trusted": true }, "execution_count": 47, "outputs": [ { "execution_count": 47, "output_type": "execute_result", "data": { "text/plain": "-29.87099623387278" }, "metadata": {} } ] }, { "cell_type": "markdown", "source": "We see the R^2 for the training data is 0.5567 while the R^2 on the test data was -29.87. The lower the R^2, the worse the model. A negative R^2 is a sign of overfitting.\n", "metadata": {} }, { "cell_type": "markdown", "source": "Let's see how the R^2 changes on the test data for different order polynomials and then plot the results:\n", "metadata": {} }, { "cell_type": "code", "source": "Rsqu_test = []\n\norder = [1, 2, 3, 4]\nfor n in order:\n pr = PolynomialFeatures(degree=n)\n \n x_train_pr = pr.fit_transform(x_train[['horsepower']])\n \n x_test_pr = pr.fit_transform(x_test[['horsepower']]) \n \n lr.fit(x_train_pr, y_train)\n \n Rsqu_test.append(lr.score(x_test_pr, y_test))\n\nplt.plot(order, Rsqu_test)\nplt.xlabel('order')\nplt.ylabel('R^2')\nplt.title('R^2 Using Test Data')\nplt.text(3, 0.75, 'Maximum R^2 ') ", "metadata": { "trusted": true }, "execution_count": 48, "outputs": [ { "execution_count": 48, "output_type": "execute_result", "data": { "text/plain": "Text(3, 0.75, 'Maximum R^2 ')" }, "metadata": {} }, { "output_type": "display_data", "data": { "text/plain": "
", "image/png": "\n" }, "metadata": { "needs_background": "light" } } ] }, { "cell_type": "markdown", "source": "We see the R^2 gradually increases until an order three polynomial is used. Then, the R^2 dramatically decreases at an order four polynomial.\n", "metadata": {} }, { "cell_type": "markdown", "source": "The following function will be used in the next section. Please run the cell below.\n", "metadata": {} }, { "cell_type": "code", "source": "def f(order, test_data):\n x_train, x_test, y_train, y_test = train_test_split(x_data, y_data, test_size=test_data, random_state=0)\n pr = PolynomialFeatures(degree=order)\n x_train_pr = pr.fit_transform(x_train[['horsepower']])\n x_test_pr = pr.fit_transform(x_test[['horsepower']])\n poly = LinearRegression()\n poly.fit(x_train_pr,y_train)\n PollyPlot(x_train[['horsepower']], x_test[['horsepower']], y_train,y_test, poly, pr)", "metadata": { "trusted": true }, "execution_count": 49, "outputs": [] }, { "cell_type": "markdown", "source": "The following interface allows you to experiment with different polynomial orders and different amounts of data.\n", "metadata": {} }, { "cell_type": "code", "source": "interact(f, order=(0, 6, 1), test_data=(0.05, 0.95, 0.05))", "metadata": { "trusted": true }, "execution_count": 50, "outputs": [ { "execution_count": 50, "output_type": "execute_result", "data": { "text/plain": "" }, "metadata": {} }, { "output_type": "display_data", "data": { "text/plain": "
", "image/png": "\n" }, "metadata": { "needs_background": "light" } } ] }, { "cell_type": "markdown", "source": "
\n

Question #4a):

\n\nWe can perform polynomial transformations with more than one feature. Create a \"PolynomialFeatures\" object \"pr1\" of degree two.\n\n
\n", "metadata": {} }, { "cell_type": "code", "source": "# Write your code below and press Shift+Enter to execute \npr1 = PolynomialFeatures(degree=2)", "metadata": { "trusted": true }, "execution_count": 51, "outputs": [] }, { "cell_type": "markdown", "source": "
Click here for the solution\n\n```python\npr1=PolynomialFeatures(degree=2)\n\n```\n\n
\n", "metadata": {} }, { "cell_type": "markdown", "source": "
\n

Question #4b):

\n\n \n Transform the training and testing samples for the features 'horsepower', 'curb-weight', 'engine-size' and 'highway-mpg'. Hint: use the method \"fit_transform\".\n
\n", "metadata": {} }, { "cell_type": "code", "source": "# Write your code below and press Shift+Enter to execute \nx_train_pr1 = pr1.fit_transform(x_train[['horsepower', 'curb-weight', 'engine-size', 'highway-mpg']])\nx_test_pr1 = pr1.fit_transform(x_test[['horsepower', 'curb-weight', 'engine-size', 'highway-mpg']])", "metadata": { "trusted": true }, "execution_count": 52, "outputs": [] }, { "cell_type": "markdown", "source": "
Click here for the solution\n\n```python\nx_train_pr1=pr1.fit_transform(x_train[['horsepower', 'curb-weight', 'engine-size', 'highway-mpg']])\n\nx_test_pr1=pr1.fit_transform(x_test[['horsepower', 'curb-weight', 'engine-size', 'highway-mpg']])\n\n\n```\n\n
\n", "metadata": {} }, { "cell_type": "markdown", "source": "\n", "metadata": {} }, { "cell_type": "markdown", "source": "
\n

Question #4c):

\n \nHow many dimensions does the new feature have? Hint: use the attribute \"shape\".\n\n
\n", "metadata": {} }, { "cell_type": "code", "source": "# Write your code below and press Shift+Enter to execute \nx_train_pr1.shape", "metadata": { "trusted": true }, "execution_count": 53, "outputs": [ { "execution_count": 53, "output_type": "execute_result", "data": { "text/plain": "(110, 15)" }, "metadata": {} } ] }, { "cell_type": "markdown", "source": "
Click here for the solution\n\n```python\nx_train_pr1.shape #there are now 15 features\n\n\n```\n\n
\n", "metadata": {} }, { "cell_type": "markdown", "source": "
\n

Question #4d):

\n\n \nCreate a linear regression model \"poly1\". Train the object using the method \"fit\" using the polynomial features.\n
\n", "metadata": {} }, { "cell_type": "code", "source": "# Write your code below and press Shift+Enter to execute \n\npoly1 = LinearRegression()\npoly1.fit(x_train_pr1,y_train)", "metadata": { "trusted": true }, "execution_count": 54, "outputs": [ { "execution_count": 54, "output_type": "execute_result", "data": { "text/plain": "LinearRegression()" }, "metadata": {} } ] }, { "cell_type": "markdown", "source": "
Click here for the solution\n\n```python\npoly1=LinearRegression().fit(x_train_pr1,y_train)\n\n\n```\n\n
\n", "metadata": {} }, { "cell_type": "markdown", "source": "
\n

Question #4e):

\nUse the method \"predict\" to predict an output on the polynomial features, then use the function \"DistributionPlot\" to display the distribution of the predicted test output vs. the actual test data.\n
\n", "metadata": {} }, { "cell_type": "code", "source": "# Write your code below and press Shift+Enter to execute \nyhat_test1 = poly1.predict(x_test_pr1)\nDistributionPlot(y_test, yhat_test1, \"Actual Values (Test)\", \"Predicted Values (Test)\", Title)", "metadata": { "trusted": true }, "execution_count": 56, "outputs": [ { "name": "stderr", "text": "/lib/python3.9/site-packages/seaborn/distributions.py:2619: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `kdeplot` (an axes-level function for kernel density plots).\n warnings.warn(msg, FutureWarning)\n/lib/python3.9/site-packages/seaborn/distributions.py:2619: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `kdeplot` (an axes-level function for kernel density plots).\n warnings.warn(msg, FutureWarning)\n", "output_type": "stream" }, { "output_type": "display_data", "data": { "text/plain": "", "image/png": "" }, "metadata": {} } ] }, { "cell_type": "markdown", "source": "
Click here for the solution\n\n```python\nyhat_test1=poly1.predict(x_test_pr1)\n\nTitle='Distribution Plot of Predicted Value Using Test Data vs Data Distribution of Test Data'\n\nDistributionPlot(y_test, yhat_test1, \"Actual Values (Test)\", \"Predicted Values (Test)\", Title)\n\n```\n\n
\n", "metadata": {} }, { "cell_type": "markdown", "source": "
\n

Question #4f):

\n\nUsing the distribution plot above, describe (in words) the two regions where the predicted prices are less accurate than the actual prices.\n\n
\n", "metadata": {} }, { "cell_type": "markdown", "source": "# Write your code below and press Shift+Enter to execute \nhigh in $10000, low in 30000-40000", "metadata": {} }, { "cell_type": "markdown", "source": "
Click here for the solution\n\n```python\n#The predicted value is higher than actual value for cars where the price $10,000 range, conversely the predicted price is lower than the price cost in the $30,000 to $40,000 range. As such the model is not as accurate in these ranges.\n\n```\n\n
\n", "metadata": {} }, { "cell_type": "markdown", "source": "

Part 3: Ridge Regression

\n", "metadata": {} }, { "cell_type": "markdown", "source": "In this section, we will review Ridge Regression and see how the parameter alpha changes the model. Just a note, here our test data will be used as validation data.\n", "metadata": {} }, { "cell_type": "markdown", "source": "Let's perform a degree two polynomial transformation on our data.\n", "metadata": {} }, { "cell_type": "code", "source": "pr=PolynomialFeatures(degree=2)\nx_train_pr=pr.fit_transform(x_train[['horsepower', 'curb-weight', 'engine-size', 'highway-mpg','normalized-losses','symboling']])\nx_test_pr=pr.fit_transform(x_test[['horsepower', 'curb-weight', 'engine-size', 'highway-mpg','normalized-losses','symboling']])", "metadata": { "trusted": true }, "execution_count": 57, "outputs": [] }, { "cell_type": "markdown", "source": "Let's import Ridge from the module linear models.\n", "metadata": {} }, { "cell_type": "code", "source": "from sklearn.linear_model import Ridge", "metadata": { "trusted": true }, "execution_count": 58, "outputs": [] }, { "cell_type": "markdown", "source": "Let's create a Ridge regression object, setting the regularization parameter (alpha) to 0.1\n", "metadata": {} }, { "cell_type": "code", "source": "RigeModel=Ridge(alpha=1)", "metadata": { "trusted": true }, "execution_count": 59, "outputs": [] }, { "cell_type": "markdown", "source": "Like regular regression, you can fit the model using the method fit.\n", "metadata": {} }, { "cell_type": "code", "source": "RigeModel.fit(x_train_pr, y_train)", "metadata": { "trusted": true }, "execution_count": 60, "outputs": [ { "execution_count": 60, "output_type": "execute_result", "data": { "text/plain": "Ridge(alpha=1)" }, "metadata": {} } ] }, { "cell_type": "markdown", "source": "Similarly, you can obtain a prediction:\n", "metadata": {} }, { "cell_type": "code", "source": "yhat = RigeModel.predict(x_test_pr)", "metadata": { "trusted": true }, "execution_count": 61, "outputs": [] }, { "cell_type": "markdown", "source": "Let's compare the first five predicted samples to our test set:\n", "metadata": {} }, { "cell_type": "code", "source": "print('predicted:', yhat[0:4])\nprint('test set :', y_test[0:4].values)", "metadata": { "trusted": true }, "execution_count": 62, "outputs": [ { "name": "stdout", "text": "predicted: [ 6570.82441941 9636.24891471 20949.92322738 19403.60313255]\ntest set : [ 6295. 10698. 13860. 13499.]\n", "output_type": "stream" } ] }, { "cell_type": "markdown", "source": "We select the value of alpha that minimizes the test error. To do so, we can use a for loop. We have also created a progress bar to see how many iterations we have completed so far.\n", "metadata": {} }, { "cell_type": "code", "source": "from tqdm import tqdm\n\nRsqu_test = []\nRsqu_train = []\ndummy1 = []\nAlpha = 10 * np.array(range(0,1000))\npbar = tqdm(Alpha)\n\nfor alpha in pbar:\n RigeModel = Ridge(alpha=alpha) \n RigeModel.fit(x_train_pr, y_train)\n test_score, train_score = RigeModel.score(x_test_pr, y_test), RigeModel.score(x_train_pr, y_train)\n \n pbar.set_postfix({\"Test Score\": test_score, \"Train Score\": train_score})\n\n Rsqu_test.append(test_score)\n Rsqu_train.append(train_score)", "metadata": { "trusted": true }, "execution_count": 63, "outputs": [ { "name": "stderr", "text": ":7: TqdmMonitorWarning: tqdm:disabling monitor support (monitor_interval = 0) due to:\ncan't start new thread\n pbar = tqdm(Alpha)\n100%|##########| 1000/1000 [00:03<00:00, 301.20it/s, Test Score=0.564, Train Score=0.859]\n", "output_type": "stream" } ] }, { "cell_type": "markdown", "source": "We can plot out the value of R^2 for different alphas:\n", "metadata": {} }, { "cell_type": "code", "source": "width = 12\nheight = 10\nplt.figure(figsize=(width, height))\n\nplt.plot(Alpha,Rsqu_test, label='validation data ')\nplt.plot(Alpha,Rsqu_train, 'r', label='training Data ')\nplt.xlabel('alpha')\nplt.ylabel('R^2')\nplt.legend()", "metadata": { "trusted": true }, "execution_count": 64, "outputs": [ { "execution_count": 64, "output_type": "execute_result", "data": { "text/plain": "" }, "metadata": {} }, { "output_type": "display_data", "data": { "text/plain": "
", "image/png": "iVBORw0KGgoAAAANSUhEUgAAAtcAAAJNCAYAAAD6c1l4AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/Il7ecAAAACXBIWXMAAAsTAAALEwEAmpwYAAA4BElEQVR4nO3de7yddWHn++8v+5KdvXMlBJBrokUJBAXZg3pQC14oeqqoRy1Vp4MzSqVaO9OOR+p0aquv15x2ah111FrqOG0dLadFEdpjvbVYqqMtAUEIiKIgBCwE5BLIdWf/zh9rrZ2VsHNjPSs72Xm/X6/1Wuu5rf1bPgKfPPntZ5VaawAAgN7NmekBAADAbCGuAQCgIeIaAAAaIq4BAKAh4hoAABoirgEAoCGDMz2AJh1++OF1+fLlMz0MAABmseuuu+6BWuuy6bbNqrhevnx5Vq9ePdPDAABgFiul/HhX20wLAQCAhohrAABoiLgGAICGzKo51wAAM2nr1q1Zu3ZtNm3aNNNDoQEjIyM59thjMzQ0tNfHiGsAgIasXbs2CxYsyPLly1NKmenh0INaax588MGsXbs2K1as2OvjTAsBAGjIpk2bsnTpUmE9C5RSsnTp0n3+WwhxDQDQIGE9ezyZcymuAQAOYfPnz0+S3HvvvXnta1877T5nn332Hr9L5EMf+lA2bNgwtfzyl788Dz/8cGPj7OiMd1cefvjhfPzjH2/85+4tcQ0AQI4++uhcfvnlT/r4neP6i1/8YhYvXtzAyPaNuAYAoBGXXHJJPvaxj00t/87v/E4+8IEP5LHHHsuLX/ziPPvZz86pp56aK6+88gnH3nnnnVm1alWSZOPGjbnggguycuXKvPrVr87GjRun9rv44oszPj6eU045Je9973uTJB/5yEdy77335pxzzsk555yTpPXN2Q888ECS5IMf/GBWrVqVVatW5UMf+tDUz1u5cmXe+ta35pRTTsm55567w8/puOOOO/K85z0vp556an7rt35rav2uPtMll1ySH/7whznttNPyrne9a68+e6NqrbPmccYZZ1QAgJlyyy23zOjPv/766+sLX/jCqeWVK1fWu+66q27durU+8sgjtdZa161bV5/2tKfVycnJWmutY2NjtdZa77jjjnrKKafUWmv9wz/8w/rmN7+51lrrjTfeWAcGBuq1115ba631wQcfrLXWOjExUX/2Z3+23njjjbXWWk844YS6bt26qZ/dWV69enVdtWpVfeyxx+r69evrySefXK+//vp6xx131IGBgfqd73yn1lrr6173uvrpT3/6CZ/pFa94Rf2zP/uzWmutH/3oR6fGu6vP1P05drff3prunCZZXXfRo27FBwDQB7/712tyy72PNvqeJx+9MO99xSm73H766afn/vvvz7333pt169ZlyZIlOe6447J169a85z3vyTXXXJM5c+bknnvuyX333Zejjjpq2ve55ppr8s53vjNJ8sxnPjPPfOYzp7b95V/+ZS699NJMTEzkJz/5SW655ZYdtu/sG9/4Rl796ldnbGwsSfKa17wm//iP/5hXvvKVWbFiRU477bQkyRlnnJE777zzCcd/85vfzOc+97kkyb/+1/867373u5O0LhBP95l2tqv9dvXZeyWuAQBmkde97nW5/PLL8y//8i/5hV/4hSTJZz7zmaxbty7XXXddhoaGsnz58if1RTd33HFHPvCBD+Taa6/NkiVLcuGFF/b0hTlz586dej0wMDDttJBk+rt27O1nauqz7y1xDQDQB7u7wtxPv/ALv5C3vvWteeCBB/IP//APSZJHHnkkRxxxRIaGhnL11Vfnxz/+8W7f44UvfGE++9nP5kUvelFuvvnmfPe7302SPProoxkbG8uiRYty33335W//9m9z9tlnJ0kWLFiQ9evX5/DDD9/hvV7wghfkwgsvzCWXXJJaa6644op8+tOf3uvPc9ZZZ+Wyyy7Lm970pnzmM5+ZWr+rz9QZx5726xdxDQAwi5xyyilZv359jjnmmDzlKU9JkrzxjW/MK17xipx66qkZHx/PSSedtNv3uPjii/PmN785K1euzMqVK3PGGWckSZ71rGfl9NNPz0knnZTjjjsuZ5111tQxF110Uc4777wcffTRufrqq6fWP/vZz86FF16YM888M0nylre8Jaeffvq0U0Cm8+EPfzhveMMb8vu///s5//zzp9bv6jMtXbo0Z511VlatWpWXvexlefe7371Pn71XpTUne3YYHx+ve7oHIwBAv9x6661ZuXLlTA+DBk13Tksp19Vax6fb3634AACgIeIaAAAaIq4BAKAh4hoAABoirgEAoCHiGgAAGiKue7VxY3LuucmXvjTTIwEADnEPP/xwPv7xjz+pY1/+8pfn4Ycf3u0+v/3bv52vfe1rT+r9d7Z8+fKceuqpOfXUU3PyySfnt37rt/b4zYm9fL79RVz36q67kq9+NXnVq5K/+7uZHg0AcAjbXXxOTEzs9tgvfvGLWbx48W73ed/73peXvOQlT3Z4T3D11Vfnpptuyj//8z/nRz/6UX75l395t/uL60NB509Yc+Ykr3xl8s1vzux4AIBD1iWXXJIf/vCHOe200/Kud70rX//61/OCF7wgr3zlK3PyyScnSV71qlfljDPOyCmnnJJLL7106tjly5fngQceyJ133pmVK1fmrW99a0455ZSce+652bhxY5LkwgsvzOWXXz61/3vf+948+9nPzqmnnprvfe97SZJ169blpS99aU455ZS85S1vyQknnJAHHnhgt+OeP39+PvGJT+QLX/hCfvrTn+axxx7Li1/84qn3vvLKK6f9fLvabyaJ61514vpjH0uOPTZ5+csT3xIJAMyA3/u938vTnva03HDDDfmDP/iDJMn111+fD3/4w/n+97+fJPnUpz6V6667LqtXr85HPvKRPPjgg094nx/84Ad5+9vfnjVr1mTx4sX53Oc+N+3PO/zww3P99dfn4osvzgc+8IEkye/+7u/mRS96UdasWZPXvva1ueuuu/Zq7AsXLsyKFSvygx/8ICMjI7niiity/fXX5+qrr85v/MZvpNb6hM+3q/1m0uCM/vTZoBPXy5e3poW84AXJS16S/MmfJK973YwODQCYQf/+3yc33NDse552WvKhD+3TIWeeeWZWrFgxtfyRj3wkV1xxRZLk7rvvzg9+8IMsXbp0h2NWrFiR0047LUlyxhln5M4775z2vV/zmtdM7fP5z38+SfKNb3xj6v3PO++8LFmyZK/H2gnjWmve85735JprrsmcOXNyzz335L777pt2/+n2O+qoo/b6ZzbNletetf+aJCMjrSvXX/968oxnJK9/ffKmNyV33z2jwwMADm1jY2NTr7/+9a/na1/7Wr71rW/lxhtvzOmnnz7tLxHOnTt36vXAwMAu52t39tvdPntr/fr1ufPOO/P0pz89n/nMZ7Ju3bpcd911ueGGG3LkkUdOO8693W9/cuW6V50TOG9e6/mEE5JvfCN5//uT3//95PLLk7e8JXnb25JVq2ZunADA/rWPV5ibsGDBgqxfv36X2x955JEsWbIko6Oj+d73vpdvf/vbjY/hrLPOyl/+5V/m3e9+d77yla/koYce2uMxjz32WH7lV34lr3rVq7JkyZI88sgjOeKIIzI0NJSrr746P/7xj5M88fPtar+Z5Mp1rzpxPTKyfd3QUPK+9yXf/37yhje0poicempy1lnJBz+Y/OAHMzNWAGBWW7p0ac4666ysWrUq73rXu56w/bzzzsvExERWrlyZSy65JM997nMbH8N73/vefOUrX8mqVavyV3/1VznqqKOyYMGCafc955xzsmrVqpx55pk5/vjj88d//MdJkje+8Y1ZvXp1Tj311Pz5n/95TjrppGk/3672m0llpid9N2l8fLyu3t+/TPinf5q8+c3JHXe05l1P54EHkj/7s9bjppta6044oRXbz39+csYZycknJ/Pn769RAwB9cOutt2blypUzPYwZtXnz5gwMDGRwcDDf+ta3cvHFF+eGpuee70fTndNSynW11vHp9jctpFfdc6535fDDk9/4jdbjjjuSv/mb5Jprkr//++Szn92+3wkntKaOPOMZrdfdj8WLk1L6+lEAAHp111135fWvf30mJyczPDycP/mTP5npIe1X4rpXO8+53pMVK5Jf/dXWo9bkzjuTG29Mbr45WbOm9fy1ryWbN+943Pz5yVFHJcuWJUcc0XruvD788GTRotZj4cIdXw86xQDA/nPiiSfmO9/5zkwPY8Yor15NN+d6b5XSiu0VK1rf8NhRa7JuXfLjH7ced93Vetx3X2v9HXck//RPrdfbtu3+Z4yObg/t0dEdH/Pm7X7dvHnJ8HDrMXfujs+7ez087Co7AHBIEte96kwLGR5u7j1LaV2RPuKI5F/9q13vNzmZPPxwa073o48mjzzSenS/7l7euDHZsCFZv74V6p3lzmPnq+W9GBraHt1DQ60r6P1+DAy0HnPmHLjPnUcp0z/vblv3sz+8ABywaq0p/j09KzyZ300U173atKl11Xom/iGaMyc57LDWownbtrU+Tye2N25MtmzZ/ti8ed9fb96cTEzs/WPTpn3bv/uxp6v4s83ehHhT+/Tj/Txa53Gmx9DLmPf0+mDb90AbTz/3pW9GRkby4IMPZunSpRHYB7daax588MGM7OPsBHHdq05czwYDA8nYWOtxMKq19di2rXVV/0B9rnX3z03tM9Pvtzf7zpYHHIyaiPY9bT8Y9p3ueXfb9vB87MKFWXvhhVl3zDG7/sNME9Hd63s0Gf4z+Xk6U1n7ZGRkJMcee+w+HSOue7Vp097/MiP91fkX5hy3b2cGPNkon+k/GDzZMXePfU+v7dv8vgfDGA+Gfad73t22vXgeSrLisssafc+D4r1many//uvJO9+ZA4m47tXGjbPnyjXw5HVfDQPgkOUSX69m07QQAAB6Iq57Ja4BAGgT170y5xoAgDZzrnv1+tcfereAAwBgWuK6V29960yPAACAA4RpIQAA0JC+xnUp5bxSym2llNtLKZdMs31RKeWvSyk3llLWlFLe3LXtzlLKTaWUG0opq/s5TgAAaELfpoWUUgaSfCzJS5OsTXJtKeWqWustXbu9PckttdZXlFKWJbmtlPKZWuuW9vZzaq0P9GuMAADQpH5euT4zye211h+1Y/myJOfvtE9NsqCUUpLMT/LTJBN9HBMAAPRNP+P6mCR3dy2vba/r9tEkK5Pcm+SmJL9Wa51sb6tJvlJKua6UclEfxwkAAI2Y6V9o/LkkNyQ5OslpST5aSlnY3vb8Wuuzk7wsydtLKS+c7g1KKReVUlaXUlavW7duPwwZAACm18+4vifJcV3Lx7bXdXtzks/XltuT3JHkpCSptd7Tfr4/yRVpTTN5glrrpbXW8Vrr+LJlyxr+CAAAsPf6GdfXJjmxlLKilDKc5IIkV+20z11JXpwkpZQjkzwjyY9KKWOllAXt9WNJzk1ycx/HCgAAPevb3UJqrROllHck+XKSgSSfqrWuKaW8rb39E0nen+RPSyk3JSlJ3l1rfaCU8tQkV7R+zzGDST5ba/1Sv8YKAABNKLXWmR5DY8bHx+vq1W6JDQBA/5RSrqu1jk+3baZ/oREAAGYNcQ0AAA0R1wAA0BBxDQAADRHXAADQEHENAAANEdcAANAQcQ0AAA0R1wAA0BBxDQAADRHXAADQEHENAAANEdcAANAQcQ0AAA0R1wAA0BBxDQAADRHXAADQEHENAAANEdcAANAQcQ0AAA0R1wAA0BBxDQAADRHXAADQEHENAAANEdcAANAQcQ0AAA0R1wAA0BBxDQAADRHXAADQEHENAAANEdcAANAQcQ0AAA0R1wAA0BBxDQAADRHXAADQEHENAAANEdcAANAQcQ0AAA0R1wAA0BBxDQAADRHXAADQEHENAAANEdcAANAQcQ0AAA0R1wAA0BBxDQAADRHXAADQEHENAAANEdcAANAQcQ0AAA0R1wAA0BBxDQAADRHXAADQkL7GdSnlvFLKbaWU20spl0yzfVEp5a9LKTeWUtaUUt68t8cCAMCBpm9xXUoZSPKxJC9LcnKSXyylnLzTbm9Pckut9VlJzk7yh6WU4b08FgAADij9vHJ9ZpLba60/qrVuSXJZkvN32qcmWVBKKUnmJ/lpkom9PBYAAA4o/YzrY5Lc3bW8tr2u20eTrExyb5KbkvxarXVyL48FAIADykz/QuPPJbkhydFJTkvy0VLKwn15g1LKRaWU1aWU1evWrWt+hAAAsJf6Gdf3JDmua/nY9rpub07y+dpye5I7kpy0l8cmSWqtl9Zax2ut48uWLWts8AAAsK/6GdfXJjmxlLKilDKc5IIkV+20z11JXpwkpZQjkzwjyY/28lgAADigDPbrjWutE6WUdyT5cpKBJJ+qta4ppbytvf0TSd6f5E9LKTclKUneXWt9IEmmO7ZfYwUAgCaUWutMj6Ex4+PjdfXq1TM9DAAAZrFSynW11vHpts30LzQCAMCsIa4BAKAh4hoAABoirgEAoCHiGgAAGiKuAQCgIeIaAAAaIq4BAKAh4hoAABoirgEAoCHiGgAAGiKuAQCgIeIaAAAaIq4BAKAh4hoAABoirgEAoCHiGgAAGiKuAQCgIeIaAAAaIq4BAKAh4hoAABoirgEAoCHiGgAAGiKuAQCgIeIaAAAaIq4BAKAh4hoAABoirgEAoCHiGgAAGiKuAQCgIeIaAAAaIq4BAKAh4hoAABoirgEAoCHiGgAAGiKuAQCgIeIaAAAaIq4BAKAh4hoAABoirgEAoCHiGgAAGiKuAQCgIeIaAAAaIq4BAKAh4hoAABoirgEAoCHiGgAAGiKuAQCgIeIaAAAaIq4BAKAh4hoAABoirgEAoCHiGgAAGiKuAQCgIX2N61LKeaWU20opt5dSLplm+7tKKTe0HzeXUraVUg5rb7uzlHJTe9vqfo4TAACaMNivNy6lDCT5WJKXJlmb5NpSylW11ls6+9Ra/yDJH7T3f0WS/1Br/WnX25xTa32gX2MEAIAm9fPK9ZlJbq+1/qjWuiXJZUnO383+v5jkL/o4HgAA6Kt+xvUxSe7uWl7bXvcEpZTRJOcl+VzX6prkK6WU60opF/VtlAAA0JC+TQvZR69I8s2dpoQ8v9Z6TynliCRfLaV8r9Z6zc4HtsP7oiQ5/vjj989oAQBgGv28cn1PkuO6lo9tr5vOBdlpSkit9Z728/1JrkhrmskT1FovrbWO11rHly1b1vOgAQDgyepnXF+b5MRSyopSynBaAX3VzjuVUhYl+dkkV3atGyulLOi8TnJukpv7OFYAAOhZ36aF1FonSinvSPLlJANJPlVrXVNKeVt7+yfau746yVdqrY93HX5kkitKKZ0xfrbW+qV+jRUAAJpQaq0zPYbGjI+P19Wr3RIbAID+KaVcV2sdn26bb2gEAICGiGsAAGiIuAYAgIaIawAAaIi4BgCAhohrAABoiLgGAICGiGsAAGiIuAYAgIaIawAAaIi4BgCAhohrAABoiLgGAICGiGsAAGiIuAYAgIaIawAAaIi4BgCAhohrAABoiLgGAICGiGsAAGiIuAYAgIaIawAAaIi4BgCAhohrAABoiLgGAICGiGsAAGiIuAYAgIaIawAAaIi4BgCAhohrAABoiLgGAICGiGsAAGiIuAYAgIaIawAAaIi4BgCAhohrAABoiLgGAICGiGsAAGiIuAYAgIaIawAAaIi4BgCAhohrAABoiLgGAICGiGsAAGiIuAYAgIaIawAAaIi4BgCAhohrAABoiLgGAICGiGsAAGiIuAYAgIaIawAAaIi4BgCAhvQ1rksp55VSbiul3F5KuWSa7e8qpdzQftxcStlWSjlsb44FAIADTd/iupQykORjSV6W5OQkv1hKObl7n1rrH9RaT6u1npbkN5P8Q631p3tzLAAAHGj6eeX6zCS311p/VGvdkuSyJOfvZv9fTPIXT/JYAACYcf2M62OS3N21vLa97glKKaNJzkvyuX09FgAADhQHyi80viLJN2utP93XA0spF5VSVpdSVq9bt64PQwMAgL3Tz7i+J8lxXcvHttdN54JsnxKyT8fWWi+ttY7XWseXLVvWw3ABAKA3/Yzra5OcWEpZUUoZTiugr9p5p1LKoiQ/m+TKfT0WAAAOJIP9euNa60Qp5R1JvpxkIMmnaq1rSilva2//RHvXVyf5Sq318T0d26+xAgBAE0qtdabH0Jjx8fG6evXqmR4GAACzWCnlulrr+HTbDpRfaAQAgIOeuAYAgIaIawAAaIi4BgCAhohrAABoiLgGAICGiGsAAGiIuAYAgIaIawAAaIi4BgCAhohrAABoiLgGAICGiGsAAGiIuAYAgIaIawAAaIi4BgCAhohrAABoiLgGAICGiGsAAGiIuAYAgIaIawAAaMhu47qUMlBK+eVSyvtLKWfttO23+js0AAA4uOzpyvUfJ/nZJA8m+Ugp5YNd217Tt1EBAMBBaE9xfWat9Q211g8leU6S+aWUz5dS5iYpfR8dAAAcRPYU18OdF7XWiVrrRUluSPL3Seb3cVwAAHDQ2VNcry6lnNe9otb6viT/M8nyfg0KAAAORruN61rrm2qtX5pm/SdrrUP9GxYAABx89upWfKWUgX4PBAAADnZ7jOtSyoIkV+6HsQAAwEFtT/e5fkqSryW5dP8MBwAADl6De9j+j0neVWu9an8MBgAADmZ7mhbyUJJj9sdAAADgYLenuD47yctKKW/fD2MBAICD2p5uxfd4klcmOX3/DAcAAA5ee5pznVrrtiRv2Q9jAQCAg9pe3ed6Z6WUOaWUNzY9GAAAOJjt6VZ8C0spv1lK+Wgp5dzS8qtJfpTk9ftniAAAcHDY07SQT6d1x5BvpTU15D1JSpJX1Vpv6O/QAADg4LKnuH5qrfXUJCmlfDLJT5IcX2vd1PeRAQDAQWZPc663dl60f7FxrbAGAIDp7enK9bNKKY+2X5ck89rLJUmttS7s6+gAAOAgstu4rrUO7K+BAADAwe5J3YoPAAB4InENAAANEdcAANAQcQ0AAA0R1wAA0BBxDQAADRHXAADQEHENAAANEdcAANAQcQ0AAA0R1wAA0BBxDQAADelrXJdSziul3FZKub2Ucsku9jm7lHJDKWVNKeUfutbfWUq5qb1tdT/HCQAATRjs1xuXUgaSfCzJS5OsTXJtKeWqWustXfssTvLxJOfVWu8qpRyx09ucU2t9oF9jBACAJvXzyvWZSW6vtf6o1rolyWVJzt9pnzck+Xyt9a4kqbXe38fxAABAX/Uzro9JcnfX8tr2um5PT7KklPL1Usp1pZRf6tpWk3ylvf6iPo4TAAAa0bdpIfvw889I8uIk85J8q5Ty7Vrr95M8v9Z6T3uqyFdLKd+rtV6z8xu0w/uiJDn++OP349ABAGBH/bxyfU+S47qWj22v67Y2yZdrrY+351Zfk+RZSVJrvaf9fH+SK9KaZvIEtdZLa63jtdbxZcuWNfwRAABg7/Uzrq9NcmIpZUUpZTjJBUmu2mmfK5M8v5QyWEoZTfKcJLeWUsZKKQuSpJQyluTcJDf3cawAANCzvk0LqbVOlFLekeTLSQaSfKrWuqaU8rb29k/UWm8tpXwpyXeTTCb5ZK315lLKU5NcUUrpjPGztdYv9WusAADQhFJrnekxNGZ8fLyuXu2W2AAA9E8p5bpa6/h023xDIwAANERcAwBAQ8Q1AAA0RFwDAEBDxDUAADREXAMAQEPENQAANERcAwBAQ8Q1AAA0RFwDAEBDxDUAADREXAMAQEPENQAANERcAwBAQ8Q1AAA0RFwDAEBDxDUAADREXAMAQEPENQAANERcAwBAQ8Q1AAA0RFwDAEBDxDUAADREXAMAQEPENQAANERcAwBAQ8Q1AAA0RFwDAEBDxDUAADREXAMAQEPENQAANERcAwBAQ8Q1AAA0RFwDAEBDxDUAADREXAMAQEPENQAANERcAwBAQ8Q1AAA0RFwDAEBDxDUAADREXAMAQEPENQAANERcAwBAQ8Q1AAA0RFwDAEBDxDUAADREXAMAQEPENQAANERcAwBAQ8Q1AAA0RFwDAEBDxDUAADSkr3FdSjmvlHJbKeX2Usolu9jn7FLKDaWUNaWUf9iXYwEA4EAy2K83LqUMJPlYkpcmWZvk2lLKVbXWW7r2WZzk40nOq7XeVUo5Ym+PBQCAA00/r1yfmeT2WuuPaq1bklyW5Pyd9nlDks/XWu9Kklrr/ftwLAAAHFD6GdfHJLm7a3lte123pydZUkr5einlulLKL+3DsQAAcEDp27SQffj5ZyR5cZJ5Sb5VSvn2vrxBKeWiJBclyfHHH9/4AAEAYG/188r1PUmO61o+tr2u29okX661Pl5rfSDJNUmetZfHJklqrZfWWsdrrePLli1rbPAAALCv+hnX1yY5sZSyopQynOSCJFfttM+VSZ5fShkspYwmeU6SW/fyWAAAOKD0bVpIrXWilPKOJF9OMpDkU7XWNaWUt7W3f6LWemsp5UtJvptkMskna603J8l0x/ZrrAAA0IRSa53pMTRmfHy8rl69eqaHAQDALFZKua7WOj7dNt/QCAAADRHXAADQEHENAAANEdcAANAQcQ0AAA0R1wAA0BBxDQAADRHXAADQEHENAAANEdcAANAQcQ0AAA0R1wAA0BBxDQAADRHXAADQEHENAAANEdcAANAQcQ0AAA0R1wAA0BBxDQAADRHXAADQEHENAAANEdcAANAQcQ0AAA0R1wAA0BBxDQAADRHXAADQEHENAAANEdcAANAQcQ0AAA0R1wAA0BBxDQAADRHXAADQEHENAAANEdcAANAQcQ0AAA0R1wAA0BBxDQAADRHXAADQEHENAAANGZzpAQAAwM5qrdk8MZmNW7Zl08S2bNo6mU1bt2Xj1m3ZtHVbNm+dzAlLR/PUZfNneqg7ENcAAOxRrTVbtk1m09bJbJ6K3Fbwbupa3jyxrb1uciqEu/ebWp7Y1g7n1vvtfMzmick9junXX/r0vPPFJ+6HT7/3xDUAwEFsYlsrSDdu3ZZNW7peT63bNrVu45bt6zdumWxfEZ4ugLteT2y/Ylzrkxvj8MCczB2ak5GhgcwbGshI+/XI4EAWjgxm3oK5reWufeZ2lgcHMm94++vWfq3lYxbPa/Z/zAaIawCAPtg2WbtCdseru08M3a4g7orhTVP7Te4QzN37b92278U7MKdkZHBO5g0PZO7gjlE7OjyYw8bmZG53CLejtrX/nKnA3SGUh+Zk7lQID2Ska7+BOaUP/wsfmMQ1AHBIqbVuD9yuyN116E5OH77tucCdfXbevmUvpjXsrJRk3lS0tkK1s7xw3lCOXDi3tdyO4u7tI12v5w1vj+XufTrBPDRQUsqhE7z7k7gGAA4onbm9nVDdsKUVrRu2bMuGLRPTr986kU1TrzvrJ1phvGVih303bt32pMY1MjRn2pAdmzuYpfO3X8V9Yuhuv8r7hOXhHa/+zh2cI3oPcuIaANhn2ybrVOhu6IrW7VG7ff2mra11OwduZ9/u5Y3tON42uW9THYYHWlMcRoe3B+vo8EAWzRvKUxaOZHS4Fbyj7fWd19OH7hODeO7gnMw5hKY28OSJawCYpWqt2yN387Y8vmUiG7ZM5PHN268CP75lWzZsbj3vfIW3dQV44glXiTdu2ZYt2/ZtysOckowOD24P4Hbktub3zm2/Hthh+7zhwe3rh7q3bV/fieTBAV/dwYFBXAPADOtMg+gE8MYt23aI3s7V3cc3t5+3TGTDbgK5s7xhH+/u0Ane7iu/84YHcsSCkSesG20H7hPWt4N4ZCqeW+uHB0x34NAgrgFgH2ybrFNxOxXCT4je7VH8+ObtcbxzIHcfO7EP0yCGB+dkrCtkR+cOZmx4IItHhzM6PJCxua1tY+1tnSvEY137zhseyNjwYEbntp7nDQ2Y9gANENcAzGqTnRjesi2PbZ7I45sn2s/bul63YvjxHba39nls80Qe37J9eV9+GW5gTmnF7lQEtyL38PnDOX7uaEbbvwzXCuLtUx1Ghwe7Anmwfdz2mB4yBQIOWOIagANKrXXqCu9UBG/ZXRRP5LHNrWkQ20N4e0hv2LL3Mdy5sjt/bitux4YH85RFIxmbO5ixuYOZ3w7e+e3lsbnb7xaxYyC3nt35AQ494hqAnnXuG7x+89Y8tmki6ze1Qrfz/Nimra3lzRN5bNPEDleQu68od0J6b+cJd8K2O3qXzZ+b5Uu7A7i1fWxu6ypwJ4rnT21rPY+aFgE0QFwDHMImJ2s2bN3WDt6tXTG8Ywiv78Rx1/YdljdP7NWt04YH52R+V9DOnzuQw8aGc9yS0daV4rnTRHE7mke7I7kdyofSt74BBwdxDXAQ6swjXt8Vv7sM385y9xXkTe31e3mVeN7QQOaPDGbB3MHMH2nF7vFjozutG9q+PHcwC0YG28ut9WNzW98oBzCb9TWuSynnJflwkoEkn6y1/t5O289OcmWSO9qrPl9rfV97251J1ifZlmSi1jrez7EC7C+11myemMyjm7bm0Y3bw/jRzvPG1vP6TVvzaNdz9/r1m/cuijtXiTtBvGBkMEcuHJlaN20Yd+27YO5Qxua6hzDA3upbXJdSBpJ8LMlLk6xNcm0p5apa6y077fqPtdaf38XbnFNrfaBfYwR4MrZum9weue3ofXSnGN4hkjc/MaK3btt9Gc8pyYKRoSyc1wrcBSODOe6w0SwYGczCkaEsHBnMgpHW+gUjQzsEced5bHjQHGKA/ayfV67PTHJ7rfVHSVJKuSzJ+Ul2jmuA/WrbZG2F8MaJPLJx69Tj0U1bd1zeuOMV5c4xe3MrtrHhgSyctz1+D58/nBWHj7XiuGv9wnYsd69fODKU0eEBd5kAOAj1M66PSXJ31/LaJM+ZZr/nlVJuTHJvkv9Ya13TXl+TfKWUUpP8ca310j6OFTjIbJmYfEIQP9p+bF+ePp7Xb5rY7XsPzilZNG8oC+e143feUJ6yaCQL5ravJI9sj+CpSJ7XuaLcuorsF+0ADk0z/QuN1yc5odb6WCnl5Um+kOTE9rbn11rvKaUckeSrpZTv1Vqv2fkNSikXJbkoSY4//vj9NGygCZu2bsvDG554tXjnIJ4umPd09XhkaE4WzRuaejxl0UhOOmpBFraXF3Ztay0PTr2eN+SqMQBPTj/j+p4kx3UtH9teN6XW+mjX6y+WUj5eSjm81vpArfWe9vr7SylXpDXN5Alx3b6ifWmSjI+P7/13x/bB9+9bn+t+/FCeeeyinHL0opkcCuw3nfsbP7xxSx56fGse3rglj2zYmoc3bs3DG7bm4Q1bWs8b289drzdPTO72vRfMHeyK4cGsOHxsewyPDGXR6FDXFeYdQ9ldKQCYCf2M62uTnFhKWZFWVF+Q5A3dO5RSjkpyX621llLOTDInyYOllLEkc2qt69uvz03yvj6OtWdfv+3+/Ns/vTad27y+ZOWR+e2fPznHLx2d2YHBXqq1ZuPWbXmoHcTdgfzQhi15ZGN3KO/4estuInl4cE6WjA5l8bzhLBodyglLR3Pa6OIsHt0ex4vnDe9w5XjRvKHMnzvoDhUAHHT6Fte11olSyjuSfDmtW/F9qta6ppTytvb2TyR5bZKLSykTSTYmuaAd2kcmuaL917KDST5ba/1Sv8baq1pr3nvVmjx12fz80Rufna/ccl/+6Os/zM996Jr8xrlPz5vPWmH+JfvVlonJPLxhS366YUt++viWqUDe8Ury1jzSWd9+vWXbriN5ZGhOFs8bbkXxvKGsOHwsS0ZbwdxZv3jeUBaPtl+3188bdgUZgENHqXv7HbMHgfHx8bp69er9/nNvvueR/Px//0b+62ufmdePt2bC3PvwxvznL9ycv/ve/XnWsYvye//XM7PyKQv3+9g4+G3dNjkVxj99fEseerwVzQ89viUPbdi6w3LrufUlIbsyb2hgKpAXjw5lyWgnmHcdyItHhzIyJJIBIElKKdft6jtYZvoXGmeFr992f0pJXnTSEVPrjl48L5/8N+P56+/+JL971Zq84r9/Ixef/bS840U/Yy7oIWxi22Qe3tgO4se35KENrUCeNprbV513d2eLseGBLBkbzmFjw1k82rrV25Kx4Rw2Oty1fqj1LJIBoO/EdQO+f99jOWbxvBw+f+4O60speeWzjs4LfubwvP//uyX//e9vzxdv+kn+y6tPzXOeunSGRktTtk3WPLxhD4HcfVX58S15dDehPDo8kCWjw1ky1rqafMLS0dby6HAOGxvaIZo7V5uFMgAcWMR1A+588PGsOHxsl9uXjA3ng68/Leefdkze8/mb8guXfjvP/5nD8/ZzfibPfephbvl1AJicrHlk49YdQvjhDTsudyK6E8yPbNy6y6+fnjs4J0vHtofwsUtGc9jo0NTV5E40LxkbmloWygBw8BPXPaq15o51j+c1zz5mj/v+7NOX5au//sL8r2//OJdec0d+8U++nZOOWpA3POf4nP+sY7JodGg/jHj2q7Xm0U0TT7hqPF0s/7R9hfnhDVum7vSys+HBOV3TLIZy8tELp6ZhPCGY21eX/RIfAByaxHWPHnx8S9Zvnsjy3Vy57jY6PJiLXvi0/NLzludz16/NX/zzXfntK9fk/X9zS5771KU59+Qj86KVR+aYxfP6PPKDQ601j2/ZNhXEu5ty0YrlVihP7KKUhwZKe5pFK4ZPOmrh1DSMqfVjw1nSnqd82NiwLxQBAPaauO7R4+27Miyat29XnUeGBvLG55yQNz7nhNx8zyP56+/em6+uuS//+co1+c9Xrskxi+flXy1fkjNOWJJnHLUwzzhqwT7/jAPJ5GTN+k0TU18e8sjGzq3gOvdP7l7Xed2688Wubg83MKdkSftuF0vGhvPUw+fnjBPa85N3iuXD2lMw5s8dFMoAQN+I6x51LpD20murjlmUVccsym++bGVuv399vvGDB3LtnQ/lG7c/mC/ccO/Ufk9ZNJLlS8dy9OJ5OWbxSI5ePC9HLhxp3zJtOIvb31TX1D21JydrNk9MZvPEtmyemMzGLdvy2OaJPLZ5Io+3n9dvaj0/1nnuev3opq1T4fzopl3PT05av8zX/QUiyw8fzaJ5i3LY2NxWQO90B4zDRoezYGQwc9w/HAA4gIjrHk22i3FOQ1dDf+aIBfmZIxbkwrNWpNaanzyyKbf9y/rcdt/63PYv63P3Tzfkf//wgdz36KZdzhGeNzSQ4cE5mTs4p+t5IINzSiZrTa3bx915PVlbIb1pazumt07u9gtFdjanJPPnDmbBSOvq8PyRwSxp3xpu8byhLBodbn8T3/b7Ky8eHZr6amu3JwQAZgNx3aM6deW6+SuopZQcvXhejl48L+d03UM7aX2xyH2Pbsr96zfv8LXUD23Ymo1bJrJlYjKbJyZbz9sms3nrZLZNTmZOKSmlpJSkJO3l1vPcoTmZOziQkd08z587OBXPnecFc4cyMjTHdAsA4JAnrntUp65c79+fOzQwJ8cuGc2xS0b37w8GAGCX5sz0AA52nakZTU0LAQDg4CWuezQ5Q1euAQA48IjrHnXi2nxjAADEdY+qaSEAALSJ6x6ZFgIAQIe47lFt4EtkAACYHcR1j8y5BgCgQ1z3yK34AADoENc9mqkvkQEA4MAjrnvkyjUAAB3iukfb51zP8EAAAJhx4rpH22/Fp64BAA514rpHvkQGAIAOcd0j97kGAKBDXPfINzQCANAhrnvkS2QAAOgQ1z0y5xoAgA5x3SPTQgAA6BDXPfIlMgAAdIjrHvkSGQAAOsR1j6ovkQEAoE1c98gvNAIA0CGuezTpS2QAAGgT1z1ytxAAADrEdY98iQwAAB3iukfmXAMA0CGue2RaCAAAHeK6R75EBgCADnHdo+pLZAAAaBPXPTLnGgCADnHdI19/DgBAh7jukTnXAAB0iOseuXINAECHuO5RnboVn7oGADjUiesemRYCAECHuO6RL5EBAKBDXPeocyu+4so1AMAhT1z3yJVrAAA6xHWPXLkGAKBDXPfIlWsAADrEdY/cLQQAgA5x3SNfIgMAQIe47pEvkQEAoKOvcV1KOa+Uclsp5fZSyiXTbD+7lPJIKeWG9uO39/bYA4VpIQAAdAz2641LKQNJPpbkpUnWJrm2lHJVrfWWnXb9x1rrzz/JY2dcnYrrmR0HAAAzr59Xrs9Mcnut9Ue11i1JLkty/n44dr/aPudaXQMAHOr6GdfHJLm7a3lte93OnldKubGU8rellFP28dgZV2t11RoAgCR9nBayl65PckKt9bFSysuTfCHJifvyBqWUi5JclCTHH3984wPck8nqqjUAAC39vHJ9T5LjupaPba+bUmt9tNb6WPv1F5MMlVIO35tju97j0lrreK11fNmyZU2Of69MunINAEBbP+P62iQnllJWlFKGk1yQ5KruHUopR5X2Zd9Sypnt8Ty4N8ceKFy5BgCgo2/TQmqtE6WUdyT5cpKBJJ+qta4ppbytvf0TSV6b5OJSykSSjUkuqK0bR097bL/G2gtzrgEA6OjrnOv2VI8v7rTuE12vP5rko3t77IGoxj2uAQBo8Q2NPZqcrOIaAIAk4rpnrTnXMz0KAAAOBOK6R627hahrAADEdc9qra5cAwCQRFz3bLL6hUYAAFrEdY98iQwAAB3iuke+RAYAgA5x3TNXrgEAaBHXPZqcNOcaAIAWcd0jt+IDAKBDXPfIl8gAANAhrntUXbkGAKBNXPdo0pfIAADQJq575EtkAADoENc9cuUaAIAOcd2jGleuAQBoEdc9qr7+HACANnHdI18iAwBAh7juUWvOtbgGAEBc96x1t5CZHgUAAAcCcd2j6m4hAAC0ieseTfqGRgAA2sR1j2pizjUAAEnEdc/MuQYAoENc96iaFgIAQJu47tGkL5EBAKBtcKYHcLB7yqJ5mdg2OdPDAADgACCue/SB1z1rpocAAMABwrQQAABoiLgGAICGiGsAAGiIuAYAgIaIawAAaIi4BgCAhohrAABoiLgGAICGiGsAAGiIuAYAgIaIawAAaIi4BgCAhohrAABoiLgGAICGiGsAAGiIuAYAgIaIawAAaIi4BgCAhohrAABoiLgGAICGiGsAAGiIuAYAgIaIawAAaEiptc70GBpTSlmX5Mcz8KMPT/LADPxc9i/n+dDgPB8anOfZzzk+NMzUeT6h1rpsug2zKq5nSillda11fKbHQX85z4cG5/nQ4DzPfs7xoeFAPM+mhQAAQEPENQAANERcN+PSmR4A+4XzfGhwng8NzvPs5xwfGg6482zONQAANMSVawAAaIi47kEp5bxSym2llNtLKZfM9HjYN6WU40opV5dSbimlrCml/Fp7/WGllK+WUn7Qfl7Sdcxvts/3baWUn+taf0Yp5ab2to+UUspMfCamV0oZKKV8p5TyN+1l53gWKqUsLqVcXkr5Xinl1lLK85zr2aWU8h/a/76+uZTyF6WUEed4diilfKqUcn8p5eaudY2d21LK3FLK/9te/0+llOX9+izi+kkqpQwk+ViSlyU5OckvllJOntlRsY8mkvxGrfXkJM9N8vb2Obwkyd/VWk9M8nft5bS3XZDklCTnJfl4+/8HSfJHSd6a5MT247z9+UHYo19LcmvXsnM8O304yZdqrScleVZa59y5niVKKcckeWeS8VrrqiQDaZ1D53h2+NM88Tw0eW7/XZKHaq0/k+S/Jfn9fn0Qcf3knZnk9lrrj2qtW5JcluT8GR4T+6DW+pNa6/Xt1+vT+g/xMWmdxz9r7/ZnSV7Vfn1+kstqrZtrrXckuT3JmaWUpyRZWGv9dm39EsOfdx3DDCulHJvk/0zyya7VzvEsU0pZlOSFSf5HktRat9RaH45zPdsMJplXShlMMprk3jjHs0Kt9ZokP91pdZPntvu9Lk/y4n79jYW4fvKOSXJ31/La9joOQu2/Hjo9yT8lObLW+pP2pn9JcmT79a7O+THt1zuv58DwoST/d5LJrnXO8eyzIsm6JP+zPQXok6WUsTjXs0at9Z4kH0hyV5KfJHmk1vqVOMezWZPnduqYWutEkkeSLO3HoMU1h7xSyvwkn0vy72utj3Zva//J1y11DlKllJ9Pcn+t9bpd7eMczxqDSZ6d5I9qracneTztv0LucK4Pbu35tuen9Qepo5OMlVLe1L2Pczx7HUznVlw/efckOa5r+dj2Og4ipZShtML6M7XWz7dX39f+q6W0n+9vr9/VOb+n/Xrn9cy8s5K8spRyZ1pTt15USvlfcY5no7VJ1tZa/6m9fHlase1czx4vSXJHrXVdrXVrks8n+T/iHM9mTZ7bqWPa04oWJXmwH4MW10/etUlOLKWsKKUMpzWx/qoZHhP7oD3X6n8kubXW+sGuTVcl+Tft1/8myZVd6y9o/8bxirR+UeKf239l9Wgp5bnt9/ylrmOYQbXW36y1HltrXZ7WP6N/X2t9U5zjWafW+i9J7i6lPKO96sVJbolzPZvcleS5pZTR9rl5cVq/K+Mcz15Nntvu93ptWv896M+V8Fqrx5N8JHl5ku8n+WGS/zTT4/HY5/P3/LT+ium7SW5oP16e1hysv0vygyRfS3JY1zH/qX2+b0vysq7140lubm/7aNpf0ORx4DySnJ3kb9qvneNZ+EhyWpLV7X+mv5BkiXM9ux5JfjfJ99rn59NJ5jrHs+OR5C/Smku/Na2/ifp3TZ7bJCNJ/iqtX3785yRP7ddn8Q2NAADQENNCAACgIeIaAAAaIq4BAKAh4hoAABoirgEAoCHiGmCWK6XcWUo5vNd9ANgzcQ0AAA0R1wCzSCnlC6WU60opa0opF+20bXkp5XullM+UUm4tpVxeShnt2uVXSynXl1JuKqWc1D7mzFLKt0op3yml/O+ub0AEYBriGmB2+be11jPS+payd5ZSlu60/RlJPl5rXZnk0SS/0rXtgVrrs5P8UZL/2F73vSQvqLWenuS3k/yXvo4e4CAnrgFml3eWUm5M8u0kxyU5caftd9dav9l+/b+SPL9r2+fbz9clWd5+vSjJX5VSbk7y35Kc0o9BA8wW4hpgliilnJ3kJUmeV2t9VpLvJBnZabe6m+XN7edtSQbbr9+f5Opa66okr5jm/QDoIq4BZo9FSR6qtW5oz5l+7jT7HF9KeV779RuSfGMv3vOe9usLGxklwCwmrgFmjy8lGSyl3Jrk99KaGrKz25K8vb3PkrTmV+/Of03y/5RSvpPtV7MB2IVS685/QwjAbFRKWZ7kb9pTPADoA1euAQCgIa5cAwBAQ1y5BgCAhohrAABoiLgGAICGiGsAAGiIuAYAgIaIawAAaMj/D8pPylu8BCC0AAAAAElFTkSuQmCC\n" }, "metadata": { "needs_background": "light" } } ] }, { "cell_type": "markdown", "source": "**Figure 4**: The blue line represents the R^2 of the validation data, and the red line represents the R^2 of the training data. The x-axis represents the different values of Alpha.\n", "metadata": {} }, { "cell_type": "markdown", "source": "Here the model is built and tested on the same data, so the training and test data are the same.\n\nThe red line in Figure 4 represents the R^2 of the training data. As alpha increases the R^2 decreases. Therefore, as alpha increases, the model performs worse on the training data\n\nThe blue line represents the R^2 on the validation data. As the value for alpha increases, the R^2 increases and converges at a point.\n", "metadata": {} }, { "cell_type": "markdown", "source": "
\n

Question #5):

\n\nPerform Ridge regression. Calculate the R^2 using the polynomial features, use the training data to train the model and use the test data to test the model. The parameter alpha should be set to 10.\n\n
\n", "metadata": {} }, { "cell_type": "code", "source": "# Write your code below and press Shift+Enter to execute \n\nRigeModel1 = Ridge(alpha=10) \nRigeModel1.fit(x_train_pr, y_train)\nyhat1 = RigeModel1.predict(x_test_pr)\nRigeModel1.score(x_test_pr, y_test)", "metadata": { "trusted": true }, "execution_count": 67, "outputs": [ { "execution_count": 67, "output_type": "execute_result", "data": { "text/plain": "0.5418576440208844" }, "metadata": {} } ] }, { "cell_type": "markdown", "source": "
Click here for the solution\n\n```python\nRigeModel = Ridge(alpha=10) \nRigeModel.fit(x_train_pr, y_train)\nRigeModel.score(x_test_pr, y_test)\n\n```\n\n
\n", "metadata": {} }, { "cell_type": "markdown", "source": "

Part 4: Grid Search

\n", "metadata": {} }, { "cell_type": "markdown", "source": "The term alpha is a hyperparameter. Sklearn has the class GridSearchCV to make the process of finding the best hyperparameter simpler.\n", "metadata": {} }, { "cell_type": "markdown", "source": "Let's import GridSearchCV from the module model_selection.\n", "metadata": {} }, { "cell_type": "code", "source": "from sklearn.model_selection import GridSearchCV", "metadata": { "trusted": true }, "execution_count": 68, "outputs": [] }, { "cell_type": "markdown", "source": "We create a dictionary of parameter values:\n", "metadata": {} }, { "cell_type": "code", "source": "parameters1= [{'alpha': [0.001,0.1,1, 10, 100, 1000, 10000, 100000, 100000]}]\nparameters1", "metadata": { "trusted": true }, "execution_count": 69, "outputs": [ { "execution_count": 69, "output_type": "execute_result", "data": { "text/plain": "[{'alpha': [0.001, 0.1, 1, 10, 100, 1000, 10000, 100000, 100000]}]" }, "metadata": {} } ] }, { "cell_type": "markdown", "source": "Create a Ridge regression object:\n", "metadata": {} }, { "cell_type": "code", "source": "RR=Ridge()\nRR", "metadata": { "trusted": true }, "execution_count": 70, "outputs": [ { "execution_count": 70, "output_type": "execute_result", "data": { "text/plain": "Ridge()" }, "metadata": {} } ] }, { "cell_type": "markdown", "source": "Create a ridge grid search object:\n", "metadata": {} }, { "cell_type": "code", "source": "Grid1 = GridSearchCV(RR, parameters1,cv=4)", "metadata": { "trusted": true }, "execution_count": 71, "outputs": [] }, { "cell_type": "markdown", "source": "Fit the model:\n", "metadata": {} }, { "cell_type": "code", "source": "Grid1.fit(x_data[['horsepower', 'curb-weight', 'engine-size', 'highway-mpg']], y_data)", "metadata": { "trusted": true }, "execution_count": 72, "outputs": [ { "execution_count": 72, "output_type": "execute_result", "data": { "text/plain": "GridSearchCV(cv=4, estimator=Ridge(),\n param_grid=[{'alpha': [0.001, 0.1, 1, 10, 100, 1000, 10000, 100000,\n 100000]}])" }, "metadata": {} } ] }, { "cell_type": "markdown", "source": "The object finds the best parameter values on the validation data. We can obtain the estimator with the best parameters and assign it to the variable BestRR as follows:\n", "metadata": {} }, { "cell_type": "code", "source": "BestRR=Grid1.best_estimator_\nBestRR", "metadata": { "trusted": true }, "execution_count": 73, "outputs": [ { "execution_count": 73, "output_type": "execute_result", "data": { "text/plain": "Ridge(alpha=10000)" }, "metadata": {} } ] }, { "cell_type": "markdown", "source": "We now test our model on the test data:\n", "metadata": {} }, { "cell_type": "code", "source": "BestRR.score(x_test[['horsepower', 'curb-weight', 'engine-size', 'highway-mpg']], y_test)", "metadata": { "trusted": true }, "execution_count": 74, "outputs": [ { "execution_count": 74, "output_type": "execute_result", "data": { "text/plain": "0.8411649831036152" }, "metadata": {} } ] }, { "cell_type": "markdown", "source": "### Thank you for completing this lab!\n\n## Author\n\nJoseph Santarcangelo\n\n### Other Contributors\n\nMahdi Noorian PhD\n\nBahare Talayian\n\nEric Xiao\n\nSteven Dong\n\nParizad\n\nHima Vasudevan\n\nFiorella Wenver\n\nYi Yao.\n\n## Change Log\n\n| Date (YYYY-MM-DD) | Version | Changed By | Change Description |\n| ----------------- | ------- | ---------- | ----------------------------------- |\n| 2020-10-30 | 2.3 | Lakshmi | Changed URL of csv |\n| 2020-10-05 | 2.2 | Lakshmi | Removed unused library imports |\n| 2020-09-14 | 2.1 | Lakshmi | Made changes in OverFitting section |\n| 2020-08-27 | 2.0 | Lavanya | Moved lab to course repo in GitLab |\n\n
\n\n##

© IBM Corporation 2020. All rights reserved.

\n", "metadata": {} }, { "cell_type": "code", "source": "", "metadata": {}, "execution_count": null, "outputs": [] } ] }