{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "

\n", " \n", " \"Skills\n", " \n", "

\n", "\n", "# Analyzing a real world data-set with SQL and Python\n", "\n", "Estimated time needed: **15** minutes\n", "\n", "## Objectives\n", "\n", "After completing this lab you will be able to:\n", "\n", "* Understand a dataset of selected socioeconomic indicators in Chicago\n", "* Learn how to store data in an Db2 database on IBM Cloud instance\n", "* Solve example problems to practice your SQL skills\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Selected Socioeconomic Indicators in Chicago\n", "\n", "The city of Chicago released a dataset of socioeconomic data to the Chicago City Portal.\n", "This dataset contains a selection of six socioeconomic indicators of public health significance and a “hardship index,” for each Chicago community area, for the years 2008 – 2012.\n", "\n", "Scores on the hardship index can range from 1 to 100, with a higher index number representing a greater level of hardship.\n", "\n", "A detailed description of the dataset can be found on [the city of Chicago's website](https://data.cityofchicago.org/Health-Human-Services/Census-Data-Selected-socioeconomic-indicators-in-C/kn9c-c2s2?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDeveloperSkillsNetworkDB0201ENSkillsNetwork20127838-2022-01-01), but to summarize, the dataset has the following variables:\n", "\n", "* **Community Area Number** (`ca`): Used to uniquely identify each row of the dataset\n", "\n", "* **Community Area Name** (`community_area_name`): The name of the region in the city of Chicago\n", "\n", "* **Percent of Housing Crowded** (`percent_of_housing_crowded`): Percent of occupied housing units with more than one person per room\n", "\n", "* **Percent Households Below Poverty** (`percent_households_below_poverty`): Percent of households living below the federal poverty line\n", "\n", "* **Percent Aged 16+ Unemployed** (`percent_aged_16_unemployed`): Percent of persons over the age of 16 years that are unemployed\n", "\n", "* **Percent Aged 25+ without High School Diploma** (`percent_aged_25_without_high_school_diploma`): Percent of persons over the age of 25 years without a high school education\n", "\n", "* **Percent Aged Under** 18 or Over 64:Percent of population under 18 or over 64 years of age (`percent_aged_under_18_or_over_64`): (ie. dependents)\n", "\n", "* **Per Capita Income** (`per_capita_income_`): Community Area per capita income is estimated as the sum of tract-level aggragate incomes divided by the total population\n", "\n", "* **Hardship Index** (`hardship_index`): Score that incorporates each of the six selected socioeconomic indicators\n", "\n", "In this Lab, we'll take a look at the variables in the socioeconomic indicators dataset and do some basic analysis with Python.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Connect to the database\n", "\n", "Let us first load the SQL extension and establish a connection with the database\n", "\n", "The following required modules are pre-installed in the Skills Network Labs environment. However if you run this notebook commands in a different Jupyter environment (e.g. Watson Studio or Ananconda) you may need to install these libraries by removing the `#` sign before `!pip` in the code cell below.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# These libraries are pre-installed in SN Labs. If running in another environment please uncomment lines below to install them:\n", "# !pip install --force-reinstall ibm_db==3.1.0 ibm_db_sa==0.3.3\n", "# Ensure we don't load_ext with sqlalchemy>=1.4 (incompadible)\n", "# !pip uninstall sqlalchemy==1.4 -y && pip install sqlalchemy==1.3.24\n", "# !pip install ipython-sql" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "%load_ext sql" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Connected: cnv39734@BLUDB'" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Remember the connection string is of the format:\n", "# %sql ibm_db_sa://my-username:my-password@hostname:port/BLUDB?security=SSL\n", "# Enter the connection string for your Db2 on Cloud database instance below\n", "# i.e. copy after db2:// from the URI string in Service Credentials of your Db2 instance. Remove the double quotes at the end.\n", "%sql ibm_db_sa://" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Store the dataset in a Table\n", "\n", "##### In many cases the dataset to be analyzed is available as a .CSV (comma separated values) file, perhaps on the internet. To analyze the data using SQL, it first needs to be stored in the database.\n", "\n", "##### We will first read the dataset source .CSV from the internet into pandas dataframe\n", "\n", "##### Then we need to create a table in our Db2 database to store the dataset. The PERSIST command in SQL \"magic\" simplifies the process of table creation and writing the data from a `pandas` dataframe into the table\n" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " * ibm_db_sa://cnv39734:***@1bbf73c5-d84a-4bb0-85b9-ab1a4348f4a4.c3n41cmd0nqnrk39u98g.databases.appdomain.cloud:32286/BLUDB\n" ] }, { "data": { "text/plain": [ "'Persisted chicago_socioeconomic_data'" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas\n", "chicago_socioeconomic_data = pandas.read_csv('https://data.cityofchicago.org/resource/jcxq-k9xf.csv')\n", "%sql PERSIST chicago_socioeconomic_data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### You can verify that the table creation was successful by making a basic query like:\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Problems\n", "\n", "### Problem 1\n", "\n", "##### How many rows are in the dataset?\n" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "78" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "chicago_socioeconomic_data.shape[0]" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " * ibm_db_sa://cnv39734:***@1bbf73c5-d84a-4bb0-85b9-ab1a4348f4a4.c3n41cmd0nqnrk39u98g.databases.appdomain.cloud:32286/BLUDB\n", "Done.\n" ] }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
1
78
" ], "text/plain": [ "[(78,)]" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%sql SELECT COUNT(*) FROM chicago_socioeconomic_data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Click here for the solution\n", "\n", "```python\n", "%sql SELECT COUNT(*) FROM chicago_socioeconomic_data;\n", "\n", "Correct answer: 78\n", "```\n", "\n", "
\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Problem 2\n", "\n", "##### How many community areas in Chicago have a hardship index greater than 50.0?\n" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " * ibm_db_sa://cnv39734:***@1bbf73c5-d84a-4bb0-85b9-ab1a4348f4a4.c3n41cmd0nqnrk39u98g.databases.appdomain.cloud:32286/BLUDB\n", "Done.\n" ] }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
1
38
" ], "text/plain": [ "[(38,)]" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%sql SELECT COUNT(*) FROM chicago_socioeconomic_data where hardship_index>50" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Click here for the solution\n", "\n", "```python\n", "%sql SELECT COUNT(*) FROM chicago_socioeconomic_data WHERE hardship_index > 50.0;\n", "\n", "Correct answer: 38\n", "```\n", "\n", "
\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Problem 3\n", "\n", "##### What is the maximum value of hardship index in this dataset?\n" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " * ibm_db_sa://cnv39734:***@1bbf73c5-d84a-4bb0-85b9-ab1a4348f4a4.c3n41cmd0nqnrk39u98g.databases.appdomain.cloud:32286/BLUDB\n", "Done.\n" ] }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
1
98.0
" ], "text/plain": [ "[(98.0,)]" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%sql SELECT max(hardship_index) FROM chicago_socioeconomic_data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Click here for the solution\n", "\n", "```python\n", "%sql SELECT MAX(hardship_index) FROM chicago_socioeconomic_data;\n", "\n", "Correct answer: 98.0\n", "```\n", "\n", "
\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Problem 4\n", "\n", "##### Which community area which has the highest hardship index?\n" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " * ibm_db_sa://cnv39734:***@1bbf73c5-d84a-4bb0-85b9-ab1a4348f4a4.c3n41cmd0nqnrk39u98g.databases.appdomain.cloud:32286/BLUDB\n", "Done.\n" ] }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
community_area_name
Riverdale
" ], "text/plain": [ "[('Riverdale',)]" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%sql SELECT community_area_name from chicago_socioeconomic_data where hardship_index = (SELECT max(hardship_index) FROM chicago_socioeconomic_data)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Click here for the solution\n", "\n", "```python\n", "#We can use the result of the last query to as an input to this query:\n", "%sql SELECT community_area_name FROM chicago_socioeconomic_data where hardship_index=98.0\n", "\n", "#or another option:\n", "%sql SELECT community_area_name FROM chicago_socioeconomic_data ORDER BY hardship_index DESC NULLS LAST FETCH FIRST ROW ONLY;\n", "\n", "#or you can use a sub-query to determine the max hardship index:\n", "%sql select community_area_name from chicago_socioeconomic_data where hardship_index = ( select max(hardship_index) from chicago_socioeconomic_data ) \n", "\n", "Correct answer: 'Riverdale'\n", " \n", "```\n", "\n", "
\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Problem 5\n", "\n", "##### Which Chicago community areas have per-capita incomes greater than $60,000?\n" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " * ibm_db_sa://cnv39734:***@1bbf73c5-d84a-4bb0-85b9-ab1a4348f4a4.c3n41cmd0nqnrk39u98g.databases.appdomain.cloud:32286/BLUDB\n", "Done.\n" ] }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
community_area_name
Lake View
Lincoln Park
Near North Side
Loop
" ], "text/plain": [ "[('Lake View',), ('Lincoln Park',), ('Near North Side',), ('Loop',)]" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%sql SELECT community_area_name from chicago_socioeconomic_data where per_capita_income_ > 60000" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Click here for the solution\n", "\n", "```python\n", "%sql SELECT community_area_name FROM chicago_socioeconomic_data WHERE per_capita_income_ > 60000;\n", "\n", "Correct answer:Lake View,Lincoln Park, Near North Side, Loop\n", " \n", "```\n", "\n", "
\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Problem 6\n", "\n", "##### Create a scatter plot using the variables `per_capita_income_` and `hardship_index`. Explain the correlation between the two variables.\n" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " * ibm_db_sa://cnv39734:***@1bbf73c5-d84a-4bb0-85b9-ab1a4348f4a4.c3n41cmd0nqnrk39u98g.databases.appdomain.cloud:32286/BLUDB\n", "Done.\n" ] }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAasAAAGoCAYAAAD4hcrDAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8qNh9FAAAACXBIWXMAAAsTAAALEwEAmpwYAAAn6klEQVR4nO3df7icZXng8e9NcoQTBA5IoHCQgrssrYolmvVXWqugDbYqKWpBS5e17rLX1lbBLW3oL2t7WaK2WretbrnUSv2JVQxUrZQS6Q96FQ0EBESKrRQJCLFykJKIh3DvH/NOmJzMnDMzZ95535n5fq5rrpl5Z+adO+ckc+e5536eJzITSZLqbL+qA5AkaSkmK0lS7ZmsJEm1Z7KSJNWeyUqSVHsrqw5gAGxnlDROouoA6siRlSSp9kxWkqTaG4cyYC187Lq7Bnq+1z7n2IGeT5JGmSMrSVLtmawkSbVnspIk1Z7JSpJUeyYrSVLtmawkSbVnspIk1Z7JSpJUeyYrSVLtmawkSbVnspIk1Z7JSpJUe5E58ttB9fUHGPTCs3XnwrjSyHA/qzYcWUmSas9kJUmqPZOVJKn2TFaSpNozWUmSas9kJUmqPZOVJKn2TFaSpNozWUmSas9kJUmqPZOVJKn2TFaSpNozWUmSas9kJUmqPZOVJKn2TFaSpNozWUmSam9l1QFoOAa9M7I7D0saJkdWkqTaM1lJkmrPZCVJqj2TlSSp9mywUF9s2JA0TI6sJEm1Z7KSJNWeyUqSVHsmK0lS7dlgoVoYdMMG2LQhjRNHVpKk2jNZSZJqzzKgxpZzwaTx4chKklR7jqykipTRVDJIjiRVJ46sJEm1Z7KSJNWeZUBJQ1H3hpe6xzfpHFlJkmovMrPqGJYlIr4AHD7g0x4OfHvA5+xHXeIAY+nEWNozln11G8e3M/O0soMZNSOfrMoQEVszc61xPM5Y2jOW9oylvnGMKsuAkqTaM1lJkmrPZNXexVUHUKhLHGAsnRhLe8ayr7rEMZL8zkqSVHuOrCRJtWeykiTVnslKklR7JitJUu2ZrCRJtTfyyeq0005LwIsXL17G5dK1Mf38a2vkk9W3v12HJb8kafgm6fNv5JOVJGn8mawkSbVnspIk1V6pySoiPhgR90fELS3HDouIqyLijuL60JbHLoyIr0fE7RGxvszYJEmjo+yR1YeAhZuIbQSuzswTgKuL+0TEU4GzgKcVr3lvRKwoOT5J0ggoNVll5t8B31lw+HTgkuL2JcCGluOfyMxHMvMbwNeBZ5cZnyRpNFTxndWRmXkvQHF9RHF8Fvhmy/PuLo5JkiZcnRosos2xthPEIuLciNgaEVt37NhRcliSVB+T+vlXRbK6LyKOAiiu7y+O3w08ueV5xwD3tDtBZl6cmWszc+3q1atLDVaS6mRSP/+qSFZXAOcUt88BLm85flZE7B8RxwMnAF+qID5JUs2sLPPkEfFx4IXA4RFxN/AWYBPwyYh4PXAX8GqAzLw1Ij4JfBV4FHhDZu4edEybt23nnVfezj1zuzh6ZpoL1p/IhjV+NSZJdVZqssrM13R46NQOz38b8Lay4tm8bTsXXnYzu+YbOXD73C4uvOxmABOWJNVYqcmqbt555e17ElXTrvndvPPK2/ckK0deklQ/E5Ws7pnbtehxR16SRsl3Hv4+H7vurqrDWLbXPufYJZ9Tp9b10h09M73o8cVGXpKk6kxUsrpg/YlMT+29gtP01AouWH8isPTIS5JUjYkqAzZLeZ2+k5pZNcUDO+f3ed3MqqmhxilJ2ttEJStoJKxO3z9lhw2VOx2XJA3HRJUBl/Lgrn1HVYsdlyQNh8mqxVINGJKkapisWizVgCFJqsbEfWe1mE4NGADrNm1xorAkVcRktcDCBgwnCktS9SwDLsGJwpJUPZPVEjpNCN7uRGFJGhqT1RI6dQIGjRKhJKl8JqslXLD+RKLN8QRLgZI0JDZYdNC6VUinBSxcM1CShsNk1cbCDsBOnCwsScNhGbCNdh2ACzlZWJKGx2TVxlLlvRURe9rXbbKQpPKZrNroVN6bmZ5iemoFu4tl2JsThE1YklQuk1UbndYIjMAJwpJUAZNVGxvWzHLRGScxOzNNALMz01x0xknMtdmYEewKlKSy2Q3YQbtNGt955e1tV66wK1CSyuXIqgduISJJ1TBZLWLztu2s27SF4zd+jnWbtgC0LQ+6+roklcsyYAedtga56IyTuHbjKRVHJ0mTxZFVB24NIkn14ciqg04dfguPt64h6C7CklQOR1YddOrwaz3eLBVuLxa7dZKwJJXDZNVBN51/lgolaThMVh0snBh86Kop9l+5H+dfeiPrNm1h87btXZcKJUnLY7JaxIY1s1y78RTefebJfG/+MeZ2ze9V7ptZNdX2dU4SlqTBMll1oVO5LxMnCUvSEJisutCprPfgrnknCUvSENi63oWjZ6Y7rgnYbg1BSdJgObLqgmsCSlK1HFl1oTlyajf510nBklQ+k1WX2pX7Oq0f2Hy+JGkwLAMug5OCJWk4HFn1oVn6a9d0AU4KlqRBM1n1aGHprx0nBUvSYFkG7FG70l8ruwQlafAcWfVosRLfrN2AklQKk1WPZlZN8cDO+X2OH7pqyh2EJakklgF7lNnbcUnS8jmy6tGDu/YdVS123EnDkrR8jqx61M0Owk3uJCxJg2Gy6lEv6wQ6aViSBsMyYI8WWydwIXcSlqTBMFn1odttQRbbWkSS1L3KyoARcX5E3BoRt0TExyPigIg4LCKuiog7iutDq4pvKZu3bWfdpi0cv/FzrNu0pe33UG4tIkmDUUmyiohZ4I3A2sx8OrACOAvYCFydmScAVxf3a6fbxokNa2bdSViSBqDKMuBKYDoi5oFVwD3AhcALi8cvAa4BfrWK4BazWOPEwkTkTsKStHyVjKwyczvw+8BdwL3Ag5n518CRmXlv8Zx7gSPavT4izo2IrRGxdceOHcMKe0/pz9XWJVWl9fPvobnvVB3O0FRVBjwUOB04HjgaODAizu729Zl5cWauzcy1q1evLivMvbSW/jqxcUJS2Vo//w6aOazqcIamqgaLFwPfyMwdmTkPXAY8H7gvIo4CKK7vryi+fbjauiRVp6rvrO4CnhsRq4BdwKnAVuBh4BxgU3F9eUXx7WMYq627NJMktVdJssrM6yLiU8ANwKPANuBi4InAJyPi9TQS2quriK+dTnOmZmemB7La+sJNHZsdhoAJS9LEq6wbMDPfArxlweFHaIyyaueC9Sfus0PwIEt/vXQYStKoe+1zju3p+a5g0aVellnqh0szSVJnJqselDlnyqWZJKkzV12vCZdmkqTOHFkNQTddfmWXGSVplJmsStZLl59LM0lSe5YBS+YGjJK0fCarktnlJ0nLZ7Iq2SHTUz0dlyTty2RVsojejkuS9mWDRUmaHYAP7Jxv+/hch+OSpH2ZrEqwsAOwHSf7SlL3LAOWwO1EJGmwHFmVYBjbiUjSJHFkVYJOJb4AE5Uk9cFkVYIL1p9Iu2a/BCcDS1IfTFYl2LBmluzwmJOBJal3JquSzHYoBdoFKEm9M1mVxC0/JGlw7AYcsNbtQA6ZnuKAqf2Y2znvlh+StAwmqwFaOBl4btc801MrePeZJ5ukJGkZLAMOkNuBSFI5TFYD1KnTb/vcLjZv2z7kaCRpfJisBmixTr8LL7vZhCVJfTJZDVC7DsAmy4GS1D8bLAao2URx3qU3tn18uxOCJakvjqwGbMOa2Y4TggMsBUpSH0xWJXBtQEkaLJNVCVwbUJIGy2RVEtcGlKTBMVmVxLUBJWlw7AYsSbMz8K1/eSsP7JwHYP+V/t9Akvrhp2fJvjf/2J7bc7vmnRwsSX0wWZXItQIlaTBMViVabK3A4zd+jnWbtjjKkqQumKxKtFjnX9JIWpYFJWlpJqsSLbZWYJNlQUlamslqgDZv2866TVv2lPgALjrjJGZnptuuaNHkRGFJWpzJakCauwRvn9u1V4kP4NqNp/CNTT/lRGFJ6pPJakC66fx70Q+t3meE5URhSVqayWpAOpXymsc3b9vOp6/fvteagQG88lmzeyYQS5LaM1kNSKdSXvN4u5FXAl/82o6yQ5OkkWeyGpCl1gJcauQlSerMZDUgG9bM7tX5NzszzUVnnLSnxLfUyEuS1JkL2Q7QhjWdv3+6YP2JXHjZzXuVAm2ukKTumKyGYPO27Xu+s1oRwe5MZmemuWD9iTZXSFIXTFYla86/ao6odmfuGVGZqCSpO35nVTJXXpek5TNZlWyxldddwFaSumOyKtli3X6uuC5J3ek6WUXE6xfcXxERbxl8SONlsZXXLQdKUnd6GVmdGhGfj4ijIuLpwD8BB/X7xhExExGfioivRcRtEfG8iDgsIq6KiDuK60P7PX9dNOdfdXJPUQ5sXa3d0ZYk7a3rZJWZrwUuAW4GPg+cl5m/vIz3fg/whcz8IeBHgNuAjcDVmXkCcHVxf+RtWDPbccX1mVVTbVdrN2FJ0uN6KQOeALwJ+DRwJ/BzEbGqnzeNiIOBFwAfAMjM72fmHHA6jYRIcb2hn/PXUaflmDKxW1CSltBLGfAvgd/KzP8F/DhwB/DlPt/3KcAO4M8iYltEvD8iDgSOzMx7AYrrI9q9OCLOjYitEbF1x47RWAi203JMD+6ab/t81wyU1E7r599Dc9+pOpyh6WVS8LMz87sAmZnAH0TEFct432cCv5SZ10XEe+ih5JeZFwMXA6xduzaXeHpttFuO6Z1X3s72NonJNQMltdP6+feUH37GyHz+LVcvI6vpiPhARHwBICKeSqOU14+7gbsz87ri/qdoJK/7IuKo4vxHAff3ef6RsdRq7ZKk3pLVh4ArgaOK+/8MnNfPm2bmt4BvRkTzE/lU4KvAFcA5xbFzgMv7Of8o2bBmllc+a5YV0dhDeEWEGzJK0gK9JKvDM/OTwGMAmfkosHvxlyzql4CPRsRXgJOB3wM2AS+JiDuAlxT3x1pzB+Hd2RjN787k09dvtxtQklr08p3VwxHxJBob3BIRzwUe7PeNM/NGYG2bh07t95yjaLG1Ax1dSVJDL8nqzTTKdP8pIq4FVgOvKiWqCTKoHYSb25DcM7eLo91+RNKY6TpZZeYNEfHjwIlAALdnZvu+a3Xt6JnpZXcDLtyGpDmxGDBhSRoLS35nFRFnNC/AK2gkq/8CvLw4pmUYRDeg25BIGnfdjKxeXlwfATwf2FLcfxFwDXDZ4MMaT4uV6pZTwhtUKVGS6mrJZJWZrwOIiM8CT22uMFHMg/qTcsMbH0uV6pZTrhtEKVGS6qyX1vXjmomqcB+NcqC6UGapzonFksZdL92A10TElcDHabSvnwV8sZSoxlA3pbp+O/oGUUqUpDrrpRvwF4uGih8rDl2cmZ8pJ6zxs1SpbrkdfcstJUpSnfW0rX1mXpaZ5xcXE1UPlirV2dEnSZ31sp/VGcUOvg9GxHcj4qGI+G6ZwY2TTluENEdDdvRJUme9fGf1DuDlmXlbWcGMu8VKdXb0SVJnvZQB7zNRlceOPknqrJeR1daIuBTYDDzSPJiZTgoegKU6+lz7T9Ik6yVZHQzsBH6i5VjiChYD06lM6Np/kiZdL63rryszEHXmNiKSJt2SySoifiUz3xERf0Sxl1WrzHxjKZGNuV7KenYKSpp03Yysmk0VW8sMZJL0WtazU1DSpFuyGzAz/7K4vqTdpfm8YuSlLvQ6AdhOQUmTrpcGi6WsG+C5xlq3Zb3WUuHMqin2X7kfD+6atxtQ0sQZZLJSl7op6y0sFT6wc57pqRW8+8yTTVKSJk5PawNqMLop67lWoCQ9bpAjqxjgucZaN1t6dCoVbp/bxbpNW5wcLGmi9JysIuJgIDPzoQUPvWcwIU2Gpbb06FQqDNhz3MnBkiZFL6uur42Im4GvALdExE0R8azm45n5oRLim1jtSoXBvhPdLA1KmgS9fGf1QeAXMvO4zPxB4A3An5UTltptKbLPjOyCk4MljbteyoAPZebfN+9k5j9ExMJSoAZoYalw3aYtTg6WNJF6GVl9KSL+NCJeGBE/HhHvBa6JiGdGxDPLCnDSbN62nXWbtnD8xs+xbtMWNm/bvucxJwdLmlS9jKxOLq7fsuD482l8lXLKIAKaZEstw9RNF6EkjaNeVl1/UZmBqLvV1ZfqIpSkcdTNqutnZ+ZHIuLN7R7PzHcNPqzJtNzV1d2gUdK46mZkdWBxfVCZgWh5q6u7QaOkcbZkssrMPy2u31p+OJPtgvUn7pVwoPsGCjdolDTOuv7OKiJWA/8TOK71dZn584MPazL100DRLP21G5GBc7AkjYdeugEvB/4e+Btg9xLPVZ96aaBYWPprxzlYksZBL8lqVWb+ammRqGftSn+tnIMlaVz0Min4sxHxk6VFop4tVuKbnZnmojNO8vsqSWOhm9b1h2hM+g3g1yLiEWC+uJ+ZeXC5IaqTTt2DszPTXLvROdqSxseSI6vMPCgzDy6u98vM6Zb7JqoKufySpEnRyxYh6yLiwOL22RHxrog4trzQ1I0Dph7/Fc5MT1n6kzSWevnO6n3Azoj4EeBXgH8DPlxKVFpSsxPwgZ3ze4498uhjFUYkSeXpJVk9mpkJnA68JzPfg6taVGaxScCSNG562s8qIi4EzgZeEBErgKlywtJSlruOoCSNkl5GVmcCjwCvz8xvAbPAO0uJSks6ZLr9/xM6HZekUdbLFiHfAt7Vcv8u4M/LCEpLi+jtuCSNsl7mWbVl+3o5ltruY66lsaJVp+OSNMq6WXX9IICI+B3gWzQ6AAP4WWywKEU3230sZzsRSRo1vXxntT4z35uZD2XmdzPzfcArywpsknXT6eeEYEmTpJduwN0R8bPAJ2iUBV+Dq6+XoptOv8W2E3HHYEnjppdk9VrgPcUlgWuLYxqwbkt87bYTccdgSeOoqzJgMafqDZl5emYenpmrM3NDZt5ZbniTaTklPicLSxpHXSWrzNwNPGvQbx4RKyJiW0R8trh/WERcFRF3FNeHDvo9R8GGNbNcdMZJzM5ME/S23YeThSWNo17KgNsi4grgL4CHmwcz87JlvP+bgNuAZvv7RuDqzNwUERuL+xO54WMvOwa3sktQ0jjqpRvwMODfgVOAlxeXl/X7xhFxDPBTwPtbDp8OXFLcvgTY0O/5J5VdgpLGUS8rWLxuwO/9hzRWb2+dq3VkZt5bvN+9EXFEuxdGxLnAuQDHHusuJU3NLsBd87tZEcHuTGbtBpTGSuvn3+E/MDn/rrtOVhFxAPB64GnAAc3jmfnzvb5pRLwMuD8zr4+IF/b6+sy8GLgYYO3atR1X15gkC7sAd2fuGVGZqKTx0fr595QffsbEfP71Ugb8MPADwHrgb4FjgIf6fN91wCsi4k4a87ZOiYiPAPdFxFEAxfX9fZ5/4tgFKGmc9ZKs/nNm/ibwcGZeQuP7ppP6edPMvDAzj8nM44CzgC2ZeTZwBXBO8bRzgMv7Of8kGlQX4OZt21m3aQvHb/wc6zZtYfO27YMIT5KWpZdk1VwhdS4ing4cAhw34Hg2AS+JiDuAlxT31YVO3X69dAE2S4nb53aRPD6h2IQlqWq9JKuLi3lPv0FjBPRV4O3LDSAzr8nMlxW3/z0zT83ME4rr7yz3/JNiEF2AlhIl1VUv86w+TGPh2uN4vL38yEEHpP4dMLXfnmQzMz3Fb7/iaT01VzihWFJd9ZKsLgceBK6nsWOwamJhJyDAI48+1vN5nFAsqa56SVbHZOZppUWivi1WvutlZHXB+hP3SXpOKJZUB718Z/WPEdFX95/KNajy3XLWJJSkMnWzrf3NNLYEWQm8LiL+lUYZMIDMzGeUG6KWMsjyXb9rEkpSmbopA/a9/p+Gw/KdpHG3ZLLKzH8bRiDq38Jdg2dWTZEJ5196I++88naXXJI08nr5zko1tmHNLNduPIV3n3ky35t/jLld807slTQ2TFZjxom9ksZRL63rGgG9dAY2txS5Z24XR7uViKQac2Q1ZrpdI9B1ACWNEpPVmOl2jUDLhZJGiWXAMbOwM3Bhea9Z+ms3LwtcB1BSPZmsxlCnib3t1hBcyHUAJdWRZcAJ0q7018qJxJLqypHVBFmsxDdrN6CkGjNZTZBDpqeY2zW/z/GZ6Smu3XhKBRFJUncsA06QiN6OS1JdmKwmyNzOfUdVix2XpLowWU2QbicMS1LdmKwmSLsJwwA7v/+oK1dIqjWT1QRp7gQ8Mz211/EHds671JKkWjNZTZgNa2Y5cP99m0BdaklSndm6PsY6rarey8rsklQHJqsxtXBppeaq6tBoqGi3NqCNFpLqyjLgmFpsVfVuV2aXpLowWY2pTiW97XO7OP/SG9l/5X4cumqKoLHU0kVnnORSS5JqyzLgmOpU6gNIYG7XPNNTK3j3mSebpCTVniOrMdVpTlUrOwAljQpHVmNq4SaM2eF5dgBKGgUmqzHWugnjuk1b7ACUNLIsA04IOwAljTJHVhNiYVnw6Ao3W+w0WVmSOjFZTZDWsmBVFpusXHVskurLMqCGarHJypLUiSOrCTfskpzrEkrqhyOrCdYsyW0vWtubJbkytwpxA0hJ/TBZTbAqSnJ2JUrqh2XACdGu3FdFSa5OXYmSRofJagJ06sA7ZHqKuV3z+zy/7JJcHboSJY0Wy4AToFO5LwJLcpJGgslqAnQq683tnOeiM05idma6r61CNm/bzrpNWzh+4+dYt2lLqY0ZkiabZcAJsNjOwP2W5JzcK2mYHFlNgDI68JzcK2mYHFlNgDI68JzcK2mYTFYTYtAdeIuVFiVp0CwDqi9O7pU0TCYr9WXDmtm9OgkPXTXF/iv34/xLb7QzUNLAmazUtw1rZrl24ym8+8yT+d78Y8ztmh/aGoOSJovJSstmZ6CkslXSYBERTwb+HPgB4DHg4sx8T0QcBlwKHAfcCfxMZj5QRYzqXi+dge4SLKkfVY2sHgX+T2b+MPBc4A0R8VRgI3B1Zp4AXF3cV811u+1HFVuSSBoPlSSrzLw3M28obj8E3AbMAqcDlxRPuwTYUEV86k23nYGWCyU1fey6u/ZculH5d1YRcRywBrgOODIz74VGQgOO6PCacyNia0Rs3bFjx9BiVXutnYEAKyL2JKHWUZMTiaXla/38e2juO1WHMzSVJquIeCLwaeC8zPxut6/LzIszc21mrl29enV5AaprG9bM7hlh7c4E9i3zuUuwtHytn38HzRxWdThDU1myiogpGonqo5l5WXH4vog4qnj8KOD+quJT75Yq8zmRWFK/KklWERHAB4DbMvNdLQ9dAZxT3D4HuHzYsal/S5X5Fk4k7nVLEkmTq6q1AdcBPwfcHBE3Fsd+DdgEfDIiXg/cBby6mvDUj27WC3SXYEn9qCRZZeY/ANHh4VOHGYsG54L1J+61xxVY5pM0GK66roFZ7lYkThiW1InJSgPlzsOSylD5PCsJnDAsaXGOrFSqbkt7ThiWtBhHVipNL2sBOmFY0mJMVipNL6W9YUwY3rxtO+s2beH4jZ9zg0hpxFgGVGl6Ke0tt5NwKTZwSKPNZKXSdDNJuFWZE4YXG+WZrKT6swyo0gyitDeo0p0NHNJoc2Sl0gxikvCgSne9jvIk1YvJSqVaTmlvkKU7l4KSRpvJSrU1yNJd2Q0cksplslJtDbp054rv0uiywUK15WaNkppMVqqt5maNM9NTe44dMOVfWWkS+S9ftffIo4/tuf3AzvmOSzZJGl8mK9Waq7FLAhssVHOD6gh0Y0dptDmyUq0NYjX2XlZ/l1RPJivV2iA6Ai0lSqPPMqBqrd/JvK1lv+zwHNcFlEaHyUq11+tk3oVrCnbiuoDS6LAMqLHTruy3kJOLpdHiyEpjZ7HyXoDdgNIIMllp7HRaU3B2ZpprN55SQUSSlssyoMaOawpK48eRlUbSYpN83Q5EGj8mK42cbnYQdjsQabxYBtTIcZKvNHkcWWnkDHIH4VHlWoeaNI6sNHIGsV7gKHOtQ00ik5VGzqR3+1kG1SSyDKiRMyndfp1KfZZBNYlMVhpJ497tt1jHY6dJz5NSBtVksgwo1dBipb5JL4NqMpmspBparNS3Yc0sF51xErMz0wRw6Kop9l+5H+dfeiPrNm2x0UJjyWQl1dBSHY8b1sxy7cZTePeZJ/O9+ceY2zVvZ6DGmslKqqFuS312BmpS2GAh1VC3HY92BmpSmKykmuqm49HOQE0Ky4DSCLMzUJPCkZU0wgY9Qdo1B1VXJitpxA1qgnQ3W69IVbEMKAmws1D15shKElBeZ6GlRQ2CIytJQDlbr7idiQbFZCUJKKez0NKiBsUyoCSgnK1XJnHSsmXPcpisJO0x6K1XJm3Ssh2V5aldGTAiTouI2yPi6xGxsep4JPVv0iYtW/YsT61GVhGxAvgT4CXA3cCXI+KKzPxqtZFJ6sek7OrcNOyy52EHPoHXPufYUs5dN7VKVsCzga9n5r8CRMQngNMBk5U0osZ9V+dWk1b2HKa6lQFngW+23L+7OCZJtTdpZc9hqtvIKtocy32eFHEucC7AscdOxhBYUv0No+w5qZ9/kblPLqhMRDwP+O3MXF/cvxAgMy/q9Jq1a9fm1q1bhxShJJWu3X/a2xrTz7+2f/66lQG/DJwQEcdHxBOAs4ArKo5JklSxWpUBM/PRiPhF4EpgBfDBzLy14rAkSRWrVbICyMzPA5+vOg5JUn3UrQwoSdI+TFaSpNozWUmSas9kJUmqPZOVJKn2TFaSpNqr1QoW/YiIHcC/Dfi0hwPfHvA5+1GXOMBYOjGW9oxlX93G8e3MPK2bE0bEF7p97qgb+WRVhojYmplrjeNxxtKesbRnLPWNY1RZBpQk1Z7JSpJUeyar9i6uOoBCXeIAY+nEWNozln3VJY6R5HdWkqTac2QlSao9k5UkqfbGNllFxAcj4v6IuKXl2GERcVVE3FFcH9ry2IUR8fWIuD0i1rccf1ZE3Fw89n8jIorj+0fEpcXx6yLiuA5xPDkivhgRt0XErRHxpgpjOSAivhQRNxWxvLWqWFrOsyIitkXEZ6uMJSLuLM5xY0RsrTiWmYj4VER8rfh787yK/r6cWPw8mpfvRsR5FcVyfvF39paI+Hg0/i5X9ft5UxHHrRFxXnGssn9DEyMzx/ICvAB4JnBLy7F3ABuL2xuBtxe3nwrcBOwPHA/8C7CieOxLwPNobLX8V8BLi+O/APy/4vZZwKUd4jgKeGZx+yDgn4v3qyKWAJ5Y3J4CrgOeW0UsLTG9GfgY8NmqfkfF43cChy84VlUslwD/o7j9BGCmyt9R8bwVwLeAHxx2LMAs8A1gurj/SeC/V/EzAZ4O3AKsorEf4N8AJ1T9+5mES+UBlPqHg+PYO1ndDhxV3D4KuL24fSFwYcvzriz+Eh0FfK3l+GuAP219TnF7JY2Z6dFFTJcDL6k6luIf2w3Ac6qKBTgGuBo4hceTVVWx3Mm+yWrosQAH0/hgjqpjWfD+PwFcW0UsNJLVN4HDiud8toinit/Pq4H3t9z/TeBXqv79TMJlbMuAHRyZmfcCFNdHFMeb/xia7i6OzRa3Fx7f6zWZ+SjwIPCkxd68GM6voTGiqSSWaJTdbgTuB67KzMpiAf6Qxj/0x1qOVRVLAn8dEddHxLkVxvIUYAfwZ9Eoj74/Ig6sKJZWZwEfL24PNZbM3A78PnAXcC/wYGb+9bDjKNwCvCAinhQRq4CfBJ5cUSwTZdKSVSfR5lgucnyx17R/g4gnAp8GzsvM71YVS2buzsyTaYxqnh0RT68iloh4GXB/Zl6/yPsPJZbCusx8JvBS4A0R8YKKYllJo3z9vsxcAzxMo6xURSyNJ0c8AXgF8BeLxFFaLMX3P6fTKKMdDRwYEWcPOw6AzLwNeDtwFfAFGiW+R6uIZdJMWrK6LyKOAiiu7y+O303jf0dNxwD3FMePaXN8r9dExErgEOA77d40IqZoJKqPZuZlVcbSlJlzwDXAaRXFsg54RUTcCXwCOCUiPlJRLGTmPcX1/cBngGdXFMvdwN3FiBfgUzSSV5V/X14K3JCZ9xX3hx3Li4FvZOaOzJwHLgOeX0EcAGTmBzLzmZn5guI5d1QVyySZtGR1BXBOcfscGt8fNY+fVXThHE/jC9MvFcP5hyLiuUWnzn9b8JrmuV4FbMmiyNyqeN0HgNsy810Vx7I6ImaK29M0PgS+VkUsmXlhZh6TmcfRKDFtycyzK/q5HBgRBzVv0/g+5JaKfi7fAr4ZEScWh04FvlpFLC1ew+MlwIWvH0YsdwHPjYhVxetPBW6r6mcSEUcU18cCZxQ/myp/P5Oh6i/NyrrQ+At0LzBP438qr6dR972axv+ErgYOa3n+r9Po1LmdoiunOL6WxgfXvwB/zOOrfhxAoyzydRpdPU/pEMeP0hjCfwW4sbj8ZEWxPAPYVsRyC/BbxfGhx7IgrhfyeINFFT+Xp9Ao59wE3Ar8epU/F+BkYGvxe9oMHFphLKuAfwcOaTlWxe/orTT+Y3UL8GEa3XVV/Uz+nsZ/IG4CTq3Dv6FJuLjckiSp9iatDChJGkEmK0lS7ZmsJEm1Z7KSJNWeyUqSVHsmK0lS7ZmsNPEi4nci4sXF7fOKNd/6Oc/7I+Kpg41OEuA8K9VfRKzMxoKew3ivO4G1mfntYbyfpO44stJQRMRx0dhM8JKI+Eo0NhdcFY0N6P62WO38ypb11a6JiN+LiL8F3tThnEdGxGeisZnkTRHx/OL45uJ8t8bjK6gTEf8REX8QETdExNURsbo4/qGIeFVEvJHGQqlfjIgvFo+9LyK2RstmlYv8Ga+JiLUt7/W2Iq5/iogjl4j5zdHY0O+WeHxDv+bP7P3F8Y9GxIsj4tpobPL37OJ5B0Zjs9EvR2Ol9tP7/01JNVX1EhpeJuNCY2+xpLG6OcAHgQuAfwRWF8fOBD5Y3L4GeO8S57yUxir20Ngc8JDi9mHF9TSN5WyeVNxP4GeL278F/HFx+0PAq4rbd9Kyr1XLuVYUMT1jkXiuoTEqa77Xy4vb7wB+o1PMwLOAm4EDgSfSWPJpTfEzexQ4icZ/LK8vfm5BYxXyzcV5fg84u7g9Q2ODzwOr/p178TLIiyMrDdM3M/Pa4vZHgPU0dl69Khp7bP0Ge69EfekS5zsFeB/s2frkweL4GyPiJuCfaKxefUJx/LGWc36ExrqNS/mZiLiBxpqKT6Ox82s3vk9jk0BoJJnjFon5R4HPZObDmfkfNFYV/7Hi+d/IzJsz8zEaSezqzEwaya15zp8ANhY/w2torC13bJdxSiNhZdUBaKIs/IL0IeDWzHxeh+c/3OsbRMQLaawm/7zM3BkR19D48O4mnoXnOh74ZeC/ZuYDEfGhRc610HyRVAB2s/i/tXb7FzU90nL7sZb7j7WcM4BXZubtXcYmjRxHVhqmYyOimZheQ2Pks7p5LCKmIuJpPZzvauB/F69dEREH0yirPVAkqh8Cntvy/P1obLkA8FrgH9qc8yHgoOL2wTQS5oPFd04v7SG2XmL+O2BD8R3egcBP01jZu1tXAr9UbDVBRKwZQJxSrZisNEy3AedExFeAw4A/opE83l6U7W6ksalet94EvCgibqZRansajd1bVxbv8bs0EmLTw8DTIuJ6GuW432lzzouBv4qIL2bmTTTKf7fS+K7o2jbP79U+MWfmDTS+N/sScB3w/szc1sM5fxeYAr4SEbcU96WxYuu6hiIijqOxZ9XTK4zhPzLziVW9v6T+ObKSJNWeIyvVXkT8OvDqBYf/IjPfVlE8nwGOX3D4VzPzyiriaSci1gNvX3D4G5n501XEIy2XyUqSVHuWASVJtWeykiTVnslKklR7JitJUu39f1LazCw5zGnkAAAAAElFTkSuQmCC", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "import matplotlib.pyplot as plt\n", "%matplotlib inline\n", "import seaborn as sns\n", "\n", "income_vs_hardship = %sql SELECT per_capita_income_, hardship_index FROM chicago_socioeconomic_data;\n", "plot = sns.jointplot(x='per_capita_income_',y='hardship_index', data=income_vs_hardship.DataFrame())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Click here for the solution\n", "\n", "```python\n", "# if the import command gives ModuleNotFoundError: No module named 'seaborn'\n", "# then uncomment the following line i.e. delete the # to install the seaborn package \n", "# !pip install seaborn==0.9.0\n", "\n", "import matplotlib.pyplot as plt\n", "%matplotlib inline\n", "import seaborn as sns\n", "\n", "income_vs_hardship = %sql SELECT per_capita_income_, hardship_index FROM chicago_socioeconomic_data;\n", "plot = sns.jointplot(x='per_capita_income_',y='hardship_index', data=income_vs_hardship.DataFrame())\n", "\n", "Correct answer:You can see that as Per Capita Income rises as the Hardship Index decreases. We see that the points on the scatter plot are somewhat closer to a straight line in the negative direction, so we have a negative correlation between the two variables. \n", " \n", "```\n", "\n", "
\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Conclusion\n", "\n", "##### Now that you know how to do basic exploratory data analysis using SQL and python visualization tools, you can further explore this dataset to see how the variable `per_capita_income_` is related to `percent_households_below_poverty` and `percent_aged_16_unemployed`. Try to create interesting visualizations!\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Summary\n", "\n", "##### In this lab you learned how to store a real world data set from the internet in a database (Db2 on IBM Cloud), gain insights into data using SQL queries. You also visualized a portion of the data in the database to see what story it tells.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Author\n", "\n", "Rav Ahuja\n", "\n", "## Change Log\n", "\n", "| Date (YYYY-MM-DD) | Version | Changed By | Change Description |\n", "| ----------------- | ------- | ------------- | ---------------------------------- |\n", "| 2021-11-17 | 2.3 | Lakshmi | Updated library |\n", "| 2021-07-09 | 2.2 | Malika | Updated connection string |\n", "| 2021-05-06 | 2.1 | Malika Singla | Added libraries |\n", "| 2020-08-28 | 2.0 | Lavanya | Moved lab to course repo in GitLab |\n", "\n", "
\n", "\n", "##

© IBM Corporation 2020. All rights reserved.

\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3.8.9 64-bit", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.9" }, "vscode": { "interpreter": { "hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6" } }, "widgets": { "state": {}, "version": "1.1.2" } }, "nbformat": 4, "nbformat_minor": 4 }