intro-to-data-science/00_python_in_a_nutshell/05_content_functions.ipynb

559 lines
17 KiB
Text
Raw Permalink Normal View History

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Note**: Click on \"*Kernel*\" > \"*Restart Kernel and Clear All Outputs*\" in [JupyterLab](https://jupyterlab.readthedocs.io/en/stable/) *before* reading this notebook to reset its output. If you cannot run this file on your machine, you may want to open it [in the cloud <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_mb.png\">](https://mybinder.org/v2/gh/webartifex/intro-to-data-science/main?urlpath=lab/tree/00_python_in_a_nutshell/05_content_functions.ipynb)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Chapter 0: Python in a Nutshell (Part 3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"One big idea in software engineering is to **modularize** code. The purpose of that is manyfold. Two very important motivations are:\n",
"- make a code block **re-usable** and\n",
"- give it a meaningful name.\n",
"\n",
"The latter gets more important as the codebase in a project grows so big that we can only look at a tiny fraction of it at one point in time."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Defining Functions"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"One syntactical construct to achieve modularization is that of a **function definition**. Just like in math, we can \"define\" a function as a set of parametrized instructions that provide some **output** given some **input**.\n",
"\n",
"A function is defined with the `def` statement: After the `def` part comes the name of the function followed by the **parameter list** within parentheses. The first couple of lines in the function's body should be a so-called **docstring** that describes what the function does in plain English. Then, comes the code that is to be made repeatable. In the example below, we simply copy & pasted the code to calculate the sum of all even numbers in a `list` into the example function `add_evens()`. Note that we exchanged the variable name `total` with `result` here to illustrate a point further below. In order for the function to provide back the output to \"the outside world,\" we use the `return` statement (Hint: to see its effect simply re-run the couple of code cells below with and without the `return result` line)."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"def add_evens(numbers):\n",
" \"\"\"Sum up all the even numbers in a list.\n",
"\n",
" Args:\n",
" numbers (list of int's): numbers to be summed up\n",
"\n",
" Returns:\n",
" total (int)\n",
" \"\"\"\n",
" result = 0\n",
"\n",
" for number in numbers:\n",
" if number % 2 == 0: # if the number is even\n",
" result = result + number\n",
"\n",
" return result"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"After defining a function, we can **call** (i.e., \"execute\") it with the `()` operator.\n",
"\n",
"Let's execute the function with `numbers` as the input. We see the same `6` below the cell as we do above where we run the code without a function. Without the `return` statement in the function's body, we would not see any output here.\n",
"\n",
"To see what happens in detail, take a look at [PythonTutor <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://pythontutor.com/visualize.html#code=numbers%20%3D%20%5B1,%202,%203,%204%5D%0A%0Adef%20add_evens%28numbers%29%3A%0A%20%20%20%20%22%22%22Sum%20up%20all%20the%20even%20numbers%20in%20a%20list.%22%22%22%0A%20%20%20%20result%20%3D%200%0A%0A%20%20%20%20for%20number%20in%20numbers%3A%0A%20%20%20%20%20%20%20%20if%20number%20%25%202%20%3D%3D%200%3A%0A%20%20%20%20%20%20%20%20%20%20%20%20result%20%3D%20result%20%2B%20number%0A%0A%20%20%20%20return%20result%0A%0Atotal%20%3D%20add_evens%28numbers%29&cumulative=false&curstr=0&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false) again. You should notice how there are two variables by the name `numbers` in memory. Python manages the memory with a concept called **namespaces** or **scopes**, which are just fancy terms for saying that Python can tell variables from different contexts apart."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"numbers = [1, 2, 3, 4]"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"6"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"add_evens(numbers)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To re-use the *same* instructions with *different* input, we call the function a second time and give it a brand-new `list` of numbers as its input."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"30"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"add_evens([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note how the variable `result` only exists \"inside\" the `add_evens()` function. Hence, we see the `NameError` here."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"ename": "NameError",
"evalue": "name 'result' is not defined",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)",
"Cell \u001b[0;32mIn[5], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m \u001b[43mresult\u001b[49m\n",
"\u001b[0;31mNameError\u001b[0m: name 'result' is not defined"
]
}
],
"source": [
"result"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Built-in Functions"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The concept of re-usable functions is so important in programming that Python comes with many [built-in functions <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/functions.html).\n",
"\n",
"Two popular examples are the [sum() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/functions.html#sum) and [len() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/functions.html#len) functions that calculate the sum or the number of elements in a `list`."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"10"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sum(numbers)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"4"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(numbers)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"When working with numbers, the [round() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/functions.html#round) function rounds `float`ing-point numbers (i.e., real numbers in the mathematical sense) into `int`egers. `float`s are numbers containing a `.` somewhere in its digits."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"7"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"round(7.1)"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"8"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"round(7.9)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The [round() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/functions.html#round) function takes a second input called `ndigits` that allows us to customize the rounding even further. Then, [round() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/functions.html#round) returns a `float`ing-point number!"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"7.1"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"round(7.123, 1)"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"7.12"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"round(7.123, 2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As we saw before, the [print() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/functions.html#print) function simply \"prints\" out its input to the screen. Below is the popular \"Hello World\" example that is shown in almost any introduction text on any programming language. The double quotes `\"` are a delimiter that specifies anything in between them as **textual data**. The docstring above in the tripple-double quotes notation is just a special case allowing the text to span several lines.\n",
"\n",
"The quotes themselves are *not* part of the value. So, they are *not* printed out."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Hello World\n"
]
}
],
"source": [
"print(\"Hello World\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Single quotes `'` are synonyms for double quotes `\"`."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Hello World\n"
]
}
],
"source": [
"print('Hello World')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The [print() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/functions.html#print) function is often helpful to **debug** a code snippet (i.e., trying to figure out what it does, step by step)."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The square of 1 is 1\n",
"The square of 2 is 4\n",
"The square of 3 is 9\n",
"The square of 4 is 16\n"
]
}
],
"source": [
"for number in numbers:\n",
" square = number**2\n",
" print(\"The square of\", number, \"is\", square)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## The Standard Library"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In the Python community, we even say that \"Python comes with batteries included,\" meaning that a plain Python installation (like the one you are probably using to execute this notebook) offers all kinds of functionalities for a multitude of application domains. Thus, the name **general purpose** language.\n",
"\n",
"To \"enable\" most of these, however, we need to first **import** them from the so-called [standard library <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/index.html). Let's do a quick example here and look at the [random <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/random.html) module that provides functionalities to simulate and work with random numbers."
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [],
"source": [
"import random"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2024-07-15 11:26:43 +02:00
"To access a function inside the [random <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/random.html) module, for example, the [seed() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/random.html#random.seed) function, we use the `.` operator, formally called the attribute access operator. \n",
"\n",
"We use [random.seed() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/random.html#random.seed) to make the random numbers *replicable* on separate runs of this notebook."
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
2024-07-15 11:26:43 +02:00
"outputs": [],
"source": [
"random.seed(42)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The [random() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/random.html#random.random) function simply returns a random decimal number between `0` and `1`."
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
2024-07-15 11:26:43 +02:00
"0.6394267984578837"
]
},
2024-07-15 11:26:43 +02:00
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"random.random()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It could be used, for example, to model a fair coin toss by comparing the number it returns to `0.5` with the `<` operator: In 50% of the cases we see `True` and in the other 50% `False`."
]
},
{
"cell_type": "code",
2024-07-15 11:26:43 +02:00
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
2024-07-15 11:26:43 +02:00
"True"
]
},
2024-07-15 11:26:43 +02:00
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"random.random() < 0.5"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A second example would be the [choice() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/random.html#random.choice) function, which draws a random element from a `list` with replacement. We could use it to model a fair die."
]
},
{
"cell_type": "code",
2024-07-15 11:26:43 +02:00
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"3"
]
},
2024-07-15 11:26:43 +02:00
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"random.choice([1, 2, 3, 4, 5, 6])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In the \"*Extending Core Python with Third-party Packages*\" section in the next chapter, we see how Python can be extended even further by installing and importing **third-party packages**."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "intro-to-data-science",
"language": "python",
"name": "intro-to-data-science"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.4"
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": false,
"sideBar": true,
"skip_h1_title": false,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {},
"toc_section_display": true,
"toc_window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 4
}