Split Chapter 0 notebooks into smaller chunks

- make many small notebooks out of one big one
- introduce folder structure for all chapters
- add a bit more content to Chapter 0
- streamline text in Chapter 0
- adjust table of contents
This commit is contained in:
Alexander Hess 2021-10-03 16:28:26 +02:00
parent 33f2e74c64
commit de43b2a679
Signed by: alexander
GPG key ID: 344EA5AB10D868E0
21 changed files with 2702 additions and 1974 deletions

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,571 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Note**: Click on \"*Kernel*\" > \"*Restart Kernel and Clear All Outputs*\" in [JupyterLab](https://jupyterlab.readthedocs.io/en/stable/) *before* reading this notebook to reset its output. If you cannot run this file on your machine, you may want to open it [in the cloud <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_mb.png\">](https://mybinder.org/v2/gh/webartifex/intro-to-data-science/main?urlpath=lab/tree/00_python_in_a_nutshell/00_content_arithmetic.ipynb)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Chapter 0: Python in a Nutshell (Part 1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Python itself is a so-called **general purpose** programming language. That means it does *not* know about any **scientific algorithms** \"out of the box.\"\n",
"\n",
"The purpose of this notebook is to summarize anything that is worthwhile knowing about Python and programming on a \"high level\" and lay the foundation for working with so-called **third-party libraries**, some of which we see in subsequent chapters."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Basic Arithmetic"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Any computer can always be viewed as some sort of a \"fancy calculator\" and Python is no exception from that. The following code snippet, for example, does exactly what we expect it would, namely *addition*."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"3"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"1 + 2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In terms of **syntax** (i.e., \"grammatical rules\"), digits are interpreted as plain numbers (i.e., a so-called **numerical literal**) and the `+` symbol consitutes a so-called **operator** that is built into Python.\n",
"\n",
"Other common operators are `-` for *subtraction*, `*` for *multiplication*, and `**` for *exponentiation*. In terms of arithmetic, Python allows the **chaining** of operations and adheres to conventions from math, namely the [PEMDAS rule <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_wiki.png\">](https://en.wikipedia.org/wiki/Order_of_operations#Mnemonics)."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"45"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"87 - 42"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"15"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"3 * 5"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"8"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"2 ** 3"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"16"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"2 * 2 ** 3"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To change the **order of precedence**, parentheses may be used for grouping. Syntactically, they are so-called **delimiters** that mark the beginning and the end of a **(sub-)expression** (i.e., a group of symbols that are **evaluated** together)."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"64"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"(2 * 2) ** 3"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We must beware that some operators do *not* do what we expect. So, the following code snippet is *not* an example of exponentiation."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"2 ^ 3"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"*Division* is also not as straighforward as we may think!\n",
"\n",
"While the `/` operator does *ordinary division*, we must note the subtlety of the `.0` in the result."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"4.0"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"8 / 2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Whereas both `4` and `4.0` have the *same* **semantic meaning** to us humans, they are two *different* \"things\" for a computer!\n",
"\n",
"Instead of using a single `/`, we may divide with a double `//` just as well."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"4"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"8 // 2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"However, then we must be certain that the result is not a number with decimals other than `.0`. As we can guess from the result below, the `//` operator does *integer division* (i.e., \"whole number\" division)."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"3"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"7 // 2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"On the contrary, the `%` operator implements the so-called *modulo division* (i.e., \"rest\" division). Here, a result of `0` indicates that a number is divisible by another one whereas any result other than `0` shows the opposite."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"7 % 2"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"8 % 2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"What makes Python such an intuitive and thus beginner-friendly language, is the fact that it is a so-called **[interpreted language <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_wiki.png\">](https://en.wikipedia.org/wiki/Interpreter_%28computing%29)**. In layman's terms, this means that we can go back up and *re-execute* any of the code cells in *any order*: That allows us to built up code *incrementally*. So-called **[compiled languages <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_wiki.png\">](https://en.wikipedia.org/wiki/Compiler)**, on the other hand, would require us to run a program in its entirety even if only one small part has been changed.\n",
"\n",
"Instead of running individual code cells \"by hand\" and taking the result as it is, Python offers us the usage of **variables** to store \"values.\" A variable is created with the single `=` symbol, the so-called **assignment statement**."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"a = 1"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [],
"source": [
"b = 2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"After assignment, we can simply ask Python about the values of `a` and `b`."
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"2"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"b"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Similarly, we can use a variable in place of, for example, a numerical literal within an expression."
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"3"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a + b"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Also, we may combine several lines of code into a single code cell, adding as many empty lines as we wish to group the code. Then, all of the lines are executed from top to bottom in linear order whenever we execute the cell as a whole."
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"3"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a = 1\n",
"b = 2\n",
"\n",
"a + b"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Something that fools many beginners is the fact that the `=` statement is *not* to be confused with the concept of an *equation* from math! An `=` statement is *always* to be interpreted from right to left.\n",
"\n",
"The following code snippet, for example, takes the \"old\" value of `a`, adds the value of `b` to it, and then stores the resulting `3` as the \"new\" value of `a`. After all, a variable is called a variable as its value is indeed variable!"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [],
"source": [
"a = a + b"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"3"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In general, the result of some expression involving variables is often stored in yet another variable for further processing. This is how more realistic programs are built up."
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"3"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a = 1\n",
"b = 2\n",
"\n",
"c = a + b\n",
"\n",
"c"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.12"
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": false,
"sideBar": true,
"skip_h1_title": false,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {},
"toc_section_display": true,
"toc_window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View file

@ -0,0 +1,858 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Note**: Click on \"*Kernel*\" > \"*Restart Kernel and Clear All Outputs*\" in [JupyterLab](https://jupyterlab.readthedocs.io/en/stable/) *before* reading this notebook to reset its output. If you cannot run this file on your machine, you may want to open it [in the cloud <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_mb.png\">](https://mybinder.org/v2/gh/webartifex/intro-to-data-science/main?urlpath=lab/tree/00_python_in_a_nutshell/02_content_logic.ipynb)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Chapter 0: Python in a Nutshell (Part 2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In the previous section, we only looked at **scalars** (i.e., a variable referencing one number at a time). However, that is not the only kind of data a computer can hold in its memory. In the section below, we look at how computers process many numbers in a generic fashion."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Non-Scalar Data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As most real-life projects involve *non-scalar* data, we take a pre-liminary look at how Python models `list`-like data next. Intuitively, a `list` can be thought of as a **container** holding many \"things.\"\n",
"\n",
"The syntax to create a `list` are brackets, `[` and `]`, another example of delimiters, listing the individual **elements** of the `list` in between them, separated by commas.\n",
"\n",
"For example, the next code snippet creates a `list` named `numbers` with the numbers `1`, `2`, `3`, and `4` in it."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[1, 2, 3, 4]"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"numbers = [1, 2, 3, 4]\n",
"\n",
"numbers"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Whenever we use any kind of delimiter, we may break the lines in between them as we wish and add other so-called **whitespace** characters like spaces to format the way the code looks like. So, the following two code cells do *exactly* the same as the previous one, even the `,` after the `4` in the second cell is ignored."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[1, 2, 3, 4]"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"numbers = [\n",
" 1, 2, 3, 4\n",
"]\n",
"\n",
"numbers"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[1, 2, 3, 4]"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"numbers = [\n",
" 1,\n",
" 2,\n",
" 3,\n",
" 4,\n",
"]\n",
"\n",
"numbers"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A nice thing to know is that JupyterLab comes with **tab completion** built in. That means we do not have to type out the name `numbers` as a whole. Try it out by simply typing `num` and then hit the tab key on your keyboard. JupyterLab should complete the variable into `numbers`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"num"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A natural operation to do with `list`s is to **access** its elements. That is achieved with another operator that also uses a bracket notation. Each element is associated with an **index**, which is why we say that we \"index into a `list`.\" As with many other programming languages, Python is 0-based, which simply means that whenever we count something, we start to count at `0`.\n",
"\n",
"For example, to obtain the first element in `numbers`, we write the following."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"numbers[0]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note that the indexing operation implicitly assumes an **order** among the elements, which is quite intuitive as we specified the numbers in order above.\n",
"\n",
"Another implicit assumption behind `list`s is that the number of elements is *finite*. Because of that, we may use negative indices starting at `-1` to obtain an element in right-to-left order.\n",
"\n",
"So, to obtain the last element in `numbers`, we write the following."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"4"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"numbers[-1]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`list` objects are **mutable**: We may change *parts* of them *after* they are created. That behavior is *not* a given for many other **types** of objects.\n",
"\n",
"For example, to exchange the first and the last element in `numbers`, we assign new objects to an index."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"numbers[0] = 4"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"numbers[3] = 1"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[4, 2, 3, 1]"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"numbers"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To \"flip\" the value of two variables or indexes, we may also use the following notation."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"numbers[0], numbers[3] = numbers[3], numbers[0]"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[1, 2, 3, 4]"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"numbers"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Expressing Business Logic"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The main point of using `list`s in Python is to write code that does \"something\" for each element in the `list`, which may hold big amounts of data. Expressing the logic of a problem from the real world in code, the \"something\" part, is subsumed by the term [business logic <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_wiki.png\">](https://en.wikipedia.org/wiki/Business_logic), which has *nothing* to do with businesses that make money.\n",
"\n",
"There are two aspects to business logic:\n",
"1. Execute some lines of code many times, and\n",
"2. execute some lines of code only if a certain **condition** applies.\n",
"\n",
"Both of these aspects come in many variants and may be combined in basically any arbitrary fashion."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Iterative Execution"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Iteration** is the generic idea of executing code repeatedly. Most programming languages provide dedicated constructs to achieve that. In Python, the easiest such construct is the so-called `for`-loop."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### The `for` Loop"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A `for`-loop consists of two parts:\n",
"- a **header** line specifying what is looped over, and\n",
"- a **body** consisting of the **block of code** that is repeated for each element.\n",
"\n",
"In the example below, `for number in numbers:` constitutes the header. The expression after the `in` references the \"thing\" that is looped over (here: a `list` of `numbers`) and the name between `for` and `in` becomes a variable that is assigned a new value in each **iteration** over of the loop. A best practice is to use a meaingful name, which is why we choose the singular `number`. The `:` at the end is the charactistic symbol of a header line in general and requires the next line (and possibly many more lines) to be **indented**.\n",
"\n",
"The indented line constitues the `for`-loop's body. In the example, we simply take each of the numbers in `numbers`, one at a time, and add it to a `total` that is initialized at `0`. In other words, we calculate the sum of all the elements in `numbers`.\n",
"\n",
"Many beginners struggle with the term \"loop.\" To visualize the looping behavior of this code, we use the online tool [PythonTutor <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](http://pythontutor.com/visualize.html#code=numbers%20%3D%20%5B1,%202,%203,%204%5D%0A%0Atotal%20%3D%200%0A%0Afor%20number%20in%20numbers%3A%0A%20%20%20%20total%20%3D%20total%20%2B%20number%0A%0Atotal&cumulative=false&curInstr=0&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false). That tool is helpful for two reasons:\n",
"1. It allows us to execute code in \"slow motion\" (i.e., by clicking the \"next\" button on the left side, only the next atomic step of the code snippet is executed).\n",
"2. It shows what happens inside the computer's memory on the right-hand side (cf., the \"*Thinking like a Computer*\" section further below)."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"10"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"total = 0\n",
"\n",
"for number in numbers:\n",
" total = total + number\n",
"\n",
"total"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Python is pretty agnostic about how far the `for`-loop's body is indented. So, both of the next code cells are equivalent to the one above. Yet, a popular convention in the Python world is to always indent code with 4 spaces per indentation level."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"10"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"total = 0\n",
"\n",
"for number in numbers:\n",
" total = total + number\n",
"\n",
"total"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"10"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"total = 0\n",
"\n",
"for number in numbers:\n",
" total = total + number\n",
"\n",
"total"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Conditional Execution"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As a variation, let's add up only the even numbers. To achieve that, we exploit the fact that even numbers are all numbers that are divisible by `2` and use the `%` operator from before and a new one, namely the `==` operator for *equality comparison*, to express that idea."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"7 % 2"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"8 % 2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Whenever *arithmetic* operators like `%` are combined with *relational* operators like `==`, the arithmetic ones are evaluated first. So, in the two cells below, we first obtain the rest after dividing `7` and `8` by `2` and then compare that to `0`. The result is a so-called **boolean**, either `True` or `False`, which is a computer's way of saying \"yes\" or \"no.\""
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"False"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"7 % 2 == 0"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"8 % 2 == 0"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Other relational operators are `!=` to test inequality and `<`, `<=`, `>`, and `>=` to check wether the left or right side is smaller or larger."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### The `if` Statement"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We use such kind of expressions as the **condition** in an `if` statement that constitutes a second layer within our `for`-loop implementation. An `if` statement itself consists of yet another header line with a body. That body's code is only executed if the condition is `True`.\n",
"\n",
"As an example, the next code snippet loops over all the elements in `numbers` and, for each individual `number`, checks if it is even. Only if that is the case, the `number` is added to the `total`. Otherwise, nothing is done with the `number`. The example also shows how we can add so-called **comments** at the end of a line: Anything that comes after the `#` symbol is disregarded by Python. We use such comments to put little notes to ourselves within the code."
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"6"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"total = 0\n",
"\n",
"for number in numbers:\n",
" if number % 2 == 0: # if the number is even\n",
" total = total + number\n",
"\n",
"total"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### The `else` Clause"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`if` statements may have more than one header line: For example, the code in the `else`-clause's body is only executed if the condition in the `if`-clause is `False`. In the code cell below, we calculate the sum of all even numbers and subtract the sum of all odd numbers. The result is `(2 + 4) - (1 + 3)`, or `-1 + 2 - 3 + 4` resembling the order of the numbers in the `for`-loop."
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"2"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"total = 0\n",
"\n",
"for number in numbers:\n",
" if number % 2 == 0: # if the number is even\n",
" total = total + number\n",
" else: # if the number is odd\n",
" total = total - number\n",
"\n",
"total"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A **function** (cf., the \"*Built-in Functions*\" section further below) that comes in handy with `for`-loops is [print() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/functions.html#print), which simply \"prints\" out (i.e., \"shows on the screen\") whatever **input** we give it.\n",
"\n",
"In the example next, we loop over the numbers from `1` to `10` and print out either half a `number` or three times a `number` plus 1 depending on the `number` being even or odd."
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"4\n",
"1\n",
"10\n",
"2\n",
"16\n",
"3\n",
"22\n",
"4\n",
"28\n",
"5\n"
]
}
],
"source": [
"for number in [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]:\n",
" if number % 2 == 0:\n",
" print(number // 2)\n",
" else:\n",
" print(3 * number + 1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To save ourselves writing out all the numbers, we may also use the [range() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/functions.html#func-range) built-in, which, in the example, takes two inputs separated by comma: A `start` number that is included and a `stop` number that is *not* included."
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"4\n",
"1\n",
"10\n",
"2\n",
"16\n",
"3\n",
"22\n",
"4\n",
"28\n",
"5\n"
]
}
],
"source": [
"for number in range(1, 11):\n",
" if number % 2 == 0:\n",
" print(number // 2)\n",
" else:\n",
" print(3 * number + 1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### The `elif` Clause"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If we need to check for *several* **alternatives** (i.e., different conditions), we may add an arbitrary number of `elif`-clauses to an `if` statement.\n",
"\n",
"In the next example, we print out messages indicating the *largest* whole number by which a `number` may be divided.\n",
"\n",
"Note that [print() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/functions.html#print) may take several inputs as well. The `\"...\"` notation is Python's way of modeling **textual data**."
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1 is divisible by neither 2 nor 3\n",
"2 is divisible by 2\n",
"3 is divisible by 3\n",
"4 is divisible by 2\n",
"5 is divisible by neither 2 nor 3\n",
"6 is divisible by 2\n",
"7 is divisible by neither 2 nor 3\n",
"8 is divisible by 2\n",
"9 is divisible by 3\n",
"10 is divisible by 2\n"
]
}
],
"source": [
"for number in range(1, 11):\n",
" if number % 2 == 0:\n",
" print(number, \"is divisible by 2\")\n",
" elif number % 3 == 0:\n",
" print(number, \"is divisible by 3\")\n",
" else:\n",
" print(number, \"is divisible by neither 2 nor 3\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It is noteworthy that only the *first* block of code whose condition is `True` is executed!\n",
"\n",
"So, we must be careful not to make any logical errors: In the example below, we *never* reach the alternative where the `number` is divisible by `4` because whenever a `number` is divisible by `4` it is also always divisible by `2` as well."
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1 is divisible by neither 2, 3, nor 4\n",
"2 is divisible by 2\n",
"3 is divisible by 3\n",
"4 is divisible by 2\n",
"5 is divisible by neither 2, 3, nor 4\n",
"6 is divisible by 2\n",
"7 is divisible by neither 2, 3, nor 4\n",
"8 is divisible by 2\n",
"9 is divisible by 3\n",
"10 is divisible by 2\n"
]
}
],
"source": [
"for number in range(1, 11):\n",
" if number % 2 == 0:\n",
" print(number, \"is divisible by 2\")\n",
" elif number % 3 == 0:\n",
" print(number, \"is divisible by 3\")\n",
" elif number % 4 == 0:\n",
" print(number, \"is divisible by 4\")\n",
" else:\n",
" print(number, \"is divisible by neither 2, 3, nor 4\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"By re-arranging the order of the `if`- and `elif`- clauses, we obtain the correct output."
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1 is divisible by neither 2, 3, nor 4\n",
"2 is divisible by 2\n",
"3 is divisible by 3\n",
"4 is divisible by 4\n",
"5 is divisible by neither 2, 3, nor 4\n",
"6 is divisible by 3\n",
"7 is divisible by neither 2, 3, nor 4\n",
"8 is divisible by 4\n",
"9 is divisible by 3\n",
"10 is divisible by 2\n"
]
}
],
"source": [
"for number in range(1, 11):\n",
" if number % 4 == 0:\n",
" print(number, \"is divisible by 4\")\n",
" elif number % 3 == 0:\n",
" print(number, \"is divisible by 3\")\n",
" elif number % 2 == 0:\n",
" print(number, \"is divisible by 2\")\n",
" else:\n",
" print(number, \"is divisible by neither 2, 3, nor 4\")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.12"
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": false,
"sideBar": true,
"skip_h1_title": false,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {},
"toc_section_display": true,
"toc_window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View file

@ -0,0 +1,540 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Note**: Click on \"*Kernel*\" > \"*Restart Kernel and Clear All Outputs*\" in [JupyterLab](https://jupyterlab.readthedocs.io/en/stable/) *before* reading this notebook to reset its output. If you cannot run this file on your machine, you may want to open it [in the cloud <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_mb.png\">](https://mybinder.org/v2/gh/webartifex/intro-to-data-science/main?urlpath=lab/tree/00_python_in_a_nutshell/05_content_functions.ipynb)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Chapter 0: Python in a Nutshell (Part 3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"One big idea in software engineering is to **modularize** code. The purpose of that is manyfold. Two very important motivations are:\n",
"- make a code block **re-usable** and\n",
"- give it a meaningful name.\n",
"\n",
"The latter gets more important as the codebase in a project grows so big that we can only look at a tiny fraction of it at one point in time."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## User-defined Functions"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"One syntactical construct to achieve modularization is that of a **function definition**. Just like in math, we can \"define\" a function as a set of parametrized instructions that provide some **output** given some **input**.\n",
"\n",
"A function is defined with the `def` statement: After the `def` part comes the name of the function followed by the **parameter list** within parentheses. The first couple of lines in the function's body should be a so-called **docstring** that describes what the function does in plain English. Then, comes the code that is to be made repeatable. In the example below, we simply copy & pasted the code to calculate the sum of all even numbers in a `list` into the example function `add_evens()`. Note that we exchanged the variable name `total` with `result` here to illustrate a point further below. In order for the function to provide back the output to \"the outside world,\" we use the `return` statement (Hint: to see its effect simply re-run the couple of code cells below with and without the `return result` line)."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"def add_evens(numbers):\n",
" \"\"\"Sum up all the even numbers in a list.\n",
"\n",
" Args:\n",
" numbers (list of int's): numbers to be summed up\n",
"\n",
" Returns:\n",
" total (int)\n",
" \"\"\"\n",
" result = 0\n",
"\n",
" for number in numbers:\n",
" if number % 2 == 0: # if the number is even\n",
" result = result + number\n",
"\n",
" return result"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"After defining a function, we can **call** (i.e., \"execute\") it with the `()` operator.\n",
"\n",
"Let's execute the function with `numbers` as the input. We see the same `6` below the cell as we do above where we run the code without a function. Without the `return` statement in the function's body, we would not see any output here.\n",
"\n",
"To see what happens in detail, take a look at [PythonTutor <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://pythontutor.com/visualize.html#code=numbers%20%3D%20%5B1,%202,%203,%204%5D%0A%0Adef%20add_evens%28numbers%29%3A%0A%20%20%20%20%22%22%22Sum%20up%20all%20the%20even%20numbers%20in%20a%20list.%22%22%22%0A%20%20%20%20result%20%3D%200%0A%0A%20%20%20%20for%20number%20in%20numbers%3A%0A%20%20%20%20%20%20%20%20if%20number%20%25%202%20%3D%3D%200%3A%0A%20%20%20%20%20%20%20%20%20%20%20%20result%20%3D%20result%20%2B%20number%0A%0A%20%20%20%20return%20result%0A%0Atotal%20%3D%20add_evens%28numbers%29&cumulative=false&curInstr=0&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false) again. You should notice how there are two variables by the name `numbers` in memory. Python manages the memory with a concept called **namespaces** or **scopes**, which are just fancy terms for saying that Python can tell variables from different contexts apart."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"numbers = [1, 2, 3, 4]"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"6"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"add_evens(numbers)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To re-use the *same* instructions with *different* input, we call the function a second time and give it a brand-new `list` of numbers as its input."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"30"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"add_evens([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note how the variable `result` only exists \"inside\" the `add_evens()` function. Hence, we see the `NameError` here."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"ename": "NameError",
"evalue": "name 'result' is not defined",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m/tmp/user/1000/ipykernel_305654/1049141082.py\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mresult\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;31mNameError\u001b[0m: name 'result' is not defined"
]
}
],
"source": [
"result"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Built-in Functions"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The concept of re-usable functions is so important in programming that Python comes with many [built-in functions <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/functions.html).\n",
"\n",
"Two popular examples are the [sum() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/functions.html#sum) and [len() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/functions.html#len) functions that calculate the sum or the number of elements in a `list`."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"10"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sum(numbers)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"4"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(numbers)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"When working with numbers, the [round() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/functions.html#round) function rounds `float`ing-point numbers (i.e., real numbers in the mathematical sense) into `int`egers. `float`s are numbers containing a `.` somewhere in its digits."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"7"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"round(7.1)"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"8"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"round(7.9)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The [round() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/functions.html#round) function takes a second input called `ndigits` that allows us to customize the rounding even further. Then, [round() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/functions.html#round) returns a `float`ing-point number!"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"7.1"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"round(7.123, 1)"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"7.12"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"round(7.123, 2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As we saw before, the [print() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/functions.html#print) function simply \"prints\" out its input to the screen. Below is the popular \"Hello World\" example that is shown in almost any introduction text on any programming language. The double quotes `\"` are a delimiter that specifies anything in between them as **textual data**. The docstring above in the tripple-double quotes notation is just a special case allowing the text to span several lines.\n",
"\n",
"The quotes themselves are *not* part of the value. So, they are *not* printed out."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Hello World\n"
]
}
],
"source": [
"print(\"Hello World\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Single quotes `'` are synonyms for double quotes `\"`."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Hello World\n"
]
}
],
"source": [
"print('Hello World')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The [print() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/functions.html#print) function is often helpful to **debug** a code snippet (i.e., trying to figure out what it does, step by step)."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The square of 1 is 1\n",
"The square of 2 is 4\n",
"The square of 3 is 9\n",
"The square of 4 is 16\n"
]
}
],
"source": [
"for number in numbers:\n",
" square = number ** 2\n",
" print(\"The square of\", number, \"is\", square)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Extending Core Python with the Standard Library"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In the Python community, we even say that \"Python comes with batteries included,\" meaning that a plain Python installation (like the one you are probably using to execute this notebook) offers all kinds of functionalities for a multitude of application domains. Thus, the name **general purpose** language.\n",
"\n",
"To \"enable\" most of these, however, we need to first **import** them from the so-called [standard library <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/index.html). Let's do a quick example here and look at the [random <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/random.html) module that provides functionalities to simulate and work with random numbers."
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [],
"source": [
"import random"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To access a function inside the [random <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/random.html) module, for example, the [random() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/random.html#random.random) function, we use the `.` operator, formally called the attribute access operator. The [random() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/random.html#random.random) function simply returns a random decimal number between `0` and `1`."
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.44374384200665107"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"random.random()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It could be used, for example, to model a fair coin toss by comparing the number it returns to `0.5` with the `<` operator: In 50% of the cases we see `True` and in the other 50% `False`."
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"False"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"random.random() < 0.5"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A second example would be the [choice() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/random.html#random.choice) function, which draws a random element from a `list` with replacement. We could use it to model a fair die."
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"3"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"random.choice([1, 2, 3, 4, 5, 6])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In the \"*Extending Core Python with Third-party Packages*\" section in the next chapter, we see how Python can be extended even further by installing and importing **third-party packages**."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.12"
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": false,
"sideBar": true,
"skip_h1_title": false,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {},
"toc_section_display": true,
"toc_window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View file

@ -0,0 +1,708 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Note**: Click on \"*Kernel*\" > \"*Restart Kernel and Clear All Outputs*\" in [JupyterLab](https://jupyterlab.readthedocs.io/en/stable/) *before* reading this notebook to reset its output. If you cannot run this file on your machine, you may want to open it [in the cloud <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_mb.png\">](https://mybinder.org/v2/gh/webartifex/intro-to-data-science/main?urlpath=lab/tree/00_python_in_a_nutshell/07_content_data_types.ipynb)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Chapter 0: Python in a Nutshell (Part 4)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"An important skill for any data scientist is to learn to \"think\" like a computer does. So far, we have seen that Python is a pretty \"intuitive\" language: Many concepts can already be understood after seeing them once or just a couple of times. Many of the aspects that make other languages harder to learn, are somehow \"magically\" automated by Python in the background, most notably the management of the memory.\n",
"\n",
"This section introduces a couple of more \"advanced\" concepts that presumably are *not* so intuitive to beginners."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## \"Simple\" Data Types"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"At first, let's review the concept of **object-orientation**, which is the paradigm by which Python manages the memory.\n",
"\n",
"Take the following three examples. Whereas `a` and `b` have the same **value** (i.e., **semantic meaning**) to us humans, we see in this section that there are a couple of caveats to look out for."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"a = 42\n",
"b = 42.0\n",
"c = 42.87"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"An important idea to understand is that each of the right-hand sides lead to a *new* **object** being created in the computer's memory *first*. An object can be thought of as a \"box\" in memory holding $1$s and $0$s (i.e., physical energy flows inside the computer).\n",
"\n",
"Objects can and do exist without being **referenced** by a variable. Also, an object may even have several variables referencing them, just as a human may have different names in different contexts (e.g., a formal name in the password, a name by which one is known to friends, and maybe a different name by which one is called by one's spouse).\n",
"\n",
"In the example, while both `a` and `b` have the *same* value, they are two *distinct* objects. The `is` operator checks if the objects referenced by two variables are indeed the *same* one, or, in other words, have the same **identity**."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a == b"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"False"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a is b"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Every object always has some **data type**, which determines how the object behaves and what we can do with it. The types of `a` and `b` are `int` and `float`, respectively."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"int"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"type(a)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"float"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"type(b)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"While it seems cumbersome to analyze numbers at this level of detail, the following code cell shows how `float`ing-point numbers, one gold standard of numbers in all of computer science and engineering, behave couter-intutive. Yet, *nothing* is wrong here."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"False"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"0.1 + 0.2 == 0.3"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The data type of an object also determines which **methods** we can invoke on it. A method is just a function that is \"attached\" to an object and can be accessed with the `.` operator seen above. A method necessarily needs the objects it is attached to as in input, which is why it is attached to an object to begin with.\n",
"\n",
"For example, `float` objects come with an `.is_integer()` method that tells us if the number has non-`0` decimals."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"b.is_integer()"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"False"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"c.is_integer()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`int` objects on the contrary have no notion of the concept of decimals, which is why they do *not* have an `.is_integer()` method. That is what the `AttributeError` tells us."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"ename": "AttributeError",
"evalue": "'int' object has no attribute 'is_integer'",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mAttributeError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m/tmp/user/1000/ipykernel_306555/2418692311.py\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0ma\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mis_integer\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;31mAttributeError\u001b[0m: 'int' object has no attribute 'is_integer'"
]
}
],
"source": [
"a.is_integer()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"What we could do here, is to take `a` and pass it to the [float() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/functions.html#float) built-in, a so-called **constructor**, which takes the value of its input and creates a *new* object of the desired `float` type. Yet, we know the answer to `aa.is_integer()` already, even without executing the code cell as `a` has no non-`0` decimals to begin with."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"aa = float(a)"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"aa.is_integer()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's create another example `d` to see further examples of methods."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"d = \"Python rocks\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The type of `d` is `str`, which is short for \"**string**\" and is defined in computer science as a sequence of characters."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"str"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"type(d)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`str` objects support various methods that \"make sense\" in the context of textual data, for example, the `.lower()` and `.upper()` methods."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'python rocks'"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"d.lower()"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'PYTHON ROCKS'"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"d.upper()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### \"Complex\" Data Types"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The examples in the previous section are considered \"simple\" as they only model *scalar* values (i.e., an individual object per example). However, we have already seen an example of a more \"complex\" object, namely the `list` called `numbers` from before."
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [],
"source": [
"numbers = [1, 2, 3, 4]"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"list"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"type(numbers)"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[1, 2, 3, 4]"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"numbers"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`list` objects also come with specific methods on them, for example, the `.append()` method that adds another element at the end of a `list`."
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [],
"source": [
"numbers.append(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note how the `.append()` method does not lead to any output below the code cell. That is an indication that `numbers` is \"changed in place.\" The formal term for this property is **mutability**. A good working definition is: Any object whose value can be changed *after* its creation, is a **mutable** objects. Objects *without* this property are called **immutable**.\n",
"\n",
"An example for the latter, is the `tuple` data type. `tuple`s are simply `list`s with the additional property that they cannot be changed. Everything is else is the same as for `list`s. `tuple`s are created with parentheses replacing the brackets."
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [],
"source": [
"more_numbers = (7, 8, 9)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`more_numbers` does not know about the `.append()` method."
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"ename": "AttributeError",
"evalue": "'tuple' object has no attribute 'append'",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mAttributeError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m/tmp/user/1000/ipykernel_306555/2667408552.py\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mmore_numbers\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mappend\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m10\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;31mAttributeError\u001b[0m: 'tuple' object has no attribute 'append'"
]
}
],
"source": [
"more_numbers.append(10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Whereas both `list` and `tuple` objects perserve the **order** of their elements, the `set` data type does not. Additionally, any object may only be an element of a `set` at most once. The syntax to create `set`s are curly braces, `{` and `}`. By giving up order, `set` objects offer significantly increased processing speed in various situations."
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [],
"source": [
"other_numbers = {3, 2, 1, 3, 3, 2}"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{1, 2, 3}"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"other_numbers"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"One last example of a \"complex\" data type is the `dict`ionary type, which models a mapping relationship among the objects it contains. The syntax to create `dict`s also involves curly braces with the additon of using a `:` to specify the mapping relationships.\n",
"\n",
"For example, to map `int`egers to `str`ings modeling the English words corresponding to the numbers, we could write the following. The objects to the left of the `:` take the role of the **keys** while the ones to the right take the role of the **values**."
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [],
"source": [
"to_words = {\n",
" 0: \"zero\",\n",
" 1: \"one\",\n",
" 2: \"two\",\n",
"}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The main purpose of `dict`s is to **look up** the value mapped to by some key. We can use the indexing notion to achieve that."
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'zero'"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"to_words[0]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Looking up the values can *not* be done as the `KeyError` below shows."
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"ename": "KeyError",
"evalue": "'zero'",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mKeyError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m/tmp/user/1000/ipykernel_306555/3320204082.py\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mto_words\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m\"zero\"\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;31mKeyError\u001b[0m: 'zero'"
]
}
],
"source": [
"to_words[\"zero\"]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Instead, we would have to create a `dict` mapping the words to numbers, like the one below."
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [],
"source": [
"to_numbers = {\n",
" \"zero\": 0,\n",
" \"one\": 1,\n",
" \"two\": 2,\n",
"}"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"to_numbers[\"zero\"]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`dict`s are among the most optimized data type in the Python world and a major building block in codebases solving real-life problems."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A big factor in getting good at any programming language is to learn what data types to use in which situations. There is no \"best\" data type; choosing among a couple of data types always comes down to trade-offs."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.12"
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": false,
"sideBar": true,
"skip_h1_title": false,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {},
"toc_section_display": true,
"toc_window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View file

@ -13,8 +13,20 @@
"source": [
"Python itself does not come with any scientific algorithms. However, over time, many third-party libraries emerged that are useful to build machine learning applications. In this context, \"third-party\" means that the libraries are *not* part of Python's standard library.\n",
"\n",
"Among the popular ones are [numpy](https://numpy.org/) (numerical computations, linear algebra), [pandas](https://pandas.pydata.org/) (data processing), [matplotlib](https://matplotlib.org/) (visualisations), and [scikit-learn](https://scikit-learn.org/stable/index.html) (machine learning algorithms).\n",
"\n",
"Among the popular ones are [numpy](https://numpy.org/) (numerical computations, linear algebra), [pandas](https://pandas.pydata.org/) (data processing), [matplotlib](https://matplotlib.org/) (visualisations), and [scikit-learn](https://scikit-learn.org/stable/index.html) (machine learning algorithms)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Extending Core Python with Third-party Packages"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Before we can import these libraries, we must ensure that they installed on our computers. If you installed Python via the Anaconda Distribution that should already be the case. Otherwise, we can use Python's **package manager** `pip` to install them manually.\n",
"\n",
"`pip` is a so-called command-line interface (CLI), meaning it is a program that is run within a terminal window. JupyterLab allows us to run such a CLI tool from within a notebook by starting a code cell with a single `%` symbol. Here, this does not mean Python's modulo operator but is just an instruction to JupyterLab that the following code is *not* Python."
@ -627,7 +639,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
@ -641,7 +653,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.9"
"version": "3.8.12"
},
"toc": {
"base_numbering": 1,

View file

@ -1150,7 +1150,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
@ -1164,7 +1164,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.9"
"version": "3.8.12"
},
"toc": {
"base_numbering": 1,

View file

Before

Width:  |  Height:  |  Size: 33 KiB

After

Width:  |  Height:  |  Size: 33 KiB

View file

Before

Width:  |  Height:  |  Size: 55 KiB

After

Width:  |  Height:  |  Size: 55 KiB

View file

Before

Width:  |  Height:  |  Size: 344 KiB

After

Width:  |  Height:  |  Size: 344 KiB

View file

Before

Width:  |  Height:  |  Size: 32 KiB

After

Width:  |  Height:  |  Size: 32 KiB

View file

Before

Width:  |  Height:  |  Size: 21 KiB

After

Width:  |  Height:  |  Size: 21 KiB

View file

Before

Width:  |  Height:  |  Size: 497 KiB

After

Width:  |  Height:  |  Size: 497 KiB

View file

Before

Width:  |  Height:  |  Size: 76 KiB

After

Width:  |  Height:  |  Size: 76 KiB

View file

Before

Width:  |  Height:  |  Size: 113 KiB

After

Width:  |  Height:  |  Size: 113 KiB

View file

Before

Width:  |  Height:  |  Size: 76 KiB

After

Width:  |  Height:  |  Size: 76 KiB

View file

Before

Width:  |  Height:  |  Size: 96 KiB

After

Width:  |  Height:  |  Size: 96 KiB

View file

Before

Width:  |  Height:  |  Size: 73 KiB

After

Width:  |  Height:  |  Size: 73 KiB

View file

@ -9,9 +9,13 @@ To learn about Python and programming in detail,
### Table of Contents
- *Chapter 0*: [Python in a Nutshell](00_python_in_a_nutshell.ipynb)
- *Chapter 1*: [Python's Scientific Stack](01_scientific_stack.ipynb)
- *Chapter 2*: [A first Example: Classifying Flowers](02_a_first_example.ipynb)
- *Chapter 0*: **Python in a Nutshell**
- *Content*: [Basic Arithmetic](00_python_in_a_nutshell/00_content_arithmetic.ipynb)
- *Content*: [Business Logic](00_python_in_a_nutshell/02_content_logic.ipynb)
- *Content*: [Functions](00_python_in_a_nutshell/05_content_functions.ipynb)
- *Content*: [Data Types](00_python_in_a_nutshell/07_content_data_types.ipynb)
- *Chapter 1*: [Python's Scientific Stack](01_scientific_stack/00_content.ipynb)
- *Chapter 2*: [A first Example: Classifying Flowers](02_classification/00_content.ipynb)
- *Chapter 3*: [Case Study: House Prices in Ames, Iowa <img height="12" style="display: inline-block" src="static/link/to_gh.png">](https://github.com/webartifex/ames-housing)

BIN
static/link/to_hn.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 632 B

BIN
static/link/to_mb.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.2 KiB