{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Note**: Click on \"*Kernel*\" > \"*Restart Kernel and Clear All Outputs*\" in [JupyterLab](https://jupyterlab.readthedocs.io/en/stable/) *before* reading this notebook to reset its output. If you cannot run this file on your machine, you may want to open it [in the cloud ](https://mybinder.org/v2/gh/webartifex/intro-to-data-science/main?urlpath=lab/tree/00_python_in_a_nutshell/02_content_logic.ipynb)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Chapter 0: Python in a Nutshell (Part 2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In the previous section, we only looked at **scalars** (i.e., a variable referencing one number at a time). However, that is not the only kind of data a computer can hold in its memory. In the section below, we look at how computers process many numbers in a generic fashion."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Non-Scalar Data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As most real-life projects involve *non-scalar* data, we take a pre-liminary look at how Python models `list`-like data next. Intuitively, a `list` can be thought of as a **container** holding many \"things.\"\n",
"\n",
"The syntax to create a `list` are brackets, `[` and `]`, another example of delimiters, listing the individual **elements** of the `list` in between them, separated by commas.\n",
"\n",
"For example, the next code snippet creates a `list` named `numbers` with the numbers `1`, `2`, `3`, `4`, and `5` in it."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[1, 2, 3, 4, 5]"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"numbers = [1, 2, 3, 4, 5]\n",
"\n",
"numbers"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Whenever we use any kind of delimiter, we may break the lines in between them as we wish and add other so-called **whitespace** characters like spaces to format the way the code looks like. So, the following two code cells do *exactly* the same as the previous one, even the `,` after the `5` in the second cell is ignored."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[1, 2, 3, 4, 5]"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"numbers = [\n",
" 1, 2, 3, 4, 5\n",
"]\n",
"\n",
"numbers"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[1, 2, 3, 4, 5]"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"numbers = [\n",
" 1,\n",
" 2,\n",
" 3,\n",
" 4,\n",
" 5,\n",
"]\n",
"\n",
"numbers"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A nice thing to know is that JupyterLab comes with **tab completion** built in. That means we do not have to type out the name `numbers` as a whole. Try it out by simply typing `num` and then hit the tab key on your keyboard. JupyterLab should complete the variable into `numbers`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"num"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Indexing & Slicing"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A natural operation to do with `list`s is to **access** its elements. That is achieved with another operator that also uses a bracket notation. Each element is associated with an **index**, which is why we say that we \"index into a `list`.\" As with many other programming languages, Python is 0-based, which simply means that whenever we count something, we start to count at `0`.\n",
"\n",
"For example, to obtain the first element in `numbers`, we write the following."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"numbers[0]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note that the indexing operation implicitly assumes an **order** among the elements, which is quite intuitive as we specified the numbers in order above.\n",
"\n",
"Another implicit assumption behind `list`s is that the number of elements is *finite*. Because of that, we may use negative indices starting at `-1` to obtain an element in right-to-left order.\n",
"\n",
"So, to obtain the last element in `numbers`, we write the following."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"5"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"numbers[-1]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`list` objects are **mutable**: We may change *parts* of them *after* they are created. That behavior is *not* a given for many other **types** of objects.\n",
"\n",
"For example, to exchange the first and the last element in `numbers`, we assign new objects to an index."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"numbers[0] = 5"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"numbers[4] = 1"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[5, 2, 3, 4, 1]"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"numbers"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To \"flip\" the value of two variables or indexes, we may also use the following notation."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"numbers[0], numbers[4] = numbers[4], numbers[0]"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[1, 2, 3, 4, 5]"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"numbers"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As a generalization, we may also **slice** out some elements in the `list`. That is done with the `[...]` notation as well. Yet, instead of a single integer index, we now provide a *start* and a *stop* index separated by a `:`. While the element corresponding to the *start* index is included, this is not the case for *stop*.\n",
"\n",
"For example, to slice out the middle three elements, we write the following."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[2, 3, 4]"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"numbers[1:4]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We may combine positive and negative indexes.\n",
"\n",
"So, the following yields the same result."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[2, 3, 4]"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"numbers[1:-1]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"While ommitting the *start* index makes a slice begin at the first element, ..."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[1, 2, 3, 4]"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"numbers[:-1]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"... leaving out the *stop* index makes a slice go to the last element."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[2, 3, 4, 5]"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"numbers[1:]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Providing a third integer as the *step* value after another `:` makes a slice skip some elements.\n",
"\n",
"For example, `[1:-1:2]` means \"go from the second element (including) to the last element (excluding) and take every second element\" ..."
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[2, 4]"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"numbers[1:-1:2]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"... while `[::2]` simply downsamples the `list` by taking every other element."
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[1, 3, 5]"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"numbers[::2]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Expressing Business Logic"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The main point of using `list`s in Python is to write code that does \"something\" for each element in the `list`, which may hold big amounts of data. Expressing the logic of a problem from the real world in code, the \"something\" part, is subsumed by the term [business logic ](https://en.wikipedia.org/wiki/Business_logic), which has *nothing* to do with businesses that make money.\n",
"\n",
"There are two aspects to business logic:\n",
"1. Execute some lines of code many times, and\n",
"2. execute some lines of code only if a certain **condition** applies.\n",
"\n",
"Both of these aspects come in many variants and may be combined in basically any arbitrary fashion."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Iterative Execution"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Iteration** is the generic idea of executing code repeatedly. Most programming languages provide dedicated constructs to achieve that. In Python, the easiest such construct is the so-called `for`-loop."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### The `for` Loop"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A `for`-loop consists of two parts:\n",
"- a **header** line specifying what is looped over, and\n",
"- a **body** consisting of the **block of code** that is repeated for each element.\n",
"\n",
"In the example below, `for number in numbers:` constitutes the header. The expression after the `in` references the \"thing\" that is looped over (here: a `list` of `numbers`) and the name between `for` and `in` becomes a variable that is assigned a new value in each **iteration** over of the loop. A best practice is to use a meaingful name, which is why we choose the singular `number`. The `:` at the end is the charactistic symbol of a header line in general and requires the next line (and possibly many more lines) to be **indented**.\n",
"\n",
"The indented line constitues the `for`-loop's body. In the example, we simply take each of the numbers in `numbers`, one at a time, and add it to a `total` that is initialized at `0`. In other words, we calculate the sum of all the elements in `numbers`.\n",
"\n",
"Many beginners struggle with the term \"loop.\" To visualize the looping behavior of this code, we use the online tool [PythonTutor ](http://pythontutor.com/visualize.html#code=numbers%20%3D%20%5B1,%202,%203,%204%5D%0A%0Atotal%20%3D%200%0A%0Afor%20number%20in%20numbers%3A%0A%20%20%20%20total%20%3D%20total%20%2B%20number%0A%0Atotal&cumulative=false&curstr=0&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false). That tool is helpful for two reasons:\n",
"1. It allows us to execute code in \"slow motion\" (i.e., by clicking the \"next\" button on the left side, only the next atomic step of the code snippet is executed).\n",
"2. It shows what happens inside the computer's memory on the right-hand side."
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"15"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"total = 0\n",
"\n",
"for number in numbers:\n",
" total = total + number\n",
"\n",
"total"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Python is pretty agnostic about how far the `for`-loop's body is indented. So, both of the next code cells are equivalent to the one above. Yet, a popular convention in the Python world is to always indent code with 4 spaces per indentation level."
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"15"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"total = 0\n",
"\n",
"for number in numbers:\n",
" total = total + number\n",
"\n",
"total"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"15"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"total = 0\n",
"\n",
"for number in numbers:\n",
" total = total + number\n",
"\n",
"total"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Conditional Execution"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As a variation, let's add up only the even numbers. To achieve that, we exploit the fact that even numbers are all numbers that are divisible by `2` and use the `%` operator from before and a new one, namely the `==` operator for *equality comparison*, to express that idea."
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"7 % 2"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"8 % 2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Whenever *arithmetic* operators like `%` are combined with *relational* operators like `==`, the arithmetic ones are evaluated first. So, in the two cells below, we first obtain the rest after dividing `7` and `8` by `2` and then compare that to `0`. The result is a so-called **boolean**, either `True` or `False`, which is a computer's way of saying \"yes\" or \"no.\""
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"False"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"7 % 2 == 0"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"8 % 2 == 0"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Other relational operators are `!=` to test inequality and `<`, `<=`, `>`, and `>=` to check wether the left or right side is smaller or larger."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### The `if` Statement"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We use such kind of expressions as the **condition** in an `if` statement that constitutes a second layer within our `for`-loop implementation. An `if` statement itself consists of yet another header line with a body. That body's code is only executed if the condition is `True`.\n",
"\n",
"As an example, the next code snippet loops over all the elements in `numbers` and, for each individual `number`, checks if it is even. Only if that is the case, the `number` is added to the `total`. Otherwise, nothing is done with the `number`. The example also shows how we can add so-called **comments** at the end of a line: Anything that comes after the `#` symbol is disregarded by Python. We use such comments to put little notes to ourselves within the code."
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"6"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"total = 0\n",
"\n",
"for number in numbers:\n",
" if number % 2 == 0: # if the number is even\n",
" total = total + number\n",
"\n",
"total"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### The `else` Clause"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`if` statements may have more than one header line: For example, the code in the `else`-clause's body is only executed if the condition in the `if`-clause is `False`. In the code cell below, we calculate the sum of all even numbers and subtract the sum of all odd numbers. The result is `(2 + 4) - (1 + 3 + 5)`, or `-1 + 2 - 3 + 4 - 5` resembling the order of the numbers in the `for`-loop."
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"-3"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"total = 0\n",
"\n",
"for number in numbers:\n",
" if number % 2 == 0: # if the number is even\n",
" total = total + number\n",
" else: # if the number is odd\n",
" total = total - number\n",
"\n",
"total"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A **function** (cf., the \"*Built-in Functions*\" section further below) that comes in handy with `for`-loops is [print() ](https://docs.python.org/3/library/functions.html#print), which simply \"prints\" out (i.e., \"shows on the screen\") whatever **input** we give it.\n",
"\n",
"In the example next, we loop over the numbers from `1` to `10` and print out either half a `number` or three times a `number` plus 1 depending on the `number` being even or odd."
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"4\n",
"1\n",
"10\n",
"2\n",
"16\n",
"3\n",
"22\n",
"4\n",
"28\n",
"5\n"
]
}
],
"source": [
"for number in [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]:\n",
" if number % 2 == 0:\n",
" print(number // 2)\n",
" else:\n",
" print(3 * number + 1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To save ourselves writing out all the numbers, we may also use the [range() ](https://docs.python.org/3/library/functions.html#func-range) built-in, which, in the example, takes two inputs separated by comma: A `start` number that is included and a `stop` number that is *not* included."
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"4\n",
"1\n",
"10\n",
"2\n",
"16\n",
"3\n",
"22\n",
"4\n",
"28\n",
"5\n"
]
}
],
"source": [
"for number in range(1, 11):\n",
" if number % 2 == 0:\n",
" print(number // 2)\n",
" else:\n",
" print(3 * number + 1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### The `elif` Clause"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If we need to check for *several* **alternatives** (i.e., different conditions), we may add an arbitrary number of `elif`-clauses to an `if` statement.\n",
"\n",
"In the next example, we print out messages indicating the *largest* whole number by which a `number` may be divided.\n",
"\n",
"Note that [print() ](https://docs.python.org/3/library/functions.html#print) may take several inputs as well. The `\"...\"` notation is Python's way of modeling **textual data**."
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1 is divisible by neither 2 nor 3\n",
"2 is divisible by 2\n",
"3 is divisible by 3\n",
"4 is divisible by 2\n",
"5 is divisible by neither 2 nor 3\n",
"6 is divisible by 2\n",
"7 is divisible by neither 2 nor 3\n",
"8 is divisible by 2\n",
"9 is divisible by 3\n",
"10 is divisible by 2\n"
]
}
],
"source": [
"for number in range(1, 11):\n",
" if number % 2 == 0:\n",
" print(number, \"is divisible by 2\")\n",
" elif number % 3 == 0:\n",
" print(number, \"is divisible by 3\")\n",
" else:\n",
" print(number, \"is divisible by neither 2 nor 3\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It is noteworthy that only the *first* block of code whose condition is `True` is executed!\n",
"\n",
"So, we must be careful not to make any logical errors: In the example below, we *never* reach the alternative where the `number` is divisible by `4` because whenever a `number` is divisible by `4` it is also always divisible by `2` as well."
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1 is divisible by neither 2, 3, nor 4\n",
"2 is divisible by 2\n",
"3 is divisible by 3\n",
"4 is divisible by 2\n",
"5 is divisible by neither 2, 3, nor 4\n",
"6 is divisible by 2\n",
"7 is divisible by neither 2, 3, nor 4\n",
"8 is divisible by 2\n",
"9 is divisible by 3\n",
"10 is divisible by 2\n"
]
}
],
"source": [
"for number in range(1, 11):\n",
" if number % 2 == 0:\n",
" print(number, \"is divisible by 2\")\n",
" elif number % 3 == 0:\n",
" print(number, \"is divisible by 3\")\n",
" elif number % 4 == 0:\n",
" print(number, \"is divisible by 4\")\n",
" else:\n",
" print(number, \"is divisible by neither 2, 3, nor 4\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"By re-arranging the order of the `if`- and `elif`- clauses, we obtain the correct output."
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1 is divisible by neither 2, 3, nor 4\n",
"2 is divisible by 2\n",
"3 is divisible by 3\n",
"4 is divisible by 4\n",
"5 is divisible by neither 2, 3, nor 4\n",
"6 is divisible by 3\n",
"7 is divisible by neither 2, 3, nor 4\n",
"8 is divisible by 4\n",
"9 is divisible by 3\n",
"10 is divisible by 2\n"
]
}
],
"source": [
"for number in range(1, 11):\n",
" if number % 4 == 0:\n",
" print(number, \"is divisible by 4\")\n",
" elif number % 3 == 0:\n",
" print(number, \"is divisible by 3\")\n",
" elif number % 2 == 0:\n",
" print(number, \"is divisible by 2\")\n",
" else:\n",
" print(number, \"is divisible by neither 2, 3, nor 4\")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "intro-to-data-science",
"language": "python",
"name": "intro-to-data-science"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.4"
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": false,
"sideBar": true,
"skip_h1_title": false,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {},
"toc_section_display": true,
"toc_window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 4
}