diff --git a/00_start_up.ipynb b/00_start_up.ipynb index 9ace23c..97fe84b 100644 --- a/00_start_up.ipynb +++ b/00_start_up.ipynb @@ -792,13 +792,12 @@ "**Part 2: Managing Data and Memory**\n", "\n", "- How is data stored in memory?\n", - " 5. [Numbers](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/05_numbers.ipynb)\n", - " 6. Text\n", - " 7. Sequences\n", + " 5. [Bits & Numbers](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/05_numbers.ipynb)\n", + " 6. [Bytes & Text](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/06_text.ipynb)\n", + " 7. [Sequential Data](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/07_sequences.ipynb)\n", " 8. Mappings & Sets\n", - " 9. Arrays\n", "- How can we create custom data types?\n", - " 10. Object-Orientation" + " 9. Object-Orientation" ] }, { diff --git a/01_elements.ipynb b/01_elements.ipynb index c941e43..fed6458 100644 --- a/01_elements.ipynb +++ b/01_elements.ipynb @@ -35,7 +35,7 @@ } }, "source": [ - "## Example: Average of a Subset of Numbers" + "## Example: Averaging Even Numbers" ] }, { @@ -46,11 +46,11 @@ } }, "source": [ - "As our introductory example, we want to calculate the *average* of all *even* numbers from *one* through *ten*.\n", + "As our introductory example, we want to calculate the *average* of all *evens* in a **list** of numbers: `[7, 11, 8, 5, 3, 12, 2, 6, 9, 10, 1, 4]`.\n", "\n", - "While we could come up with an [analytical solution](https://math.stackexchange.com/questions/935405/what-s-the-difference-between-analytical-and-numerical-approaches-to-problems/935446#935446) (i.e., derive some equation with \"pen and paper\" from, e.g., one of [Faulhaber's formulas](https://en.wikipedia.org/wiki/Faulhaber%27s_formula)), we instead solve the task programmatically.\n", + "While we could try to develop an [analytical solution](https://math.stackexchange.com/questions/935405/what-s-the-difference-between-analytical-and-numerical-approaches-to-problems/935446#935446) (i.e., derive some equation with \"pen and paper\"), we solve the task programmatically instead.\n", "\n", - "We start by creating a **list** called `numbers` that holds all the individual numbers." + "We start by creating a list called `numbers` that holds all the individual numbers between **brackets** `[` and `]`." ] }, { @@ -64,7 +64,7 @@ }, "outputs": [], "source": [ - "numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]" + "numbers = [7, 11, 8, 5, 3, 12, 2, 6, 9, 10, 1, 4]" ] }, { @@ -90,7 +90,7 @@ { "data": { "text/plain": [ - "[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]" + "[7, 11, 8, 5, 3, 12, 2, 6, 9, 10, 1, 4]" ] }, "execution_count": 2, @@ -116,7 +116,7 @@ "\n", "The `if number % 2 == 0` may look disturbing at first sight. Both the `%` and `==` must have an unintuitive meaning here. Luckily, the **comment** in the same line after the `#` symbol has the answer: The program only does something if the current `number` is even.\n", "\n", - "In particular, it increases `count` by $1$ and adds the current `number` onto the [running](https://en.wikipedia.org/wiki/Running_total) `total`. Both `count` and `number` are initially set to $0$ and the single `=` symbol reads as \"... is *set* equal to ...\". It could not indicate a mathematical equation as, for example, `count` is generally not equal to `count + 1`.\n", + "In particular, it increases `count` by `1` and adds the current `number` onto the [running](https://en.wikipedia.org/wiki/Running_total) `total`. Both `count` and `number` are initially set to `0` and the single `=` symbol reads as \"... is *set* equal to ...\". It could not indicate a mathematical equation as, for example, `count` is generally not equal to `count + 1`.\n", "\n", "Lastly, the `average` is calculated as the ratio of the final **values** of `total` and `count`. Overall, we divide the sum of all even numbers by their count: This is nothing but the definition of an average.\n", "\n", @@ -167,7 +167,7 @@ { "data": { "text/plain": [ - "6.0" + "7.0" ] }, "execution_count": 4, @@ -427,7 +427,7 @@ } }, "source": [ - "When we compare the output of the `*` and `/` operators for multiplication and division, we note the subtle *difference* between the $42$ and the $42.0$: They are the *same* number represented as a *different* **data type**." + "When we compare the output of the `*` and `/` operators for multiplication and division, we note the subtle *difference* between the `42` and the `42.0`: They are the *same* number represented as a *different* **data type**." ] }, { @@ -615,9 +615,9 @@ } }, "source": [ - "The remainder is $0$ *only if* a number is *divisible* by another.\n", + "The remainder is `0` *only if* a number is *divisible* by another.\n", "\n", - "A popular convention in both, computer science and mathematics, is to abbreviate \"only if\" as **iff**, which is short for \"**[if and only if](https://en.wikipedia.org/wiki/If_and_only_if)**.\" The iff means that a remainder of $0$ implies that a number is divisible by another but also that a number divisible by another implies a remainder of $0$. The implication goes in *both* directions!" + "A popular convention in both, computer science and mathematics, is to abbreviate \"only if\" as **iff**, which is short for \"**[if and only if](https://en.wikipedia.org/wiki/If_and_only_if)**.\" The iff means that a remainder of `0` implies that a number is divisible by another but also that a number divisible by another implies a remainder of `0`. The implication goes in *both* directions!" ] }, { @@ -988,7 +988,7 @@ { "data": { "text/plain": [ - "139829040614096" + "140706351197104" ] }, "execution_count": 28, @@ -1012,7 +1012,7 @@ { "data": { "text/plain": [ - "139829040789352" + "140706351380256" ] }, "execution_count": 29, @@ -1036,7 +1036,7 @@ { "data": { "text/plain": [ - "139829040474736" + "140706351003760" ] }, "execution_count": 30, @@ -1080,7 +1080,7 @@ } }, "source": [ - "`a` and `d` indeed have the same value as is checked with the **equality operator** `==`. The resulting `True` (and the `False` further below) is yet another data type, a so-called **boolean**. We look into them closely in [Chapter 3](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/03_conditionals.ipynb)." + "`a` and `d` indeed have the same value as is checked with the **equality operator** `==`. The resulting `True` (and the `False` further below) is yet another data type, a so-called **boolean**. We look into them closely in [Chapter 3](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/03_conditionals.ipynb#Boolean-Expressions)." ] }, { @@ -1142,6 +1142,41 @@ "a is d" ] }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "If we want to check the opposite case, we use the negated version of the `is` operator." + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 34, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "a is not d" + ] + }, { "cell_type": "markdown", "metadata": { @@ -1166,7 +1201,7 @@ }, { "cell_type": "code", - "execution_count": 34, + "execution_count": 35, "metadata": { "slideshow": { "slide_type": "slide" @@ -1179,7 +1214,7 @@ "int" ] }, - "execution_count": 34, + "execution_count": 35, "metadata": {}, "output_type": "execute_result" } @@ -1190,7 +1225,7 @@ }, { "cell_type": "code", - "execution_count": 35, + "execution_count": 36, "metadata": { "slideshow": { "slide_type": "-" @@ -1203,7 +1238,7 @@ "float" ] }, - "execution_count": 35, + "execution_count": 36, "metadata": {}, "output_type": "execute_result" } @@ -1222,12 +1257,12 @@ "source": [ "Different types imply different behaviors for the objects. The `b` object, for example, can be \"asked\" if it could also be interpreted as an `int` with the [is_integer()](https://docs.python.org/3/library/stdtypes.html#float.is_integer) \"functionality\" that comes with every `float` object.\n", "\n", - "Formally, we call such type-specific functionalities **methods** (to differentiate them from functions) and we formally introduce them in Chapter 10. For now, it suffices to know that we access them using the **dot operator** `.`. Of course, `b` could be converted into an `int`, which the boolean value `True` tells us." + "Formally, we call such type-specific functionalities **methods** (to differentiate them from functions) and we formally introduce them in Chapter 9. For now, it suffices to know that we access them using the **dot operator** `.`. Of course, `b` could be converted into an `int`, which the boolean value `True` tells us." ] }, { "cell_type": "code", - "execution_count": 36, + "execution_count": 37, "metadata": { "slideshow": { "slide_type": "fragment" @@ -1240,7 +1275,7 @@ "True" ] }, - "execution_count": 36, + "execution_count": 37, "metadata": {}, "output_type": "execute_result" } @@ -1262,7 +1297,7 @@ }, { "cell_type": "code", - "execution_count": 37, + "execution_count": 38, "metadata": { "slideshow": { "slide_type": "-" @@ -1276,7 +1311,7 @@ "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mAttributeError\u001b[0m Traceback (most recent call last)", - "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0ma\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mis_integer\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0ma\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mis_integer\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mAttributeError\u001b[0m: 'int' object has no attribute 'is_integer'" ] } @@ -1298,7 +1333,7 @@ }, { "cell_type": "code", - "execution_count": 38, + "execution_count": 39, "metadata": { "slideshow": { "slide_type": "slide" @@ -1311,7 +1346,7 @@ "str" ] }, - "execution_count": 38, + "execution_count": 39, "metadata": {}, "output_type": "execute_result" } @@ -1322,7 +1357,7 @@ }, { "cell_type": "code", - "execution_count": 39, + "execution_count": 40, "metadata": { "slideshow": { "slide_type": "fragment" @@ -1335,7 +1370,7 @@ "'python rocks'" ] }, - "execution_count": 39, + "execution_count": 40, "metadata": {}, "output_type": "execute_result" } @@ -1346,7 +1381,7 @@ }, { "cell_type": "code", - "execution_count": 40, + "execution_count": 41, "metadata": { "slideshow": { "slide_type": "-" @@ -1359,7 +1394,7 @@ "'PYTHON ROCKS'" ] }, - "execution_count": 40, + "execution_count": 41, "metadata": {}, "output_type": "execute_result" } @@ -1370,7 +1405,7 @@ }, { "cell_type": "code", - "execution_count": 41, + "execution_count": 42, "metadata": { "slideshow": { "slide_type": "skip" @@ -1383,7 +1418,7 @@ "'Python Rocks'" ] }, - "execution_count": 41, + "execution_count": 42, "metadata": {}, "output_type": "execute_result" } @@ -1418,7 +1453,7 @@ }, { "cell_type": "code", - "execution_count": 42, + "execution_count": 43, "metadata": { "slideshow": { "slide_type": "slide" @@ -1431,7 +1466,7 @@ "789" ] }, - "execution_count": 42, + "execution_count": 43, "metadata": {}, "output_type": "execute_result" } @@ -1442,7 +1477,7 @@ }, { "cell_type": "code", - "execution_count": 43, + "execution_count": 44, "metadata": { "slideshow": { "slide_type": "-" @@ -1455,7 +1490,7 @@ "42.0" ] }, - "execution_count": 43, + "execution_count": 44, "metadata": {}, "output_type": "execute_result" } @@ -1477,7 +1512,7 @@ }, { "cell_type": "code", - "execution_count": 44, + "execution_count": 45, "metadata": { "slideshow": { "slide_type": "-" @@ -1490,7 +1525,7 @@ "'Python rocks'" ] }, - "execution_count": 44, + "execution_count": 45, "metadata": {}, "output_type": "execute_result" } @@ -1549,7 +1584,7 @@ }, { "cell_type": "code", - "execution_count": 45, + "execution_count": 46, "metadata": { "slideshow": { "slide_type": "slide" @@ -1558,10 +1593,10 @@ "outputs": [ { "ename": "SyntaxError", - "evalue": "invalid syntax (, line 1)", + "evalue": "invalid syntax (, line 1)", "output_type": "error", "traceback": [ - "\u001b[0;36m File \u001b[0;32m\"\"\u001b[0;36m, line \u001b[0;32m1\u001b[0m\n\u001b[0;31m 3.99 $ + 10.40 $\u001b[0m\n\u001b[0m ^\u001b[0m\n\u001b[0;31mSyntaxError\u001b[0m\u001b[0;31m:\u001b[0m invalid syntax\n" + "\u001b[0;36m File \u001b[0;32m\"\"\u001b[0;36m, line \u001b[0;32m1\u001b[0m\n\u001b[0;31m 3.99 $ + 10.40 $\u001b[0m\n\u001b[0m ^\u001b[0m\n\u001b[0;31mSyntaxError\u001b[0m\u001b[0;31m:\u001b[0m invalid syntax\n" ] } ], @@ -1582,7 +1617,7 @@ }, { "cell_type": "code", - "execution_count": 46, + "execution_count": 47, "metadata": { "slideshow": { "slide_type": "slide" @@ -1591,10 +1626,10 @@ "outputs": [ { "ename": "SyntaxError", - "evalue": "invalid syntax (, line 1)", + "evalue": "invalid syntax (, line 1)", "output_type": "error", "traceback": [ - "\u001b[0;36m File \u001b[0;32m\"\"\u001b[0;36m, line \u001b[0;32m1\u001b[0m\n\u001b[0;31m for number in numbers\u001b[0m\n\u001b[0m ^\u001b[0m\n\u001b[0;31mSyntaxError\u001b[0m\u001b[0;31m:\u001b[0m invalid syntax\n" + "\u001b[0;36m File \u001b[0;32m\"\"\u001b[0;36m, line \u001b[0;32m1\u001b[0m\n\u001b[0;31m for number in numbers\u001b[0m\n\u001b[0m ^\u001b[0m\n\u001b[0;31mSyntaxError\u001b[0m\u001b[0;31m:\u001b[0m invalid syntax\n" ] } ], @@ -1616,7 +1651,7 @@ }, { "cell_type": "code", - "execution_count": 47, + "execution_count": 48, "metadata": { "slideshow": { "slide_type": "slide" @@ -1625,10 +1660,10 @@ "outputs": [ { "ename": "IndentationError", - "evalue": "expected an indented block (, line 2)", + "evalue": "expected an indented block (, line 2)", "output_type": "error", "traceback": [ - "\u001b[0;36m File \u001b[0;32m\"\"\u001b[0;36m, line \u001b[0;32m2\u001b[0m\n\u001b[0;31m print(number)\u001b[0m\n\u001b[0m ^\u001b[0m\n\u001b[0;31mIndentationError\u001b[0m\u001b[0;31m:\u001b[0m expected an indented block\n" + "\u001b[0;36m File \u001b[0;32m\"\"\u001b[0;36m, line \u001b[0;32m2\u001b[0m\n\u001b[0;31m print(number)\u001b[0m\n\u001b[0m ^\u001b[0m\n\u001b[0;31mIndentationError\u001b[0m\u001b[0;31m:\u001b[0m expected an indented block\n" ] } ], @@ -1660,12 +1695,12 @@ "\n", "However, there are also so-called **runtime errors**, often called **exceptions**, that occur whenever otherwise (i.e., syntactically) correct code does not run because of invalid input.\n", "\n", - "This example does not work because just like in the \"real\" world, Python does not know how to divide by $0$. The syntactically correct code leads to a `ZeroDivisionError`." + "This example does not work because just like in the \"real\" world, Python does not know how to divide by `0`. The syntactically correct code leads to a `ZeroDivisionError`." ] }, { "cell_type": "code", - "execution_count": 48, + "execution_count": 49, "metadata": { "slideshow": { "slide_type": "slide" @@ -1679,7 +1714,7 @@ "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mZeroDivisionError\u001b[0m Traceback (most recent call last)", - "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0;36m1\u001b[0m \u001b[0;34m/\u001b[0m \u001b[0;36m0\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0;36m1\u001b[0m \u001b[0;34m/\u001b[0m \u001b[0;36m0\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mZeroDivisionError\u001b[0m: division by zero" ] } @@ -1714,7 +1749,7 @@ }, { "cell_type": "code", - "execution_count": 49, + "execution_count": 50, "metadata": { "code_folding": [], "slideshow": { @@ -1736,7 +1771,7 @@ }, { "cell_type": "code", - "execution_count": 50, + "execution_count": 51, "metadata": { "slideshow": { "slide_type": "-" @@ -1746,10 +1781,10 @@ { "data": { "text/plain": [ - "3.0" + "3.5" ] }, - "execution_count": 50, + "execution_count": 51, "metadata": {}, "output_type": "execute_result" } @@ -1795,7 +1830,7 @@ }, { "cell_type": "code", - "execution_count": 51, + "execution_count": 52, "metadata": { "slideshow": { "slide_type": "skip" @@ -1817,10 +1852,10 @@ " " ], "text/plain": [ - "" + "" ] }, - "execution_count": 51, + "execution_count": 52, "metadata": {}, "output_type": "execute_result" } @@ -1838,12 +1873,12 @@ } }, "source": [ - "For example, while the above code to calculate the average of the even numbers from 1 through 10 is correct, a Pythonista would re-write it in a more \"Pythonic\" way and use the [sum()](https://docs.python.org/3/library/functions.html#sum) and [len()](https://docs.python.org/3/library/functions.html#len) (= \"length\") built-in functions (cf., [Chapter 2](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/02_functions.ipynb)) as well as a so-called **list comprehension** (cf., Chapter 7). Pythonic code runs faster in many cases and is less error-prone." + "For example, while the above code to calculate the average of the even numbers from `1` through `12` is correct, a Pythonista would re-write it in a more \"Pythonic\" way and use the [sum()](https://docs.python.org/3/library/functions.html#sum) and [len()](https://docs.python.org/3/library/functions.html#len) (= \"length\") [built-in functions](https://docs.python.org/3/library/functions.html) (cf., [Chapter 2](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/02_functions.ipynb#Built-in-Functions)) as well as a so-called **list comprehension** (cf., [Chapter 7](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/07_sequences.ipynb#List-Comprehensions)). Pythonic code runs faster in many cases and is less error-prone." ] }, { "cell_type": "code", - "execution_count": 52, + "execution_count": 53, "metadata": { "slideshow": { "slide_type": "slide" @@ -1851,12 +1886,12 @@ }, "outputs": [], "source": [ - "numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]" + "numbers = [7, 11, 8, 5, 3, 12, 2, 6, 9, 10, 1, 4]" ] }, { "cell_type": "code", - "execution_count": 53, + "execution_count": 54, "metadata": { "slideshow": { "slide_type": "-" @@ -1869,7 +1904,7 @@ }, { "cell_type": "code", - "execution_count": 54, + "execution_count": 55, "metadata": { "slideshow": { "slide_type": "-" @@ -1879,10 +1914,10 @@ { "data": { "text/plain": [ - "[2, 4, 6, 8, 10]" + "[8, 12, 2, 6, 10, 4]" ] }, - "execution_count": 54, + "execution_count": 55, "metadata": {}, "output_type": "execute_result" } @@ -1893,7 +1928,7 @@ }, { "cell_type": "code", - "execution_count": 55, + "execution_count": 56, "metadata": { "slideshow": { "slide_type": "-" @@ -1906,7 +1941,7 @@ }, { "cell_type": "code", - "execution_count": 56, + "execution_count": 57, "metadata": { "slideshow": { "slide_type": "-" @@ -1916,10 +1951,10 @@ { "data": { "text/plain": [ - "6.0" + "7.0" ] }, - "execution_count": 56, + "execution_count": 57, "metadata": {}, "output_type": "execute_result" } @@ -1941,7 +1976,7 @@ }, { "cell_type": "code", - "execution_count": 57, + "execution_count": 58, "metadata": { "slideshow": { "slide_type": "slide" @@ -2040,7 +2075,7 @@ "\n", "At the same time, for a beginner's course, it is often easier to code linearly.\n", "\n", - "In real data science projects, one would probably employ a mixed approach and put re-usable code into so-called Python modules (i.e., *.py* files; cf., [Chapter 2](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/02_functions.ipynb)) and then use Jupyter notebooks to build up a linear report or storyline for a business argument to be made." + "In real data science projects, one would probably employ a mixed approach and put re-usable code into so-called Python modules (i.e., *.py* files; cf., [Chapter 2](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/02_functions.ipynb#Local-Modules-and-Packages)) and then use Jupyter notebooks to build up a linear report or storyline for a business argument to be made." ] }, { @@ -2069,7 +2104,7 @@ }, { "cell_type": "code", - "execution_count": 58, + "execution_count": 59, "metadata": { "slideshow": { "slide_type": "slide" @@ -2094,7 +2129,7 @@ }, { "cell_type": "code", - "execution_count": 59, + "execution_count": 60, "metadata": { "slideshow": { "slide_type": "fragment" @@ -2107,7 +2142,7 @@ "20.0" ] }, - "execution_count": 59, + "execution_count": 60, "metadata": {}, "output_type": "execute_result" } @@ -2129,7 +2164,7 @@ }, { "cell_type": "code", - "execution_count": 60, + "execution_count": 61, "metadata": { "slideshow": { "slide_type": "fragment" @@ -2142,7 +2177,7 @@ }, { "cell_type": "code", - "execution_count": 61, + "execution_count": 62, "metadata": { "slideshow": { "slide_type": "-" @@ -2155,7 +2190,7 @@ "20" ] }, - "execution_count": 61, + "execution_count": 62, "metadata": {}, "output_type": "execute_result" } @@ -2177,7 +2212,7 @@ }, { "cell_type": "code", - "execution_count": 62, + "execution_count": 63, "metadata": { "slideshow": { "slide_type": "slide" @@ -2190,7 +2225,7 @@ }, { "cell_type": "code", - "execution_count": 63, + "execution_count": 64, "metadata": { "slideshow": { "slide_type": "-" @@ -2203,7 +2238,7 @@ }, { "cell_type": "code", - "execution_count": 64, + "execution_count": 65, "metadata": { "slideshow": { "slide_type": "-" @@ -2216,7 +2251,7 @@ }, { "cell_type": "code", - "execution_count": 65, + "execution_count": 66, "metadata": { "slideshow": { "slide_type": "-" @@ -2229,7 +2264,7 @@ "42" ] }, - "execution_count": 65, + "execution_count": 66, "metadata": {}, "output_type": "execute_result" } @@ -2251,7 +2286,7 @@ }, { "cell_type": "code", - "execution_count": 66, + "execution_count": 67, "metadata": { "slideshow": { "slide_type": "slide" @@ -2264,7 +2299,7 @@ "789" ] }, - "execution_count": 66, + "execution_count": 67, "metadata": {}, "output_type": "execute_result" } @@ -2275,7 +2310,7 @@ }, { "cell_type": "code", - "execution_count": 67, + "execution_count": 68, "metadata": { "slideshow": { "slide_type": "-" @@ -2299,7 +2334,7 @@ }, { "cell_type": "code", - "execution_count": 68, + "execution_count": 69, "metadata": { "slideshow": { "slide_type": "-" @@ -2313,7 +2348,7 @@ "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", - "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mb\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mb\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mNameError\u001b[0m: name 'b' is not defined" ] } @@ -2330,12 +2365,12 @@ } }, "source": [ - "Some variables magically exist when we start a Python process or are added by Jupyter. We may safely ignore the former until Chapter 10 and the latter for good." + "Some variables magically exist when we start a Python process or are added by Jupyter. We may safely ignore the former until Chapter 9 and the latter for good." ] }, { "cell_type": "code", - "execution_count": 69, + "execution_count": 70, "metadata": { "slideshow": { "slide_type": "skip" @@ -2348,7 +2383,7 @@ "'__main__'" ] }, - "execution_count": 69, + "execution_count": 70, "metadata": {}, "output_type": "execute_result" } @@ -2370,7 +2405,7 @@ }, { "cell_type": "code", - "execution_count": 70, + "execution_count": 71, "metadata": { "slideshow": { "slide_type": "slide" @@ -2410,7 +2445,7 @@ " '_34',\n", " '_35',\n", " '_36',\n", - " '_38',\n", + " '_37',\n", " '_39',\n", " '_4',\n", " '_40',\n", @@ -2418,16 +2453,17 @@ " '_42',\n", " '_43',\n", " '_44',\n", + " '_45',\n", " '_5',\n", - " '_50',\n", " '_51',\n", - " '_54',\n", - " '_56',\n", - " '_59',\n", - " '_61',\n", - " '_65',\n", + " '_52',\n", + " '_55',\n", + " '_57',\n", + " '_60',\n", + " '_62',\n", " '_66',\n", - " '_69',\n", + " '_67',\n", + " '_70',\n", " '_9',\n", " '__',\n", " '___',\n", @@ -2508,6 +2544,7 @@ " '_i69',\n", " '_i7',\n", " '_i70',\n", + " '_i71',\n", " '_i8',\n", " '_i9',\n", " '_ih',\n", @@ -2529,7 +2566,7 @@ " 'total']" ] }, - "execution_count": 70, + "execution_count": 71, "metadata": {}, "output_type": "execute_result" } @@ -2564,7 +2601,7 @@ }, { "cell_type": "code", - "execution_count": 71, + "execution_count": 72, "metadata": { "slideshow": { "slide_type": "slide" @@ -2575,30 +2612,6 @@ "b = a" ] }, - { - "cell_type": "code", - "execution_count": 72, - "metadata": { - "slideshow": { - "slide_type": "-" - } - }, - "outputs": [ - { - "data": { - "text/plain": [ - "42" - ] - }, - "execution_count": 72, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "a" - ] - }, { "cell_type": "code", "execution_count": 73, @@ -2619,6 +2632,30 @@ "output_type": "execute_result" } ], + "source": [ + "a" + ] + }, + { + "cell_type": "code", + "execution_count": 74, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "42" + ] + }, + "execution_count": 74, + "metadata": {}, + "output_type": "execute_result" + } + ], "source": [ "b" ] @@ -2638,7 +2675,7 @@ }, { "cell_type": "code", - "execution_count": 74, + "execution_count": 75, "metadata": { "slideshow": { "slide_type": "fragment" @@ -2651,7 +2688,7 @@ }, { "cell_type": "code", - "execution_count": 75, + "execution_count": 76, "metadata": { "slideshow": { "slide_type": "-" @@ -2664,7 +2701,7 @@ "123" ] }, - "execution_count": 75, + "execution_count": 76, "metadata": {}, "output_type": "execute_result" } @@ -2686,7 +2723,7 @@ }, { "cell_type": "code", - "execution_count": 76, + "execution_count": 77, "metadata": { "slideshow": { "slide_type": "-" @@ -2699,7 +2736,7 @@ "42" ] }, - "execution_count": 76, + "execution_count": 77, "metadata": {}, "output_type": "execute_result" } @@ -2721,7 +2758,7 @@ }, { "cell_type": "code", - "execution_count": 77, + "execution_count": 78, "metadata": { "slideshow": { "slide_type": "slide" @@ -2734,7 +2771,7 @@ }, { "cell_type": "code", - "execution_count": 78, + "execution_count": 79, "metadata": { "slideshow": { "slide_type": "skip" @@ -2747,7 +2784,7 @@ "list" ] }, - "execution_count": 78, + "execution_count": 79, "metadata": {}, "output_type": "execute_result" } @@ -2758,7 +2795,7 @@ }, { "cell_type": "code", - "execution_count": 79, + "execution_count": 80, "metadata": { "slideshow": { "slide_type": "-" @@ -2779,14 +2816,14 @@ "source": [ "Let's change the first element of `x`.\n", "\n", - "Chapter 7 discusses lists in more depth. For now, let's view a `list` object as some sort of **container** that holds an arbitrary number of pointers to other objects and treat the brackets `[]` attached to it as just another operator, called the **indexing operator**. `x[0]` instructs Python to first follow the pointer from the global list of all names to the `x` object. Then, it follows the first pointer it finds there to the `1` object. The indexing operator must be an operator as we merely read the first element and do not change anything in memory.\n", + "[Chapter 7](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/07_sequences.ipynb#The-list-Type) discusses lists in more depth. For now, let's view a `list` object as some sort of **container** that holds an arbitrary number of pointers to other objects and treat the brackets `[]` attached to it as just another operator, called the **indexing operator**. `x[0]` instructs Python to first follow the pointer from the global list of all names to the `x` object. Then, it follows the first pointer it finds there to the `1` object. The indexing operator must be an operator as we merely read the first element and do not change anything in memory.\n", "\n", "Note how Python **begins counting at 0**. This is not the case for many other languages, for example, [MATLAB](https://en.wikipedia.org/wiki/MATLAB), [R](https://en.wikipedia.org/wiki/R_%28programming_language%29), or [Stata](https://en.wikipedia.org/wiki/Stata). To understand why this makes sense, see this short [note](https://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/EWD831.html) by one of the all-time greats in computer science, the late [Edsger Dijkstra](https://en.wikipedia.org/wiki/Edsger_W._Dijkstra)." ] }, { "cell_type": "code", - "execution_count": 80, + "execution_count": 81, "metadata": { "slideshow": { "slide_type": "fragment" @@ -2799,7 +2836,7 @@ "1" ] }, - "execution_count": 80, + "execution_count": 81, "metadata": {}, "output_type": "execute_result" } @@ -2821,7 +2858,7 @@ }, { "cell_type": "code", - "execution_count": 81, + "execution_count": 82, "metadata": { "slideshow": { "slide_type": "slide" @@ -2834,7 +2871,7 @@ }, { "cell_type": "code", - "execution_count": 82, + "execution_count": 83, "metadata": { "slideshow": { "slide_type": "-" @@ -2847,7 +2884,7 @@ "[99, 2, 3]" ] }, - "execution_count": 82, + "execution_count": 83, "metadata": {}, "output_type": "execute_result" } @@ -2869,7 +2906,7 @@ }, { "cell_type": "code", - "execution_count": 83, + "execution_count": 84, "metadata": { "slideshow": { "slide_type": "fragment" @@ -2882,7 +2919,7 @@ "[99, 2, 3]" ] }, - "execution_count": 83, + "execution_count": 84, "metadata": {}, "output_type": "execute_result" } @@ -2962,7 +2999,7 @@ }, { "cell_type": "code", - "execution_count": 84, + "execution_count": 85, "metadata": { "slideshow": { "slide_type": "slide" @@ -2975,7 +3012,7 @@ }, { "cell_type": "code", - "execution_count": 85, + "execution_count": 86, "metadata": { "slideshow": { "slide_type": "-" @@ -2988,7 +3025,7 @@ }, { "cell_type": "code", - "execution_count": 86, + "execution_count": 87, "metadata": { "slideshow": { "slide_type": "-" @@ -3001,7 +3038,7 @@ }, { "cell_type": "code", - "execution_count": 87, + "execution_count": 88, "metadata": { "slideshow": { "slide_type": "-" @@ -3025,7 +3062,7 @@ }, { "cell_type": "code", - "execution_count": 88, + "execution_count": 89, "metadata": { "slideshow": { "slide_type": "skip" @@ -3038,7 +3075,7 @@ }, { "cell_type": "code", - "execution_count": 89, + "execution_count": 90, "metadata": { "slideshow": { "slide_type": "skip" @@ -3051,7 +3088,7 @@ }, { "cell_type": "code", - "execution_count": 90, + "execution_count": 91, "metadata": { "slideshow": { "slide_type": "skip" @@ -3064,7 +3101,7 @@ }, { "cell_type": "code", - "execution_count": 91, + "execution_count": 92, "metadata": { "slideshow": { "slide_type": "skip" @@ -3073,10 +3110,10 @@ "outputs": [ { "ename": "SyntaxError", - "evalue": "can't assign to operator (, line 1)", + "evalue": "can't assign to operator (, line 1)", "output_type": "error", "traceback": [ - "\u001b[0;36m File \u001b[0;32m\"\"\u001b[0;36m, line \u001b[0;32m1\u001b[0m\n\u001b[0;31m address@work = \"Burgplatz 2, Vallendar\"\u001b[0m\n\u001b[0m ^\u001b[0m\n\u001b[0;31mSyntaxError\u001b[0m\u001b[0;31m:\u001b[0m can't assign to operator\n" + "\u001b[0;36m File \u001b[0;32m\"\"\u001b[0;36m, line \u001b[0;32m1\u001b[0m\n\u001b[0;31m address@work = \"Burgplatz 2, Vallendar\"\u001b[0m\n\u001b[0m ^\u001b[0m\n\u001b[0;31mSyntaxError\u001b[0m\u001b[0;31m:\u001b[0m can't assign to operator\n" ] } ], @@ -3097,7 +3134,7 @@ }, { "cell_type": "code", - "execution_count": 92, + "execution_count": 93, "metadata": { "slideshow": { "slide_type": "skip" @@ -3121,7 +3158,7 @@ }, { "cell_type": "code", - "execution_count": 93, + "execution_count": 94, "metadata": { "slideshow": { "slide_type": "skip" @@ -3134,7 +3171,7 @@ "'__main__'" ] }, - "execution_count": 93, + "execution_count": 94, "metadata": {}, "output_type": "execute_result" } @@ -3167,7 +3204,7 @@ }, { "cell_type": "code", - "execution_count": 94, + "execution_count": 95, "metadata": { "slideshow": { "slide_type": "skip" @@ -3189,10 +3226,10 @@ " " ], "text/plain": [ - "" + "" ] }, - "execution_count": 94, + "execution_count": 95, "metadata": {}, "output_type": "execute_result" } @@ -3232,7 +3269,7 @@ }, { "cell_type": "code", - "execution_count": 95, + "execution_count": 96, "metadata": { "slideshow": { "slide_type": "slide" @@ -3245,7 +3282,7 @@ "123" ] }, - "execution_count": 95, + "execution_count": 96, "metadata": {}, "output_type": "execute_result" } @@ -3256,7 +3293,7 @@ }, { "cell_type": "code", - "execution_count": 96, + "execution_count": 97, "metadata": {}, "outputs": [ { @@ -3265,7 +3302,7 @@ "42" ] }, - "execution_count": 96, + "execution_count": 97, "metadata": {}, "output_type": "execute_result" } @@ -3283,7 +3320,7 @@ }, { "cell_type": "code", - "execution_count": 97, + "execution_count": 98, "metadata": { "slideshow": { "slide_type": "-" @@ -3296,7 +3333,7 @@ "165" ] }, - "execution_count": 97, + "execution_count": 98, "metadata": {}, "output_type": "execute_result" } @@ -3318,7 +3355,7 @@ }, { "cell_type": "code", - "execution_count": 98, + "execution_count": 99, "metadata": { "slideshow": { "slide_type": "-" @@ -3331,7 +3368,7 @@ "4492125" ] }, - "execution_count": 98, + "execution_count": 99, "metadata": {}, "output_type": "execute_result" } @@ -3353,7 +3390,7 @@ }, { "cell_type": "code", - "execution_count": 99, + "execution_count": 100, "metadata": { "slideshow": { "slide_type": "-" @@ -3366,7 +3403,7 @@ "3" ] }, - "execution_count": 99, + "execution_count": 100, "metadata": {}, "output_type": "execute_result" } @@ -3388,7 +3425,7 @@ }, { "cell_type": "code", - "execution_count": 100, + "execution_count": 101, "metadata": { "slideshow": { "slide_type": "-" @@ -3401,7 +3438,7 @@ "104" ] }, - "execution_count": 100, + "execution_count": 101, "metadata": {}, "output_type": "execute_result" } @@ -3434,7 +3471,7 @@ }, { "cell_type": "code", - "execution_count": 101, + "execution_count": 102, "metadata": { "slideshow": { "slide_type": "slide" @@ -3448,7 +3485,7 @@ }, { "cell_type": "code", - "execution_count": 102, + "execution_count": 103, "metadata": { "slideshow": { "slide_type": "fragment" @@ -3461,7 +3498,7 @@ "'Hi class'" ] }, - "execution_count": 102, + "execution_count": 103, "metadata": {}, "output_type": "execute_result" } @@ -3483,7 +3520,7 @@ }, { "cell_type": "code", - "execution_count": 103, + "execution_count": 104, "metadata": { "slideshow": { "slide_type": "fragment" @@ -3496,7 +3533,7 @@ "'Hi Hi Hi Hi Hi Hi Hi Hi Hi Hi Hi Hi Hi Hi Hi Hi Hi Hi Hi Hi Hi Hi Hi Hi Hi Hi Hi Hi Hi Hi Hi Hi Hi Hi Hi Hi Hi Hi Hi Hi Hi Hi '" ] }, - "execution_count": 103, + "execution_count": 104, "metadata": {}, "output_type": "execute_result" } @@ -3531,7 +3568,7 @@ }, { "cell_type": "code", - "execution_count": 104, + "execution_count": 105, "metadata": { "slideshow": { "slide_type": "slide" @@ -3544,7 +3581,7 @@ }, { "cell_type": "code", - "execution_count": 105, + "execution_count": 106, "metadata": { "slideshow": { "slide_type": "-" @@ -3568,7 +3605,7 @@ }, { "cell_type": "code", - "execution_count": 106, + "execution_count": 107, "metadata": { "slideshow": { "slide_type": "skip" @@ -3579,12 +3616,12 @@ "name": "stdout", "output_type": "stream", "text": [ - "I change the display of the computer\n" + "I change the state of the computer's display\n" ] } ], "source": [ - "print(\"I change the display of the computer\")" + "print(\"I change the state of the computer's display\")" ] }, { @@ -3615,7 +3652,7 @@ }, { "cell_type": "code", - "execution_count": 107, + "execution_count": 108, "metadata": { "slideshow": { "slide_type": "slide" @@ -3643,7 +3680,7 @@ }, { "cell_type": "code", - "execution_count": 108, + "execution_count": 109, "metadata": { "slideshow": { "slide_type": "fragment" @@ -3656,7 +3693,7 @@ }, { "cell_type": "code", - "execution_count": 109, + "execution_count": 110, "metadata": { "slideshow": { "slide_type": "-" @@ -3686,17 +3723,8 @@ } }, "source": [ - "We end each chapter with a summary of the main points. The essence in this first chapter is that just like a sentence in a real language like English may be decomposed into its parts (e.g., subject, predicate, and objects), the same may be done with programming languages." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "skip" - } - }, - "source": [ + "We end each chapter with a summary of the main points (i.e., **TL;DR** = \"too long; didn't read\"). The essence in this first chapter is that just as a sentence in a real language like English may be decomposed into its parts (e.g., subject, predicate, and objects), the same may be done with programming languages.\n", + "\n", "- program\n", " - **sequence** of **instructions** that specify how to perform a computation (= a \"recipe\")\n", " - a \"black box\" that processes **inputs** and transforms them into meaningful **outputs** in a *deterministic* way\n", @@ -3764,7 +3792,7 @@ "\n", "- flow control (cf., [Chapter 3](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/03_conditionals.ipynb))\n", " - expression of **business logic** or an **algorithm**\n", - " - conditional execution of a small **branch** within a program (e.g., `if` statements)\n", + " - conditional execution of parts of a program (e.g., `if` statements)\n", " - repetitive execution of parts of a program (e.g., `for`-loops)" ] } diff --git a/02_functions.ipynb b/02_functions.ipynb index 941c71b..897f8a7 100644 --- a/02_functions.ipynb +++ b/02_functions.ipynb @@ -19,7 +19,7 @@ } }, "source": [ - "In [Chapter 1](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/01_elements.ipynb), we typed the **business logic** of our little program to calculate the mean of a subset of a list of numbers right into the code cells. Then, we executed them one after another. We had no way of **re-using** the code except for either re-executing the cells or copying and pasting their contents into other cells. And whenever we find ourselves doing repetitive manual work, we can be sure that there must be a way of automating what we are doing.\n", + "In [Chapter 1](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/01_elements.ipynb#Example:-Averaging-Even-Numbers), we typed the code to calculate the average of the even numbers in a list into several code cells. Then, we executed them one after another. We had no way of **re-using** the code except for either re-executing the cells or copying and pasting their contents into other cells. And, whenever we find ourselves doing repetitive manual work, we can be sure that there must be a way of automating what we are doing.\n", "\n", "At the same time, we executed built-in functions (e.g., [print()](https://docs.python.org/3/library/functions.html#print), [sum()](https://docs.python.org/3/library/functions.html#sum), [len()](https://docs.python.org/3/library/functions.html#len), [id()](https://docs.python.org/3/library/functions.html#id), or [type()](https://docs.python.org/3/library/functions.html#type)) that obviously must be re-using the same parts inside core Python every time we use them.\n", "\n", @@ -45,7 +45,7 @@ } }, "source": [ - "So-called **[user-defined functions](https://docs.python.org/3/reference/compound_stmts.html#function-definitions)** may be created with the `def` statement. To extend an already familiar example, we re-use the introductory example from [Chapter 1](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/01_elements.ipynb) in its final Pythonic version and transform it into the function `average_evens()` below. \n", + "So-called **[user-defined functions](https://docs.python.org/3/reference/compound_stmts.html#function-definitions)** may be created with the `def` statement. To extend an already familiar example, we re-use the introductory example from [Chapter 1](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/01_elements.ipynb#Best-Practices) in its final Pythonic version and transform it into the function `average_evens()` below. \n", "\n", "A function's **name** must be chosen according to the same naming rules as ordinary variables as Python manages function names like variables. In this book, we further adopt the convention of ending function names with parentheses `()` in text cells for faster comprehension when reading (i.e., `average_evens()` vs. `average_evens`). These are *not* part of the name but must always be written out in the `def` statement for syntactic reasons.\n", "\n", @@ -55,9 +55,9 @@ "\n", "Together, the name and the list of parameters are also referred to as the function's **[signature](https://en.wikipedia.org/wiki/Type_signature)** (i.e., `average_evens(numbers)` below).\n", "\n", - "A function may come with an *explicit* **[return value](https://docs.python.org/3/reference/simple_stmts.html#the-return-statement)** (i.e., \"result\" or \"output\") specified with the `return` statement: Functions that have one are considered **fruitful**; otherwise, they are **void**. Functions of the latter kind are still useful because of their **side effects** (e.g., the [print()](https://docs.python.org/3/library/functions.html#print) built-in). Strictly speaking, they also have an *implicit* return value of `None` that is different from the `False` we saw in Chapter 1.\n", + "A function may come with an *explicit* **return value** (i.e., \"result\" or \"output\") specified with the `return` statement (cf., [reference](https://docs.python.org/3/reference/simple_stmts.html#the-return-statement)): Functions that have one are considered **fruitful**; otherwise, they are **void**. Functions of the latter kind are still useful because of their **side effects** as, for example, the built-in [print()](https://docs.python.org/3/library/functions.html#print) function. Strictly speaking, they also have an *implicit* return value of `None`.\n", "\n", - "A function should define a **docstring** that describes what it does in a short subject line, what parameters it expects (i.e., their types), and what it returns (if anything). A docstring is a syntactically valid multi-line string (i.e., type `str`) defined within **triple-double quotes** `\"\"\"`. Strings are covered in depth in [Chapter 6](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/06_text.ipynb). Widely adopted standards as to how to format a docstring are [PEP 257](https://www.python.org/dev/peps/pep-0257/) and section 3.8 in [Google's Python Style Guide](https://github.com/google/styleguide/blob/gh-pages/pyguide.md)." + "A function should define a **docstring** that describes what it does in a short subject line, what parameters it expects (i.e., their types), and what it returns (if anything). A docstring is a syntactically valid multi-line string (i.e., type `str`) defined within **triple-double quotes** `\"\"\"`. Strings are covered in depth in [Chapter 6](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/06_text.ipynb#The-str-Type). Widely adopted standards as to how to format a docstring are [PEP 257](https://www.python.org/dev/peps/pep-0257/) and section 3.8 in [Google's Python Style Guide](https://github.com/google/styleguide/blob/gh-pages/pyguide.md)." ] }, { @@ -142,7 +142,7 @@ { "data": { "text/plain": [ - "140519730762208" + "139646582981088" ] }, "execution_count": 3, @@ -302,7 +302,7 @@ }, "outputs": [], "source": [ - "nums = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]" + "nums = [7, 11, 8, 5, 3, 12, 2, 6, 9, 10, 1, 4]" ] }, { @@ -317,7 +317,7 @@ { "data": { "text/plain": [ - "6.0" + "7.0" ] }, "execution_count": 9, @@ -337,7 +337,7 @@ } }, "source": [ - "The return value is commonly assigned to a new variable for later reference. Otherwise, we would lose access to it in memory right away." + "The return value is commonly assigned to a variable for later reference. Otherwise, we would lose access to it in memory right away." ] }, { @@ -365,7 +365,7 @@ { "data": { "text/plain": [ - "6.0" + "7.0" ] }, "execution_count": 11, @@ -493,7 +493,7 @@ } }, "source": [ - "[PythonTutor](http://pythontutor.com/visualize.html#code=nums%20%3D%20%5B1,%202,%203,%204,%205,%206,%207,%208,%209,%2010%5D%0A%0Adef%20average_evens%28numbers%29%3A%0A%20%20%20%20evens%20%3D%20%5Bn%20for%20n%20in%20numbers%20if%20n%20%25%202%20%3D%3D%200%5D%0A%20%20%20%20average%20%3D%20sum%28evens%29%20/%20len%28evens%29%0A%20%20%20%20return%20average%0A%0Arv%20%3D%20average_evens%28nums%29&cumulative=false&curInstr=0&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false) visualizes what happens in memory: To be precise, in the exact moment when the function call is initiated and `nums` passed in as the `numbers` argument, there are *two* pointers to the *same* `list` object (cf., steps 4-5 in the visualization). We also see how Python creates a *new* **frame** that holds the function's local scope (i.e., \"internal names\") in addition to the **global** frame. Frames are nothing but [namespaces](https://en.wikipedia.org/wiki/Namespace) to *isolate* the names of different **scopes** from each other. The list comprehension `[n for n in numbers if n % 2 == 0]` constitutes yet another frame that is in scope as the `list` object assigned to `evens` is *being* created (cf., steps 6-18). When the function returns, only the global frame is left (cf., steps 21-22)." + "[PythonTutor](http://pythontutor.com/visualize.html#code=nums%20%3D%20%5B1,%202,%203,%204,%205,%206,%207,%208,%209,%2010,%2011,%2012%5D%0A%0Adef%20average_evens%28numbers%29%3A%0A%20%20%20%20evens%20%3D%20%5Bn%20for%20n%20in%20numbers%20if%20n%20%25%202%20%3D%3D%200%5D%0A%20%20%20%20average%20%3D%20sum%28evens%29%20/%20len%28evens%29%0A%20%20%20%20return%20average%0A%0Arv%20%3D%20average_evens%28nums%29&cumulative=false&curInstr=0&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false) visualizes what happens in memory: To be precise, in the exact moment when the function call is initiated and `nums` passed in as the `numbers` argument, there are *two* pointers to the *same* `list` object (cf., steps 4-5 in the visualization). We also see how Python creates a *new* **frame** that holds the function's local scope (i.e., \"internal names\") in addition to the **global** frame. Frames are nothing but [namespaces](https://en.wikipedia.org/wiki/Namespace) to *isolate* the names of different **scopes** from each other. The list comprehension `[n for n in numbers if n % 2 == 0]` constitutes yet another frame that is in scope as the `list` object assigned to `evens` is *being* created (cf., steps 6-20). When the function returns, only the global frame is left (cf., last step)." ] }, { @@ -565,7 +565,7 @@ { "data": { "text/plain": [ - "[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]" + "[7, 11, 8, 5, 3, 12, 2, 6, 9, 10, 1, 4]" ] }, "execution_count": 16, @@ -600,7 +600,7 @@ { "data": { "text/plain": [ - "6.0" + "7.0" ] }, "execution_count": 17, @@ -635,7 +635,7 @@ { "data": { "text/plain": [ - "6.0" + "7.0" ] }, "execution_count": 18, @@ -655,9 +655,9 @@ } }, "source": [ - "[PythonTutor](http://pythontutor.com/visualize.html#code=nums%20%3D%20%5B1,%202,%203,%204,%205,%206,%207,%208,%209,%2010%5D%0A%0Adef%20average_wrong%28numbers%29%3A%0A%20%20%20%20evens%20%3D%20%5Bn%20for%20n%20in%20nums%20if%20n%20%25%202%20%3D%3D%200%5D%0A%20%20%20%20average%20%3D%20sum%28evens%29%20/%20len%28evens%29%0A%20%20%20%20return%20average%0A%0Arv%20%3D%20average_wrong%28%5B123,%20456,%20789%5D%29&cumulative=false&curInstr=0&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false) is again helpful at visualizing the error interactively: Creating the `list` object `evens` eventually points to takes *12* computational steps, namely one for setting up an empty `list` object, *ten* for filling it with elements derived from `nums` in the global scope, and one to make `evens` point at it (cf., steps 6-18).\n", + "[PythonTutor](http://pythontutor.com/visualize.html#code=nums%20%3D%20%5B1,%202,%203,%204,%205,%206,%207,%208,%209,%2010,%2011,%2012%5D%0A%0Adef%20average_wrong%28numbers%29%3A%0A%20%20%20%20evens%20%3D%20%5Bn%20for%20n%20in%20nums%20if%20n%20%25%202%20%3D%3D%200%5D%0A%20%20%20%20average%20%3D%20sum%28evens%29%20/%20len%28evens%29%0A%20%20%20%20return%20average%0A%0Arv%20%3D%20average_wrong%28%5B123,%20456,%20789%5D%29&cumulative=false&curInstr=0&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false) is again helpful at visualizing the error interactively: Creating the `list` object `evens` eventually points to takes *16* computational steps, namely two for managing the list comprehension, one for setting up an empty `list` object, *twelve* for filling it with elements derived from `nums` in the global scope (i.e., that is the error), and one to make `evens` point at it (cf., steps 6-21).\n", "\n", - "The frames logic shown by PythonTutor is the mechanism by which Python not only manages the names inside *one* function call but also for *many* potentially simultaneous* calls, as revealed in [Chapter 4](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/04_iteration.ipynb). It is the reason why we may re-use the same names for the parameters and variables inside both `average_evens()` and `average_wrong()` without Python mixing them up. So, as we already read in the [Zen of Python](https://www.python.org/dev/peps/pep-0020/), \"namespaces are one honking great idea\" (cf., `import this`), and a frame is just a special kind of namespace." + "The frames logic shown by PythonTutor is the mechanism by which Python not only manages the names inside *one* function call but also for *many* potentially *simultaneous* calls, as revealed in [Chapter 4](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/04_iteration.ipynb#Trivial-Example:-Countdown). It is the reason why we may re-use the same names for the parameters and variables inside both `average_evens()` and `average_wrong()` without Python mixing them up. So, as we already read in the [Zen of Python](https://www.python.org/dev/peps/pep-0020/), \"namespaces are one honking great idea\" (cf., `import this`), and a frame is just a special kind of namespace." ] }, { @@ -732,7 +732,7 @@ { "data": { "text/plain": [ - "[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]" + "[7, 11, 8, 5, 3, 12, 2, 6, 9, 10, 1, 4]" ] }, "execution_count": 20, @@ -802,7 +802,7 @@ { "data": { "text/plain": [ - "5.0" + "6.0" ] }, "execution_count": 22, @@ -837,7 +837,7 @@ { "data": { "text/plain": [ - "[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]" + "[7, 11, 8, 5, 3, 12, 2, 6, 9, 10, 1, 4]" ] }, "execution_count": 23, @@ -859,7 +859,7 @@ "source": [ "The reason why everything works is that *every time* we (re-)assign an object to a variable *inside* a function with the `=` statement, this is done in the *local* scope by default. There are ways to change variables existing in an outer scope from within a function, but this is a rather advanced topic on its own.\n", "\n", - "[PythonTutor](http://pythontutor.com/visualize.html#code=nums%20%3D%20%5B1,%202,%203,%204,%205,%206,%207,%208,%209,%2010%5D%0A%0Adef%20average_odds%28numbers%29%3A%0A%20%20%20%20nums%20%3D%20%5Bint%28n%29%20for%20n%20in%20numbers%5D%0A%20%20%20%20odds%20%3D%20%5Bn%20for%20n%20in%20nums%20if%20n%20%25%202%20!%3D%200%5D%0A%20%20%20%20average%20%3D%20sum%28odds%29%20/%20len%28odds%29%0A%20%20%20%20return%20average%0A%0Arv%20%3D%20average_odds%28%5B1.0,%2010.0,%203.0,%2010.0,%205.0%5D%29&cumulative=false&curInstr=0&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false) shows how *two* `nums` variables exist in *different* scopes pointing to *different* objects (cf., steps 14-25) when we execute `average_odds([1.0, 10.0, 3.0, 10.0, 5.0])`.\n", + "[PythonTutor](http://pythontutor.com/visualize.html#code=nums%20%3D%20%5B1,%202,%203,%204,%205,%206,%207,%208,%209,%2010,%2011,%2012%5D%0A%0Adef%20average_odds%28numbers%29%3A%0A%20%20%20%20nums%20%3D%20%5Bint%28n%29%20for%20n%20in%20numbers%5D%0A%20%20%20%20odds%20%3D%20%5Bn%20for%20n%20in%20nums%20if%20n%20%25%202%20!%3D%200%5D%0A%20%20%20%20average%20%3D%20sum%28odds%29%20/%20len%28odds%29%0A%20%20%20%20return%20average%0A%0Arv%20%3D%20average_odds%28%5B1.0,%2010.0,%203.0,%2010.0,%205.0%5D%29&cumulative=false&curInstr=0&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false) shows how *two* `nums` variables exist in *different* scopes pointing to *different* objects (cf., steps 14-25) when we execute `average_odds([1.0, 10.0, 3.0, 10.0, 5.0])`.\n", "\n", "Variables whose names collide with the ones of variables in enclosing scopes - and the global scope is just the most enclosing scope - are said to **shadow** them.\n", "\n", @@ -902,7 +902,7 @@ { "data": { "text/plain": [ - "10" + "12" ] }, "execution_count": 24, @@ -926,7 +926,7 @@ { "data": { "text/plain": [ - "55" + "78" ] }, "execution_count": 25, @@ -946,7 +946,7 @@ } }, "source": [ - "We may cast objects as a different type. For example, to \"convert\" a float or a text into an integer, we use the [int()](https://docs.python.org/3/library/functions.html#int) built-in. It creates a *new* object of type `int` from the provided `avg` or `\"6\"` objects that continue to exist in memory unchanged." + "We may cast objects as a different type. For example, to \"convert\" a float or a text into an integer, we use the [int()](https://docs.python.org/3/library/functions.html#int) built-in. It creates a *new* object of type `int` from the provided `avg` or `\"7\"` objects that continue to exist in memory unchanged." ] }, { @@ -961,7 +961,7 @@ { "data": { "text/plain": [ - "6.0" + "7.0" ] }, "execution_count": 26, @@ -985,7 +985,7 @@ { "data": { "text/plain": [ - "6" + "7" ] }, "execution_count": 27, @@ -1009,7 +1009,7 @@ { "data": { "text/plain": [ - "6" + "7" ] }, "execution_count": 28, @@ -1018,7 +1018,7 @@ } ], "source": [ - "int(\"6\")" + "int(\"7\")" ] }, { @@ -1029,7 +1029,7 @@ } }, "source": [ - "Observe that casting as an integer is different from rounding with the [round()](https://docs.python.org/3/library/functions.html#round) built-in function." + "Casting as an `int` object is different from rounding with the built-in [round()](https://docs.python.org/3/library/functions.html#round) function!" ] }, { @@ -1102,18 +1102,18 @@ "outputs": [ { "ename": "ValueError", - "evalue": "invalid literal for int() with base 10: 'six'", + "evalue": "invalid literal for int() with base 10: 'seven'", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)", - "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"six\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", - "\u001b[0;31mValueError\u001b[0m: invalid literal for int() with base 10: 'six'" + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"seven\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;31mValueError\u001b[0m: invalid literal for int() with base 10: 'seven'" ] } ], "source": [ - "int(\"six\")" + "int(\"seven\")" ] }, { @@ -1170,7 +1170,7 @@ } }, "source": [ - "So far, we have only specified one parameter in each of our user-defined functions. In [Chapter 1](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/01_elements.ipynb), however, we saw the built-in function [divmod()](https://docs.python.org/3/library/functions.html#divmod) take two arguments. And, the order of the numbers passed in mattered! Whenever we call a function and list its arguments in a comma separated manner, we say that we pass in the arguments by position or refer to them as **positional arguments**." + "So far, we have only specified one parameter in each of our user-defined functions. In [Chapter 1](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/01_elements.ipynb#%28Arithmetic%29-Operators), however, we saw the built-in function [divmod()](https://docs.python.org/3/library/functions.html#divmod) take two arguments. And, the order of the numbers passed in mattered! Whenever we call a function and list its arguments in a comma separated manner, we say that we pass in the arguments by position or refer to them as **positional arguments**." ] }, { @@ -1281,7 +1281,7 @@ { "data": { "text/plain": [ - "12.0" + "14.0" ] }, "execution_count": 36, @@ -1318,7 +1318,7 @@ { "data": { "text/plain": [ - "12.0" + "14.0" ] }, "execution_count": 37, @@ -1342,7 +1342,7 @@ { "data": { "text/plain": [ - "12.0" + "14.0" ] }, "execution_count": 38, @@ -1366,7 +1366,7 @@ { "data": { "text/plain": [ - "12.0" + "14.0" ] }, "execution_count": 39, @@ -1483,7 +1483,7 @@ } }, "source": [ - "The outcome of `average_evens(nums)` is, of course, still `6.0`." + "The outcome of `average_evens(nums)` is, of course, still `7.0`." ] }, { @@ -1498,7 +1498,7 @@ { "data": { "text/plain": [ - "6.0" + "7.0" ] }, "execution_count": 43, @@ -1555,7 +1555,7 @@ } }, "source": [ - "Now we call the function either with or without the `scalar` argument.\n", + "Now, we call the function either with or without the `scalar` argument.\n", "\n", "If `scalar` is to be passed in, this may be done as either a positional or a keyword argument. Which of the two versions where `scalar` is `2` is more \"natural\" and faster to comprehend in a larger program?" ] @@ -1572,7 +1572,7 @@ { "data": { "text/plain": [ - "6.0" + "7.0" ] }, "execution_count": 45, @@ -1596,7 +1596,7 @@ { "data": { "text/plain": [ - "12.0" + "14.0" ] }, "execution_count": 46, @@ -1620,7 +1620,7 @@ { "data": { "text/plain": [ - "12.0" + "14.0" ] }, "execution_count": 47, @@ -1705,7 +1705,7 @@ { "data": { "text/plain": [ - "6.0" + "7.0" ] }, "execution_count": 49, @@ -1729,7 +1729,7 @@ { "data": { "text/plain": [ - "12.0" + "14.0" ] }, "execution_count": 50, @@ -1749,7 +1749,7 @@ } }, "source": [ - "If we pass in `scalar` as a positional argument instead, we obtain a `TypeError`." + "If we pass in `scalar` as a positional argument instead, we get a `TypeError`." ] }, { @@ -1841,11 +1841,11 @@ } }, "source": [ - "If you think this is a rather pointless thing to do, you are absolutely correct!\n", + "If you think this is rather pointless to do, you are absolutely correct!\n", "\n", "We created a `function` object, dit *not* call it, and Python immediately forgot about it. So what's the point?\n", "\n", - "To prove that a `lambda` expression creates a callable `function` object, we use the simple `=` statement to assign it to the variable `add_three`, which is really `add_three()` as per our convention from above." + "To show that a `lambda` expression creates a callable `function` object, we use the simple `=` statement to assign it to the variable `add_three`, which is really `add_three()` as per our convention from above." ] }, { @@ -1928,7 +1928,7 @@ } ], "source": [ - "(lambda x: x + 3)(39) # this looks weird but will become very useful" + "(lambda x: x + 3)(39)" ] }, { @@ -1939,9 +1939,9 @@ } }, "source": [ - "The main point of having functions without a name is to use them in a situation where we know ahead of time that we use the function *once* only.\n", + "The main point of having functions without a name is to use them in a situation where we know ahead of time that we use the function only *once*.\n", "\n", - "Popular applications of lambda expressions are with the **map-filter-reduce** paradigm in Chapter 7 or when we do \"number crunching\" with **arrays** and **data frames** in Chapter 9." + "Popular applications of lambda expressions occur in combination with the **map-filter-reduce** paradigm or when we do \"number crunching\" with **arrays** and **data frames**. We look at both in detail in [Chapter 7](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/07_sequences.ipynb)." ] }, { @@ -2021,7 +2021,7 @@ "source": [ "The [math](https://docs.python.org/3/library/math.html) module provides non-trivial mathematical functions like $sin(x)$ and constants like $\\pi$ or $\\text{e}$.\n", "\n", - "To make functions and variables defined \"somewhere else\" available in our current program, we must first **[import](https://docs.python.org/3/reference/simple_stmts.html#import)** them with the `import` statement. " + "To make functions and variables defined \"somewhere else\" available in our current program, we must first **import** them with the `import` statement (cf., [reference](https://docs.python.org/3/reference/simple_stmts.html#import)). " ] }, { @@ -2084,7 +2084,7 @@ { "data": { "text/plain": [ - "140519828111832" + "139646755983752" ] }, "execution_count": 58, @@ -2356,7 +2356,7 @@ "source": [ "Observe how the arguments passed to functions do not need to be just variables or simple literals. Instead, we may pass in any *expression* that evaluates to a *new* object of the type the function expects.\n", "\n", - "So just as a reminder from the expression vs. statement discussion in [Chapter 1](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/01_elements.ipynb): An expression is *any* syntactically correct combination of variables and literals with operators. And the call operator `()` is yet another operator. So both of the next two code cells are just expressions! They have no permanent side effects in memory. We may execute them as often as we want *without* changing the state of the program (i.e., this Jupyter notebook).\n", + "So just as a reminder from the expression vs. statement discussion in [Chapter 1](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/01_elements.ipynb#Expressions): An expression is *any* syntactically correct combination of variables and literals with operators. And the call operator `()` is yet another operator. So both of the next two code cells are just expressions! They have no permanent side effects in memory. We may execute them as often as we want *without* changing the state of the program (i.e., this Jupyter notebook).\n", "\n", "So, regarding the very next cell in particular: Although the `2 ** 2` creates a *new* object `4` in memory that is then immediately passed into the [math.sqrt()](https://docs.python.org/3/library/math.html#math.sqrt) function, once that function call returns, \"all is lost\" and the newly created `4` object is forgotten again, as well as the return value of [math.sqrt()](https://docs.python.org/3/library/math.html#math.sqrt)." ] @@ -2489,7 +2489,7 @@ } }, "source": [ - "Often, we need a random variable, for example, when we want to build a simulation program. The [random](https://docs.python.org/3/library/random.html) module in the [standard library](https://docs.python.org/3/library/index.html) often suffices for that." + "Often, we need a random variable, for example, when we want to build a simulation. The [random](https://docs.python.org/3/library/random.html) module in the [standard library](https://docs.python.org/3/library/index.html) often suffices for that." ] }, { @@ -2537,7 +2537,7 @@ } }, "source": [ - "Besides the usual dunder-style attributes, the built-in [dir()](https://docs.python.org/3/library/functions.html#dir) function lists some attributes in an upper case naming convention and many others starting with a *single* underscore `_`. To understand the former, we must wait until Chapter 10, while the latter is explained further below." + "Besides the usual dunder-style attributes, the built-in [dir()](https://docs.python.org/3/library/functions.html#dir) function lists some attributes in an upper case naming convention and many others starting with a *single* underscore `_`. To understand the former, we must wait until Chapter 9, while the latter is explained further below." ] }, { @@ -2697,7 +2697,7 @@ { "data": { "text/plain": [ - "0.5291407120147841" + "0.21932911318720072" ] }, "execution_count": 75, @@ -2732,7 +2732,7 @@ { "data": { "text/plain": [ - ">" + ">" ] }, "execution_count": 76, @@ -2781,7 +2781,7 @@ { "data": { "text/plain": [ - "4" + "7" ] }, "execution_count": 78, @@ -2801,7 +2801,7 @@ } }, "source": [ - "To reproduce the same random numbers in a simulation each time we run it, we set the **[random seed](https://en.wikipedia.org/wiki/Random_seed)**. It is good practice to do this at the beginning of a program or notebook. Then every time we re-start the program, we get the *same* random numbers again. This becomes very important, for example, when we employ machine learning algorithms that rely on randomization, like the infamous [Random Forest](https://en.wikipedia.org/wiki/Random_forest), and want to obtain **reproducable** results.\n", + "To reproduce the *same* random numbers in a simulation each time we run it, we set the **[random seed](https://en.wikipedia.org/wiki/Random_seed)**. It is good practice to do that at the beginning of a program or notebook. It becomes essential when we employ randomized machine learning algorithms, like the [Random Forest](https://en.wikipedia.org/wiki/Random_forest), and want to obtain **reproducible** results for publication in academic journals.\n", "\n", "The [random](https://docs.python.org/3/library/random.html) module provides the [random.seed()](https://docs.python.org/3/library/random.html#random.seed) function to do that." ] @@ -2927,9 +2927,9 @@ } }, "source": [ - "[numpy](http://www.numpy.org/) is the de-facto standard in the Python world for handling **array-like** data. That is a fancy word for data that can be put into a matrix or vector format. We look at it in depth in Chapter 9.\n", + "[numpy](http://www.numpy.org/) is the de-facto standard in the Python world for handling **array-like** data. That is a fancy word for data that can be put into a matrix or vector format. We look at it in depth in [Chapter 7](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/07_sequences.ipynb).\n", "\n", - "As [numpy](http://www.numpy.org/) is *not* in the [standard library](https://docs.python.org/3/library/index.html), it must be *manually* installed, for example, with the [pip](https://pip.pypa.io/en/stable/) tool. As mentioned in [Chapter 0](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/00_start_up.ipynb), to execute terminal commands from within a Jupyter notebook, we start a code cell with an exclamation mark.\n", + "As [numpy](http://www.numpy.org/) is *not* in the [standard library](https://docs.python.org/3/library/index.html), it must be *manually* installed, for example, with the [pip](https://pip.pypa.io/en/stable/) tool. As mentioned in [Chapter 0](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/00_start_up.ipynb#Markdown-vs.-Code-Cells), to execute terminal commands from within a Jupyter notebook, we start a code cell with an exclamation mark.\n", "\n", "If you are running this notebook with an installation of the [Anaconda Distribution](https://www.anaconda.com/distribution/), then [numpy](http://www.numpy.org/) is probably already installed. Running the cell below confirms that." ] @@ -3050,7 +3050,7 @@ { "data": { "text/plain": [ - "array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])" + "array([ 7, 11, 8, 5, 3, 12, 2, 6, 9, 10, 1, 4])" ] }, "execution_count": 87, @@ -3111,7 +3111,7 @@ { "data": { "text/plain": [ - "array([ 2, 4, 6, 8, 10, 12, 14, 16, 18, 20])" + "array([14, 22, 16, 10, 6, 24, 4, 12, 18, 20, 2, 8])" ] }, "execution_count": 89, @@ -3146,7 +3146,7 @@ { "data": { "text/plain": [ - "[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]" + "[7, 11, 8, 5, 3, 12, 2, 6, 9, 10, 1, 4, 7, 11, 8, 5, 3, 12, 2, 6, 9, 10, 1, 4]" ] }, "execution_count": 90, @@ -3181,7 +3181,7 @@ { "data": { "text/plain": [ - "55" + "78" ] }, "execution_count": 91, @@ -3205,7 +3205,7 @@ { "data": { "text/plain": [ - "1" + "7" ] }, "execution_count": 92, @@ -3389,7 +3389,7 @@ { "data": { "text/plain": [ - "6.0" + "7.0" ] }, "execution_count": 97, @@ -3413,7 +3413,7 @@ { "data": { "text/plain": [ - "12.0" + "14.0" ] }, "execution_count": 98, @@ -3433,7 +3433,7 @@ } }, "source": [ - "Packages are a generalization of modules, and we look at one in detail in Chapter 10. You may, however, already look at a [sample package](https://github.com/webartifex/intro-to-python/tree/master/sample_package) in the repository, which is nothing but a folder with *.py* files in it.\n", + "Packages are a generalization of modules, and we look at one in detail in Chapter 9. You may, however, already look at a [sample package](https://github.com/webartifex/intro-to-python/tree/master/sample_package) in the repository, which is nothing but a folder with *.py* files in it.\n", "\n", "As a further reading on modules and packages, we refer to the [official tutorial](https://docs.python.org/3/tutorial/modules.html)." ] diff --git a/03_conditionals.ipynb b/03_conditionals.ipynb index d0ed61d..072448a 100644 --- a/03_conditionals.ipynb +++ b/03_conditionals.ipynb @@ -139,7 +139,7 @@ } }, "source": [ - "There are, however, cases where even well-behaved Python does not make us happy. [Chapter 5](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/05_numbers.ipynb) provides more insights into this \"bug.\"" + "There are, however, cases where even well-behaved Python does not make us happy. [Chapter 5](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/05_numbers.ipynb#Imprecision) provides more insights into this \"bug.\"" ] }, { @@ -189,7 +189,7 @@ { "data": { "text/plain": [ - "94906834637792" + "94711290794976" ] }, "execution_count": 5, @@ -213,7 +213,7 @@ { "data": { "text/plain": [ - "94906834637760" + "94711290794944" ] }, "execution_count": 6, @@ -281,7 +281,7 @@ } }, "source": [ - "Let's not confuse the boolean `False` with `None`, another built-in object! We saw the latter before in [Chapter 2](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/02_functions.ipynb) as the *implicit* return value of a function without a `return` statement.\n", + "Let's not confuse the boolean `False` with `None`, another built-in object! We saw the latter before in [Chapter 2](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/02_functions.ipynb#Function-Definition) as the *implicit* return value of a function without a `return` statement.\n", "\n", "We might think of `None` in a boolean context indicating a \"maybe\" or even an \"unknown\" answer; however, for Python, there are no \"maybe\" or \"unknown\" objects, as we see further below!\n", "\n", @@ -313,7 +313,7 @@ { "data": { "text/plain": [ - "94906834624752" + "94711290781936" ] }, "execution_count": 10, @@ -357,7 +357,7 @@ } }, "source": [ - "`True`, `False`, and `None` have the property that they each exist in memory only *once*. Objects designed this way are so-called **singletons**. This **[design pattern](https://en.wikipedia.org/wiki/Design_Patterns)** was originally developed to keep a program's memory usage at a minimum. It may only be employed in situations where we know that an object does *not* mutate its value (i.e., to re-use the bag analogy from [Chapter 1](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/01_elements.ipynb), no flipping of $0$s and $1$s in the bag is allowed). In languages \"closer\" to the memory like C, we would have to code this singleton logic ourselves, but Python has this built in for *some* types.\n", + "`True`, `False`, and `None` have the property that they each exist in memory only *once*. Objects designed this way are so-called **singletons**. This **[design pattern](https://en.wikipedia.org/wiki/Design_Patterns)** was originally developed to keep a program's memory usage at a minimum. It may only be employed in situations where we know that an object does *not* mutate its value (i.e., to re-use the bag analogy from [Chapter 1](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/01_elements.ipynb#Objects-vs.-Types-vs.-Values), no flipping of $0$s and $1$s in the bag is allowed). In languages \"closer\" to the memory like C, we would have to code this singleton logic ourselves, but Python has this built in for *some* types.\n", "\n", "We verify this with either the `is` operator or by comparing memory addresses." ] @@ -897,9 +897,11 @@ } }, "source": [ - "The operands of the logical operators do not need to be *boolean* expressions as defined above but may be *any* expression. If a sub-expression does *not* evaluate to an object of type `bool`, Python automatically casts it as such.\n", + "The operands of the logical operators do not need to be *boolean* expressions but may be *any* expression. If a sub-expression does *not* evaluate to an object of type `bool`, Python automatically casts it as such.\n", "\n", - "For example, any non-zero numeric object becomes `True`. While this behavior allows writing more concise and thus more \"beautiful\" code, it is also a common source of confusion. `(x - 9)` is cast as `True` and then the overall expression evaluates to `True` as well." + "For example, any non-zero numeric object becomes `True`. While this behavior allows writing more concise and thus more \"beautiful\" code, it is also a common source of confusion.\n", + "\n", + "So, `(x - 9)` is cast as `True` and then the overall expression evaluates to `True` as well." ] }, { @@ -1122,7 +1124,7 @@ } }, "source": [ - "Pythonistas often use the terms **truthy** or **falsy** to describe a non-boolean expression's behavior when used in place of a boolean one." + "Pythonistas use the terms **truthy** or **falsy** to describe a non-boolean expression's behavior when evaluated in a boolean context." ] }, { @@ -1144,14 +1146,18 @@ } }, "source": [ - "When evaluating boolean expressions with logical operators in it, Python follows the **[short-circuiting](https://en.wikipedia.org/wiki/Short-circuit_evaluation)** strategy: First, the inner-most sub-expressions are evaluated. Second, with identical **[operator precedence](https://docs.python.org/3/reference/expressions.html#operator-precedence)**, evaluation goes from left to right. Once it is clear what the overall truth value is, no more sub-expressions are evaluated, and the result is *immediately* returned.\n", + "When evaluating expressions involving the `and` and `or` operators, Python follows the **[short-circuiting](https://en.wikipedia.org/wiki/Short-circuit_evaluation)** strategy: Once it is clear what the overall truth value is, no more sub-expressions are evaluated, and the result is *immediately* returned.\n", "\n", - "In summary, data science practitioners must know *how* the following two generic expressions are evaluated:\n", + "Also, if such expressions are evaluated in a non-boolean context, the result is returned as is and *not* cast as a `bool` type.\n", "\n", - "- `x or y`: The `y` expression is evaluated *only if* `x` evaluates to `False`, in which case `y` is returned; otherwise, `x` is returned *without* even looking at `y`.\n", - "- `x and y`: The `y` expression is evaluated *only if* `x` evaluates to `True`. Then, if `y` also evaluates to `True`, it is returned; otherwise, `x` is returned.\n", + "The two rules can be summarized as:\n", "\n", - "Let's look at a couple of examples." + "- `x or y`: If `x` is truthy, it is returned *without* evaluating `y`. Otherwise, `y` is evaluated *and* returned.\n", + "- `x and y`: If `x` is falsy, it is returned *without* evaluating `y`. Otherwise, `y` is evaluated *and* returned.\n", + "\n", + "The rules may also be chained or combined.\n", + "\n", + "Let's look at a couple of examples below. To visualize which sub-expressions are evaluated, we define a helper function `expr()` that prints out the only argument it is passed before returning it." ] }, { @@ -1163,32 +1169,6 @@ } }, "outputs": [], - "source": [ - "x = 0\n", - "y = 1\n", - "z = 2" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "skip" - } - }, - "source": [ - "We define a helper function `expr()` that prints out the only argument it is passed before returning it. With `expr()`, we can see if a sub-expression is evaluated or not." - ] - }, - { - "cell_type": "code", - "execution_count": 37, - "metadata": { - "slideshow": { - "slide_type": "fragment" - } - }, - "outputs": [], "source": [ "def expr(arg):\n", " \"\"\"Print and return the only argument.\"\"\"\n", @@ -1204,12 +1184,12 @@ } }, "source": [ - "With the `or` operator, the first sub-expression that evaluates to `True` is returned." + "With the `or` operator, the first truthy sub-expression is returned." ] }, { "cell_type": "code", - "execution_count": 38, + "execution_count": 37, "metadata": { "slideshow": { "slide_type": "slide" @@ -1230,18 +1210,18 @@ "1" ] }, - "execution_count": 38, + "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "expr(x) or expr(y)" + "expr(0) or expr(1)" ] }, { "cell_type": "code", - "execution_count": 39, + "execution_count": 38, "metadata": { "slideshow": { "slide_type": "-" @@ -1261,18 +1241,18 @@ "1" ] }, - "execution_count": 39, + "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "expr(y) or expr(z)" + "expr(1) or expr(2) # 2 is not evaluated due to short-circuiting" ] }, { "cell_type": "code", - "execution_count": 40, + "execution_count": 39, "metadata": { "slideshow": { "slide_type": "-" @@ -1293,13 +1273,13 @@ "1" ] }, - "execution_count": 40, + "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "expr(x) or expr(y) or expr(z)" + "expr(0) or expr(1) or expr(2) # 2 is not evaluated due to short-circuiting" ] }, { @@ -1310,12 +1290,12 @@ } }, "source": [ - "If all sub-expressions evaluate to `False`, the last one is the result." + "If all sub-expressions are falsy, the last one is returned." ] }, { "cell_type": "code", - "execution_count": 41, + "execution_count": 40, "metadata": { "slideshow": { "slide_type": "fragment" @@ -1337,13 +1317,13 @@ "0" ] }, - "execution_count": 41, + "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "expr(False) or expr([]) or expr(x)" + "expr(False) or expr([]) or expr(0)" ] }, { @@ -1354,12 +1334,12 @@ } }, "source": [ - "With the `and` operator, the first sub-expression that evaluates to `False` is returned." + "With the `and` operator, the first falsy sub-expression is returned." ] }, { "cell_type": "code", - "execution_count": 42, + "execution_count": 41, "metadata": { "slideshow": { "slide_type": "slide" @@ -1373,6 +1353,38 @@ "Arg: 0\n" ] }, + { + "data": { + "text/plain": [ + "0" + ] + }, + "execution_count": 41, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "expr(0) and expr(1) # 1 is not evaluated due to short-circuiting" + ] + }, + { + "cell_type": "code", + "execution_count": 42, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Arg: 1\n", + "Arg: 0\n" + ] + }, { "data": { "text/plain": [ @@ -1385,7 +1397,7 @@ } ], "source": [ - "expr(x) and expr(y)" + "expr(1) and expr(0)" ] }, { @@ -1417,40 +1429,7 @@ } ], "source": [ - "expr(y) and expr(x)" - ] - }, - { - "cell_type": "code", - "execution_count": 44, - "metadata": { - "slideshow": { - "slide_type": "-" - } - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Arg: 2\n", - "Arg: 1\n", - "Arg: 0\n" - ] - }, - { - "data": { - "text/plain": [ - "0" - ] - }, - "execution_count": 44, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "expr(z) and expr(y) and expr(x)" + "expr(1) and expr(0) and expr(2) # 2 is not evaluated due to short-circuiting" ] }, { @@ -1461,12 +1440,12 @@ } }, "source": [ - "If all sub-expressions evaluate to `True`, the last one is returned." + "If all sub-expressions are truthy, the last one is returned." ] }, { "cell_type": "code", - "execution_count": 45, + "execution_count": 44, "metadata": { "slideshow": { "slide_type": "fragment" @@ -1487,13 +1466,13 @@ "2" ] }, - "execution_count": 45, + "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "expr(y) and expr(z)" + "expr(1) and expr(2)" ] }, { @@ -1526,24 +1505,24 @@ } }, "source": [ - "To write useful programs, we need to control the flow of execution, for example, to react to user input. The logic by which a program does that is referred to as **business logic**.\n", + "To write useful programs, we need to control the flow of execution, for example, to react to user input. The logic by which a program follows the rules from the \"real world\" is referred to as **[business logic](https://en.wikipedia.org/wiki/Business_logic)**, even if \"real world\" refers to the domain of mathematics and not a business application.\n", "\n", - "One language construct to do so is the **[conditional statement](https://docs.python.org/3/reference/compound_stmts.html#the-if-statement)**, or `if` statement for short. It consists of:\n", + "One language feature to do so is the `if` statement (cf., [reference](https://docs.python.org/3/reference/compound_stmts.html#the-if-statement)). It consists of:\n", "\n", "- *one* mandatory `if`-clause,\n", - "- an *arbitrary* number of `elif`-clauses (i.e. \"else if\"), and\n", + "- an *arbitrary* number of `elif`-clauses (i.e., \"else if\"), and\n", "- an *optional* `else`-clause.\n", "\n", - "The `if`- and `elif`-clauses each specify one *boolean* expression, also called **condition**, while the `else`-clause serves as the \"catch everything else\" case.\n", + "The `if`- and `elif`-clauses each specify one *boolean* expression, also called **condition** in this context, while the `else`-clause serves as the \"catch everything else\" case.\n", "\n", - "In terms of syntax, the header lines end with a colon, and the code blocks are indented.\n", + "In contrast to our intuitive interpretation in natural languages, only the code in *one* of the alternatives, also called **branches**, is executed. To be precise, it is always the code in the first clause whose condition evaluates to `True`.\n", "\n", - "In contrast to our intuitive interpretation in natural languages, only the code in *one* of the alternatives, also called **branches**, is executed. To be precise, it is always the code in the first branch whose condition evaluates to `True`." + "In terms of syntax, the header lines end with a colon, and the code blocks are indented. Formally, any statement that is written across several lines is a **[compound statement](https://docs.python.org/3/reference/compound_stmts.html#compound-statements)**, the code blocks are called **suites** and belong to one header line, and the term **clause** refers to a header line and its suite as a whole. So far, we have seen three compound statements: `for`, `if`, and `def`. On the contrary, **[simple statements](https://docs.python.org/3/reference/simple_stmts.html#simple-statements)**, for example, `=`, `del`, or `return`, are written on *one* line." ] }, { "cell_type": "code", - "execution_count": 46, + "execution_count": 45, "metadata": { "code_folding": [], "slideshow": { @@ -1557,7 +1536,7 @@ }, { "cell_type": "code", - "execution_count": 47, + "execution_count": 46, "metadata": { "code_folding": [], "slideshow": { @@ -1599,7 +1578,7 @@ }, { "cell_type": "code", - "execution_count": 48, + "execution_count": 47, "metadata": { "slideshow": { "slide_type": "slide" @@ -1612,7 +1591,7 @@ }, { "cell_type": "code", - "execution_count": 49, + "execution_count": 48, "metadata": { "slideshow": { "slide_type": "-" @@ -1637,7 +1616,7 @@ }, { "cell_type": "code", - "execution_count": 50, + "execution_count": 49, "metadata": { "slideshow": { "slide_type": "skip" @@ -1674,7 +1653,7 @@ }, { "cell_type": "code", - "execution_count": 51, + "execution_count": 50, "metadata": { "slideshow": { "slide_type": "slide" @@ -1717,7 +1696,7 @@ }, { "cell_type": "code", - "execution_count": 52, + "execution_count": 51, "metadata": { "slideshow": { "slide_type": "slide" @@ -1728,7 +1707,7 @@ "name": "stdout", "output_type": "stream", "text": [ - "z is odd\n" + "z is positive\n" ] } ], @@ -1766,7 +1745,7 @@ } }, "source": [ - "When all we do with an `if` statement is to assign an object to a variable according to a single true-or-false condition (i.e., a binary choice), there is a shortcut: We assign the variable the result of a so-called **conditional expression**, or `if` expression for short, instead.\n", + "When an `if` statement assigns an object to a variable according to a true-or-false condition (i.e., a binary choice), there is a shortcut: We assign the variable the result of a so-called **[conditional expression](https://docs.python.org/3/reference/expressions.html#conditional-expressions)**, or `if` expression for short, instead.\n", "\n", "Think of a situation where we evaluate a piece-wise functional relationship $y = f(x)$ at a given $x$, for example:" ] @@ -1790,7 +1769,7 @@ }, { "cell_type": "code", - "execution_count": 53, + "execution_count": 52, "metadata": { "slideshow": { "slide_type": "slide" @@ -1814,7 +1793,7 @@ }, { "cell_type": "code", - "execution_count": 54, + "execution_count": 53, "metadata": { "slideshow": { "slide_type": "-" @@ -1830,7 +1809,7 @@ }, { "cell_type": "code", - "execution_count": 55, + "execution_count": 54, "metadata": { "slideshow": { "slide_type": "-" @@ -1843,7 +1822,7 @@ "9" ] }, - "execution_count": 55, + "execution_count": 54, "metadata": {}, "output_type": "execute_result" } @@ -1860,12 +1839,12 @@ } }, "source": [ - "On the contrary, the `if` expression fits into one line. The main downside here is a potential loss in readability, in particular, if the functional relationship is not that simple." + "On the contrary, the `if` expression fits into one line. The main downside is a potential loss in readability, in particular, if the functional relationship is not that simple. Also, some practitioners do *not* like that the condition is in the middle of the expression." ] }, { "cell_type": "code", - "execution_count": 56, + "execution_count": 55, "metadata": { "slideshow": { "slide_type": "fragment" @@ -1878,7 +1857,7 @@ }, { "cell_type": "code", - "execution_count": 57, + "execution_count": 56, "metadata": { "slideshow": { "slide_type": "skip" @@ -1891,7 +1870,7 @@ "9" ] }, - "execution_count": 57, + "execution_count": 56, "metadata": {}, "output_type": "execute_result" } @@ -1908,12 +1887,12 @@ } }, "source": [ - "In this example, however, the most elegant solution would be to use the built-in [max()](https://docs.python.org/3/library/functions.html#max) function." + "In this example, however, the most elegant solution is to use the built-in [max()](https://docs.python.org/3/library/functions.html#max) function." ] }, { "cell_type": "code", - "execution_count": 58, + "execution_count": 57, "metadata": { "slideshow": { "slide_type": "fragment" @@ -1926,7 +1905,7 @@ }, { "cell_type": "code", - "execution_count": 59, + "execution_count": 58, "metadata": { "slideshow": { "slide_type": "skip" @@ -1939,7 +1918,7 @@ "9" ] }, - "execution_count": 59, + "execution_count": 58, "metadata": {}, "output_type": "execute_result" } @@ -1948,17 +1927,6 @@ "y" ] }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "skip" - } - }, - "source": [ - "Conditional expressions may not only be used in the way described in this section. We already saw them as part of a *list comprehension* in [Chapter 1](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/01_elements.ipynb) and [Chapter 2](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/02_functions.ipynb) and revisit this in Chapter 7 in greater detail." - ] - }, { "cell_type": "markdown", "metadata": { @@ -1980,14 +1948,14 @@ "source": [ "In the previous two chapters, we encountered a couple of *runtime* errors. A natural urge we might have after reading about conditional statements is to write code that somehow reacts to the occurrence of such exceptions. All we need is a way to formulate a condition for that.\n", "\n", - "For sure, this is such a common thing to do that Python provides a language construct for it, namely the `try` [statement](https://docs.python.org/3/reference/compound_stmts.html#the-try-statement).\n", + "For sure, this is such a common thing to do that Python provides a language construct for it, namely the compound `try` statement (cf., [reference](https://docs.python.org/3/reference/compound_stmts.html#the-try-statement)).\n", "\n", - "In its simplest form, it comes with just two branches: `try` and `except`. The following tells Python to execute the code in the `try`-branch, and if *anything* goes wrong, continue in the `except`-branch instead of **raising** an error to us. Of course, if nothing goes wrong, the `except`-branch is *not* executed." + "In its simplest form, it comes with just two clauses: `try` and `except`. The following tells Python to execute the code in the `try`-clause, and if *anything* goes wrong, continue in the `except`-clause instead of **raising** an error to us. Of course, if nothing goes wrong, the `except`-clause is *not* executed." ] }, { "cell_type": "code", - "execution_count": 60, + "execution_count": 59, "metadata": { "slideshow": { "slide_type": "slide" @@ -2000,7 +1968,7 @@ }, { "cell_type": "code", - "execution_count": 61, + "execution_count": 60, "metadata": { "slideshow": { "slide_type": "-" @@ -2030,16 +1998,16 @@ } }, "source": [ - "However, it is good practice *not* to **handle** *any* possible exception but only the ones we may *expect* from the code in the `try`-branch. The reasoning why this is done is a bit involved. We only remark that the codebase becomes easier to understand as we communicate to any human reader what could go wrong during execution in an *explicit* way. Python comes with a lot of [built-in exceptions](https://docs.python.org/3/library/exceptions.html#concrete-exceptions) that we should familiarize ourselves with.\n", + "However, it is good practice *not* to **handle** *any* possible exception but only the ones we may *expect* from the code in the `try`-clause. The reasoning why this is done is a bit involved. We only remark that the codebase becomes easier to understand as we communicate to any human reader what could go wrong during execution in an *explicit* way. Python comes with a lot of [built-in exceptions](https://docs.python.org/3/library/exceptions.html#concrete-exceptions) that we should familiarize ourselves with.\n", "\n", - "Another good practice is to always keep the code in the `try`-branch short to not *accidentally* handle an exception we do *not* want to handle.\n", + "Another good practice is to always keep the code in the `try`-clause short to not *accidentally* handle an exception we do *not* want to handle.\n", "\n", "In the example, we are dividing numbers and may expect a `ZeroDivisionError`." ] }, { "cell_type": "code", - "execution_count": 62, + "execution_count": 61, "metadata": { "slideshow": { "slide_type": "fragment" @@ -2069,16 +2037,16 @@ } }, "source": [ - "Often, we may have to run some code *independent* of an exception occurring, for example, to close a connection to a database. To achieve that, we add a `finally`-branch to the `try` statement.\n", + "Often, we may have to run some code *independent* of an exception occurring, for example, to close a connection to a database. To achieve that, we add a `finally`-clause to the `try` statement.\n", "\n", - "Similarly, we may have to run some code *only if* no exception occurs, but we do not want to put it in the `try`-branch as per the good practice mentioned above. To achieve that, we add an `else`-branch to the `try` statement.\n", + "Similarly, we may have to run some code *only if* no exception occurs, but we do not want to put it in the `try`-clause as per the good practice mentioned above. To achieve that, we add an `else`-clause to the `try` statement.\n", "\n", "To showcase everything together, we look at one last example. To spice it up a bit, we randomize the input. So run the cell several times and see for yourself." ] }, { "cell_type": "code", - "execution_count": 63, + "execution_count": 62, "metadata": { "slideshow": { "slide_type": "slide" @@ -2089,7 +2057,7 @@ "name": "stdout", "output_type": "stream", "text": [ - "Yes, division worked smoothly.\n", + "Oops. Division by 0. How does that work?\n", "I am always printed\n" ] } diff --git a/04_iteration.ipynb b/04_iteration.ipynb index b760a25..2246da5 100644 --- a/04_iteration.ipynb +++ b/04_iteration.ipynb @@ -8,7 +8,7 @@ } }, "source": [ - "# Chapter 4: Iteration" + "# Chapter 4: Recursion & Looping" ] }, { @@ -859,7 +859,7 @@ "name": "stdout", "output_type": "stream", "text": [ - "39.9 µs ± 5.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n" + "36.5 µs ± 2.47 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n" ] } ], @@ -881,7 +881,7 @@ "name": "stdout", "output_type": "stream", "text": [ - "1.61 ms ± 8.84 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n" + "1.58 ms ± 20.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n" ] } ], @@ -903,7 +903,7 @@ "name": "stdout", "output_type": "stream", "text": [ - "199 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)\n" + "194 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)\n" ] } ], @@ -925,7 +925,7 @@ "name": "stdout", "output_type": "stream", "text": [ - "2.21 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)\n" + "2.15 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)\n" ] } ], @@ -947,7 +947,7 @@ "name": "stdout", "output_type": "stream", "text": [ - "5.8 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)\n" + "5.59 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)\n" ] } ], @@ -4107,7 +4107,7 @@ "source": [ "We use the built-in [isinstance()](https://docs.python.org/3/library/functions.html#isinstance) function to make sure `factorial()` is called with an `int` object as the argument. We further **validate the input** by verifying that the integer is non-negative.\n", "\n", - "Meanwhile, we also see how we manually raise exceptions with the `raise` [statement](https://docs.python.org/3/reference/simple_stmts.html#the-raise-statement), another way of controlling the flow of execution.\n", + "Meanwhile, we also see how we manually raise exceptions with the `raise` statement (cf., [reference](https://docs.python.org/3/reference/simple_stmts.html#the-raise-statement)), another way of controlling the flow of execution.\n", "\n", "The first two branches in the revised `factorial()` function act as **guardians** ensuring that the code does not produce *unexpected* runtime errors: Errors may be expected when mentioned in the docstring.\n", "\n", @@ -4289,7 +4289,7 @@ } }, "source": [ - "With everything *officially* introduced so far (i.e., without the introductory example in [Chapter 1](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/01_elements.ipynb) that only served as an overview), Python would be what is called **[Turing complete](https://en.wikipedia.org/wiki/Turing_completeness)**. That means that anything that could be formulated as an algorithm could be expressed with all the language features we have seen. Note that, in particular, we have *not* yet formally *introduced* the `for` and `while` statements!" + "With everything *officially* introduced so far, Python would be what is called **[Turing complete](https://en.wikipedia.org/wiki/Turing_completeness)**. That means that anything that could be formulated as an algorithm could be expressed with all the language features we have seen. Note that, in particular, we have *not* yet formally *introduced* the `for` and `while` statements!" ] }, { @@ -4322,7 +4322,7 @@ } }, "source": [ - "Whereas functions combined with `if` statements suffice to model any repetitive logic, Python comes with a `while` [statement](https://docs.python.org/3/reference/compound_stmts.html#the-while-statement) that often makes it easier to implement iterative ideas.\n", + "Whereas functions combined with `if` statements suffice to model any repetitive logic, Python comes with a compound `while` statement (cf., [reference](https://docs.python.org/3/reference/compound_stmts.html#the-while-statement)) that often makes it easier to implement iterative ideas.\n", "\n", "It consists of a header line with a boolean expression followed by an indented code block. Before the first and after every execution of the code block, the boolean expression is evaluated, and if it is (still) equal to `True`, the code block runs (again). Eventually, some variable referenced in the boolean expression is changed in the code block such that the condition becomes `False`.\n", "\n", @@ -4792,9 +4792,9 @@ "source": [ "Recursion and the `while` statement are two sides of the same coin. Disregarding that in the case of recursion Python internally faces some additional burden for managing the stack of frames in memory, both approaches lead to the *same* computational steps in memory. More importantly, we can re-formulate a recursive implementation in an iterative way and vice versa despite one of the two ways often \"feeling\" a lot more natural given a particular problem.\n", "\n", - "So how does the `for` [statement](https://docs.python.org/3/reference/compound_stmts.html#the-for-statement) we saw in the very first example in this book fit into this picture? It is a *redundant* language construct to provide a *shorter* and more *convenient* syntax for common applications of the `while` statement. In programming, such additions to a language are called **syntactic sugar**. A cup of tea tastes better with sugar, but we may drink tea without sugar too.\n", + "So how does the compound `for` statement (cf., [reference](https://docs.python.org/3/reference/compound_stmts.html#the-for-statement)) in this book's very first example fit into this picture? It is a *redundant* language construct to provide a *shorter* and more *convenient* syntax for common applications of the `while` statement. In programming, such additions to a language are called **syntactic sugar**. A cup of tea tastes better with sugar, but we may drink tea without sugar too.\n", "\n", - "Consider the following `numbers` list. Without the `for` statement, we have to manage a temporary **index variable** `i` to loop over all the elements and also obtain the individual elements with the `[]` operator in each iteration of the loop." + "Consider `elements` below. Without the `for` statement, we have to manage a temporary **index variable**, `index`, to loop over all the elements and also obtain the individual elements with the `[]` operator in each iteration of the loop." ] }, { @@ -4807,7 +4807,7 @@ }, "outputs": [], "source": [ - "numbers = [0, 1, 2, 3, 4]" + "elements = [0, 1, 2, 3, 4]" ] }, { @@ -4828,12 +4828,12 @@ } ], "source": [ - "i = 0\n", - "while i < len(numbers):\n", - " number = numbers[i]\n", - " print(number, end=\" \")\n", - " i += 1\n", - "del i" + "index = 0\n", + "while index < len(elements):\n", + " element = elements[index]\n", + " print(element, end=\" \")\n", + " index += 1\n", + "del index" ] }, { @@ -4865,8 +4865,8 @@ } ], "source": [ - "for number in numbers:\n", - " print(number, end=\" \")" + "for element in elements:\n", + " print(element, end=\" \")" ] }, { @@ -4877,7 +4877,7 @@ } }, "source": [ - "For sequences of integers, the [range()](https://docs.python.org/3/library/functions.html#func-range) built-in makes the `for` statement even more convenient: It creates a list-like object of type `range` that generates integers \"on the fly,\" and we look closely at the underlying effects in memory in Chapter 7." + "For sequences of integers, the [range()](https://docs.python.org/3/library/functions.html#func-range) built-in makes the `for` statement even more convenient: It creates a `list`-like object of type `range` that generates integers \"on the fly,\" and we look closely at the underlying effects in memory in [Chapter 7](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/07_sequences.ipynb#Mapping)." ] }, { @@ -4898,8 +4898,8 @@ } ], "source": [ - "for number in range(5):\n", - " print(number, end=\" \")" + "for element in range(5):\n", + " print(element, end=\" \")" ] }, { @@ -4955,8 +4955,8 @@ } ], "source": [ - "for number in [1, 3, 5, 7, 9]:\n", - " print(number, end=\" \")" + "for element in [1, 3, 5, 7, 9]:\n", + " print(element, end=\" \")" ] }, { @@ -4977,8 +4977,8 @@ } ], "source": [ - "for number in range(1, 10, 2):\n", - " print(number, end=\" \")" + "for element in range(1, 10, 2):\n", + " print(element, end=\" \")" ] }, { @@ -5008,13 +5008,13 @@ "\n", "Now, just as we classify objects by their types, we also classify these **concrete data types** (e.g., `int`, `float`, `str`, or `list`) into **abstract concepts**.\n", "\n", - "We did this already in [Chapter 1](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/01_elements.ipynb) when we described a `list` object as \"some sort of container that holds [...] pointers to other objects\". So, abstractly speaking, **containers** are any objects that are \"composed\" of other objects and also \"manage\" how these objects are organized. `list` objects, for example, have the property that they model an internal order associated with its elements. There exist, however, other container types, many of which do *not* come with an order. So, containers primarily \"contain\" other objects and have *nothing* to do with looping.\n", + "We did this already in [Chapter 1](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/01_elements.ipynb#Who-am-I?-And-how-many?) when we described a `list` object as \"some sort of container that holds [...] pointers to other objects\". So, abstractly speaking, **containers** are any objects that are \"composed\" of other objects and also \"manage\" how these objects are organized. `list` objects, for example, have the property that they model an internal order associated with its elements. There exist, however, other container types, many of which do *not* come with an order. So, containers primarily \"contain\" other objects and have *nothing* to do with looping.\n", "\n", "On the contrary, the abstract concept of **iterables** is all about looping: Any object that we can loop over is, by definition, an iterable. So, `range` objects, for example, are iterables that do *not* contain other objects. Moreover, looping does *not* have to occur in a *predictable* order, although this is the case for both `list` and `range` objects.\n", "\n", - "Typically, containers are iterables, and iterables are containers. Yet, only because these two concepts coincide often, we must not think of them as the same. Chapter 10 finally gives an explanation as to how abstract concepts are implemented and play together.\n", + "Typically, containers are iterables, and iterables are containers. Yet, only because these two concepts coincide often, we must not think of them as the same. Chapter 9 finally gives an explanation as to how abstract concepts are implemented and play together.\n", "\n", - "So, `list` objects like `first_names` below are iterable containers. They implement even more abstract concepts, as Chapter 7 reveals." + "So, `list` objects like `first_names` below are iterable containers. They implement even more abstract concepts, as [Chapter 7](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/07_sequences.ipynb#Collections-vs.-Sequences) reveals." ] }, { @@ -5097,7 +5097,7 @@ } }, "source": [ - "The cell below shows the *exact* workings of the `in` operator: Although `3.0` is *not* contained in `numbers`, it evaluates equal to the `3` that is, which is why the following expression evaluates to `True`. So, while we could colloquially say that `numbers` \"contains\" `3.0`, it actually does not." + "The cell below shows the *exact* workings of the `in` operator: Although `3.0` is *not* contained in `elements`, it evaluates equal to the `3` that is, which is why the following expression evaluates to `True`. So, while we could colloquially say that `elements` \"contains\" `3.0`, it actually does not." ] }, { @@ -5121,7 +5121,7 @@ } ], "source": [ - "3.0 in numbers" + "3.0 in elements" ] }, { @@ -5165,7 +5165,7 @@ } }, "source": [ - "If we must have an index variable in the loop's body, we use the [enumerate()](https://docs.python.org/3/library/functions.html#enumerate) built-in that takes an *iterable* as its argument and then generates a \"stream\" of \"pairs\" consisting of an index variable and an object provided by the iterable. There is *no* need to ever revert to the `while` statement to loop over an iterable object." + "If we must have an index variable in the loop's body, we use the [enumerate()](https://docs.python.org/3/library/functions.html#enumerate) built-in that takes an *iterable* as its argument and then generates a \"stream\" of \"pairs\" of an index variable, `i` below, and an object provided by the iterable, `name`, separated by a `,`. There is *no* need to ever revert to the `while` statement with an explicitly managed index variable to loop over an iterable object." ] }, { @@ -5288,7 +5288,7 @@ } }, "source": [ - "In contrast to its recursive counterpart, the iterative `fibonacci()` function below is somewhat harder to read. For example, it is not so obvious as to how many iterations through the `for`-loop we need to make when implementing it. There is an increased risk of making an *off-by-one* error. Moreover, we need to track a `temp` variable along, at least until we have worked through Chapter 7. Do you understand what `temp` does?\n", + "In contrast to its recursive counterpart, the iterative `fibonacci()` function below is somewhat harder to read. For example, it is not so obvious as to how many iterations through the `for`-loop we need to make when implementing it. There is an increased risk of making an *off-by-one* error. Moreover, we need to track a `temp` variable along.\n", "\n", "However, one advantage of calculating Fibonacci numbers in a **forward** fashion with a `for` statement is that we could list the entire sequence in ascending order as we calculate the desired number. To show this, we added `print()` statements in `fibonacci()` below.\n", "\n", @@ -5550,7 +5550,7 @@ } }, "source": [ - "Let's say we have a `numbers` list and want to check if the square of at least one of its elements is above a `threshold` of `100`." + "Let's say we have a `samples` list and want to check if the square of at least one of its elements is above a `threshold` of `100`." ] }, { @@ -5563,7 +5563,7 @@ }, "outputs": [], "source": [ - "numbers = [3, 7, 2, 9, 11, 4, 7, 9, 4, 5]" + "samples = [3, 7, 2, 9, 11, 4, 7, 9, 4, 5]" ] }, { @@ -5587,7 +5587,7 @@ } }, "source": [ - "A first naive implementation could look like this: We loop over *every* element in `numbers` and set an **indicator variable** `is_above`, initialized as `False`, to `True` once we encounter an element satisfying the search condition." + "A first naive implementation could look like this: We loop over *every* element in `samples` and set an **indicator variable** `is_above`, initialized as `False`, to `True` once we encounter an element satisfying the search condition." ] }, { @@ -5603,22 +5603,22 @@ "name": "stdout", "output_type": "stream", "text": [ - "3 7 2 9 11 4 7 9 4 5 => at least one number's square is above 100\n" + "3 7 2 9 11 4 7 9 4 5 => at least one sample's square is above 100\n" ] } ], "source": [ "is_above = False\n", "\n", - "for number in numbers:\n", - " print(number, end=\" \") # added for didactical purposes\n", - " if number ** 2 > threshold:\n", + "for sample in samples:\n", + " print(sample, end=\" \") # added for didactical purposes\n", + " if sample ** 2 > threshold:\n", " is_above = True\n", "\n", "if is_above:\n", - " print(\"=> at least one number's square is above\", threshold)\n", + " print(\"=> at least one sample's square is above\", threshold)\n", "else:\n", - " print(\"=> no number's square is above\", threshold)" + " print(\"=> no sample's square is above\", threshold)" ] }, { @@ -5629,11 +5629,11 @@ } }, "source": [ - "This implementation is *inefficient* as even if the *first* element in `numbers` has a square greater than `100`, we loop until the last element: This could take a long time for a big list.\n", + "This implementation is *inefficient* as even if the *first* element in `samples` has a square greater than `100`, we loop until the last element: This could take a long time for a big list.\n", "\n", "Moreover, we must initialize `is_above` *before* the `for`-loop and write an `if`-`else`-logic *after* it to check for the result. The actual business logic is *not* clear right away.\n", "\n", - "Luckily, Python provides a `break` [statement](https://docs.python.org/3/reference/simple_stmts.html#the-break-statement) that lets us stop the `for`-loop in any iteration of the loop. Conceptually, it is yet another means of controlling the flow of execution." + "Luckily, Python provides the `break` statement (cf., [reference](https://docs.python.org/3/reference/simple_stmts.html#the-break-statement)) that lets us stop the `for`-loop in any iteration of the loop. Conceptually, it is yet another means of controlling the flow of execution." ] }, { @@ -5649,23 +5649,23 @@ "name": "stdout", "output_type": "stream", "text": [ - "3 7 2 9 11 => at least one number's square is above 100\n" + "3 7 2 9 11 => at least one sample's square is above 100\n" ] } ], "source": [ "is_above = False\n", "\n", - "for number in numbers:\n", - " print(number, end=\" \") # added for didactical purposes\n", - " if number ** 2 > threshold:\n", + "for sample in samples:\n", + " print(sample, end=\" \") # added for didactical purposes\n", + " if sample ** 2 > threshold:\n", " is_above = True\n", " break\n", "\n", "if is_above:\n", - " print(\"=> at least one number's square is above\", threshold)\n", + " print(\"=> at least one sample's square is above\", threshold)\n", "else:\n", - " print(\"=> no number's square is above\", threshold)" + " print(\"=> no sample's square is above\", threshold)" ] }, { @@ -5687,7 +5687,7 @@ } }, "source": [ - "### The `for`-`else` Clause" + "### The `for`-`else`-Clause" ] }, { @@ -5698,7 +5698,7 @@ } }, "source": [ - "To express the logic in a prettier way, we add an `else`-clause at the end of the `for`-loop. The `else`-branch is executed *iff* the `for`-loop is *not* stopped with a `break` statement **prematurely** (i.e., *before* reaching the *last* iteration in the loop). The word \"else\" implies a somewhat unintuitive meaning and had better been named a `then`-clause. In most scenarios, however, the `else`-clause logically goes together with some `if` statement in the `for`-loop's body.\n", + "To express the logic in a prettier way, we add an `else`-clause at the end of the `for`-loop (cf., [reference](https://docs.python.org/3/reference/compound_stmts.html#the-for-statement)). The `else`-clause is executed *iff* the `for`-loop is *not* stopped with a `break` statement **prematurely** (i.e., *before* reaching the *last* iteration in the loop). The word \"else\" implies a somewhat unintuitive meaning and had better been named a `then`-clause. In most scenarios, however, the `else`-clause logically goes together with some `if` statement in the `for`-loop's body.\n", "\n", "Overall, the code's expressive power increases. Not many programming languages support a `for`-`else`-branching, which turns out to be very useful in practice." ] @@ -5727,23 +5727,23 @@ "name": "stdout", "output_type": "stream", "text": [ - "3 7 2 9 11 => at least one number's square is above 100\n" + "3 7 2 9 11 => at least one sample's square is above 100\n" ] } ], "source": [ - "for number in numbers:\n", - " print(number, end=\" \") # added for didactical purposes\n", - " if number ** 2 > threshold:\n", + "for sample in samples:\n", + " print(sample, end=\" \") # added for didactical purposes\n", + " if sample ** 2 > threshold:\n", " is_above = True\n", " break\n", "else:\n", " is_above = False\n", "\n", "if is_above:\n", - " print(\"=> at least one number's square is above\", threshold)\n", + " print(\"=> at least one sample's square is above\", threshold)\n", "else:\n", - " print(\"=> no number's square is above\", threshold)" + " print(\"=> no sample's square is above\", threshold)" ] }, { @@ -5770,18 +5770,18 @@ "name": "stdout", "output_type": "stream", "text": [ - "3 7 2 9 11 => at least one number's square is above 100\n" + "3 7 2 9 11 => at least one sample's square is above 100\n" ] } ], "source": [ - "for number in numbers:\n", - " print(number, end=\" \") # added for didactical purposes\n", - " if number ** 2 > threshold:\n", - " print(\"=> at least one number's square is above\", threshold)\n", + "for sample in samples:\n", + " print(sample, end=\" \") # added for didactical purposes\n", + " if sample ** 2 > threshold:\n", + " print(\"=> at least one sample's square is above\", threshold)\n", " break\n", "else:\n", - " print(\"=> no number's square is above\", threshold)" + " print(\"=> no sample's square is above\", threshold)" ] }, { @@ -5792,7 +5792,7 @@ } }, "source": [ - "Of course, if we set the `threshold` an element's square has to pass higher, for example, to `200`, we have to loop through the entire `numbers` list. There is *no way* to optimize this **[linear search](https://en.wikipedia.org/wiki/Linear_search)** further, at least as long as we model `numbers` as a `list` object. More advanced data types, however, exist that mitigate that downside." + "Of course, if we set the `threshold` an element's square has to pass higher, for example, to `200`, we have to loop over all `samples`. There is *no way* to optimize this **[linear search](https://en.wikipedia.org/wiki/Linear_search)** further, at least as long as we model `samples` as a `list` object. More advanced data types, however, exist that mitigate that downside." ] }, { @@ -5808,20 +5808,20 @@ "name": "stdout", "output_type": "stream", "text": [ - "3 7 2 9 11 4 7 9 4 5 => no number's square is above 200\n" + "3 7 2 9 11 4 7 9 4 5 => no sample's square is above 200\n" ] } ], "source": [ "threshold = 200\n", "\n", - "for number in numbers:\n", - " print(number, end=\" \") # added for didactical purposes\n", - " if number ** 2 > threshold:\n", - " print(\"=> at least one number's square is above\", threshold)\n", + "for sample in samples:\n", + " print(sample, end=\" \") # added for didactical purposes\n", + " if sample ** 2 > threshold:\n", + " print(\"=> at least one sample's square is above\", threshold)\n", " break\n", "else:\n", - " print(\"=> no number's square is above\", threshold)" + " print(\"=> no sample's square is above\", threshold)" ] }, { @@ -5843,7 +5843,7 @@ } }, "source": [ - "Often, we process some iterable with numeric data, for example, a list of numbers as in the introductory example in [Chapter 1](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/01_elements.ipynb) or, more realistically, data from a CSV file with many rows and columns.\n", + "Often, we process some iterable with numeric data, for example, a list of `numbers` as in this book's introductory example in [Chapter 1](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/01_elements.ipynb#Example:-Averaging-Even-Numbers) or, more realistically, data from a CSV file with many rows and columns.\n", "\n", "Processing numeric data usually comes down to operations that may be grouped into one of the following three categories:\n", "\n", @@ -5851,7 +5851,7 @@ "- **filtering**: throw away individual samples (e.g., statistical outliers)\n", "- **reducing**: collect individual samples into summary statistics\n", "\n", - "We study this **map-filter-reduce** paradigm extensively in Chapter 7 after introducing more advanced data types that are needed to work with \"big\" data.\n", + "We study this **map-filter-reduce** paradigm extensively in [Chapter 7](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/07_sequences.ipynb#The-Map-Filter-Reduce-Paradigm) after introducing more advanced data types that are needed to work with \"big\" data.\n", "\n", "In the remainder of this section, we focus on *filtering out* some samples within a `for`-loop." ] @@ -5875,7 +5875,7 @@ } }, "source": [ - "Calculate the sum of all even numbers from $1$ through $12$ after squaring them and adding $1$ to the squares:\n", + "Calculate the sum of all even numbers in `[7, 11, 8, 5, 3, 12, 2, 6, 9, 10, 1, 4]` after squaring them and adding `1` to the squares:\n", "\n", "- **\"all\"** => loop over an iterable\n", "- **\"even\"** => *filter* out the odd numbers\n", @@ -5893,7 +5893,7 @@ }, "outputs": [], "source": [ - "numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]" + "numbers = [7, 11, 8, 5, 3, 12, 2, 6, 9, 10, 1, 4]" ] }, { @@ -5909,7 +5909,7 @@ "name": "stdout", "output_type": "stream", "text": [ - "2 > 5 4 > 17 6 > 37 8 > 65 10 > 101 12 > 145 " + "8 > 65 12 > 145 2 > 5 6 > 37 10 > 101 4 > 17 " ] }, { @@ -5926,11 +5926,11 @@ "source": [ "total = 0\n", "\n", - "for x in numbers:\n", - " if x % 2 == 0: # only keep even numbers\n", - " y = (x ** 2) + 1\n", - " print(x, y, sep=\" > \", end=\" \") # added for didactical purposes\n", - " total += y\n", + "for number in numbers:\n", + " if number % 2 == 0: # only keep even numbers\n", + " square = (number ** 2) + 1\n", + " print(number, square, sep=\" > \", end=\" \") # added for didactical purposes\n", + " total += square\n", "\n", "total" ] @@ -5969,7 +5969,7 @@ } }, "source": [ - "Calculate the sum of every third and even number from $1$ through $12$ after squaring them and adding $1$ to the squares:\n", + "Calculate the sum of every third and even number in `[7, 11, 8, 5, 3, 12, 2, 6, 9, 10, 1, 4]` after squaring them and adding `1` to the squares:\n", "\n", "- **\"every\"** => loop over an iterable\n", "- **\"third\"** => *filter* out all numbers except every third\n", @@ -5990,7 +5990,7 @@ { "data": { "text/plain": [ - "[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]" + "[7, 11, 8, 5, 3, 12, 2, 6, 9, 10, 1, 4]" ] }, "execution_count": 78, @@ -6015,13 +6015,13 @@ "name": "stdout", "output_type": "stream", "text": [ - "6 > 37 12 > 145 " + "8 > 65 12 > 145 4 > 17 " ] }, { "data": { "text/plain": [ - "182" + "227" ] }, "execution_count": 79, @@ -6032,12 +6032,12 @@ "source": [ "total = 0\n", "\n", - "for i, x in enumerate(numbers, start=1):\n", + "for i, number in enumerate(numbers, start=1):\n", " if i % 3 == 0: # only keep every third number\n", - " if x % 2 == 0: # only keep even numbers\n", - " y = (x ** 2) + 1\n", - " print(x, y, sep=\" > \", end=\" \") # added for didactical purposes \n", - " total += y\n", + " if number % 2 == 0: # only keep even numbers\n", + " square = (number ** 2) + 1\n", + " print(number, square, sep=\" > \", end=\" \") # added for didactical purposes \n", + " total += square\n", "\n", "total" ] @@ -6052,7 +6052,7 @@ "source": [ "With already three levels of indentation, less horizontal space is available for the actual code block. Of course, one could flatten the two `if` statements with the logical `and` operator, as shown in [Chapter 3](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/03_conditionals.ipynb#The-if-Statement). Then, however, we trade off horizontal space against a more \"complex\" `if` logic, and this is *not* a real improvement.\n", "\n", - "A Pythonista would instead make use of the `continue` [statement](https://docs.python.org/3/reference/simple_stmts.html#the-continue-statement) that causes a loop to jump into the next iteration skipping the rest of the code block.\n", + "A Pythonista would instead make use of the `continue` statement (cf., [reference](https://docs.python.org/3/reference/simple_stmts.html#the-continue-statement)) that causes a loop to jump into the next iteration skipping the rest of the code block.\n", "\n", "The revised code fragment below occupies more vertical space and less horizontal space: A *good* trade-off." ] @@ -6081,13 +6081,13 @@ "name": "stdout", "output_type": "stream", "text": [ - "6 > 37 12 > 145 " + "8 > 65 12 > 145 4 > 17 " ] }, { "data": { "text/plain": [ - "182" + "227" ] }, "execution_count": 80, @@ -6098,15 +6098,15 @@ "source": [ "total = 0\n", "\n", - "for i, x in enumerate(numbers, start=1):\n", + "for i, number in enumerate(numbers, start=1):\n", " if i % 3 != 0: # only keep every third number\n", " continue\n", - " elif x % 2 != 0: # only keep even numbers\n", + " elif number % 2 != 0: # only keep even numbers\n", " continue\n", "\n", - " y = (x ** 2) + 1\n", - " print(x, y, sep=\" > \", end=\" \") # added for didactical purposes \n", - " total += y\n", + " square = (number ** 2) + 1\n", + " print(number, square, sep=\" > \", end=\" \") # added for didactical purposes \n", + " total += square\n", "\n", "total" ] diff --git a/04_iteration_review_and_exercises.ipynb b/04_iteration_review_and_exercises.ipynb index a953796..64d6903 100644 --- a/04_iteration_review_and_exercises.ipynb +++ b/04_iteration_review_and_exercises.ipynb @@ -5,7 +5,7 @@ "metadata": {}, "source": [ "\n", - "# Chapter 4: Iteration" + "# Chapter 4: Recursion & Looping" ] }, { diff --git a/05_numbers.ipynb b/05_numbers.ipynb index 73daebb..a671e68 100644 --- a/05_numbers.ipynb +++ b/05_numbers.ipynb @@ -8,7 +8,7 @@ } }, "source": [ - "# Chapter 5: Numbers" + "# Chapter 5: Bits & Numbers" ] }, { @@ -23,7 +23,7 @@ "\n", "We start with the \"simple\" ones: Numeric types in this chapter and textual data in [Chapter 6](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/06_text.ipynb). An important fact that holds for all objects of these types is that they are **immutable**. To re-use the bag analogy from [Chapter 1](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/01_elements.ipynb#Objects-vs.-Types-vs.-Values), this means that the $0$s and $1$s making up an object's *value* cannot be changed once the bag is created in memory, implying that any operation with or method on the object creates a *new* object in a *different* memory location.\n", "\n", - "Chapters 7, 8, and 9 then cover the more \"complex\" data types, including, for example, the `list` type. Finally, Chapter 10 completes the picture by introducing language constructs to create custom types.\n", + "[Chapter 7](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/07_sequences.ipynb) and Chapter 8 then cover the more \"complex\" data types, including, for example, the `list` type. Finally, Chapter 9 completes the picture by introducing language constructs to create custom types.\n", "\n", "We have already seen many hints indicating that numbers are not as trivial to work with as it seems at first sight:\n", "\n", @@ -92,7 +92,7 @@ { "data": { "text/plain": [ - "139928698315984" + "140087541220560" ] }, "execution_count": 2, @@ -2115,7 +2115,7 @@ { "data": { "text/plain": [ - "139928698489496" + "140087541402072" ] }, "execution_count": 63, @@ -2242,7 +2242,7 @@ } ], "source": [ - "1. # on the contrary, 1 creates an int object" + "1. # 1 without a dot creates an int object" ] }, { @@ -5498,7 +5498,7 @@ { "data": { "text/plain": [ - "139928697740272" + "140087540555856" ] }, "execution_count": 176, @@ -6100,7 +6100,7 @@ } }, "source": [ - "## The Numeric Tower" + "## The Numerical Tower" ] }, { @@ -6149,7 +6149,7 @@ "\n", "For the other types, in particular, the `float` type, the implications of their imprecision are discussed in detail above.\n", "\n", - "The abstract concepts behind the four outer-most mathematical sets are part of Python since [PEP 3141](https://www.python.org/dev/peps/pep-3141/) in 2007. The [numbers](https://docs.python.org/3/library/numbers.html) module in the [standard library](https://docs.python.org/3/library/index.html) defines what Pythonistas call the **numeric tower**, a collection of five **[abstract data types](https://en.wikipedia.org/wiki/Abstract_data_type)**, or **abstract base classes** as they are called in Python jargon:\n", + "The abstract concepts behind the four outer-most mathematical sets are part of Python since [PEP 3141](https://www.python.org/dev/peps/pep-3141/) in 2007. The [numbers](https://docs.python.org/3/library/numbers.html) module in the [standard library](https://docs.python.org/3/library/index.html) defines what programmers call the **[numerical tower](https://en.wikipedia.org/wiki/Numerical_tower)**, a collection of five **[abstract data types](https://en.wikipedia.org/wiki/Abstract_data_type)**, or **abstract base classes** as they are called in Python jargon:\n", "\n", "- `numbers.Number`: \"any number\" (cf., [documentation](https://docs.python.org/3/library/numbers.html#numbers.Number))\n", "- `numbers.Complex`: \"all complex numbers\" (cf., [documentation](https://docs.python.org/3/library/numbers.html#numbers.Complex))\n", @@ -6712,7 +6712,7 @@ } }, "source": [ - "#### Example: [Factorial](https://en.wikipedia.org/wiki/Factorial) (revisited)" + "### Goose Typing" ] }, { @@ -6723,7 +6723,9 @@ } }, "source": [ - "Replacing *concrete* data types with *abstract* ones is particularly valuable in the context of input validation: The revised version of the `factorial()` function below allows its user to *take advantage of duck typing*. If a real but non-integer argument `n` is passed in, `factorial()` tries to cast `n` as an `int` object with the [trunc()](https://docs.python.org/3/library/math.html#math.trunc) function from the [math](https://docs.python.org/3/library/math.html) module in the [standard library](https://docs.python.org/3/library/index.html). [trunc()](https://docs.python.org/3/library/math.html#math.trunc) cuts off all decimals and any *concrete* numeric type implementing the *abstract* `numbers.Real` type supports it (cf., [documentation](https://docs.python.org/3/library/numbers.html#numbers.Real))." + "Replacing *concrete* data types with *abstract* ones is particularly valuable in the context of input validation: The revised version of the `factorial()` function below allows its user to take advantage of *duck typing*: If a real but non-integer argument `n` is passed in, `factorial()` tries to cast `n` as an `int` object with the [trunc()](https://docs.python.org/3/library/math.html#math.trunc) function from the [math](https://docs.python.org/3/library/math.html) module in the [standard library](https://docs.python.org/3/library/index.html). [trunc()](https://docs.python.org/3/library/math.html#math.trunc) cuts off all decimals and any *concrete* numeric type implementing the *abstract* `numbers.Real` type supports it (cf., [documentation](https://docs.python.org/3/library/numbers.html#numbers.Real)).\n", + "\n", + "Two popular and distinguished Pythonistas, [Luciano Ramalho](https://github.com/ramalho) and [Alex Martelli](https://en.wikipedia.org/wiki/Alex_Martelli), coin the term **goose typing** to specifically mean using the built-in [isinstance()](https://docs.python.org/3/library/functions.html#isinstance) function with an *abstract base class* (cf., Chapter 11 in this [book](https://www.amazon.com/Fluent-Python-Concise-Effective-Programming/dp/1491946008) or this [summary](https://dgkim5360.github.io/blog/python/2017/07/duck-typing-vs-goose-typing-pythonic-interfaces/) thereof)." ] }, { @@ -6763,6 +6765,17 @@ "math.trunc(1 / 10)" ] }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "#### Example: [Factorial](https://en.wikipedia.org/wiki/Factorial) (revisited)" + ] + }, { "cell_type": "code", "execution_count": 211, @@ -6893,7 +6906,7 @@ } }, "source": [ - "With the keyword-only argument `strict`, we can control whether or not a passed in `float` object may be rounded. By default, this is not allowed." + "With the keyword-only argument `strict`, we can control whether or not a passed in `float` object may be rounded. By default, this is not allowed and results in a `TypeError`." ] }, { @@ -7003,20 +7016,20 @@ }, "source": [ "There exist three numeric types in core Python:\n", - "- `int`: a perfect model for whole numbers (i.e., the set $\\mathbb{Z}$); inherently precise\n", + "- `int`: a near-perfect model for whole numbers (i.e., the set $\\mathbb{Z}$); inherently precise\n", "- `float`: the \"gold\" standard to approximate real numbers (i.e., the set $\\mathbb{R}$); inherently imprecise\n", "- `complex`: layer on top of the `float` type; therefore inherently imprecise\n", "\n", "Furthermore, the [standard library](https://docs.python.org/3/library/index.html) adds two more types that can be used as substitutes for `float` objects:\n", "- `Decimal`: similar to `float` but allows customizing the precision; still inherently imprecise\n", - "- `Fraction`: a perfect model for rational numbers (i.e., the set $\\mathbb{Q}$); built on top of the `int` type and therefore inherently precise\n", + "- `Fraction`: a near-perfect model for rational numbers (i.e., the set $\\mathbb{Q}$); built on top of the `int` type and therefore inherently precise\n", "\n", "The *important* takeaways for the data science practitioner are:\n", "\n", "1. **Do not mix** precise and imprecise data types, and\n", "2. actively expect `nan` results when working with `float` numbers as there are no **loud failures**.\n", "\n", - "The **numeric tower** is Python's way of implementing the various **abstract** ideas of what a number is in mathematics." + "The **numerical tower** is Python's way of implementing the various **abstract** ideas of what numbers are in mathematics." ] } ], diff --git a/05_numbers_review_and_exercises.ipynb b/05_numbers_review_and_exercises.ipynb index 4246965..90983d4 100644 --- a/05_numbers_review_and_exercises.ipynb +++ b/05_numbers_review_and_exercises.ipynb @@ -5,7 +5,7 @@ "metadata": {}, "source": [ "\n", - "# Chapter 5: Numbers" + "# Chapter 5: Bits & Numbers" ] }, { @@ -156,7 +156,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "**Q9**: How can **abstract data types**, for example, as defined in the **numeric tower**, be helpful in enabling **duck typing**?" + "**Q9**: How can **abstract data types**, for example, as defined in the **numerical tower**, be helpful in enabling **duck typing**?" ] }, { diff --git a/06_text.ipynb b/06_text.ipynb index 4cdd4ab..44eaa3c 100644 --- a/06_text.ipynb +++ b/06_text.ipynb @@ -8,7 +8,7 @@ } }, "source": [ - "# Chapter 6: Text" + "# Chapter 6: Bytes & Text" ] }, { @@ -80,7 +80,7 @@ { "data": { "text/plain": [ - "140148947850512" + "140488988107952" ] }, "execution_count": 2, @@ -124,7 +124,7 @@ } }, "source": [ - "A `str` object evaluates to itself in a literal notation with enclosing **single quotes** `'` by default. In [Chapter 1](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/01_elements.ipynb), we already specified the double quotes `\"` convention we stick to in this book. Yet, single quotes `'` and double quotes `\"` are *perfect* substitutes for all `str` objects that do *not* contain any of the two symbols in it. We could have used the reverse convention, as well.\n", + "A `str` object evaluates to itself in a literal notation with enclosing **single quotes** `'` by default. In [Chapter 1](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/01_elements.ipynb#Value), we already specified the double quotes `\"` convention we stick to in this book. Yet, single quotes `'` and double quotes `\"` are *perfect* substitutes for all `str` objects that do *not* contain any of the two symbols in it. We could use the reverse convention, as well.\n", "\n", "As [this discussion](https://stackoverflow.com/questions/56011/single-quotes-vs-double-quotes-in-python) shows, many programmers have *strong* opinions about that and make up *new* conventions for their projects. Consequently, the discussion was \"closed as not constructive\" by the moderators." ] @@ -161,7 +161,7 @@ } }, "source": [ - "As the single quote `'` is often used in the English language as a shortener, we could make an argument in favor of using the double quotes `\"`: There are possibly fewer situations like in the two code cells below, in which we must revert to using a `\\` to **escape** a single quote `'` in a text (cf., the section on special characters further below). However, double quotes `\"` are often used as well. So, this argument is somewhat not convincing.\n", + "As the single quote `'` is often used in the English language as a shortener, we could make an argument in favor of using the double quotes `\"`: There are possibly fewer situations like in the two code cells below, in which we must revert to using a `\\` to **escape** a single quote `'` in a text (cf., the [Special Characters](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/06_text.ipynb#Special-Characters) section further below). However, double quotes `\"` are often used as well. So, this argument is somewhat not convincing.\n", "\n", "Many proponents of the single quote `'` usage claim that double quotes `\"` make more **visual noise** on the screen. This argument is also not convincing. On the contrary, one could claim that *two* single quotes `''` look so similar to *one* double quote `\"` that it might not be apparent right away what we are looking at. By sticking to double quotes `\"`, we avoid such danger of confusion.\n", "\n", @@ -226,7 +226,7 @@ } }, "source": [ - "Alternatively, we can use the [str()](https://docs.python.org/3/library/stdtypes.html#str) built-in to **cast** non-`str` objects as a `str`." + "We can always use the [str()](https://docs.python.org/3/library/stdtypes.html#str) built-in to cast non-`str` objects as a `str`." ] }, { @@ -253,6 +253,471 @@ "str(123)" ] }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "Another common situation where we obtain `str` objects is when reading the contents of a file with the [open()](https://docs.python.org/3/library/functions.html#open) built-in. In its simplest form, to open a [text file](https://en.wikipedia.org/wiki/Text_file) file in read-only mode, we pass in its path (i.e., \"filename\") as a `str` object." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [], + "source": [ + "file = open(\"lorem_ipsum.txt\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "[open()](https://docs.python.org/3/library/functions.html#open) returns a **[proxy](https://en.wikipedia.org/wiki/Proxy_pattern)** object of type `TextIOWrapper` that allows us to interact with the file on disk." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "<_io.TextIOWrapper name='lorem_ipsum.txt' mode='r' encoding='UTF-8'>" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "file" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "_io.TextIOWrapper" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "type(file)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "While `file` provides, for example, the [read()](https://docs.python.org/3/library/io.html#io.TextIOBase.read), [readline()](https://docs.python.org/3/library/io.html#io.TextIOBase.readline), and [readlines()](https://docs.python.org/3/library/io.html#io.IOBase.readlines) methods to access its contents, it is also *iterable*, and we may loop over the individual lines with a `for` statement." + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Lorem Ipsum is simply dummy text of the printing and typesetting industry.\n", + "\n", + "Lorem Ipsum has been the industry's standard dummy text ever since the 1500s\n", + "\n", + "when an unknown printer took a galley of type and scrambled it to make a type\n", + "\n", + "specimen book. It has survived not only five centuries but also the leap into\n", + "\n", + "electronic typesetting, remaining essentially unchanged. It was popularised in\n", + "\n", + "the 1960s with the release of Letraset sheets.\n", + "\n" + ] + } + ], + "source": [ + "for line in file:\n", + " print(line)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "Once we looped over `file` the first time, it is **exhausted**: That means we do not see any output if we loop over it another time." + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [], + "source": [ + "for line in file:\n", + " print(line)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "After the `for`-loop, `line` is still set to the last line in the file, and we verify that it is indeed a `str` object." + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "'the 1960s with the release of Letraset sheets.\\n'" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "line" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "str" + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "type(line)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "An important fact is that `file` is still associated with an *open* **[file descriptor](https://en.wikipedia.org/wiki/File_descriptor)**. Without going into any technical details, we note that an operating system can only handle a limited number of \"open files\" at the same time, and, therefore, we should always *close* the file once we are done processing it.\n", + "\n", + "`file` has a `closed` attribute on it that shows us if a file descriptor is open or closed, and with the [close()](https://docs.python.org/3/library/io.html#io.IOBase.close) method, we can \"manually\" close it." + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "False" + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "file.closed" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "file.close()" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "file.closed" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "The more Pythonic way is to open a file with the `with` statement (cf., [reference](https://docs.python.org/3/reference/compound_stmts.html#the-with-statement)): The indented code block is said to be executed in the **context** of the header line that acts as a **[context manager](https://docs.python.org/3/reference/datamodel.html?highlight=context%20manager#with-statement-context-managers)**. Such objects may have many different purposes. Here, the context manager created with `with open(...) as file:` mainly ensures that the file descriptor gets automatically closed after the last line in the code block is executed." + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Lorem Ipsum is simply dummy text of the printing and typesetting industry.\n", + "\n", + "Lorem Ipsum has been the industry's standard dummy text ever since the 1500s\n", + "\n", + "when an unknown printer took a galley of type and scrambled it to make a type\n", + "\n", + "specimen book. It has survived not only five centuries but also the leap into\n", + "\n", + "electronic typesetting, remaining essentially unchanged. It was popularised in\n", + "\n", + "the 1960s with the release of Letraset sheets.\n", + "\n" + ] + } + ], + "source": [ + "with open(\"lorem_ipsum.txt\") as file:\n", + " for line in file:\n", + " print(line)" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 19, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "file.closed" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "To use constructs familiar from [Chapter 3](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/03_conditionals.ipynb#The-try-Statement) to explain what `with open(...) as file:` does, below is a formulation with a `try` statement *equivalent* to the `with` statement above. The `finally`-branch is always executed, even if an exception is raised in the `for`-loop. So, `file` is sure to be closed too, with a somewhat less expressive formulation." + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Lorem Ipsum is simply dummy text of the printing and typesetting industry.\n", + "\n", + "Lorem Ipsum has been the industry's standard dummy text ever since the 1500s\n", + "\n", + "when an unknown printer took a galley of type and scrambled it to make a type\n", + "\n", + "specimen book. It has survived not only five centuries but also the leap into\n", + "\n", + "electronic typesetting, remaining essentially unchanged. It was popularised in\n", + "\n", + "the 1960s with the release of Letraset sheets.\n", + "\n" + ] + } + ], + "source": [ + "try:\n", + " file = open(\"lorem_ipsum.txt\")\n", + " for line in file:\n", + " print(line)\n", + "finally:\n", + " file.close()" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 21, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "file.closed" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "A subtlety to notice is that there is an empty line printed between each `line`. That is because each `line` ends with a `\"\\n\"` that results in a line break and that is explained further below. To print the text without empty lines in between, we pass a `end=\"\"` argument to the [print()](https://docs.python.org/3/library/functions.html#print) function." + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Lorem Ipsum is simply dummy text of the printing and typesetting industry.\n", + "Lorem Ipsum has been the industry's standard dummy text ever since the 1500s\n", + "when an unknown printer took a galley of type and scrambled it to make a type\n", + "specimen book. It has survived not only five centuries but also the leap into\n", + "electronic typesetting, remaining essentially unchanged. It was popularised in\n", + "the 1960s with the release of Letraset sheets.\n" + ] + } + ], + "source": [ + "with open(\"lorem_ipsum.txt\") as file:\n", + " for line in file:\n", + " print(line, end=\"\")" + ] + }, { "cell_type": "markdown", "metadata": { @@ -272,23 +737,23 @@ } }, "source": [ - "The idea of a **sequence** is yet another *abstract* concept.\n", + "A **sequence** is yet another *abstract* concept (cf., *containers* and *iterables* in [Chapter 4](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/04_iteration.ipynb#Containers-vs.-Iterables)).\n", "\n", - "It unifies *four* orthogonal *abstract* concepts into one: Any *concrete* data type, such as `str` in this chapter, is considered a sequence if it simultaneously\n", + "It unifies *four* [orthogonal](https://en.wikipedia.org/wiki/Orthogonality) (i.e., \"independent\") behaviors into one idea: Any data type, such as `str`, is considered a sequence if it simultaneously\n", "\n", - "1. behaves like a **container** and\n", - "2. an **iterable**, and \n", - "3. comes with a predictable **order** of its\n", - "4. **finite** number of elements.\n", + "1. **contains** other \"things,\"\n", + "2. is **iterable**, and \n", + "3. comes with a *predictable* **order** of its\n", + "4. **finite** number of \"things.\"\n", "\n", - "Chapter 7 discusses sequences in a generic sense and in greater detail. Here, we keep our focus on the `str` type that historically received its name as it models a \"**[string of characters](https://en.wikipedia.org/wiki/String_%28computer_science%29)**,\" and a \"string\" is more formally called a sequence in the computer science literature.\n", + "[Chapter 7](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/07_sequences.ipynb#Collections-vs.-Sequences) formalizes sequences in great detail. Here, we keep our focus on the `str` type that historically received its name as it models a \"**[string of characters](https://en.wikipedia.org/wiki/String_%28computer_science%29)**,\" and a \"string\" is more formally called a sequence in the computer science literature.\n", "\n", - "Behaving like a sequence, `str` objects may be treated like `list` objects in many cases. For example, the built-in [len()](https://docs.python.org/3/library/functions.html#len) function tells us how many elements (i.e., characters) make up `school`. [len()](https://docs.python.org/3/library/functions.html#len) would not work on an \"*infinite*\" object: As anything modeled in a program must fit into a computer's finite memory at runtime, there cannot exist objects containing a truly infinite number of elements; however, Chapter 7 introduces a *concrete* iterable data type that can be used to model an *infinite* series of elements and that, consequently, has no concept of \"length.\"" + "Behaving like a sequence, `str` objects may be treated like `list` objects in many cases. For example, the built-in [len()](https://docs.python.org/3/library/functions.html#len) function tells us how many elements (i.e., characters) make up `school`. [len()](https://docs.python.org/3/library/functions.html#len) would not work on an \"*infinite*\" object: As anything modeled in a program must fit into a computer's finite memory at runtime, there cannot exist objects containing a truly infinite number of elements; however, [Chapter 7](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/07_sequences.ipynb#Collections-vs.-Sequences#Mapping) introduces *concrete* iterable data types that can be used to model an *infinite* series of elements and that, consequently, have no concept of \"length.\"" ] }, { "cell_type": "code", - "execution_count": 8, + "execution_count": 23, "metadata": { "slideshow": { "slide_type": "slide" @@ -301,7 +766,7 @@ "40" ] }, - "execution_count": 8, + "execution_count": 23, "metadata": {}, "output_type": "execute_result" } @@ -325,7 +790,7 @@ }, { "cell_type": "code", - "execution_count": 9, + "execution_count": 24, "metadata": { "slideshow": { "slide_type": "fragment" @@ -353,12 +818,12 @@ } }, "source": [ - "Being a container, we can check if a given object is a member of a sequence with the `in` operator. In the context of strings, the `in` operator has *two* usages: First, it checks if a *single* character is contained in a `str` object. Second, it may also check if a shorter `str` object, then called a **substring**, is contained in a longer one." + "Being a container, we can check if a given object is a member of a sequence with the `in` operator. In the context of `str` objects, the `in` operator has *two* usages: First, it checks if a *single* character is contained in a `str` object. Second, it may also check if a shorter `str` object, then called a **substring**, is contained in a longer one." ] }, { "cell_type": "code", - "execution_count": 10, + "execution_count": 25, "metadata": { "slideshow": { "slide_type": "fragment" @@ -371,7 +836,7 @@ "True" ] }, - "execution_count": 10, + "execution_count": 25, "metadata": {}, "output_type": "execute_result" } @@ -382,7 +847,7 @@ }, { "cell_type": "code", - "execution_count": 11, + "execution_count": 26, "metadata": { "slideshow": { "slide_type": "-" @@ -395,7 +860,7 @@ "True" ] }, - "execution_count": 11, + "execution_count": 26, "metadata": {}, "output_type": "execute_result" } @@ -406,7 +871,7 @@ }, { "cell_type": "code", - "execution_count": 12, + "execution_count": 27, "metadata": { "slideshow": { "slide_type": "-" @@ -419,7 +884,7 @@ "False" ] }, - "execution_count": 12, + "execution_count": 27, "metadata": {}, "output_type": "execute_result" } @@ -452,7 +917,7 @@ }, { "cell_type": "code", - "execution_count": 13, + "execution_count": 28, "metadata": { "slideshow": { "slide_type": "slide" @@ -465,7 +930,7 @@ "'W'" ] }, - "execution_count": 13, + "execution_count": 28, "metadata": {}, "output_type": "execute_result" } @@ -476,7 +941,7 @@ }, { "cell_type": "code", - "execution_count": 14, + "execution_count": 29, "metadata": { "slideshow": { "slide_type": "-" @@ -489,7 +954,7 @@ "'H'" ] }, - "execution_count": 14, + "execution_count": 29, "metadata": {}, "output_type": "execute_result" } @@ -511,7 +976,7 @@ }, { "cell_type": "code", - "execution_count": 15, + "execution_count": 30, "metadata": { "slideshow": { "slide_type": "slide" @@ -525,7 +990,7 @@ "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", - "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mschool\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m1.0\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mschool\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m1.0\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mTypeError\u001b[0m: string indices must be integers" ] } @@ -547,7 +1012,7 @@ }, { "cell_type": "code", - "execution_count": 16, + "execution_count": 31, "metadata": { "slideshow": { "slide_type": "slide" @@ -560,7 +1025,7 @@ "'t'" ] }, - "execution_count": 16, + "execution_count": 31, "metadata": {}, "output_type": "execute_result" } @@ -582,7 +1047,7 @@ }, { "cell_type": "code", - "execution_count": 17, + "execution_count": 32, "metadata": { "slideshow": { "slide_type": "-" @@ -596,7 +1061,7 @@ "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mIndexError\u001b[0m Traceback (most recent call last)", - "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mschool\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m40\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mschool\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m40\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mIndexError\u001b[0m: string index out of range" ] } @@ -618,7 +1083,7 @@ }, { "cell_type": "code", - "execution_count": 18, + "execution_count": 33, "metadata": { "slideshow": { "slide_type": "slide" @@ -631,7 +1096,7 @@ "'t'" ] }, - "execution_count": 18, + "execution_count": 33, "metadata": {}, "output_type": "execute_result" } @@ -653,7 +1118,7 @@ }, { "cell_type": "code", - "execution_count": 19, + "execution_count": 34, "metadata": { "slideshow": { "slide_type": "fragment" @@ -666,7 +1131,7 @@ "'O'" ] }, - "execution_count": 19, + "execution_count": 34, "metadata": {}, "output_type": "execute_result" } @@ -677,7 +1142,7 @@ }, { "cell_type": "code", - "execution_count": 20, + "execution_count": 35, "metadata": { "slideshow": { "slide_type": "-" @@ -690,7 +1155,7 @@ "'O'" ] }, - "execution_count": 20, + "execution_count": 35, "metadata": {}, "output_type": "execute_result" } @@ -725,7 +1190,7 @@ }, { "cell_type": "code", - "execution_count": 21, + "execution_count": 36, "metadata": { "slideshow": { "slide_type": "slide" @@ -738,7 +1203,7 @@ "'WHU'" ] }, - "execution_count": 21, + "execution_count": 36, "metadata": {}, "output_type": "execute_result" } @@ -760,7 +1225,7 @@ }, { "cell_type": "code", - "execution_count": 22, + "execution_count": 37, "metadata": { "slideshow": { "slide_type": "fragment" @@ -773,7 +1238,7 @@ "'WHU - Otto Beisheim School of Management'" ] }, - "execution_count": 22, + "execution_count": 37, "metadata": {}, "output_type": "execute_result" } @@ -790,12 +1255,12 @@ } }, "source": [ - "For convenience, the indexes do not need to lie in the range from 0 to the string's \"length\" when slicing. This is *not* the case for indexing as the `IndexError` above shows." + "For convenience, the indexes do not need to lie in the range from 0 to the `str` object's \"length\" when slicing. This is *not* the case for indexing as the `IndexError` above shows." ] }, { "cell_type": "code", - "execution_count": 23, + "execution_count": 38, "metadata": { "slideshow": { "slide_type": "fragment" @@ -808,7 +1273,7 @@ "'WHU - Otto Beisheim School of Management'" ] }, - "execution_count": 23, + "execution_count": 38, "metadata": {}, "output_type": "execute_result" } @@ -830,7 +1295,7 @@ }, { "cell_type": "code", - "execution_count": 24, + "execution_count": 39, "metadata": { "slideshow": { "slide_type": "fragment" @@ -843,7 +1308,7 @@ "'WHU - Otto Beisheim School of Management'" ] }, - "execution_count": 24, + "execution_count": 39, "metadata": {}, "output_type": "execute_result" } @@ -865,7 +1330,7 @@ }, { "cell_type": "code", - "execution_count": 25, + "execution_count": 40, "metadata": { "slideshow": { "slide_type": "skip" @@ -878,7 +1343,7 @@ "'WHU Otto Beisheim School'" ] }, - "execution_count": 25, + "execution_count": 40, "metadata": {}, "output_type": "execute_result" } @@ -900,7 +1365,7 @@ }, { "cell_type": "code", - "execution_count": 26, + "execution_count": 41, "metadata": { "slideshow": { "slide_type": "slide" @@ -913,7 +1378,7 @@ "'WU-Ot esemSho fMngmn'" ] }, - "execution_count": 26, + "execution_count": 41, "metadata": {}, "output_type": "execute_result" } @@ -935,7 +1400,7 @@ }, { "cell_type": "code", - "execution_count": 27, + "execution_count": 42, "metadata": { "slideshow": { "slide_type": "fragment" @@ -948,7 +1413,7 @@ "'tnemeganaM fo loohcS miehsieB ottO - UHW'" ] }, - "execution_count": 27, + "execution_count": 42, "metadata": {}, "output_type": "execute_result" } @@ -985,7 +1450,7 @@ }, { "cell_type": "code", - "execution_count": 28, + "execution_count": 43, "metadata": { "slideshow": { "slide_type": "slide" @@ -999,7 +1464,7 @@ "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", - "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mschool\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m\"E\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mschool\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m\"E\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mTypeError\u001b[0m: 'str' object does not support item assignment" ] } @@ -1021,7 +1486,7 @@ }, { "cell_type": "code", - "execution_count": 29, + "execution_count": 44, "metadata": { "slideshow": { "slide_type": "slide" @@ -1034,7 +1499,7 @@ }, { "cell_type": "code", - "execution_count": 30, + "execution_count": 45, "metadata": { "slideshow": { "slide_type": "fragment" @@ -1047,7 +1512,7 @@ "'EBS - Otto Beisheim School of Management'" ] }, - "execution_count": 30, + "execution_count": 45, "metadata": {}, "output_type": "execute_result" } @@ -1058,7 +1523,7 @@ }, { "cell_type": "code", - "execution_count": 31, + "execution_count": 46, "metadata": { "slideshow": { "slide_type": "-" @@ -1068,10 +1533,10 @@ { "data": { "text/plain": [ - "140148946876144" + "140488987187120" ] }, - "execution_count": 31, + "execution_count": 46, "metadata": {}, "output_type": "execute_result" } @@ -1082,7 +1547,7 @@ }, { "cell_type": "code", - "execution_count": 32, + "execution_count": 47, "metadata": { "slideshow": { "slide_type": "-" @@ -1092,10 +1557,10 @@ { "data": { "text/plain": [ - "140148947850512" + "140488988107952" ] }, - "execution_count": 32, + "execution_count": 47, "metadata": {}, "output_type": "execute_result" } @@ -1128,7 +1593,7 @@ }, { "cell_type": "code", - "execution_count": 33, + "execution_count": 48, "metadata": { "slideshow": { "slide_type": "slide" @@ -1141,7 +1606,7 @@ }, { "cell_type": "code", - "execution_count": 34, + "execution_count": 49, "metadata": { "slideshow": { "slide_type": "-" @@ -1154,7 +1619,7 @@ "'Hello WHU'" ] }, - "execution_count": 34, + "execution_count": 49, "metadata": {}, "output_type": "execute_result" } @@ -1165,7 +1630,7 @@ }, { "cell_type": "code", - "execution_count": 35, + "execution_count": 50, "metadata": { "slideshow": { "slide_type": "-" @@ -1178,7 +1643,7 @@ "'WHU WHU WHU WHU WHU WHU WHU WHU WHU WHU '" ] }, - "execution_count": 35, + "execution_count": 50, "metadata": {}, "output_type": "execute_result" } @@ -1213,7 +1678,7 @@ }, { "cell_type": "code", - "execution_count": 36, + "execution_count": 51, "metadata": { "slideshow": { "slide_type": "slide" @@ -1226,7 +1691,7 @@ "6" ] }, - "execution_count": 36, + "execution_count": 51, "metadata": {}, "output_type": "execute_result" } @@ -1237,7 +1702,7 @@ }, { "cell_type": "code", - "execution_count": 37, + "execution_count": 52, "metadata": { "slideshow": { "slide_type": "-" @@ -1250,7 +1715,7 @@ "-1" ] }, - "execution_count": 37, + "execution_count": 52, "metadata": {}, "output_type": "execute_result" } @@ -1261,7 +1726,7 @@ }, { "cell_type": "code", - "execution_count": 38, + "execution_count": 53, "metadata": { "slideshow": { "slide_type": "fragment" @@ -1274,7 +1739,7 @@ "11" ] }, - "execution_count": 38, + "execution_count": 53, "metadata": {}, "output_type": "execute_result" } @@ -1296,7 +1761,7 @@ }, { "cell_type": "code", - "execution_count": 39, + "execution_count": 54, "metadata": { "slideshow": { "slide_type": "slide" @@ -1309,7 +1774,7 @@ "12" ] }, - "execution_count": 39, + "execution_count": 54, "metadata": {}, "output_type": "execute_result" } @@ -1320,7 +1785,7 @@ }, { "cell_type": "code", - "execution_count": 40, + "execution_count": 55, "metadata": { "slideshow": { "slide_type": "fragment" @@ -1333,7 +1798,7 @@ "16" ] }, - "execution_count": 40, + "execution_count": 55, "metadata": {}, "output_type": "execute_result" } @@ -1344,7 +1809,7 @@ }, { "cell_type": "code", - "execution_count": 41, + "execution_count": 56, "metadata": { "slideshow": { "slide_type": "-" @@ -1357,7 +1822,7 @@ "-1" ] }, - "execution_count": 41, + "execution_count": 56, "metadata": {}, "output_type": "execute_result" } @@ -1379,7 +1844,7 @@ }, { "cell_type": "code", - "execution_count": 42, + "execution_count": 57, "metadata": { "slideshow": { "slide_type": "slide" @@ -1392,7 +1857,7 @@ "4" ] }, - "execution_count": 42, + "execution_count": 57, "metadata": {}, "output_type": "execute_result" } @@ -1414,7 +1879,7 @@ }, { "cell_type": "code", - "execution_count": 43, + "execution_count": 58, "metadata": { "slideshow": { "slide_type": "fragment" @@ -1427,7 +1892,7 @@ "5" ] }, - "execution_count": 43, + "execution_count": 58, "metadata": {}, "output_type": "execute_result" } @@ -1449,7 +1914,7 @@ }, { "cell_type": "code", - "execution_count": 44, + "execution_count": 59, "metadata": { "slideshow": { "slide_type": "skip" @@ -1462,7 +1927,7 @@ "5" ] }, - "execution_count": 44, + "execution_count": 59, "metadata": {}, "output_type": "execute_result" } @@ -1484,7 +1949,7 @@ }, { "cell_type": "code", - "execution_count": 45, + "execution_count": 60, "metadata": { "slideshow": { "slide_type": "slide" @@ -1497,7 +1962,7 @@ }, { "cell_type": "code", - "execution_count": 46, + "execution_count": 61, "metadata": { "slideshow": { "slide_type": "-" @@ -1507,10 +1972,10 @@ { "data": { "text/plain": [ - "140149036799680" + "140489068668384" ] }, - "execution_count": 46, + "execution_count": 61, "metadata": {}, "output_type": "execute_result" } @@ -1521,7 +1986,7 @@ }, { "cell_type": "code", - "execution_count": 47, + "execution_count": 62, "metadata": { "slideshow": { "slide_type": "fragment" @@ -1534,7 +1999,7 @@ }, { "cell_type": "code", - "execution_count": 48, + "execution_count": 63, "metadata": { "slideshow": { "slide_type": "-" @@ -1544,10 +2009,10 @@ { "data": { "text/plain": [ - "140148946626016" + "140488987340848" ] }, - "execution_count": 48, + "execution_count": 63, "metadata": {}, "output_type": "execute_result" } @@ -1569,7 +2034,7 @@ }, { "cell_type": "code", - "execution_count": 49, + "execution_count": 64, "metadata": { "slideshow": { "slide_type": "fragment" @@ -1582,7 +2047,7 @@ "False" ] }, - "execution_count": 49, + "execution_count": 64, "metadata": {}, "output_type": "execute_result" } @@ -1593,7 +2058,7 @@ }, { "cell_type": "code", - "execution_count": 50, + "execution_count": 65, "metadata": { "slideshow": { "slide_type": "-" @@ -1606,7 +2071,7 @@ "True" ] }, - "execution_count": 50, + "execution_count": 65, "metadata": {}, "output_type": "execute_result" } @@ -1628,7 +2093,7 @@ }, { "cell_type": "code", - "execution_count": 51, + "execution_count": 66, "metadata": { "slideshow": { "slide_type": "slide" @@ -1667,7 +2132,7 @@ }, { "cell_type": "code", - "execution_count": 52, + "execution_count": 67, "metadata": { "slideshow": { "slide_type": "slide" @@ -1680,7 +2145,7 @@ }, { "cell_type": "code", - "execution_count": 53, + "execution_count": 68, "metadata": { "slideshow": { "slide_type": "-" @@ -1693,7 +2158,7 @@ }, { "cell_type": "code", - "execution_count": 54, + "execution_count": 69, "metadata": { "slideshow": { "slide_type": "-" @@ -1706,7 +2171,7 @@ "'This will become a sentence'" ] }, - "execution_count": 54, + "execution_count": 69, "metadata": {}, "output_type": "execute_result" } @@ -1728,7 +2193,7 @@ }, { "cell_type": "code", - "execution_count": 55, + "execution_count": 70, "metadata": { "slideshow": { "slide_type": "fragment" @@ -1741,7 +2206,7 @@ "'a b c d e'" ] }, - "execution_count": 55, + "execution_count": 70, "metadata": {}, "output_type": "execute_result" } @@ -1763,7 +2228,7 @@ }, { "cell_type": "code", - "execution_count": 56, + "execution_count": 71, "metadata": { "slideshow": { "slide_type": "slide" @@ -1776,7 +2241,7 @@ "'This is a sentence'" ] }, - "execution_count": 56, + "execution_count": 71, "metadata": {}, "output_type": "execute_result" } @@ -1809,7 +2274,7 @@ }, { "cell_type": "code", - "execution_count": 57, + "execution_count": 72, "metadata": { "slideshow": { "slide_type": "slide" @@ -1824,7 +2289,7 @@ }, { "cell_type": "code", - "execution_count": 58, + "execution_count": 73, "metadata": { "slideshow": { "slide_type": "fragment" @@ -1837,7 +2302,7 @@ "True" ] }, - "execution_count": 58, + "execution_count": 73, "metadata": {}, "output_type": "execute_result" } @@ -1848,7 +2313,7 @@ }, { "cell_type": "code", - "execution_count": 59, + "execution_count": 74, "metadata": { "slideshow": { "slide_type": "-" @@ -1861,7 +2326,7 @@ "False" ] }, - "execution_count": 59, + "execution_count": 74, "metadata": {}, "output_type": "execute_result" } @@ -1878,12 +2343,12 @@ } }, "source": [ - "One way to fix this is to only compare lower-case strings." + "One way to fix this is to only compare lower-cased `str` objects." ] }, { "cell_type": "code", - "execution_count": 60, + "execution_count": 75, "metadata": { "slideshow": { "slide_type": "fragment" @@ -1896,7 +2361,7 @@ "True" ] }, - "execution_count": 60, + "execution_count": 75, "metadata": {}, "output_type": "execute_result" } @@ -1918,7 +2383,7 @@ }, { "cell_type": "code", - "execution_count": 61, + "execution_count": 76, "metadata": { "slideshow": { "slide_type": "slide" @@ -1985,9 +2450,9 @@ } }, "source": [ - "The previous code cell shows an example of a so-called **f-string**, as introduced by [PEP 498](https://www.python.org/dev/peps/pep-0498/) only in 2016.\n", + "The previous code cell shows an example of a so-called **f-string**, as introduced by [PEP 498](https://www.python.org/dev/peps/pep-0498/) only in 2016, that is passed as the argument to the [print()](https://docs.python.org/3/library/functions.html#print) function.\n", "\n", - "So far, we have used the built-in [print()](https://docs.python.org/3/library/functions.html#print) function only with \"plain\" `str` objects (e.g., `\"example\"`) or variables. Sometimes, it is more convenient to fill a value determined at runtime in a \"draft\" `str` object. This is called **string interpolation**. There are three ways to achieve that in Python, but only two are commonly used." + "The \"f\" stands for \"formatted\", and we can think of the `str` object as a text \"draft\" that is filled in with values determined at runtime. This concept is formally called **string interpolation**, and there are three ways to achieve that in Python." ] }, { @@ -2009,12 +2474,12 @@ } }, "source": [ - "f-strings are the new and most readable way. Prepend the literal notation with an `f`, and put variables/expressions within curly braces. The latter are then filled in when the [print()](https://docs.python.org/3/library/functions.html#print) function is executed." + "f-strings, formally called **[formatted string literals](https://docs.python.org/3/reference/lexical_analysis.html#formatted-string-literals)**, are the least recently added and most readable way: We prepend a `str` in literal notation with an `f`, and put variables, or more generally, expressions, within curly braces. These are then filled in when a `str` object is evaluated." ] }, { "cell_type": "code", - "execution_count": 62, + "execution_count": 77, "metadata": { "slideshow": { "slide_type": "slide" @@ -2028,7 +2493,7 @@ }, { "cell_type": "code", - "execution_count": 63, + "execution_count": 78, "metadata": { "slideshow": { "slide_type": "-" @@ -2036,15 +2501,18 @@ }, "outputs": [ { - "name": "stdout", - "output_type": "stream", - "text": [ - "Hello Alexander! Good morning.\n" - ] + "data": { + "text/plain": [ + "'Hello Alexander! Good morning.'" + ] + }, + "execution_count": 78, + "metadata": {}, + "output_type": "execute_result" } ], "source": [ - "print(f\"Hello {name}! Good {time_of_day}.\")" + "f\"Hello {name}! Good {time_of_day}.\"" ] }, { @@ -2055,12 +2523,12 @@ } }, "source": [ - "Separated by a colon `:`, various formatting options are available. In the beginning, only the ability to round is useful, and this can be achieved by adding `:.2f` to the variable name to cast the number as a float and round it to two digits." + "Separated by a colon `:`, various formatting options are available. In the beginning, the ability to round may be particularly useful: This can be achieved by adding `:.2f` to the variable name inside the curly braces, which casts the number as a `float` and rounds it to two digits. The `:.2f` is a so-called format specifier, and there exists a whole **[format specification mini-language](https://docs.python.org/3/library/string.html#formatspec)** to govern how specifiers work." ] }, { "cell_type": "code", - "execution_count": 64, + "execution_count": 79, "metadata": { "slideshow": { "slide_type": "slide" @@ -2073,7 +2541,7 @@ }, { "cell_type": "code", - "execution_count": 65, + "execution_count": 80, "metadata": { "slideshow": { "slide_type": "-" @@ -2081,20 +2549,23 @@ }, "outputs": [ { - "name": "stdout", - "output_type": "stream", - "text": [ - "Pi is 3.14\n" - ] + "data": { + "text/plain": [ + "'Pi is 3.14'" + ] + }, + "execution_count": 80, + "metadata": {}, + "output_type": "execute_result" } ], "source": [ - "print(f\"Pi is {pi:.2f}\")" + "f\"Pi is {pi:.2f}\"" ] }, { "cell_type": "code", - "execution_count": 66, + "execution_count": 81, "metadata": { "slideshow": { "slide_type": "-" @@ -2102,15 +2573,18 @@ }, "outputs": [ { - "name": "stdout", - "output_type": "stream", - "text": [ - "Pi is 3.142\n" - ] + "data": { + "text/plain": [ + "'Pi is 3.142'" + ] + }, + "execution_count": 81, + "metadata": {}, + "output_type": "execute_result" } ], "source": [ - "print(f\"Pi is {pi:.3f}\")" + "f\"Pi is {pi:.3f}\"" ] }, { @@ -2121,7 +2595,7 @@ } }, "source": [ - "### format() Method" + "### [format()](https://docs.python.org/3/library/stdtypes.html#str.format) Method" ] }, { @@ -2132,12 +2606,12 @@ } }, "source": [ - "`str` objects also provide a [format()](https://docs.python.org/3/library/stdtypes.html#str.format) method that accepts an arbitrary number of *positional* arguments that are inserted into the `str` object in the same order replacing curly brackets. See the official [documentation](https://docs.python.org/3/library/string.html#formatspec) for a full reference. This is the more traditional way of string interpolation and many code examples on the internet use it. f-strings are the officially recommended way going forward, but the usage of the [format()](https://docs.python.org/3/library/stdtypes.html#str.format) method is most likely not going down any time soon." + "`str` objects also provide a [format()](https://docs.python.org/3/library/stdtypes.html#str.format) method that accepts an arbitrary number of *positional* arguments that are inserted into the `str` object in the same order replacing empty curly brackets. String interpolation with the [format()](https://docs.python.org/3/library/stdtypes.html#str.format) method is a more traditional and probably the most common way one as of today. While f-strings are the recommended way going forward, usage of the [format()](https://docs.python.org/3/library/stdtypes.html#str.format) method is likely not declining any time soon." ] }, { "cell_type": "code", - "execution_count": 67, + "execution_count": 82, "metadata": { "slideshow": { "slide_type": "slide" @@ -2145,15 +2619,18 @@ }, "outputs": [ { - "name": "stdout", - "output_type": "stream", - "text": [ - "Hello Alexander! Good morning.\n" - ] + "data": { + "text/plain": [ + "'Hello Alexander! Good morning.'" + ] + }, + "execution_count": 82, + "metadata": {}, + "output_type": "execute_result" } ], "source": [ - "print(\"Hello {}! Good {}.\".format(name, time_of_day))" + "\"Hello {}! Good {}.\".format(name, time_of_day)" ] }, { @@ -2164,12 +2641,12 @@ } }, "source": [ - "Use index numbers if the order is different in the `str` object." + "We may use index numbers inside the curly braces if the order is different in the `str` object." ] }, { "cell_type": "code", - "execution_count": 68, + "execution_count": 83, "metadata": { "slideshow": { "slide_type": "fragment" @@ -2177,15 +2654,18 @@ }, "outputs": [ { - "name": "stdout", - "output_type": "stream", - "text": [ - "Good morning, Alexander\n" - ] + "data": { + "text/plain": [ + "'Good morning, Alexander'" + ] + }, + "execution_count": 83, + "metadata": {}, + "output_type": "execute_result" } ], "source": [ - "print(\"Good {1}, {0}\".format(name, time_of_day))" + "\"Good {1}, {0}\".format(name, time_of_day)" ] }, { @@ -2196,12 +2676,12 @@ } }, "source": [ - "The [format()](https://docs.python.org/3/library/stdtypes.html#str.format) method may alternatively be used with *keyword* arguments as well. Then, we must put the keyword names within the curly brackets." + "The [format()](https://docs.python.org/3/library/stdtypes.html#str.format) method may alternatively be used with *keyword* arguments as well. Then, we must put the keywords' names within the curly brackets." ] }, { "cell_type": "code", - "execution_count": 69, + "execution_count": 84, "metadata": { "slideshow": { "slide_type": "fragment" @@ -2209,15 +2689,18 @@ }, "outputs": [ { - "name": "stdout", - "output_type": "stream", - "text": [ - "Hello Alexander! Good morning.\n" - ] + "data": { + "text/plain": [ + "'Hello Alexander! Good morning.'" + ] + }, + "execution_count": 84, + "metadata": {}, + "output_type": "execute_result" } ], "source": [ - "print(\"Hello {name}! Good {time}.\".format(name=name, time=time_of_day))" + "\"Hello {name}! Good {time}.\".format(name=name, time=time_of_day)" ] }, { @@ -2228,12 +2711,60 @@ } }, "source": [ - "Numbers are treated as in the f-strings case." + "Format specifiers work as in the f-string case." ] }, { "cell_type": "code", - "execution_count": 70, + "execution_count": 85, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "'Pi is 3.14'" + ] + }, + "execution_count": 85, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "\"Pi is {:.2f}\".format(pi)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "### `%` Operator" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "The `%` operator that we saw in the context of modulo division in [Chapter 1](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/01_elements.ipynb#%28Arithmetic%29-Operators) is overloaded with string interpolation when its first operand is a `str` object. The second operand consists of all expressions to be filled in. Format specifiers work with a `%` instead of curly braces and according to a different set of rules referred to as **[printf-style string formatting](https://docs.python.org/3/library/stdtypes.html#printf-style-string-formatting)**. So, `{:.2f}` becomes `%.2f`.\n", + "\n", + "This way of string interpolation is the oldest and originates from the [C language](https://en.wikipedia.org/wiki/C_%28programming_language%29). It is still widely spread, but we should use one of the other two ways instead. We show it here mainly for completeness sake." + ] + }, + { + "cell_type": "code", + "execution_count": 86, "metadata": { "slideshow": { "slide_type": "skip" @@ -2241,15 +2772,53 @@ }, "outputs": [ { - "name": "stdout", - "output_type": "stream", - "text": [ - "Pi is 3.14\n" - ] + "data": { + "text/plain": [ + "'Pi is 3.14'" + ] + }, + "execution_count": 86, + "metadata": {}, + "output_type": "execute_result" } ], "source": [ - "print(\"Pi is {:.2f}\".format(pi))" + "\"Pi is %.2f\" % pi" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "To insert more than one expression, we must list them in order and between parenthesis `(` and `)`. As [Chapter 7](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/07_sequences.ipynb#The-tuple-Type) reveals, this literal syntax creates an object of type `tuple`. Also, to format an expression as text, we use the format specifier `%s`." + ] + }, + { + "cell_type": "code", + "execution_count": 87, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "'Hello Alexander! Good morning.'" + ] + }, + "execution_count": 87, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "\"Hello %s! Good %s.\" % (name, time_of_day)" ] }, { @@ -2271,12 +2840,14 @@ } }, "source": [ - "Some symbols have a special meaning within `str` objects. Popular examples are the newline `\\n` and tab `\\t` \"characters.\" The backslash symbol `\\` is also referred to as an **escape character** in this context, indicating that the following character has a meaning other than its literal meaning." + "Some symbols have a special meaning within `str` objects. Popular examples are the newline `\\n` and tab `\\t` \"characters.\" The backslash symbol `\\` is also referred to as an **escape character** in this context, indicating that the following character has a meaning other than its literal meaning.\n", + "\n", + "The built-in [print()](https://docs.python.org/3/library/functions.html#print) function then \"prints\" out these special characters accordingly." ] }, { "cell_type": "code", - "execution_count": 71, + "execution_count": 88, "metadata": { "slideshow": { "slide_type": "slide" @@ -2299,7 +2870,7 @@ }, { "cell_type": "code", - "execution_count": 72, + "execution_count": 89, "metadata": { "slideshow": { "slide_type": "fragment" @@ -2331,7 +2902,7 @@ }, { "cell_type": "code", - "execution_count": 73, + "execution_count": 90, "metadata": { "slideshow": { "slide_type": "fragment" @@ -2350,6 +2921,41 @@ "print(\"\\U0001f604\")" ] }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "Outside the [print()](https://docs.python.org/3/library/functions.html#print) function, the special characters are not treated any different from non-special ones." + ] + }, + { + "cell_type": "code", + "execution_count": 91, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "'This is a sentence\\nthat is printed\\non three lines.'" + ] + }, + "execution_count": 91, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "\"This is a sentence\\nthat is printed\\non three lines.\"" + ] + }, { "cell_type": "markdown", "metadata": { @@ -2376,7 +2982,7 @@ }, { "cell_type": "code", - "execution_count": 74, + "execution_count": 92, "metadata": { "slideshow": { "slide_type": "slide" @@ -2404,12 +3010,12 @@ } }, "source": [ - "Some strings even produce a `SyntaxError` because the `\\U` *cannot* be interpreted as a unicode code point." + "Some `str` objects even produce a `SyntaxError` because the `\\U` *cannot* be interpreted as a unicode code point." ] }, { "cell_type": "code", - "execution_count": 75, + "execution_count": 93, "metadata": { "slideshow": { "slide_type": "fragment" @@ -2418,10 +3024,10 @@ "outputs": [ { "ename": "SyntaxError", - "evalue": "(unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \\UXXXXXXXX escape (, line 1)", + "evalue": "(unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \\UXXXXXXXX escape (, line 1)", "output_type": "error", "traceback": [ - "\u001b[0;36m File \u001b[0;32m\"\"\u001b[0;36m, line \u001b[0;32m1\u001b[0m\n\u001b[0;31m print(\"C:\\Users\\Administrator\\Desktop\\Project_Folder\")\u001b[0m\n\u001b[0m ^\u001b[0m\n\u001b[0;31mSyntaxError\u001b[0m\u001b[0;31m:\u001b[0m (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \\UXXXXXXXX escape\n" + "\u001b[0;36m File \u001b[0;32m\"\"\u001b[0;36m, line \u001b[0;32m1\u001b[0m\n\u001b[0;31m print(\"C:\\Users\\Administrator\\Desktop\\Project_Folder\")\u001b[0m\n\u001b[0m ^\u001b[0m\n\u001b[0;31mSyntaxError\u001b[0m\u001b[0;31m:\u001b[0m (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \\UXXXXXXXX escape\n" ] } ], @@ -2442,7 +3048,7 @@ }, { "cell_type": "code", - "execution_count": 76, + "execution_count": 94, "metadata": { "slideshow": { "slide_type": "slide" @@ -2463,7 +3069,7 @@ }, { "cell_type": "code", - "execution_count": 77, + "execution_count": 95, "metadata": { "slideshow": { "slide_type": "-" @@ -2490,12 +3096,12 @@ } }, "source": [ - "However, this is tedious to remember and type. Luckily, Python allows treating any string literal as a \"raw\" string, and this is indicated in the string literal by a `r` prefix." + "However, this is tedious to remember and type. Luckily, Python allows treating any string literal as \"raw,\" and this is indicated in the string literal by the prefix `r`." ] }, { "cell_type": "code", - "execution_count": 78, + "execution_count": 96, "metadata": { "slideshow": { "slide_type": "slide" @@ -2516,7 +3122,7 @@ }, { "cell_type": "code", - "execution_count": 79, + "execution_count": 97, "metadata": { "slideshow": { "slide_type": "-" @@ -2559,7 +3165,7 @@ }, { "cell_type": "code", - "execution_count": 80, + "execution_count": 98, "metadata": { "slideshow": { "slide_type": "slide" @@ -2568,10 +3174,10 @@ "outputs": [ { "ename": "SyntaxError", - "evalue": "EOL while scanning string literal (, line 1)", + "evalue": "EOL while scanning string literal (, line 1)", "output_type": "error", "traceback": [ - "\u001b[0;36m File \u001b[0;32m\"\"\u001b[0;36m, line \u001b[0;32m1\u001b[0m\n\u001b[0;31m \"\u001b[0m\n\u001b[0m ^\u001b[0m\n\u001b[0;31mSyntaxError\u001b[0m\u001b[0;31m:\u001b[0m EOL while scanning string literal\n" + "\u001b[0;36m File \u001b[0;32m\"\"\u001b[0;36m, line \u001b[0;32m1\u001b[0m\n\u001b[0;31m \"\u001b[0m\n\u001b[0m ^\u001b[0m\n\u001b[0;31mSyntaxError\u001b[0m\u001b[0;31m:\u001b[0m EOL while scanning string literal\n" ] } ], @@ -2594,7 +3200,7 @@ }, { "cell_type": "code", - "execution_count": 81, + "execution_count": 99, "metadata": { "slideshow": { "slide_type": "fragment" @@ -2616,12 +3222,12 @@ } }, "source": [ - "Linebreaks are kept and implicitly converted into `\\n` characters." + "Line breaks are kept and implicitly converted into `\\n` characters." ] }, { "cell_type": "code", - "execution_count": 82, + "execution_count": 100, "metadata": { "slideshow": { "slide_type": "fragment" @@ -2634,7 +3240,7 @@ "'\\nI am a multi-line string\\nconsisting of 4 lines.\\n'" ] }, - "execution_count": 82, + "execution_count": 100, "metadata": {}, "output_type": "execute_result" } @@ -2656,7 +3262,7 @@ }, { "cell_type": "code", - "execution_count": 83, + "execution_count": 101, "metadata": { "slideshow": { "slide_type": "fragment" @@ -2686,12 +3292,12 @@ } }, "source": [ - "Using the [split()](https://docs.python.org/3/library/stdtypes.html#str.split) method with the optional *sep* argument, we confirm that `multi_line` consists of *four* lines with the first and last linebreaks being the first and last characters in the `str` object." + "Using the [split()](https://docs.python.org/3/library/stdtypes.html#str.split) method with the optional *sep* argument, we confirm that `multi_line` consists of *four* lines with the first and last line breaks being the first and last characters in the `str` object." ] }, { "cell_type": "code", - "execution_count": 84, + "execution_count": 102, "metadata": { "slideshow": { "slide_type": "slide" @@ -2714,6 +3320,82 @@ " print(i, line)" ] }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "The next code cell puts several constructs from this chapter together to create a multi-line `str` object `content`: The `with` statement provides a context that ensures `file` is not left open. Then, the [readlines()](https://docs.python.org/3/library/io.html#io.IOBase.readlines) method returns the contents of `file` as a `list` object holding as many `str` objects as there are lines in the file on disk. Lastly, we concatenate these together with the [join()](https://docs.python.org/3/library/stdtypes.html#str.join) method to obtain `content`. We do so on an empty `str` object `\"\"` as each line already ends with a `\"\\n\"`." + ] + }, + { + "cell_type": "code", + "execution_count": 103, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [], + "source": [ + "with open(\"lorem_ipsum.txt\") as file:\n", + " content = \"\".join(file.readlines())" + ] + }, + { + "cell_type": "code", + "execution_count": 104, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "\"Lorem Ipsum is simply dummy text of the printing and typesetting industry.\\nLorem Ipsum has been the industry's standard dummy text ever since the 1500s\\nwhen an unknown printer took a galley of type and scrambled it to make a type\\nspecimen book. It has survived not only five centuries but also the leap into\\nelectronic typesetting, remaining essentially unchanged. It was popularised in\\nthe 1960s with the release of Letraset sheets.\\n\"" + ] + }, + "execution_count": 104, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "content" + ] + }, + { + "cell_type": "code", + "execution_count": 105, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Lorem Ipsum is simply dummy text of the printing and typesetting industry.\n", + "Lorem Ipsum has been the industry's standard dummy text ever since the 1500s\n", + "when an unknown printer took a galley of type and scrambled it to make a type\n", + "specimen book. It has survived not only five centuries but also the leap into\n", + "electronic typesetting, remaining essentially unchanged. It was popularised in\n", + "the 1960s with the release of Letraset sheets.\n", + "\n" + ] + } + ], + "source": [ + "print(content)" + ] + }, { "cell_type": "markdown", "metadata": { @@ -2735,7 +3417,7 @@ "source": [ "Textual data is modeled with the **immutable** `str` type.\n", "\n", - "The `str` type supports *four* orthogonal **abstract concepts** that together make up the idea of a **sequence**: Every `str` object is an iterable container of a finite number of ordered characters." + "The `str` type supports *four* orthogonal **abstract concepts** that together constitute the idea of a **sequence**: Every `str` object is an iterable container of a finite number of ordered characters." ] } ], diff --git a/06_text_review_and_exercises.ipynb b/06_text_review_and_exercises.ipynb index 304a91d..40ac8c4 100644 --- a/06_text_review_and_exercises.ipynb +++ b/06_text_review_and_exercises.ipynb @@ -5,7 +5,7 @@ "metadata": {}, "source": [ "\n", - "# Chapter 6: Text" + "# Chapter 6: Bytes & Text" ] }, { @@ -54,7 +54,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "**Q2**: What is a direct consequence of the `str` type's property of being **ordered**? What operations could we *not* do with it if it were *unordered*?" + "**Q2**: What is a direct consequence of the `str` type's property of being an **ordered** sequence? What operations could we *not* do with it if it were *unordered*?" ] }, { @@ -87,7 +87,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "**Q4**: Describe in your own words what we mean with **string interpolation**!" + "**Q4**: Describe in your own words what we mean by **string interpolation**!" ] }, { @@ -115,7 +115,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "**Q5**: **Triple-double** quotes `\"\"\"` and **triple-single** quotes `'''` create a *new* object of type `text` that model so-called **multi-line strings**." + "**Q5**: **Triple-double** quotes `\"\"\"` and **triple-single** quotes `'''` create a *new* object of type `text` that models so-called **multi-line strings**." ] }, { @@ -143,7 +143,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "**Q7**: Indexing into a `str` object with a *negative* index **fails silently**: It does *not* raise an error but also does *not* do anything useful." + "**Q7**: Indexing into a `str` object with a *negative* index **fails silently**: It does *neither* raise an error *nor* do anything useful." ] }, { diff --git a/07_sequences.ipynb b/07_sequences.ipynb new file mode 100644 index 0000000..11add0a --- /dev/null +++ b/07_sequences.ipynb @@ -0,0 +1,10742 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# Chapter 7: Sequential Data" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "We studied numbers (cf., [Chapter 5](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/05_numbers.ipynb)) and textual data (cf., [Chapter 6](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/06_text.ipynb)) first, mainly because objects of the presented data types are \"simple,\" for two reasons: First, they are *immutable*, and, as we saw in the \"*Who am I? And how many?*\" section in [Chapter 1](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/01_elements.ipynb#Who-am-I?-And-how-many?), mutable objects can quickly become hard to reason about. Second, they are \"flat\" in the sense that they are *not* composed of other objects.\n", + "\n", + "The `str` type is a bit of a corner case in this regard. While one could argue that a longer `str` object, for example, `\"text\"`, is composed of individual characters, this is *not* the case in memory as the literal `\"text\"` only creates *one* object (i.e., one \"bag\" of $0$s and $1$s modeling all characters).\n", + "\n", + "This chapter and Chapter 8 introduce various \"complex\" data types. While some are mutable and others are not, they all share that they are primarily used to \"manage,\" or structure, the memory in a program. Unsurprisingly, computer scientists refer to the ideas and theories behind these data types as **[data structures](https://en.wikipedia.org/wiki/Data_structure)**.\n", + "\n", + "In this chapter, we focus on data types that model all kinds of sequential data. Examples of such data are [spreadsheets](https://en.wikipedia.org/wiki/Spreadsheet) or [matrices](https://en.wikipedia.org/wiki/Matrix_%28mathematics%29)/[vectors](https://en.wikipedia.org/wiki/Vector_%28mathematics_and_physics%29). Such formats share the property that they are composed of smaller units that come in a sequence of, for example, rows/columns/cells or elements/entries." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "## Collections vs. Sequences" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "[Chapter 6](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/06_text.ipynb#A-\"String\"-of-Characters) already describes the *sequence* properties of `str` objects. Here, we take a step back and study these properties on their own before looking at bigger ideas.\n", + "\n", + "The [collections.abc](https://docs.python.org/3/library/collections.abc.html) module in the [standard library](https://docs.python.org/3/library/index.html) defines a variety of **abstract base classes** (ABCs). We saw ABCs already in [Chapter 5](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/05_numbers.ipynb#The-Numerical-Tower), where we use the ones from the [numbers](https://docs.python.org/3/library/numbers.html) module in the [standard library](https://docs.python.org/3/library/index.html) to classify Python's numeric data types according to mathematical ideas. Now, we take the ABCs from the [collections.abc](https://docs.python.org/3/library/collections.abc.html) module to classify the data types in this chapter according to their behavior in various contexts.\n", + "\n", + "As an illustration, consider `numbers` and `word` below, two objects of *different* types." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [], + "source": [ + "numbers = [7, 11, 8, 5, 3, 12, 2, 6, 9, 10, 1, 4]\n", + "word = \"random\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "They have in common that we may loop over them with the `for` statement. So, in the context of iteration, both exhibit the *same* behavior." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "7 11 8 5 3 12 2 6 9 10 1 4 " + ] + } + ], + "source": [ + "for number in numbers:\n", + " print(number, end=\" \")" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "r a n d o m " + ] + } + ], + "source": [ + "for character in word:\n", + " print(character, end=\" \")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "In [Chapter 4](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/04_iteration.ipynb#Containers-vs.-Iterables), we referred to such types as *iterables*. That is *not* a proper [English](https://dictionary.cambridge.org/spellcheck/english-german/?q=iterable) word, even if it may sound like one at first sight. Yet, it is an official term in the Python world formalized with the `Iterable` ABC in the [collections.abc](https://docs.python.org/3/library/collections.abc.html) module.\n", + "\n", + "For the data science practitioner, it is worthwhile to know such terms as, for example, the documentation on the [built-ins](https://docs.python.org/3/library/functions.html) uses them extensively: In simple words, any built-in that takes an argument called \"*iterable*\" may be called with *any* object that supports being looped over. Already familiar [built-ins](https://docs.python.org/3/library/functions.html) include, among others, [enumerate()](https://docs.python.org/3/library/functions.html#enumerate), [sum()](https://docs.python.org/3/library/functions.html#sum), or [zip()](https://docs.python.org/3/library/functions.html#zip). So, they do *not* require the argument to be of a certain concrete data type (e.g., `list`); instead, any *iterable* type works." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [], + "source": [ + "import collections.abc as abc" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "collections.abc.Iterable" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "abc.Iterable" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "As in the context of *goose typing* in [Chapter 5](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/05_numbers.ipynb#Goose-Typing), we can use ABCs with the built-in [isinstance()](https://docs.python.org/3/library/functions.html#isinstance) function to check if an object supports a behavior.\n", + "\n", + "So, let's \"ask\" Python if it can loop over `numbers` or `word`." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "isinstance(numbers, abc.Iterable)" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "isinstance(word, abc.Iterable)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "Contrary to `list` or `str` objects, numeric objects are *not* iterable." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "False" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "isinstance(999, abc.Iterable)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "Instead of asking, we could try to loop over `999`, but this results in a `TypeError`." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [ + { + "ename": "TypeError", + "evalue": "'int' object is not iterable", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0;32mfor\u001b[0m \u001b[0mdigit\u001b[0m \u001b[0;32min\u001b[0m \u001b[0;36m999\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 2\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdigit\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;31mTypeError\u001b[0m: 'int' object is not iterable" + ] + } + ], + "source": [ + "for digit in 999:\n", + " print(digit)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "Most of the data types in this and the next chapter exhibit three [orthogonal](https://en.wikipedia.org/wiki/Orthogonality) (i.e., \"independent\") behaviors, formalized by ABCs in the [collections.abc](https://docs.python.org/3/library/collections.abc.html) module as:\n", + "- `Iterable`: An object supports being looped over.\n", + "- `Container`: An object \"contains\" references to other objects; a \"whole\" is composed of many \"parts.\"\n", + "- `Sized`: The number of references to other objects, the \"parts,\" is *finite*.\n", + "\n", + "The characteristical operation supported by `Container` types is the `in` operator for membership testing." + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "False" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "0 in numbers" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "\"r\" in word" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "Alternatively, we could also check if `numbers` and `word` are `Container` types with the [isinstance()](https://docs.python.org/3/library/functions.html#isinstance) function." + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "isinstance(numbers, abc.Container)" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "isinstance(word, abc.Container)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "Numeric objects do *not* \"contain\" references to other objects, and that is why they are considered \"flat\" data types. The `in` operator raises a `TypeError`. Conceptually speaking, Python views numeric types as \"wholes\" without any \"parts.\"" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "False" + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "isinstance(999, abc.Container)" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [ + { + "ename": "TypeError", + "evalue": "argument of type 'int' is not iterable", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0;36m9\u001b[0m \u001b[0;32min\u001b[0m \u001b[0;36m999\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;31mTypeError\u001b[0m: argument of type 'int' is not iterable" + ] + } + ], + "source": [ + "9 in 999" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "Analogously, being `Sized` types, we can pass `numbers` and `word` as the argument to the built-in [len()](https://docs.python.org/3/library/functions.html#len) function and obtain \"meaningful\" results. The exact meaning depends on the *concrete* data type: For `numbers`, [len()](https://docs.python.org/3/library/functions.html#len) tells us how many elements are in the `list` object; for `word`, it tells us how many [Unicode characters](https://en.wikipedia.org/wiki/Unicode) make up the `str` object. But, *abstractly* speaking, both data types exhibit the *same* behavior of *finiteness*." + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "12" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "len(numbers)" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "6" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "len(word)" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 18, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "isinstance(numbers, abc.Sized)" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 19, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "isinstance(word, abc.Sized)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "On the contrary, even though `999` consists of three digits for humans, numeric objects in Python have no concept of a \"size\" or \"length,\" and the [len()](https://docs.python.org/3/library/functions.html#len) function raises a `TypeError`." + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "False" + ] + }, + "execution_count": 20, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "isinstance(999, abc.Sized)" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [ + { + "ename": "TypeError", + "evalue": "object of type 'int' has no len()", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m999\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;31mTypeError\u001b[0m: object of type 'int' has no len()" + ] + } + ], + "source": [ + "len(999)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "These three behaviors are so essential that whenever they coincide for a data type, it is called a **collection**, formalized with the `Collection` ABC. That is where the [collections.abc](https://docs.python.org/3/library/collections.abc.html) module got its name from: It summarizes all ABCs related to collections; in particular, it defines a hierarchy of specialized kinds of collections.\n", + "\n", + "So, both `numbers` and `word` are collections." + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 22, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "isinstance(numbers, abc.Collection)" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 23, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "isinstance(word, abc.Collection)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "They share one more common behavior: When looping over them, we can *predict* the *order* of the elements or characters. The ABC in the [collections.abc](https://docs.python.org/3/library/collections.abc.html) module corresponding to this behavior is `Reversible`. While sounding unintuitive at first, it is evident that if something is reversible, it must have a forward order, to begin with.\n", + "\n", + "We add the [reversed()](https://docs.python.org/3/library/functions.html#reversed) built-in to the `for`-loop from above to iterate over the elements or characters in reverse order." + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "4 1 10 9 6 2 12 3 5 8 11 7 " + ] + } + ], + "source": [ + "for number in reversed(numbers):\n", + " print(number, end=\" \")" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "m o d n a r " + ] + } + ], + "source": [ + "for character in reversed(word):\n", + " print(character, end=\" \")" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 26, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "isinstance(numbers, abc.Reversible)" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 27, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "isinstance(word, abc.Reversible)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "Collections that exhibit this fourth behavior are referred to as **sequences**, formalized with the `Sequence` ABC in the [collections.abc](https://docs.python.org/3/library/collections.abc.html) module." + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 28, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "isinstance(numbers, abc.Sequence)" + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 29, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "isinstance(word, abc.Sequence)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "Most of the data types introduced in the remainder of this chapter are sequences. Nevertheless, we also look at some data types that are neither collections nor sequences but still useful to model sequential data in practice.\n", + "\n", + "In Python-related documentations, the terms collection and sequence are heavily used, and the data science practitioner should always think of them in terms of the three or four behaviors they exhibit.\n", + "\n", + "Data types that are collections but not sequences are covered in Chapter 8." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "## The `list` Type" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "As already seen multiple times, to create a `list` object, we use the *literal notation* and list all elements within brackets `[` and `]`." + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [], + "source": [ + "empty = []" + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [], + "source": [ + "simple = [40, 50]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "The elements do *not* need to be of the *same* type, and `list` objects may also be **nested**." + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "nested = [empty, 10, 20.0, \"Thirty\", simple]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "[PythonTutor](http://www.pythontutor.com/visualize.html#code=empty%20%3D%20%5B%5D%0Asimple%20%3D%20%5B40,%2050%5D%0Anested%20%3D%20%5Bempty,%2010,%2020.0,%20%22Thirty%22,%20simple%5D&cumulative=false&curInstr=0&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false) shows how `nested` holds references to the `empty` and `simple` objects. Technically, it holds three more references pointing to the `10`, `20.0`, and `\"Thirty\"` objects as well. However, to simplify the visualization, these three objects are shown right inside the `nested` object as they are immutable and of \"flat\" data types. In general, the $0$s and $1$s inside a `list` object in memory always constitute pointers to other objects only." + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "[[], 10, 20.0, 'Thirty', [40, 50]]" + ] + }, + "execution_count": 33, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "nested" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "Let's not forget that `nested` is an object on its own with an *identity* and *data type*." + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "140322034424136" + ] + }, + "execution_count": 34, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "id(nested)" + ] + }, + { + "cell_type": "code", + "execution_count": 35, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "list" + ] + }, + "execution_count": 35, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "type(nested)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "Alternatively, we may use the [list()](https://docs.python.org/3/library/functions.html#func-list) built-in to create a `list` object out of an iterable we pass to it as the argument.\n", + "\n", + "For example, we can wrap the [range()](https://docs.python.org/3/library/functions.html#func-range) built-in with [list()](https://docs.python.org/3/library/functions.html#func-list): As described in [Chapter 4](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/04_iteration.ipynb#Containers-vs.-Iterables), `range` objects, like `range(1, 13)` below, are iterable and generate `int` objects \"on the fly\" (i.e., one by one). The [list()](https://docs.python.org/3/library/functions.html#func-list) around it acts like a `for`-loop and **materializes** twelve `int` objects in memory that then become the elements of the newly created `list` object. [PythonTutor](http://www.pythontutor.com/visualize.html#code=r%20%3D%20range%281,%2013%29%0Al%20%3D%20list%28range%281,%2013%29%29&cumulative=false&curInstr=0&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false) shows this difference visually." + ] + }, + { + "cell_type": "code", + "execution_count": 36, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]" + ] + }, + "execution_count": 36, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "list(range(1, 13))" + ] + }, + { + "cell_type": "code", + "execution_count": 37, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 37, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "isinstance(range(1, 13), abc.Iterable)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "Beware of passing a `range` object over a \"big\" horizon as the argument to [list()](https://docs.python.org/3/library/functions.html#func-list) as that may lead to a `MemoryError` and the computer crashing." + ] + }, + { + "cell_type": "code", + "execution_count": 38, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [ + { + "ename": "MemoryError", + "evalue": "", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mMemoryError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mlist\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mrange\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m999_999_999_999\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;31mMemoryError\u001b[0m: " + ] + } + ], + "source": [ + "list(range(999_999_999_999))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "As another example, we may also create a `list` object from a `str` object as the latter is iterable, as well. Then, the individual characters become the elements of the new `list` object!" + ] + }, + { + "cell_type": "code", + "execution_count": 39, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "['W', 'H', 'U']" + ] + }, + "execution_count": 39, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "list(\"WHU\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "### Sequence Behaviors" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "`list` objects are *sequences*. To reiterate that concept from above *without* the formal ABCs from the [collections.abc](https://docs.python.org/3/library/collections.abc.html) module, we briefly summarize the *four* behaviors of a sequence and provide some more `list`-specific details below:" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "- **Container**:\n", + " - holds references to other objects in memory (with their own *identity* and *type*)\n", + " - implements membership testing via the `in` operator\n", + "- **Iterable**:\n", + " - supports being looped over\n", + " - works with the `for` or `while` statements\n", + "- **Reversible**:\n", + " - the elements come in a *predictable* order that we may traverse in a forward or backward fashion\n", + " - works with the [reversed()](https://docs.python.org/3/library/functions.html#reversed) built-in\n", + "- **Sized**:\n", + " - the number of elements is finite *and* known in advance\n", + " - works with the built-in [len()](https://docs.python.org/3/library/functions.html#len) function" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "The \"length\" of `nested` is *five* because `simple` counts as only *one* element. In other words, `nested` holds five references to other objects." + ] + }, + { + "cell_type": "code", + "execution_count": 40, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "5" + ] + }, + "execution_count": 40, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "len(nested)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "With a `for`-loop, we can traverse all elements in a *predictable* order, forward or backward. As `list` objects only hold references to other objects, these have a *indentity* and may be of *different* types; however, this is rarely, if ever, useful in practice." + ] + }, + { + "cell_type": "code", + "execution_count": 41, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[] \t140322025703432 \t\n", + "10 \t94360180081984 \t\n", + "20.0 \t140322034534104 \t\n", + "Thirty \t140322025251816 \t\n", + "[40, 50] \t140322034424072 \t\n" + ] + } + ], + "source": [ + "for element in nested:\n", + " print(element, id(element), type(element), sep=\" \\t\")" + ] + }, + { + "cell_type": "code", + "execution_count": 42, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[40, 50] Thirty 20.0 10 [] " + ] + } + ], + "source": [ + "for element in reversed(nested):\n", + " print(element, end=\" \")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "The `in` operator checks if a given object is \"contained\" in a `list` object. It uses the `==` operator behind the scenes (i.e., *not* the `is` operator) conducting a so-called **[linear search](https://en.wikipedia.org/wiki/Linear_search)**: So, Python implicitly loops over *all* elements and only stops prematurely if an element evaluates equal to the given object. A linear search may, therefore, be relatively *slow* for big `list` objects." + ] + }, + { + "cell_type": "code", + "execution_count": 43, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 43, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "10 in nested" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "`20` compares equal to the `20.0` in `nested`." + ] + }, + { + "cell_type": "code", + "execution_count": 44, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 44, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "20 in nested" + ] + }, + { + "cell_type": "code", + "execution_count": 45, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "False" + ] + }, + "execution_count": 45, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "30 in nested" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "### Indexing" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "Because of the *predictable* order and the *finiteness*, each element in a sequence can be labeled with a unique *index* (i.e., an `int` object in the range $0 \\leq \\text{index} < \\lvert \\text{sequence} \\rvert$).\n", + "\n", + "Brackets, `[` and `]`, are the literal syntax for accessing individual elements of any sequence type. In this book, we also call them the *indexing operator*." + ] + }, + { + "cell_type": "code", + "execution_count": 46, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "10" + ] + }, + "execution_count": 46, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "nested[1]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "The last index is one less than `len(nested)` above, and Python raises an `IndexError` if we look up an index that is not in the implied range." + ] + }, + { + "cell_type": "code", + "execution_count": 47, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "ename": "IndexError", + "evalue": "list index out of range", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mIndexError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mnested\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m5\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;31mIndexError\u001b[0m: list index out of range" + ] + } + ], + "source": [ + "nested[5]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "Negative indices are used to count in reverse order from the end of a sequence, and brackets may be chained to access nested objects. So, to access the `50` inside `simple` via the `nested` object, we write `nested[-1][1]`." + ] + }, + { + "cell_type": "code", + "execution_count": 48, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "50" + ] + }, + "execution_count": 48, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "nested[-1][1]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "### Slicing" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "Slicing `list` objects works analogously to slicing `str` objects: We use the literal syntax with either one or two colons `:` inside the brackets `[]` to separate the *start*, *stop*, and *step* values. Slicing creates a *new* `list` object with the elements chosen from the original one.\n", + "\n", + "For example, to obtain the three elements in the \"middle\" of `nested`, we slice from `1` (including) to `4` (excluding)." + ] + }, + { + "cell_type": "code", + "execution_count": 49, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "[10, 20.0, 'Thirty']" + ] + }, + "execution_count": 49, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "nested[1:4]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "To obtain \"every other\" element, we slice from beginning to end, defaulting to `0` and `len(nested)`, in steps of `2`." + ] + }, + { + "cell_type": "code", + "execution_count": 50, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "[[], 20.0, [40, 50]]" + ] + }, + "execution_count": 50, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "nested[::2]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "The literal notation with the colons `:` is *syntactic sugar*, and Python provides the [slice()](https://docs.python.org/3/library/functions.html#slice) built-in to slice with `slice` objects. [slice()](https://docs.python.org/3/library/functions.html#slice) takes *start*, *stop*, and *step* arguments in the same way as the already familiar [range()](https://docs.python.org/3/library/functions.html#func-range) built-in." + ] + }, + { + "cell_type": "code", + "execution_count": 51, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [], + "source": [ + "middle = slice(1, 4)" + ] + }, + { + "cell_type": "code", + "execution_count": 52, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "slice" + ] + }, + "execution_count": 52, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "type(middle)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "In most cases, the literal notation is more convenient to use; however, with `slice` objects, we may give names to slices and re-use them across several sequences." + ] + }, + { + "cell_type": "code", + "execution_count": 53, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "[10, 20.0, 'Thirty']" + ] + }, + "execution_count": 53, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "nested[middle]" + ] + }, + { + "cell_type": "code", + "execution_count": 54, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "[11, 8, 5]" + ] + }, + "execution_count": 54, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "numbers[middle]" + ] + }, + { + "cell_type": "code", + "execution_count": 55, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "'and'" + ] + }, + "execution_count": 55, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "word[middle]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "`slice` objects come with three read-only attributes `start`, `stop`, and `step` on them." + ] + }, + { + "cell_type": "code", + "execution_count": 56, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "1" + ] + }, + "execution_count": 56, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "middle.start" + ] + }, + { + "cell_type": "code", + "execution_count": 57, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "4" + ] + }, + "execution_count": 57, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "middle.stop" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "If not passed to [slice()](https://docs.python.org/3/library/functions.html#slice), these attributes default to `None`. That is why the cell below has no output." + ] + }, + { + "cell_type": "code", + "execution_count": 58, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [], + "source": [ + "middle.step" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "A good trick to know is taking a \"full\" slice: This copies *all* elements of a `list` object into a *new* `list` object." + ] + }, + { + "cell_type": "code", + "execution_count": 59, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [], + "source": [ + "nested_copy = nested[:]" + ] + }, + { + "cell_type": "code", + "execution_count": 60, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "[[], 10, 20.0, 'Thirty', [40, 50]]" + ] + }, + "execution_count": 60, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "nested_copy" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "At first glance, `nested` and `nested_copy` seem to cause no pain. For `list` objects, the comparison operator `==` goes over the elements in both operands in a pairwise fashion and checks if they all evaluate equal.\n", + "\n", + "We confirm that `nested` and `nested_copy` compare equal as expected but also that they are *different* objects." + ] + }, + { + "cell_type": "code", + "execution_count": 61, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 61, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "nested == nested_copy" + ] + }, + { + "cell_type": "code", + "execution_count": 62, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "False" + ] + }, + "execution_count": 62, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "nested is nested_copy" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "However, as [PythonTutor](http://pythontutor.com/visualize.html#code=nested%20%3D%20%5B%5B%5D,%2010,%2020.0,%20%22Thirty%22,%20%5B40,%2050%5D%5D%0Anested_copy%20%3D%20nested%5B%3A%5D&cumulative=false&curInstr=0&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false) reveals, only the *pointers* to the elements are copied! That concept is called a **[shallow copy](https://en.wikipedia.org/wiki/Object_copying#Shallow_copy)**.\n", + "\n", + "We could also see this with the [id()](https://docs.python.org/3/library/functions.html#id) function: The respective first elements in both `nested` and `nested_copy` are the *same* `empty` object." + ] + }, + { + "cell_type": "code", + "execution_count": 63, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 63, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "nested[0] is nested_copy[0]" + ] + }, + { + "cell_type": "code", + "execution_count": 64, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "[]" + ] + }, + "execution_count": 64, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "nested[0]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "Knowing this becomes critical if the elements in a `list` object are mutable objects. Then, because of the original `list` object and its copy both pointing at the *same* objects in memory, if some of them are mutated, these changes are visible to both! We already saw a similar kind of confusion in [Chapter 1](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/01_elements.ipynb#Who-am-I?-And-how-many?) in a \"simpler\" setting and look into this in detail in the next section.\n", + "\n", + "Instead of a shallow copy, we could also create a so-called **[deep copy](https://en.wikipedia.org/wiki/Object_copying#Deep_copy)** of `nested`: That concept recursively follows every pointer in a possible nested data structure and creates copies of *every* involved object.\n", + "\n", + "To explicitly create shallow or deep copies, the [copy](https://docs.python.org/3/library/copy.html) module in the [standard library](https://docs.python.org/3/library/index.html) provides two functions, [copy()](https://docs.python.org/3/library/copy.html#copy.copy) and [deepcopy()](https://docs.python.org/3/library/copy.html#copy.deepcopy). We must always remember that slicing creates *shallow* copies only." + ] + }, + { + "cell_type": "code", + "execution_count": 65, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [], + "source": [ + "import copy" + ] + }, + { + "cell_type": "code", + "execution_count": 66, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [], + "source": [ + "nested_deep_copy = copy.deepcopy(nested)" + ] + }, + { + "cell_type": "code", + "execution_count": 67, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 67, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "nested == nested_deep_copy" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "Now, the first elements of `nested` and `nested_deep_copy` are *different* objects, and [PythonTutor](http://pythontutor.com/visualize.html#code=import%20copy%0Anested%20%3D%20%5B%5B%5D,%2010,%2020.0,%20%22Thirty%22,%20%5B40,%2050%5D%5D%0Anested_deep_copy%20%3D%20copy.deepcopy%28nested%29&cumulative=false&curInstr=0&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false) shows that there are *six* `list` objects in memory." + ] + }, + { + "cell_type": "code", + "execution_count": 68, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "False" + ] + }, + "execution_count": 68, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "nested[0] is nested_deep_copy[0]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "As this [StackOverflow question](https://stackoverflow.com/questions/184710/what-is-the-difference-between-a-deep-copy-and-a-shallow-copy) shows, understanding shallow and deep copies is a common source of confusion independent of the programming language." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "### Mutability" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "In contrast to `str` objects, `list` objects are *mutable*: We may assign new elements to indices or slices and also remove elements. That changes *parts* of a `list` object in memory." + ] + }, + { + "cell_type": "code", + "execution_count": 69, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [], + "source": [ + "nested[0] = 0" + ] + }, + { + "cell_type": "code", + "execution_count": 70, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "[0, 10, 20.0, 'Thirty', [40, 50]]" + ] + }, + "execution_count": 70, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "nested" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "When we re-assign a slice, we can even change the size of the `list` object." + ] + }, + { + "cell_type": "code", + "execution_count": 71, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "nested[:4] = [100, 100, 100] # assign three elements where there were four before" + ] + }, + { + "cell_type": "code", + "execution_count": 72, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "[100, 100, 100, [40, 50]]" + ] + }, + "execution_count": 72, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "nested" + ] + }, + { + "cell_type": "code", + "execution_count": 73, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "4" + ] + }, + "execution_count": 73, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "len(nested)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "The `list` object's identity does *not* change. That is the whole point behind mutable objects." + ] + }, + { + "cell_type": "code", + "execution_count": 74, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "140322034424136" + ] + }, + "execution_count": 74, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "id(nested)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "`nested_copy` is still unchanged!" + ] + }, + { + "cell_type": "code", + "execution_count": 75, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "[[], 10, 20.0, 'Thirty', [40, 50]]" + ] + }, + "execution_count": 75, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "nested_copy" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "Let's change the nested `[40, 50]` via `nested_copy` into `[1, 2, 3]` by replacing all its elements." + ] + }, + { + "cell_type": "code", + "execution_count": 76, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [], + "source": [ + "nested_copy[-1][:] = [1, 2, 3]" + ] + }, + { + "cell_type": "code", + "execution_count": 77, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "[[], 10, 20.0, 'Thirty', [1, 2, 3]]" + ] + }, + "execution_count": 77, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "nested_copy" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "This has a surprising side effect on `nested`!" + ] + }, + { + "cell_type": "code", + "execution_count": 78, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "[100, 100, 100, [1, 2, 3]]" + ] + }, + "execution_count": 78, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "nested" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "That is because `nested_copy` is a shallow copy of `nested`. [PythonTutor](http://pythontutor.com/visualize.html#code=nested%20%3D%20%5B%5B%5D,%2010,%2020.0,%20%22Thirty%22,%20%5B40,%2050%5D%5D%0Anested_copy%20%3D%20nested%5B%3A%5D%0Anested%5B%3A4%5D%20%3D%20%5B100,%20100,%20100%5D%0Anested_copy%5B-1%5D%5B%3A%5D%20%3D%20%5B1,%202,%203%5D&cumulative=false&curInstr=0&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false) shows how both point to the *same* nested `list` object.\n", + "\n", + "Lastly, we use the `del` statement to remove an element." + ] + }, + { + "cell_type": "code", + "execution_count": 79, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [], + "source": [ + "del nested[-1]" + ] + }, + { + "cell_type": "code", + "execution_count": 80, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "[100, 100, 100]" + ] + }, + "execution_count": 80, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "nested" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "The `del` statement, of course, also works for slices." + ] + }, + { + "cell_type": "code", + "execution_count": 81, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "del nested[:2]" + ] + }, + { + "cell_type": "code", + "execution_count": 82, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "[100]" + ] + }, + "execution_count": 82, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "nested" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "### List Operations" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "As with `str` objects, the `+` and `*` operators are overloaded for concatenation and always return *new* `list` objects." + ] + }, + { + "cell_type": "code", + "execution_count": 83, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [], + "source": [ + "first = [10, 20, 30]\n", + "second = [40, 50, 60]" + ] + }, + { + "cell_type": "code", + "execution_count": 84, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "[10, 20, 30, 40, 50, 60]" + ] + }, + "execution_count": 84, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "first + second" + ] + }, + { + "cell_type": "code", + "execution_count": 85, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "[10, 20, 30, 10, 20, 30]" + ] + }, + "execution_count": 85, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "2 * first" + ] + }, + { + "cell_type": "code", + "execution_count": 86, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "[40, 50, 60, 40, 50, 60, 40, 50, 60]" + ] + }, + "execution_count": 86, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "second * 3" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "Besides being an operator, the `*` symbol has a second syntactical use, as explained in [PEP 3132](https://www.python.org/dev/peps/pep-3132/) and [PEP 448](https://www.python.org/dev/peps/pep-0448/): It implements what is called **iterable unpacking**. It is *not* an operator syntactically but a notation that Python processes as a literal.\n", + "\n", + "In the example, Python interprets the expression as if the elements of the iterable `second` were placed between `30` and `70` one by one. So, we do not obtain a nested but a *flat* list." + ] + }, + { + "cell_type": "code", + "execution_count": 87, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "[30, 40, 50, 60, 70]" + ] + }, + "execution_count": 87, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "[30, *second, 70]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "### List Methods" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "The `list` type is an essential data structure in any real-world Python application, and many typical `list` related algorithms from computer science theory are already built into it at the C level (cf., the [documentation](https://docs.python.org/3/tutorial/datastructures.html#more-on-lists) for a full overview). So, understanding and applying the built-in methods of the `list` type not only speeds up the development process but also makes programs significantly faster.\n", + "\n", + "In contrast to the `str` type's methods, the `list` type's methods *always* mutate (i.e., \"change\") an object *in place*. They do *not* create a *new* `list` object and return `None` to indicate that. So, we must *never* assign the return value of `list` methods to the variable holding the list!\n", + "\n", + "Let's look at the following `names` example." + ] + }, + { + "cell_type": "code", + "execution_count": 88, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [], + "source": [ + "names = [\"Carl\", \"Berthold\", \"Achim\", \"Xavier\", \"Peter\"]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "To add an object to the end of `names`, we use the append() method. The code cell shows no output indicating that `None` must have been returned." + ] + }, + { + "cell_type": "code", + "execution_count": 89, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "names.append(\"Eckardt\")" + ] + }, + { + "cell_type": "code", + "execution_count": 90, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "['Carl', 'Berthold', 'Achim', 'Xavier', 'Peter', 'Eckardt']" + ] + }, + "execution_count": 90, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "names" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "With the extend() method, we may also append multiple elements provided by an iterable at once. Here, the iterable is a `list` object itself holding two `str` objects." + ] + }, + { + "cell_type": "code", + "execution_count": 91, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "names.extend([\"Karl\", \"Oliver\"])" + ] + }, + { + "cell_type": "code", + "execution_count": 92, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "['Carl', 'Berthold', 'Achim', 'Xavier', 'Peter', 'Eckardt', 'Karl', 'Oliver']" + ] + }, + "execution_count": 92, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "names" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "`list` objects may be sorted *in place* with the [sort()](https://docs.python.org/3/library/stdtypes.html#list.sort) method. That is different from the built-in [sorted()](https://docs.python.org/3/library/functions.html#sorted) function that takes any *finite* and *iterable* object and returns a *new* `list` object with the iterable's elements sorted." + ] + }, + { + "cell_type": "code", + "execution_count": 93, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "['Achim', 'Berthold', 'Carl', 'Eckardt', 'Karl', 'Oliver', 'Peter', 'Xavier']" + ] + }, + "execution_count": 93, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "sorted(names)" + ] + }, + { + "cell_type": "code", + "execution_count": 94, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "['Carl', 'Berthold', 'Achim', 'Xavier', 'Peter', 'Eckardt', 'Karl', 'Oliver']" + ] + }, + "execution_count": 94, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "names" + ] + }, + { + "cell_type": "code", + "execution_count": 95, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "names.sort()" + ] + }, + { + "cell_type": "code", + "execution_count": 96, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "['Achim', 'Berthold', 'Carl', 'Eckardt', 'Karl', 'Oliver', 'Peter', 'Xavier']" + ] + }, + "execution_count": 96, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "names" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "To sort in reverse order, we pass a keyword-only `reverse=True` argument to either the [sort()](https://docs.python.org/3/library/stdtypes.html#list.sort) method or the [sorted()](https://docs.python.org/3/library/functions.html#sorted) function. In the latter case, we could also use the [reversed()](https://docs.python.org/3/library/functions.html#reversed) built-in instead; however, that *neither* returns a new `list` object *nor* changes the existing one in place. We revisit it at the end of this chapter." + ] + }, + { + "cell_type": "code", + "execution_count": 97, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [], + "source": [ + "names.sort(reverse=True)" + ] + }, + { + "cell_type": "code", + "execution_count": 98, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "['Xavier', 'Peter', 'Oliver', 'Karl', 'Eckardt', 'Carl', 'Berthold', 'Achim']" + ] + }, + "execution_count": 98, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "names" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "Both, the [sort()](https://docs.python.org/3/library/stdtypes.html#list.sort) method and the [sorted()](https://docs.python.org/3/library/functions.html#sorted) function, also accept a keyword-only `key` argument that must be a reference to a `function` object accepting one positional argument. Then, the elements in the `list` object are passed to that on a one-by-one basis, and the return values are used as the **sort keys**.\n", + "\n", + "For example, to sort `names` not by alphabet but by the names' lengths, we pass in a reference to the built-in [len()](https://docs.python.org/3/library/functions.html#len) function as `key=len`. Note that there are *no* parentheses after `len`!" + ] + }, + { + "cell_type": "code", + "execution_count": 99, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "names.sort(key=len)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "If two names have the same length, their relative order is kept as is. A [sorting algorithm](https://en.wikipedia.org/wiki/Sorting_algorithm) is called **[stable](https://en.wikipedia.org/wiki/Sorting_algorithm#Stability)** if it has that property. That is why `\"Karl\"` comes before `\"Carl\" ` below.\n", + "\n", + "Sorting is an important topic in programming and we refer to the official [HOWTO](https://docs.python.org/3/howto/sorting.html) for a more comprehensive introduction." + ] + }, + { + "cell_type": "code", + "execution_count": 100, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "['Karl', 'Carl', 'Peter', 'Achim', 'Xavier', 'Oliver', 'Eckardt', 'Berthold']" + ] + }, + "execution_count": 100, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "names" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "The pop() method removes the last element from a `list` object *and* returns it." + ] + }, + { + "cell_type": "code", + "execution_count": 101, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "'Berthold'" + ] + }, + "execution_count": 101, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "names.pop()" + ] + }, + { + "cell_type": "code", + "execution_count": 102, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "['Karl', 'Carl', 'Peter', 'Achim', 'Xavier', 'Oliver', 'Eckardt']" + ] + }, + "execution_count": 102, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "names" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "It takes an optional index argument and removes that instead." + ] + }, + { + "cell_type": "code", + "execution_count": 103, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "'Carl'" + ] + }, + "execution_count": 103, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "names.pop(1)" + ] + }, + { + "cell_type": "code", + "execution_count": 104, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "['Karl', 'Peter', 'Achim', 'Xavier', 'Oliver', 'Eckardt']" + ] + }, + "execution_count": 104, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "names" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "Instead of removing an element by its index, we can remove it by its value with the remove() method. Behind the scenes, Python then compares the object passed as its argument, `\"Peter\"` in the example, sequentially to each element with the `==` operator and removes the first one that evaluates equal." + ] + }, + { + "cell_type": "code", + "execution_count": 105, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [], + "source": [ + "names.remove(\"Peter\")" + ] + }, + { + "cell_type": "code", + "execution_count": 106, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "['Karl', 'Achim', 'Xavier', 'Oliver', 'Eckardt']" + ] + }, + "execution_count": 106, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "names" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "remove() raises a `ValueError` if the value is not found." + ] + }, + { + "cell_type": "code", + "execution_count": 107, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [ + { + "ename": "ValueError", + "evalue": "list.remove(x): x not in list", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mnames\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mremove\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"Peter\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;31mValueError\u001b[0m: list.remove(x): x not in list" + ] + } + ], + "source": [ + "names.remove(\"Peter\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "`list` objects implement an index() method that returns the position of the first occurrence of an element. It fails loudly with a `ValueError` if the element cannot be found by value." + ] + }, + { + "cell_type": "code", + "execution_count": 108, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "['Karl', 'Achim', 'Xavier', 'Oliver', 'Eckardt']" + ] + }, + "execution_count": 108, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "names" + ] + }, + { + "cell_type": "code", + "execution_count": 109, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "3" + ] + }, + "execution_count": 109, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "names.index(\"Oliver\")" + ] + }, + { + "cell_type": "code", + "execution_count": 110, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "ename": "ValueError", + "evalue": "'Carl' is not in list", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mnames\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mindex\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"Carl\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;31mValueError\u001b[0m: 'Carl' is not in list" + ] + } + ], + "source": [ + "names.index(\"Carl\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "The count() method returns the number of occurrences of a value." + ] + }, + { + "cell_type": "code", + "execution_count": 111, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "1" + ] + }, + "execution_count": 111, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "names.count(\"Xavier\")" + ] + }, + { + "cell_type": "code", + "execution_count": 112, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "0" + ] + }, + "execution_count": 112, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "names.count(\"Yves\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "### List Comparison" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "The relational operators also work with `list` objects; yet another example of operator overloading.\n", + "\n", + "Comparison is made in a pairwise fashion until the first pair of elements does not evaluate equal or one of the `list` objects ends. The exact comparison rules depend on the elements and not the `list` object. We say that comparison is **[delegated](https://en.wikipedia.org/wiki/Delegation_(object-oriented_programming))** to the objects to be compared. Usually, all elements are of the *same* type. Then, the comparison is straightforward and conceptually the same as for string comparison in [Chapter 6](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/06_text.ipynb#String-Comparison)." + ] + }, + { + "cell_type": "code", + "execution_count": 113, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "['Karl', 'Achim', 'Xavier', 'Oliver', 'Eckardt']" + ] + }, + "execution_count": 113, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "names" + ] + }, + { + "cell_type": "code", + "execution_count": 114, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "False" + ] + }, + "execution_count": 114, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "names < [\"Karl\", \"Achim\", \"Oliver\", \"Xavier\", \"Eckardt\"]" + ] + }, + { + "cell_type": "code", + "execution_count": 115, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 115, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "names < [\"Karl\", \"Xavier\", \"Achim\", \"Oliver\", \"Eckardt\"]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "The shorter `list` object is considered \"smaller,\" and vice versa." + ] + }, + { + "cell_type": "code", + "execution_count": 116, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "False" + ] + }, + "execution_count": 116, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "names < [\"Karl\", \"Achim\", \"Xavier\", \"Oliver\"]" + ] + }, + { + "cell_type": "code", + "execution_count": 117, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 117, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "names < [\"Karl\", \"Achim\", \"Xavier\", \"Oliver\", \"Eckardt\", \"Peter\"]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "### Modifiers vs. Pure Functions" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "As `list` objects are mutable, the caller of a function can see the changes made to a `list` object passed to the function as an argument. That is often a surprising *side effect* and should be avoided.\n", + "\n", + "As an example, consider the `add_xyz()` function." + ] + }, + { + "cell_type": "code", + "execution_count": 118, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [], + "source": [ + "letters = [\"a\", \"b\", \"c\"]" + ] + }, + { + "cell_type": "code", + "execution_count": 119, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [], + "source": [ + "def add_xyz(arg):\n", + " \"\"\"Append letters to a list.\"\"\"\n", + " arg.extend([\"x\", \"y\", \"z\"])\n", + " return arg" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "While this function is being executed, two variables, namely `letters` in the global scope and `arg` inside the function's local scope, point to the *same* `list` object in memory. Furthermore, the passed in `arg` is also the return value.\n", + "\n", + "So, after the function call, `letters_with_xyz` and `letters` are **aliases** as well, pointing to the *same* object." + ] + }, + { + "cell_type": "code", + "execution_count": 120, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "letters_with_xyz = add_xyz(letters)" + ] + }, + { + "cell_type": "code", + "execution_count": 121, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "['a', 'b', 'c', 'x', 'y', 'z']" + ] + }, + "execution_count": 121, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "letters_with_xyz" + ] + }, + { + "cell_type": "code", + "execution_count": 122, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "['a', 'b', 'c', 'x', 'y', 'z']" + ] + }, + "execution_count": 122, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "letters" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "A better practice is to first create a copy of `arg` within the function that is then modified and returned. If we are sure that `arg` contains immutable elements only, we get away with a shallow copy. The downside of this approach is the higher amount of memory necessary.\n", + "\n", + "The revised `add_xyz()` function below is more natural to reason about as it does *not* modify the passed in `arg` internally. This approach is following the **[functional programming](https://en.wikipedia.org/wiki/Functional_programming)** paradigm that is going through a \"renaissance\" currently. Two essential characteristics of functional programming are that a function *never* changes its inputs and *always* returns the same output given the same inputs.\n", + "\n", + "For a beginner, it is probably better to stick to this idea and not change any arguments as the original `add_xyz()` above. However, functions that modify and return the argument passed in are an important aspect of object-oriented programming, as explained in Chapter 9." + ] + }, + { + "cell_type": "code", + "execution_count": 123, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [], + "source": [ + "letters = [\"a\", \"b\", \"c\"]" + ] + }, + { + "cell_type": "code", + "execution_count": 124, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [], + "source": [ + "def add_xyz(arg):\n", + " \"\"\"Create a new list from an existing one.\"\"\"\n", + " new_arg = arg[:] # a shallow copy is good enough here\n", + " new_arg.extend([\"x\", \"y\", \"z\"])\n", + " return new_arg" + ] + }, + { + "cell_type": "code", + "execution_count": 125, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "letters_with_xyz = add_xyz(letters)" + ] + }, + { + "cell_type": "code", + "execution_count": 126, + "metadata": { + "scrolled": true, + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "['a', 'b', 'c', 'x', 'y', 'z']" + ] + }, + "execution_count": 126, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "letters_with_xyz" + ] + }, + { + "cell_type": "code", + "execution_count": 127, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "['a', 'b', 'c']" + ] + }, + "execution_count": 127, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "letters" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "If we want to modify the argument passed in, it is best to return `None` and not `arg`, as does the final version of `add_xyz()` below. Then, the user of our function cannot accidentally create two aliases to the same object. That is also why the list methods above all return `None`." + ] + }, + { + "cell_type": "code", + "execution_count": 128, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [], + "source": [ + "letters = [\"a\", \"b\", \"c\"]" + ] + }, + { + "cell_type": "code", + "execution_count": 129, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [], + "source": [ + "def add_xyz(arg):\n", + " \"\"\"Append letters to a list.\"\"\"\n", + " arg.extend([\"x\", \"y\", \"z\"])\n", + " return # = None" + ] + }, + { + "cell_type": "code", + "execution_count": 130, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "add_xyz(letters)" + ] + }, + { + "cell_type": "code", + "execution_count": 131, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "['a', 'b', 'c', 'x', 'y', 'z']" + ] + }, + "execution_count": 131, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "letters" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "If we call `add_xyz()` with `letters` as the argument again, we end up with an even longer `list` object." + ] + }, + { + "cell_type": "code", + "execution_count": 132, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [], + "source": [ + "add_xyz(letters)" + ] + }, + { + "cell_type": "code", + "execution_count": 133, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "['a', 'b', 'c', 'x', 'y', 'z', 'x', 'y', 'z']" + ] + }, + "execution_count": 133, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "letters" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "Functions that only work on the argument passed in are called **modifiers**. Their primary purpose is to change the **state** of the argument. On the contrary, functions that have *no* side effects on the arguments are said to be **pure**." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "## The `tuple` Type" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "To create a `tuple` object, we can use the same literal notation as for `list` objects *without* the brackets and list all elements." + ] + }, + { + "cell_type": "code", + "execution_count": 134, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [], + "source": [ + "numbers = 7, 11, 8, 5, 3, 12, 2, 6, 9, 10, 1, 4" + ] + }, + { + "cell_type": "code", + "execution_count": 135, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "(7, 11, 8, 5, 3, 12, 2, 6, 9, 10, 1, 4)" + ] + }, + "execution_count": 135, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "numbers" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "However, to be clearer, many Pythonistas write out the optional parentheses `(` and `)`." + ] + }, + { + "cell_type": "code", + "execution_count": 136, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "numbers = (7, 11, 8, 5, 3, 12, 2, 6, 9, 10, 1, 4)" + ] + }, + { + "cell_type": "code", + "execution_count": 137, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "(7, 11, 8, 5, 3, 12, 2, 6, 9, 10, 1, 4)" + ] + }, + "execution_count": 137, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "numbers" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "As before, `numbers` is an object on its own." + ] + }, + { + "cell_type": "code", + "execution_count": 138, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "140322025210648" + ] + }, + "execution_count": 138, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "id(numbers)" + ] + }, + { + "cell_type": "code", + "execution_count": 139, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "tuple" + ] + }, + "execution_count": 139, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "type(numbers)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "While we could use empty parentheses `()` to create an empty `tuple` object ..." + ] + }, + { + "cell_type": "code", + "execution_count": 140, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [], + "source": [ + "empty_tuple = ()" + ] + }, + { + "cell_type": "code", + "execution_count": 141, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "()" + ] + }, + "execution_count": 141, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "empty_tuple" + ] + }, + { + "cell_type": "code", + "execution_count": 142, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "tuple" + ] + }, + "execution_count": 142, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "type(empty_tuple)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "... we must use a *trailing comma* to create a `tuple` object holding one element. If we forget the comma, the parentheses are interpreted as the grouping operator and effectively useless!" + ] + }, + { + "cell_type": "code", + "execution_count": 143, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [], + "source": [ + "one_tuple = (1,) # we could ommit the parentheses but not the comma" + ] + }, + { + "cell_type": "code", + "execution_count": 144, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "(1,)" + ] + }, + "execution_count": 144, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "one_tuple" + ] + }, + { + "cell_type": "code", + "execution_count": 145, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "tuple" + ] + }, + "execution_count": 145, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "type(one_tuple)" + ] + }, + { + "cell_type": "code", + "execution_count": 146, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [], + "source": [ + "no_tuple = (1)" + ] + }, + { + "cell_type": "code", + "execution_count": 147, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "1" + ] + }, + "execution_count": 147, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "no_tuple" + ] + }, + { + "cell_type": "code", + "execution_count": 148, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "int" + ] + }, + "execution_count": 148, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "type(no_tuple)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "Alternatively, we may use the [tuple()](https://docs.python.org/3/library/functions.html#func-tuple) built-in that takes any iterable as its argument and creates a new `tuple` from its elements." + ] + }, + { + "cell_type": "code", + "execution_count": 149, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "(1,)" + ] + }, + "execution_count": 149, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "tuple([1])" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "### Tuples are like \"Immutable Lists\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "Most operations involving `tuple` objects work in the same way as with `list` objects. The main difference is that `tuple` objects are *immutable*. So, if our program does not depend on mutability, we may and should use `tuple` and not `list` objects to model sequential data. That way, we avoid the pitfalls seen above.\n", + "\n", + "`tuple` objects are *sequences* exhibiting the familiar *four* behaviors." + ] + }, + { + "cell_type": "code", + "execution_count": 150, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 150, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "isinstance(numbers, abc.Sequence)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + " So, `numbers` holds a *finite* number of elements ..." + ] + }, + { + "cell_type": "code", + "execution_count": 151, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "12" + ] + }, + "execution_count": 151, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "len(numbers)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "... that we can obtain individually by looping over it in a predictable *forward* or *reverse* order." + ] + }, + { + "cell_type": "code", + "execution_count": 152, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "7 11 8 5 3 12 2 6 9 10 1 4 " + ] + } + ], + "source": [ + "for number in numbers:\n", + " print(number, end=\" \")" + ] + }, + { + "cell_type": "code", + "execution_count": 153, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "4 1 10 9 6 2 12 3 5 8 11 7 " + ] + } + ], + "source": [ + "for number in reversed(numbers):\n", + " print(number, end=\" \")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "To check if a given object is *contained* in `numbers`, we use the `in` operator and conduct a linear search." + ] + }, + { + "cell_type": "code", + "execution_count": 154, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "False" + ] + }, + "execution_count": 154, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "0 in numbers" + ] + }, + { + "cell_type": "code", + "execution_count": 155, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 155, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "1 in numbers" + ] + }, + { + "cell_type": "code", + "execution_count": 156, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 156, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "1.0 in numbers # in relies on == behind the scenes" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "We may index and slice with the `[]` operator. The latter returns *new* `tuple` objects." + ] + }, + { + "cell_type": "code", + "execution_count": 157, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "7" + ] + }, + "execution_count": 157, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "numbers[0]" + ] + }, + { + "cell_type": "code", + "execution_count": 158, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "4" + ] + }, + "execution_count": 158, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "numbers[-1]" + ] + }, + { + "cell_type": "code", + "execution_count": 159, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "(2, 6, 9, 10, 1, 4)" + ] + }, + "execution_count": 159, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "numbers[6:]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "Index assignment does *not* work as tuples are *immutable* and results in a `TypeError`." + ] + }, + { + "cell_type": "code", + "execution_count": 160, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [ + { + "ename": "TypeError", + "evalue": "'tuple' object does not support item assignment", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mnumbers\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;36m99\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;31mTypeError\u001b[0m: 'tuple' object does not support item assignment" + ] + } + ], + "source": [ + "numbers[-1] = 99" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "If we need to \"modify\" the `tuple` object, we must create a *new* `tuple` object, for example, like so: We take a slice of the elements we want to keep and use the overloaded `+` operator to concatenate the slice with another `tuple` object." + ] + }, + { + "cell_type": "code", + "execution_count": 161, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "new_numbers = numbers[:-1] + (99,)" + ] + }, + { + "cell_type": "code", + "execution_count": 162, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "(7, 11, 8, 5, 3, 12, 2, 6, 9, 10, 1, 99)" + ] + }, + "execution_count": 162, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "new_numbers" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "The `*` operator works as well." + ] + }, + { + "cell_type": "code", + "execution_count": 163, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "(7, 11, 8, 5, 3, 12, 2, 6, 9, 10, 1, 4, 7, 11, 8, 5, 3, 12, 2, 6, 9, 10, 1, 4)" + ] + }, + "execution_count": 163, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "2 * numbers" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "Being immutable, `tuple` objects only provide the count() and index() methods." + ] + }, + { + "cell_type": "code", + "execution_count": 164, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "0" + ] + }, + "execution_count": 164, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "numbers.count(0)" + ] + }, + { + "cell_type": "code", + "execution_count": 165, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "10" + ] + }, + "execution_count": 165, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "numbers.index(1)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "The relational operators compare the elements of two `tuple` objects in a pairwise fashion as above." + ] + }, + { + "cell_type": "code", + "execution_count": 166, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 166, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "numbers < new_numbers" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "While `tuple` objects are immutable, this only relates to the references they hold. If a `tuple` object contains mutable objects, the entire nested structure is *not* immutable as a whole.\n", + "\n", + "Consider the following stylized example `not_immutable`: It contains *three* elements, `1`, `[2, ..., 11]`, and `12`, and the elements of the nested `list` object may be changed. While it is not practical to mix data types in a `tuple` object that is used as an \"immutable list,\" we want to make the point that the mere usage of the `tuple` type does *not* guarantee a nested object to be immutable." + ] + }, + { + "cell_type": "code", + "execution_count": 167, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [], + "source": [ + "not_immutable = (1, [2, 3, 4, 5, 6, 7, 8, 9, 10, 11], 12)" + ] + }, + { + "cell_type": "code", + "execution_count": 168, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [], + "source": [ + "not_immutable[1][:] = [99, 99, 99]" + ] + }, + { + "cell_type": "code", + "execution_count": 169, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "(1, [99, 99, 99], 12)" + ] + }, + "execution_count": 169, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "not_immutable" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "### Packing & Unpacking" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "In the \"*List Operations*\" section above, the `*` symbol **unpacks** the elements of a `list` object into another one. This idea of *iterable unpacking* is built into Python at various places, even *without* the `*` symbol.\n", + "\n", + "For example, we may write variables on the left-hand side of a `=` statement in a `tuple` style. Then, any *finite* iterable on the right-hand side is unpacked. So, `numbers` is unpacked into *twelve* variables below." + ] + }, + { + "cell_type": "code", + "execution_count": 170, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [], + "source": [ + "n1, n2, n3, n4, n5, n6, n7, n8, n9, n10, n11, n12 = numbers" + ] + }, + { + "cell_type": "code", + "execution_count": 171, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "7" + ] + }, + "execution_count": 171, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "n1" + ] + }, + { + "cell_type": "code", + "execution_count": 172, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "11" + ] + }, + "execution_count": 172, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "n2" + ] + }, + { + "cell_type": "code", + "execution_count": 173, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "8" + ] + }, + "execution_count": 173, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "n3" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "Having to type twelve variables on the left is already tedious. Furthermore, if the iterable on the right yields a number of elements *different* from the number of variables, we get a `ValueError`." + ] + }, + { + "cell_type": "code", + "execution_count": 174, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [ + { + "ename": "ValueError", + "evalue": "too many values to unpack (expected 11)", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mn1\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mn2\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mn3\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mn4\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mn5\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mn6\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mn7\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mn8\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mn9\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mn10\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mn11\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mnumbers\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;31mValueError\u001b[0m: too many values to unpack (expected 11)" + ] + } + ], + "source": [ + "n1, n2, n3, n4, n5, n6, n7, n8, n9, n10, n11 = numbers" + ] + }, + { + "cell_type": "code", + "execution_count": 175, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [ + { + "ename": "ValueError", + "evalue": "not enough values to unpack (expected 13, got 12)", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mn1\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mn2\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mn3\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mn4\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mn5\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mn6\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mn7\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mn8\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mn9\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mn10\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mn11\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mn12\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mn13\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mnumbers\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;31mValueError\u001b[0m: not enough values to unpack (expected 13, got 12)" + ] + } + ], + "source": [ + "n1, n2, n3, n4, n5, n6, n7, n8, n9, n10, n11, n12, n13 = numbers" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "So, to make iterable unpacking useful, we prepend the `*` symbol to *one* of the variables on the left: That variable then becomes a `list` object holding the elements not captured by the other variables. We say that the excess elements from the iterable are **packed** into this variable.\n", + "\n", + "For example, let's get the `first` and `last` element of `numbers` and collect the rest in `middle`." + ] + }, + { + "cell_type": "code", + "execution_count": 176, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [], + "source": [ + "first, *middle, last = numbers" + ] + }, + { + "cell_type": "code", + "execution_count": 177, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "7" + ] + }, + "execution_count": 177, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "first" + ] + }, + { + "cell_type": "code", + "execution_count": 178, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "[11, 8, 5, 3, 12, 2, 6, 9, 10, 1]" + ] + }, + "execution_count": 178, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "middle" + ] + }, + { + "cell_type": "code", + "execution_count": 179, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "4" + ] + }, + "execution_count": 179, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "last" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "If we do not need the `middle` elements, we go with the underscore `_` convention and \"throw\" them away." + ] + }, + { + "cell_type": "code", + "execution_count": 180, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [], + "source": [ + "first, *_, last = numbers" + ] + }, + { + "cell_type": "code", + "execution_count": 181, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "7" + ] + }, + "execution_count": 181, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "first" + ] + }, + { + "cell_type": "code", + "execution_count": 182, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "4" + ] + }, + "execution_count": 182, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "last" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "We already used unpacking before this section without knowing it. Whenever we write a `for`-loop over the [zip()](https://docs.python.org/3/library/functions.html#zip) built-in, that generates a new `tuple` object in each iteration, which we unpack by listing several loop variables.\n", + "\n", + "So, the `name, position` acts like a left-hand side of an `=` statement and unpacks the `tuple` objects generated from \"zipping\" the `names` list and the `positions` tuple together." + ] + }, + { + "cell_type": "code", + "execution_count": 183, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [], + "source": [ + "positions = (\"goalkeeper\", \"defender\", \"midfielder\", \"striker\", \"coach\")" + ] + }, + { + "cell_type": "code", + "execution_count": 184, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Karl is a goalkeeper\n", + "Achim is a defender\n", + "Xavier is a midfielder\n", + "Oliver is a striker\n", + "Eckardt is a coach\n" + ] + } + ], + "source": [ + "for name, position in zip(names, positions):\n", + " print(name, \"is a\", position)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "Without unpacking, [zip()](https://docs.python.org/3/library/functions.html#zip) generates a series of `tuple` objects." + ] + }, + { + "cell_type": "code", + "execution_count": 185, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + " ('Karl', 'goalkeeper')\n", + " ('Achim', 'defender')\n", + " ('Xavier', 'midfielder')\n", + " ('Oliver', 'striker')\n", + " ('Eckardt', 'coach')\n" + ] + } + ], + "source": [ + "for pair in zip(names, positions):\n", + " print(type(pair), pair, sep=\" \")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "Unpacking also works for nested objects. Below, we wrap [zip()](https://docs.python.org/3/library/functions.html#zip) with the [enumerate()](https://docs.python.org/3/library/functions.html#enumerate) built-in to have an index variable `i` inside the `for`-loop. In each iteration, a `tuple` object consisting of `i` and another `tuple` object is created. The inner one then holds the `name` and `position`." + ] + }, + { + "cell_type": "code", + "execution_count": 186, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "0 -> Karl is a goalkeeper\n", + "1 -> Achim is a defender\n", + "2 -> Xavier is a midfielder\n", + "3 -> Oliver is a striker\n", + "4 -> Eckardt is a coach\n" + ] + } + ], + "source": [ + "for i, (name, position) in enumerate(zip(names, positions)):\n", + " print(i, \"->\", name, \"is a\", position)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "#### Swapping Variables" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "A popular use case of unpacking is **swapping** two variables.\n", + "\n", + "Consider `a` and `b` below." + ] + }, + { + "cell_type": "code", + "execution_count": 187, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [], + "source": [ + "a = 0\n", + "b = 1" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "Without unpacking, we must use a temporary variable `temp` to swap `a` and `b`." + ] + }, + { + "cell_type": "code", + "execution_count": 188, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "temp = a\n", + "a = b\n", + "b = temp" + ] + }, + { + "cell_type": "code", + "execution_count": 189, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "(1, 0)" + ] + }, + "execution_count": 189, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "a, b" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "With unpacking, the solution is more elegant, and also a bit faster as well. *All* expressions on the right-hand side are evaluated *before* any assignment takes place." + ] + }, + { + "cell_type": "code", + "execution_count": 190, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [], + "source": [ + "a = 0\n", + "b = 1" + ] + }, + { + "cell_type": "code", + "execution_count": 191, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [], + "source": [ + "a, b = b, a" + ] + }, + { + "cell_type": "code", + "execution_count": 192, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "(1, 0)" + ] + }, + "execution_count": 192, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "a, b" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "#### Example: [Fibonacci Numbers](https://en.wikipedia.org/wiki/Fibonacci_number) (revisited)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "Unpacking allows us to rewrite the iterative `fibonacci()` function from [Chapter 4](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/04_iteration.ipynb#\"Hard-at-first-Glance\"-Example:-Fibonacci-Numbers-%28revisited%29) in a concise way, now also supporting *goose typing* with the [numbers](https://docs.python.org/3/library/numbers.html) module from the [standard library](https://docs.python.org/3/library/index.html)." + ] + }, + { + "cell_type": "code", + "execution_count": 193, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [], + "source": [ + "import numbers" + ] + }, + { + "cell_type": "code", + "execution_count": 194, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [], + "source": [ + "def fibonacci(i):\n", + " \"\"\"Calculate the ith Fibonacci number.\n", + "\n", + " Args:\n", + " i (int): index of the Fibonacci number to calculate\n", + "\n", + " Returns:\n", + " ith_fibonacci (int)\n", + "\n", + " Raises:\n", + " TypeError: if i is not an integer\n", + " ValueError: if i is not positive\n", + " \"\"\"\n", + " if not isinstance(i, numbers.Integral):\n", + " raise TypeError(\"i must be an integer\")\n", + " elif i < 0:\n", + " raise ValueError(\"i must be non-negative\")\n", + "\n", + " a, b = 0, 1\n", + "\n", + " for _ in range(i - 1):\n", + " a, b = b, a + b\n", + "\n", + " return b" + ] + }, + { + "cell_type": "code", + "execution_count": 195, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "144" + ] + }, + "execution_count": 195, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "fibonacci(12)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "#### Function Definitions & Calls" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "The concepts of packing and unpacking are also helpful when writing and using functions.\n", + "\n", + "For example, let's look at the `product()` function below. Its implementation suggests that `args` must be a sequence type. Otherwise, it would not make sense to index into it with `[0]` or take a slice with `[1:]`. In line with the function's name, the `for`-loop multiplies all elements of the `args` sequence. So, what does the `*` do in the header line, and what is the exact data type of `args`?\n", + "\n", + "The `*` is again *not* an operator in this context but a special syntax that makes Python *pack* all *positional* arguments passed to `product()` into a single `tuple` object called `args`." + ] + }, + { + "cell_type": "code", + "execution_count": 196, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [], + "source": [ + "def product(*args):\n", + " \"\"\"Multiply all arguments.\"\"\"\n", + " result = args[0]\n", + "\n", + " for arg in args[1:]:\n", + " result *= arg\n", + "\n", + " return result" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "So, we can pass an *arbitrary* (i.e., also none) number of *positional* arguments to `product()`.\n", + "\n", + "The product of just one number is the number itself." + ] + }, + { + "cell_type": "code", + "execution_count": 197, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "42" + ] + }, + "execution_count": 197, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "product(42)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "Passing in several numbers works as expected." + ] + }, + { + "cell_type": "code", + "execution_count": 198, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "100" + ] + }, + "execution_count": 198, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "product(2, 5, 10)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "However, this implementation of `product()` needs *at least* one argument passed in due to the expression `args[0]` used internally. Otherwise, we see a *runtime* error, namely an `IndexError`. We emphasize that this error is *not* caused in the header line." + ] + }, + { + "cell_type": "code", + "execution_count": 199, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [ + { + "ename": "IndexError", + "evalue": "tuple index out of range", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mIndexError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mproduct\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;32m\u001b[0m in \u001b[0;36mproduct\u001b[0;34m(*args)\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mproduct\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[0;34m\"\"\"Multiply all arguments.\"\"\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 3\u001b[0;31m \u001b[0mresult\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0margs\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 4\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0marg\u001b[0m \u001b[0;32min\u001b[0m \u001b[0margs\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;31mIndexError\u001b[0m: tuple index out of range" + ] + } + ], + "source": [ + "product()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "Another downside of this implementation is that we can easily generate *semantic* errors: For example, if we pass in an iterable object like the `one_hundred` list, *no* exception is raised. However, the return value is also not a numeric object as we expect. The reason for this is that during the function call, `args` becomes a `tuple` object holding *one* element, which is `one_hundred`, a `list` object. So, we created a nested structure by accident." + ] + }, + { + "cell_type": "code", + "execution_count": 200, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [], + "source": [ + "one_hundred = [2, 5, 10]" + ] + }, + { + "cell_type": "code", + "execution_count": 201, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "[2, 5, 10]" + ] + }, + "execution_count": 201, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "product(one_hundred)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "This error does not occur if we unpack `one_hundred` upon passing it as the argument." + ] + }, + { + "cell_type": "code", + "execution_count": 202, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "100" + ] + }, + "execution_count": 202, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "product(*one_hundred)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "That is the equivalent of writing out the following tedious expression. Yet, that does *not* scale for iterables with many elements in them." + ] + }, + { + "cell_type": "code", + "execution_count": 203, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "100" + ] + }, + "execution_count": 203, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "product(one_hundred[0], one_hundred[1], one_hundred[2])" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "While we needed to unpack `one_hundred` above to avoid the semantic error, unpacking an argument in a function call may also be a convenience in general.\n", + "\n", + "For example, to print the elements of `one_hundred` in one line, we need to use a `for` statement, until now. With unpacking, we get away *without* a loop." + ] + }, + { + "cell_type": "code", + "execution_count": 204, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[2, 5, 10]\n" + ] + } + ], + "source": [ + "print(one_hundred) # prints the tuple; we do not want that" + ] + }, + { + "cell_type": "code", + "execution_count": 205, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "2 5 10 " + ] + } + ], + "source": [ + "for number in one_hundred:\n", + " print(number, end=\" \")" + ] + }, + { + "cell_type": "code", + "execution_count": 206, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "2 5 10\n" + ] + } + ], + "source": [ + "print(*one_hundred)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "## The `namedtuple` Type" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "Above, we proposed the idea that `tuple` objects are like \"immutable lists.\" Often, however, we use `tuple` objects to represent a **record** of related **fields**. Then, each element has a *semantic* meaning (i.e., a descriptive name).\n", + "\n", + "As an example, think of a spreadsheet with information on students in a course. Each row represents a record and holds all the data associated with an individual student. The columns (e.g., matriculation number, first name, last name) are the fields that may come as *different* data types (e.g., `int` for the matriculation number, `str` for the names).\n", + "\n", + "A simple way of modeling a single student is as a `tuple` object, for example, `(123456, \"John\", \"Doe\")`. A disadvantage of this approach is that we must remember the order and meaning of the elements/fields in the `tuple` object.\n", + "\n", + "An example from a different domain is the representation of $(x, y)$-points in the $x$-$y$-plane. Again, we could use a `tuple` object like `current_position` below to model the point $(4, 2)$." + ] + }, + { + "cell_type": "code", + "execution_count": 207, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [], + "source": [ + "current_position = (4, 2)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "We implicitly assume that the first element represents the $x$ and the second the $y$ coordinate. While that follows intuitively from convention in math, we should at least add comments somewhere in the code to document this assumption.\n", + "\n", + "A better way is to create a *custom* data type. While that is covered in depth in Chapter 9, the [collections](https://docs.python.org/3/library/collections.html) module in the [standard library](https://docs.python.org/3/library/index.html) provides a [namedtuple()](https://docs.python.org/3/library/collections.html#collections.namedtuple) **factory function** that creates \"simple\" custom data types on top of the standard `tuple` type." + ] + }, + { + "cell_type": "code", + "execution_count": 208, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "from collections import namedtuple" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "[namedtuple()](https://docs.python.org/3/library/collections.html#collections.namedtuple) takes two arguments. The first argument is the name of the data type. That could be different from the variable `Point` we use to refer to the new type, but in most cases it is best to keep them in sync. The second argument is a sequence with the field names as `str` objects. The names' order corresponds to the one assumed in `current_position`." + ] + }, + { + "cell_type": "code", + "execution_count": 209, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [], + "source": [ + "Point = namedtuple(\"Point\", [\"x\", \"y\"])" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "The `Point` object is a so-called **class**. That is what it means if an object is of type `type`. It can be used as a **factory** to create *new* `tuple`-like objects of type `Point`." + ] + }, + { + "cell_type": "code", + "execution_count": 210, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "140321614839000" + ] + }, + "execution_count": 210, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "id(Point) # classes are objects as well" + ] + }, + { + "cell_type": "code", + "execution_count": 211, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "type" + ] + }, + "execution_count": 211, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "type(Point)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "To create a `Point` object, we use the same *literal syntax* as for `current_position` above and prepend it with `Point`." + ] + }, + { + "cell_type": "code", + "execution_count": 212, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [], + "source": [ + "current_position = Point(4, 2)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "Now, `current_position` has a somewhat nicer representation. In particular, the coordinates are named `x` and `y`." + ] + }, + { + "cell_type": "code", + "execution_count": 213, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "Point(x=4, y=2)" + ] + }, + "execution_count": 213, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "current_position" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "It is *not* a `tuple` any more but an object of type `Point`." + ] + }, + { + "cell_type": "code", + "execution_count": 214, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "140322025282656" + ] + }, + "execution_count": 214, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "id(current_position)" + ] + }, + { + "cell_type": "code", + "execution_count": 215, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "__main__.Point" + ] + }, + "execution_count": 215, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "type(current_position)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "We use the dot operator `.` to access the defined attributes." + ] + }, + { + "cell_type": "code", + "execution_count": 216, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "4" + ] + }, + "execution_count": 216, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "current_position.x" + ] + }, + { + "cell_type": "code", + "execution_count": 217, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "2" + ] + }, + "execution_count": 217, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "current_position.y" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "As before, we get an `AttributeError` if we try to access an undefined attribute." + ] + }, + { + "cell_type": "code", + "execution_count": 218, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [ + { + "ename": "AttributeError", + "evalue": "'Point' object has no attribute 'z'", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mAttributeError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mcurrent_position\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mz\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;31mAttributeError\u001b[0m: 'Point' object has no attribute 'z'" + ] + } + ], + "source": [ + "current_position.z" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "`current_position` continues to work like a `tuple` object! That is why we can use `namedtuple` as a replacement for `tuple`. The underlying implementations exhibit the *same* computational efficiencies and memory usages.\n", + "\n", + "For example, we can index into or loop over `current_position` as it is still a sequence." + ] + }, + { + "cell_type": "code", + "execution_count": 219, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 219, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "isinstance(current_position, abc.Sequence)" + ] + }, + { + "cell_type": "code", + "execution_count": 220, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "4" + ] + }, + "execution_count": 220, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "current_position[0]" + ] + }, + { + "cell_type": "code", + "execution_count": 221, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "2" + ] + }, + "execution_count": 221, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "current_position[1]" + ] + }, + { + "cell_type": "code", + "execution_count": 222, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "4\n", + "2\n" + ] + } + ], + "source": [ + "for number in current_position:\n", + " print(number)" + ] + }, + { + "cell_type": "code", + "execution_count": 223, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "2\n", + "4\n" + ] + } + ], + "source": [ + "for number in reversed(current_position):\n", + " print(number)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "## The Map-Filter-Reduce Paradigm" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "Whenever we process sequential data, most tasks can be classified into one of the three categories **map**, **filter**, or **reduce**. This paradigm has caught attention in recent years as it enables **[parallel computing](https://en.wikipedia.org/wiki/Parallel_computing)**, and this gets important when dealing with big amounts of data.\n", + "\n", + "Let's look at a simple example." + ] + }, + { + "cell_type": "code", + "execution_count": 224, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [], + "source": [ + "numbers = [7, 11, 8, 5, 3, 12, 2, 6, 9, 10, 1, 4]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "### Mapping" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "**Mapping** refers to the idea of applying a transformation to every element in a sequence.\n", + "\n", + "For example, let's square each element in `numbers` and add `1` to it. In essence, we apply the transformation $y := x^2 + 1$ expressed as the `transform()` function below." + ] + }, + { + "cell_type": "code", + "execution_count": 225, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [], + "source": [ + "def transform(element):\n", + " \"\"\"Map elements to their squares plus 1.\"\"\"\n", + " return (element ** 2) + 1" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "With the syntax we know so far, we revert to a `for`-loop that iteratively appends the transformed elements to the initially empty `transformed_numbers` list." + ] + }, + { + "cell_type": "code", + "execution_count": 226, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "transformed_numbers = []\n", + "\n", + "for old in numbers:\n", + " new = transform(old)\n", + " transformed_numbers.append(new)" + ] + }, + { + "cell_type": "code", + "execution_count": 227, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "[50, 122, 65, 26, 10, 145, 5, 37, 82, 101, 2, 17]" + ] + }, + "execution_count": 227, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "transformed_numbers" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "As this kind of data processing is so common, Python provides the [map()](https://docs.python.org/3/library/functions.html#map) built-in. In its simplest usage form, it takes two arguments: A transformation function that takes one positional argument and an iterable.\n", + "\n", + "We call [map()](https://docs.python.org/3/library/functions.html#map) with the `transform()` function and the `numbers` list as the arguments and store the result in the variable `transformer` to inspect it." + ] + }, + { + "cell_type": "code", + "execution_count": 228, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [], + "source": [ + "transformer = map(transform, numbers)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "We might expect to get back a materialized sequence (i.e., all elements exist in memory), and a `list` object would feel the most natural because of the type of the `numbers` argument. However, `transformer` is an object of type `map`." + ] + }, + { + "cell_type": "code", + "execution_count": 229, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 229, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "transformer" + ] + }, + { + "cell_type": "code", + "execution_count": 230, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "map" + ] + }, + "execution_count": 230, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "type(transformer)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "Like `range` objects, `map` objects generate a series of objects \"on the fly\" (i.e., one by one), and we use the built-in [next()](https://docs.python.org/3/library/functions.html#next) function to obtain the next object in line. So, we should think of a `map` object as a \"rule\" stored in memory that only knows how to calculate the next object of possibly *infinitely* many.\n", + "\n", + "It is essential to understand that by creating a `map` object with the [map()](https://docs.python.org/3/library/functions.html#map) built-in, nothing happens in memory except the creation of the `map` object. In particular, no second `list` object derived from `numbers` is created. Also, we may view `range` objects as a special case of `map` objects: They are constrained to generating `int` objects only, and the *iterable* argument is replaced with *start*, *stop*, and *step* arguments." + ] + }, + { + "cell_type": "code", + "execution_count": 231, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "50" + ] + }, + "execution_count": 231, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "next(transformer)" + ] + }, + { + "cell_type": "code", + "execution_count": 232, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "122" + ] + }, + "execution_count": 232, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "next(transformer)" + ] + }, + { + "cell_type": "code", + "execution_count": 233, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "65" + ] + }, + "execution_count": 233, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "next(transformer)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "If we are sure that a `map` object generates a *finite* number of elements, we may materialize them into a `list` object with the [list()](https://docs.python.org/3/library/functions.html#func-list) built-in. In the example, this is the case as `transformer` is derived from a *finite* `list` object.\n", + "\n", + "In summary, instead of creating an empty list first and appending it in a `for`-loop as above, we write the following one-liner and obtain an equal `transformed_numbers` list." + ] + }, + { + "cell_type": "code", + "execution_count": 234, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [], + "source": [ + "transformed_numbers = list(map(transform, numbers))" + ] + }, + { + "cell_type": "code", + "execution_count": 235, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "[50, 122, 65, 26, 10, 145, 5, 37, 82, 101, 2, 17]" + ] + }, + "execution_count": 235, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "transformed_numbers" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "### Filtering" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "**Filtering** refers to the idea of creating a subset of a sequence with a **boolean filter** function that indicates if an element should be kept or not.\n", + "\n", + "In the example, let's only keep the even elements in `numbers`. The `is_even()` function implements that as a filter." + ] + }, + { + "cell_type": "code", + "execution_count": 236, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [], + "source": [ + "def is_even(element):\n", + " \"\"\"Filter out odd numbers.\"\"\"\n", + " if element % 2 == 0:\n", + " return True\n", + " return False" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "As before, we must revert to a `for`-loop that appends the elements to be kept iteratively to an initially empty `even_numbers` list." + ] + }, + { + "cell_type": "code", + "execution_count": 237, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "even_numbers = []\n", + "\n", + "for number in transformed_numbers:\n", + " if is_even(number):\n", + " even_numbers.append(number)" + ] + }, + { + "cell_type": "code", + "execution_count": 238, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "[50, 122, 26, 10, 82, 2]" + ] + }, + "execution_count": 238, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "even_numbers" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "As filtering is also a common task, we use the [filter()](https://docs.python.org/3/library/functions.html#filter) built-in that returns an object of type `filter` stored in the `evens` variable." + ] + }, + { + "cell_type": "code", + "execution_count": 239, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [], + "source": [ + "evens = filter(is_even, transformed_numbers)" + ] + }, + { + "cell_type": "code", + "execution_count": 240, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 240, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "evens" + ] + }, + { + "cell_type": "code", + "execution_count": 241, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "filter" + ] + }, + "execution_count": 241, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "type(evens)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "`evens` works like `transformer` above, and we use the built-in [next()](https://docs.python.org/3/library/functions.html#next) function to obtain the even numbers one by one. So, the \"next\" element in line is simply the next even `int` object the `filter` object encounters." + ] + }, + { + "cell_type": "code", + "execution_count": 242, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "[50, 122, 65, 26, 10, 145, 5, 37, 82, 101, 2, 17]" + ] + }, + "execution_count": 242, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "transformed_numbers # for quick reference" + ] + }, + { + "cell_type": "code", + "execution_count": 243, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "50" + ] + }, + "execution_count": 243, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "next(evens)" + ] + }, + { + "cell_type": "code", + "execution_count": 244, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "122" + ] + }, + "execution_count": 244, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "next(evens)" + ] + }, + { + "cell_type": "code", + "execution_count": 245, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "26" + ] + }, + "execution_count": 245, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "next(evens)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "As above, we must explicitly create a materialized `list` object with the [list()](https://docs.python.org/3/library/functions.html#func-list) built-in." + ] + }, + { + "cell_type": "code", + "execution_count": 246, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "[50, 122, 26, 10, 82, 2]" + ] + }, + "execution_count": 246, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "list(filter(is_even, transformed_numbers))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "We may also chain mappings and filters based on the original `numbers` list." + ] + }, + { + "cell_type": "code", + "execution_count": 247, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "[50, 122, 26, 10, 82, 2]" + ] + }, + "execution_count": 247, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "list(filter(is_even, map(transform, numbers)))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "Using the [map()](https://docs.python.org/3/library/functions.html#map) and [filter()](https://docs.python.org/3/library/functions.html#filter) built-ins, we can quickly switch the order: Filter first and then transform the remaining elements. This variant equals the \"*A simple Filter*\" example from [Chapter 4](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/04_iteration.ipynb#Example:-A-simple-Filter). On the contrary, code with `for`-loops and `if` statements is more tedious to adapt. Additionally, `map` and `filter` objects are optimized at the C level and, therefore, a lot faster as well." + ] + }, + { + "cell_type": "code", + "execution_count": 248, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "[65, 145, 5, 37, 101, 17]" + ] + }, + "execution_count": 248, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "list(map(transform, filter(is_even, numbers)))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "### Reducing" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "Lastly, **reducing** sequential data means to summarize the elements into a single statistic.\n", + "\n", + "A simple example is the built-in [sum()](https://docs.python.org/3/library/functions.html#sum) function." + ] + }, + { + "cell_type": "code", + "execution_count": 249, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "370" + ] + }, + "execution_count": 249, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "sum(map(transform, filter(is_even, numbers)))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "Other straightforward examples are the built-in [min()](https://docs.python.org/3/library/functions.html#min) or [max()](https://docs.python.org/3/library/functions.html#max) functions." + ] + }, + { + "cell_type": "code", + "execution_count": 250, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "5" + ] + }, + "execution_count": 250, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "min(map(transform, filter(is_even, numbers)))" + ] + }, + { + "cell_type": "code", + "execution_count": 251, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "145" + ] + }, + "execution_count": 251, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "max(map(transform, filter(is_even, numbers)))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "[sum()](https://docs.python.org/3/library/functions.html#sum), [min()](https://docs.python.org/3/library/functions.html#min), and [max()](https://docs.python.org/3/library/functions.html#max) can be regarded as special cases.\n", + "\n", + "The generic way of reducing a sequence is to apply a function of *two* arguments on a rolling horizon: Its first argument is the reduction of the elements processed so far, and the second the next element to be reduced.\n", + "\n", + "For illustration, let's replicate [sum()](https://docs.python.org/3/library/functions.html#sum) as such a function, called `add()`. Its implementation only adds two numbers." + ] + }, + { + "cell_type": "code", + "execution_count": 252, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [], + "source": [ + "def add(sum_so_far, next_number):\n", + " \"\"\"Reduce a sequence by addition.\"\"\"\n", + " return sum_so_far + next_number" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "Further, we create a *new* `map` object derived from `numbers` ..." + ] + }, + { + "cell_type": "code", + "execution_count": 253, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "evens_transformed = map(transform, filter(is_even, numbers))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "... and loop over all *but* the first element it generates. That we capture separately as the initial `result` with the [next()](https://docs.python.org/3/library/functions.html#next) function. So, `map` objects must be *iterable* as we may loop over them.\n", + "\n", + "We know that `evens_transformed` generates *six* elements. That is why we see *five* growing `result` values resembling a [cumulative sum](http://mathworld.wolfram.com/CumulativeSum.html)." + ] + }, + { + "cell_type": "code", + "execution_count": 254, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "65 210 215 252 353 " + ] + } + ], + "source": [ + "result = next(evens_transformed) # first element is the initial value\n", + "\n", + "for number in evens_transformed: # iterate over the remaining elements\n", + " print(result, end=\" \") # line added for didactical purposes\n", + " result = add(result, number)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "The final `result` is the same `370` as above." + ] + }, + { + "cell_type": "code", + "execution_count": 255, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "370" + ] + }, + "execution_count": 255, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "result" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "The [reduce()](https://docs.python.org/3/library/functools.html#functools.reduce) function in the [functools](https://docs.python.org/3/library/functools.html) module in the [standard library](https://docs.python.org/3/library/index.html) provides more convenience replacing the `for`-loop. It takes two arguments in the same way as the [map()](https://docs.python.org/3/library/functions.html#map) and [filter()](https://docs.python.org/3/library/functions.html#filter) built-ins.\n", + "\n", + "[reduce()](https://docs.python.org/3/library/functools.html#functools.reduce) is **[eager](https://en.wikipedia.org/wiki/Eager_evaluation)** meaning that all computations implied by the contained `map` and `filter` \"rules\" are executed right away, and the code cell returns `370`. On the contrary, [map()](https://docs.python.org/3/library/functions.html#map) and [filter()](https://docs.python.org/3/library/functions.html#filter) create **[lazy](https://en.wikipedia.org/wiki/Lazy_evaluation)** `map` and `filter` objects, and we have to use the [next()](https://docs.python.org/3/library/functions.html#next) function to obtain the elements." + ] + }, + { + "cell_type": "code", + "execution_count": 256, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [], + "source": [ + "from functools import reduce" + ] + }, + { + "cell_type": "code", + "execution_count": 257, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "370" + ] + }, + "execution_count": 257, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "reduce(add, map(transform, filter(is_even, numbers)))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "### Lambda Expressions" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "[map()](https://docs.python.org/3/library/functions.html#map), [filter()](https://docs.python.org/3/library/functions.html#filter), and [reduce()](https://docs.python.org/3/library/functools.html#functools.reduce) take a `function` object as their first argument, and we defined `transform()`, `is_even()`, and `add()` to be used precisely for that.\n", + "\n", + "Often, such functions are used *only once* in a program. However, the primary purpose of functions is to *re-use* them. In such cases, it makes more sense to define them \"anonymously\" right at the position where the first argument goes.\n", + "\n", + "As mentioned in [Chapter 2](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/02_functions.ipynb#Anonymous-Functions), Python provides `lambda` expressions to create `function` objects *without* a variable pointing to them.\n", + "\n", + "So, the above `add()` function could be rewritten as a `lambda` expression like so ..." + ] + }, + { + "cell_type": "code", + "execution_count": 258, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "(sum_so_far, next_number)>" + ] + }, + "execution_count": 258, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "lambda sum_so_far, next_number: sum_so_far + next_number" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "... or even shorter." + ] + }, + { + "cell_type": "code", + "execution_count": 259, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "(x, y)>" + ] + }, + "execution_count": 259, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "lambda x, y: x + y" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "With the new concepts in this section, we can rewrite the entire example in just a few lines of code *without* any `for`, `if`, and `def` statements. The resulting code is concise, easy to read, quick to modify, and even faster in execution. Most importantly, it is optimized to handle big amounts of data as *no* temporary `list` objects are materialized in memory." + ] + }, + { + "cell_type": "code", + "execution_count": 260, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "370" + ] + }, + "execution_count": 260, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "numbers = [7, 11, 8, 5, 3, 12, 2, 6, 9, 10, 1, 4]\n", + "evens = filter(lambda x: x % 2 == 0, numbers)\n", + "transformed = map(lambda x: (x ** 2) + 1, evens)\n", + "sum(transformed)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "If `numbers` comes as a sorted sequence of whole numbers, we may use the [range()](https://docs.python.org/3/library/functions.html#func-range) built-in and get away *without any* `list` object in memory at all!" + ] + }, + { + "cell_type": "code", + "execution_count": 261, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "370" + ] + }, + "execution_count": 261, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "numbers = range(1, 13)\n", + "evens = filter(lambda x: x % 2 == 0, numbers)\n", + "transformed = map(lambda x: (x ** 2) + 1, evens)\n", + "sum(transformed)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "To additionally save the temporary variables, `numbers`, `evens`, and `transformed`, we write the entire computation as *one* expression." + ] + }, + { + "cell_type": "code", + "execution_count": 262, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "370" + ] + }, + "execution_count": 262, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "sum(map(lambda x: (x ** 2) + 1, filter(lambda x: x % 2 == 0, range(1, 13))))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "PythonTutor visualizes the differences in the number of computational steps and memory usage:\n", + "- [Version 1](http://pythontutor.com/visualize.html#code=def%20is_even%28element%29%3A%0A%20%20%20%20if%20element%20%25%202%20%3D%3D%200%3A%0A%20%20%20%20%20%20%20%20return%20True%0A%20%20%20%20return%20False%0A%0Adef%20transform%28element%29%3A%0A%20%20%20%20return%20%28element%20**%202%29%20%2B%201%0A%0Anumbers%20%3D%20list%28range%281,%2013%29%29%0A%0Aevens%20%3D%20%5B%5D%0Afor%20number%20in%20numbers%3A%0A%20%20%20%20if%20is_even%28number%29%3A%0A%20%20%20%20%20%20%20%20evens.append%28number%29%0A%0Atransformed%20%3D%20%5B%5D%0Afor%20number%20in%20evens%3A%0A%20%20%20%20transformed.append%28transform%28number%29%29%0A%0Aresult%20%3D%20sum%28transformed%29&cumulative=false&curInstr=0&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false): With `for`-loops, `if` statements, and named functions -> **116** steps\n", + "- [Version 2](http://pythontutor.com/visualize.html#code=numbers%20%3D%20range%281,%2013%29%0Aevens%20%3D%20filter%28lambda%20x%3A%20x%20%25%202%20%3D%3D%200,%20numbers%29%0Atransformed%20%3D%20map%28lambda%20x%3A%20%28x%20**%202%29%20%2B%201,%20evens%29%0Aresult%20%3D%20sum%28transformed%29&cumulative=false&curInstr=0&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false): With named `map` and `filter` objects -> **58** steps\n", + "- [Version 3](http://pythontutor.com/visualize.html#code=result%20%3D%20sum%28map%28lambda%20x%3A%20%28x%20**%202%29%20%2B%201,%20filter%28lambda%20x%3A%20x%20%25%202%20%3D%3D%200,%20range%281,%2013%29%29%29%29&cumulative=false&curInstr=0&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false): Everything in *one* expression -> **55** steps\n", + "\n", + "Versions 2 and 3 are the same, except for the three additional steps required to create the temporary variables. The *major* downside of Version 1 is that, in the worst case, it may need *three times* the memory as compared to the other two versions!\n", + "\n", + "An experienced Pythonista would probably go with Version 2 in a production system to keep the code readable and maintainable." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "### List Comprehensions" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "For [map()](https://docs.python.org/3/library/functions.html#map) and [filter()](https://docs.python.org/3/library/functions.html#filter), Python provides a nice syntax appealing to people who like mathematics.\n", + "\n", + "Consider again the \"*A simple Filter*\" example from [Chapter 4](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/04_iteration.ipynb#Example:-A-simple-Filter), written with combined `for` and `if` statements. So, the mapping and filtering steps happen simultaneously." + ] + }, + { + "cell_type": "code", + "execution_count": 263, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [], + "source": [ + "numbers = [7, 11, 8, 5, 3, 12, 2, 6, 9, 10, 1, 4]" + ] + }, + { + "cell_type": "code", + "execution_count": 264, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [], + "source": [ + "evens_transformed = []\n", + "\n", + "for number in numbers:\n", + " if number % 2 == 0:\n", + " evens_transformed.append((number ** 2) + 1)" + ] + }, + { + "cell_type": "code", + "execution_count": 265, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "[65, 145, 5, 37, 101, 17]" + ] + }, + "execution_count": 265, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "evens_transformed" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "**List comprehensions**, or **list-comps** for short, are *expressions* to derive *new* `list` objects out of *existing* ones (cf., [reference](https://docs.python.org/3/reference/expressions.html#displays-for-lists-sets-and-dictionaries)). A single *expression* like below can replace the compound `for` *statement* from above." + ] + }, + { + "cell_type": "code", + "execution_count": 266, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "[65, 145, 5, 37, 101, 17]" + ] + }, + "execution_count": 266, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "[(n ** 2) + 1 for n in numbers if n % 2 == 0]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "A list comprehension may be used in place of any `list` object.\n", + "\n", + "For example, let's add up all the elements with [sum()](https://docs.python.org/3/library/functions.html#sum). The code below *materializes* all elements in memory *before* summing them up. So, this code might cause a `MemoryError` when executed with a bigger `numbers` list. [PythonTutor](http://pythontutor.com/visualize.html#code=numbers%20%3D%20range%281,%2013%29%0Aresult%20%3D%20sum%28%5B%28n%20**%202%29%20%2B%201%20for%20n%20in%20numbers%20if%20n%20%25%202%20%3D%3D%200%5D%29&cumulative=false&curInstr=0&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false) shows how a `list` object exists in memory at step 17 and then \"gets lost\" right after. " + ] + }, + { + "cell_type": "code", + "execution_count": 267, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "370" + ] + }, + "execution_count": 267, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "sum([(n ** 2) + 1 for n in numbers if n % 2 == 0])" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "#### Example: Nested Lists" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "List comprehensions may come with several `for`'s and `if`'s.\n", + "\n", + "The cell below creates a `list` object that contains other `list` objects with numbers in them. The starting number in each inner `list` object is offset by `1`." + ] + }, + { + "cell_type": "code", + "execution_count": 268, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [], + "source": [ + "nested_numbers = [list(range(x, y + 1)) for x, y in zip([1, 2, 3], [7, 8, 9])]" + ] + }, + { + "cell_type": "code", + "execution_count": 269, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "[[1, 2, 3, 4, 5, 6, 7], [2, 3, 4, 5, 6, 7, 8], [3, 4, 5, 6, 7, 8, 9]]" + ] + }, + "execution_count": 269, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "nested_numbers" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "To do something meaningful with the numbers, we have to get rid of the inner layer of `list` objects and flatten the data.\n", + "\n", + "Without list comprehensions, we would probably write two nested `for`-loops." + ] + }, + { + "cell_type": "code", + "execution_count": 270, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [], + "source": [ + "flat_numbers = []\n", + "\n", + "for inner_numbers in nested_numbers:\n", + " for number in inner_numbers:\n", + " flat_numbers.append(number)" + ] + }, + { + "cell_type": "code", + "execution_count": 271, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "[1, 2, 3, 4, 5, 6, 7, 2, 3, 4, 5, 6, 7, 8, 3, 4, 5, 6, 7, 8, 9]" + ] + }, + "execution_count": 271, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "flat_numbers" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "That translates into a list comprehension like below. The order of the `for`'s may be confusing at first but is the *same* as writing out the nested `for`-loops." + ] + }, + { + "cell_type": "code", + "execution_count": 272, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "[1, 2, 3, 4, 5, 6, 7, 2, 3, 4, 5, 6, 7, 8, 3, 4, 5, 6, 7, 8, 9]" + ] + }, + "execution_count": 272, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "[number for inner_numbers in nested_numbers for number in inner_numbers]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "Now, we may use the `list` object resulting from the list comprehension in any context we want.\n", + "\n", + "As an example, we add up the flattened numbers with [sum()](https://docs.python.org/3/library/functions.html#sum). The same caveat holds as before: The `list` object passed into [sum()](https://docs.python.org/3/library/functions.html#sum) is *materialized* before the sum is calculated!" + ] + }, + { + "cell_type": "code", + "execution_count": 273, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "105" + ] + }, + "execution_count": 273, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "sum([number for inner_numbers in nested_numbers for number in inner_numbers])" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "In this particular example, however, we can exploit the fact that any sum of numbers can be expressed as the sum of sums of mutually exclusive and collectively exhaustive subsets of these numbers and get away with just *one* `for` in the list comprehension." + ] + }, + { + "cell_type": "code", + "execution_count": 274, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "105" + ] + }, + "execution_count": 274, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "sum([sum(inner_numbers) for inner_numbers in nested_numbers])" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "#### Example: Cartesian Products" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "A popular use case of nested list comprehensions is applying a transformation to each $2$-tuple of the [Cartesian product](https://en.wikipedia.org/wiki/Cartesian_product) of two iterables.\n", + "\n", + "For example, let's add `1` to each quotient obtained by taking the numerator from `[10, 20, 30]` and the denominator from `[40, 50, 60]`, and then find the product of all quotients. The table below visualizes the calculations: The result is the product of *nine* entries.\n", + "\n", + "||**10**|**20**|**30**|\n", + "|-|-|-|-|\n", + "|**40**|1.25|1.50|1.75|\n", + "|**50**|1.20|1.40|1.60|\n", + "|**60**|1.17|1.33|1.50|\n", + "\n", + "To express that in Python, we start by creating two `list` objects, `first` and `second`." + ] + }, + { + "cell_type": "code", + "execution_count": 275, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [], + "source": [ + "first = [10, 20, 30]\n", + "second = [40, 50, 60]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "For a Cartesian product, we loop over *all* possible $2$-tuples where one element is drawn from `first` and the other from `second`. That is equivalent to two nested `for`-loops." + ] + }, + { + "cell_type": "code", + "execution_count": 276, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "[1.25, 1.2, 1.1666666666666667, 1.5, 1.4, 1.3333333333333333, 1.75, 1.6, 1.5]" + ] + }, + "execution_count": 276, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "cartesian_product = []\n", + "\n", + "for numerator in first:\n", + " for denominator in second:\n", + " quotient = numerator / denominator\n", + " cartesian_product.append(quotient + 1)\n", + "\n", + "cartesian_product" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "We translate the two `for`-loops into one list comprehensions with two `for`'s in it and use `x` and `y` as shorter variable names." + ] + }, + { + "cell_type": "code", + "execution_count": 277, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "[1.25, 1.2, 1.1666666666666667, 1.5, 1.4, 1.3333333333333333, 1.75, 1.6, 1.5]" + ] + }, + "execution_count": 277, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "[(x / y) + 1 for x in first for y in second]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "The order of the `for`'s is *important*: The list comprehension above divides numbers from `first` by numbers from `second`, whereas the list comprehension below does the opposite." + ] + }, + { + "cell_type": "code", + "execution_count": 278, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "[5.0, 3.0, 2.333333333333333, 6.0, 3.5, 2.666666666666667, 7.0, 4.0, 3.0]" + ] + }, + "execution_count": 278, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "[(x / y) + 1 for x in second for y in first]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "To find the overall product, we *unpack* the first list comprehension right into the `product()` function from the \"*Packing & Unpacking*\" section above." + ] + }, + { + "cell_type": "code", + "execution_count": 279, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "20.58" + ] + }, + "execution_count": 279, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "product(*[(x / y) + 1 for x in first for y in second])" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "Alternatively, we use a `lambda` expression with the [reduce()](https://docs.python.org/3/library/functools.html#functools.reduce) function from the [functools](https://docs.python.org/3/library/functools.html) module." + ] + }, + { + "cell_type": "code", + "execution_count": 280, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "20.58" + ] + }, + "execution_count": 280, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "reduce(lambda x, y: x * y, [(x / y) + 1 for x in first for y in second])" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "While this example is stylized, Cartesian products are hidden in many applications, and it shows how the various language features introduced in this chapter can be seamlessly combined to process sequential data." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "### Generator Expressions" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "Pythonistas would forgo materialized `list` objects, and, thus, also list comprehensions, all together, and use a more memory-efficient approach with **[generator expressions](https://docs.python.org/3/reference/expressions.html#generator-expressions)**, or **genexps** for short. Syntactically, they work like list comprehensions except that parentheses replace the brackets.\n", + "\n", + "Let's go back to the original example in this section and find the transformation $y := x^2 + 1$ of all even elements in `numbers`." + ] + }, + { + "cell_type": "code", + "execution_count": 281, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [], + "source": [ + "numbers = [7, 11, 8, 5, 3, 12, 2, 6, 9, 10, 1, 4]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "To filter and transform `numbers`, we wrote this list comprehension above ..." + ] + }, + { + "cell_type": "code", + "execution_count": 282, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "[65, 145, 5, 37, 101, 17]" + ] + }, + "execution_count": 282, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "[(n ** 2) + 1 for n in numbers if n % 2 == 0]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "... that now becomes a generator expression." + ] + }, + { + "cell_type": "code", + "execution_count": 283, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + " at 0x7f9f447618b8>" + ] + }, + "execution_count": 283, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "((n ** 2) + 1 for n in numbers if n % 2 == 0)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "We can think of it as yet another \"rule\" in memory that knows how to generate the individual objects in a series one by one. Whereas a list comprehension materializes its elements in memory *when* it is evaluated, the opposite holds for generator expressions, and *no* object is created in memory except the \"rule\" itself. Because of this behavior, we describe generator expressions as *lazy* and list comprehensions as *eager*.\n", + "\n", + "So, to materialize all elements specified by a generator expression, we revert to the [list()](https://docs.python.org/3/library/functions.html#func-list) built-in." + ] + }, + { + "cell_type": "code", + "execution_count": 284, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "[65, 145, 5, 37, 101, 17]" + ] + }, + "execution_count": 284, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "list(((n ** 2) + 1 for n in numbers if n % 2 == 0))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "Whenever a generator expression is the only argument to a function, we may leave out the parentheses." + ] + }, + { + "cell_type": "code", + "execution_count": 285, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "[65, 145, 5, 37, 101, 17]" + ] + }, + "execution_count": 285, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "list((n ** 2) + 1 for n in numbers if n % 2 == 0)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "A common use case is to reduce the elements into a single object instead, for example, by adding them up with [sum()](https://docs.python.org/3/library/functions.html#sum). [PythonTutor](http://pythontutor.com/visualize.html#code=numbers%20%3D%20range%281,%2013%29%0Asum_with_list%20%3D%20sum%28%5B%28n%20**%202%29%20%2B%201%20for%20n%20in%20numbers%20if%20n%20%25%202%20%3D%3D%200%5D%29%0Asum_with_gen%20%3D%20sum%28%28n%20**%202%29%20%2B%201%20for%20n%20in%20numbers%20if%20n%20%25%202%20%3D%3D%200%29&cumulative=false&curInstr=0&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false) shows how the code cell below does *not* create a temporary `list` object in memory, whereas a list comprehension would (cf., step 17)." + ] + }, + { + "cell_type": "code", + "execution_count": 286, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "370" + ] + }, + "execution_count": 286, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "sum((n ** 2) + 1 for n in numbers if n % 2 == 0)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "Let's assign the object returned from a generator expression to a variable and inspect it." + ] + }, + { + "cell_type": "code", + "execution_count": 287, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [], + "source": [ + "gen = ((n ** 2) + 1 for n in numbers if n % 2 == 0)" + ] + }, + { + "cell_type": "code", + "execution_count": 288, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + " at 0x7f9f44761d68>" + ] + }, + "execution_count": 288, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "gen" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "Unsurprisingly, generator expressions create objects of type `generator`." + ] + }, + { + "cell_type": "code", + "execution_count": 289, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "generator" + ] + }, + "execution_count": 289, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "type(gen)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "With the [next()](https://docs.python.org/3/library/functions.html#next) function, we can retrieve the generated elements one by one." + ] + }, + { + "cell_type": "code", + "execution_count": 290, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "65" + ] + }, + "execution_count": 290, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "next(gen)" + ] + }, + { + "cell_type": "code", + "execution_count": 291, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "145" + ] + }, + "execution_count": 291, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "next(gen)" + ] + }, + { + "cell_type": "code", + "execution_count": 292, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "5" + ] + }, + "execution_count": 292, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "next(gen)" + ] + }, + { + "cell_type": "code", + "execution_count": 293, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "37" + ] + }, + "execution_count": 293, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "next(gen)" + ] + }, + { + "cell_type": "code", + "execution_count": 294, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "101" + ] + }, + "execution_count": 294, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "next(gen)" + ] + }, + { + "cell_type": "code", + "execution_count": 295, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "17" + ] + }, + "execution_count": 295, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "next(gen)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "Once a `generator` object runs out of elements, it raises a `StopIteration` exception. We say that the `generator` object is **exhausted**, and to loop over its elements again, we must create a *new* one." + ] + }, + { + "cell_type": "code", + "execution_count": 296, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [ + { + "ename": "StopIteration", + "evalue": "", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mStopIteration\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mnext\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mgen\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;31mStopIteration\u001b[0m: " + ] + } + ], + "source": [ + "next(gen)" + ] + }, + { + "cell_type": "code", + "execution_count": 297, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [ + { + "ename": "StopIteration", + "evalue": "", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mStopIteration\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mnext\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mgen\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;31mStopIteration\u001b[0m: " + ] + } + ], + "source": [ + "next(gen)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "Calling the [next()](https://docs.python.org/3/library/functions.html#next) function repeatedly with the *same* `generator` object as the argument is essentially what a `for`-loop automates for us. So, `generator` objects are *iterable*." + ] + }, + { + "cell_type": "code", + "execution_count": 298, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "65 145 5 37 101 17 " + ] + } + ], + "source": [ + "for number in ((n ** 2) + 1 for n in numbers if n % 2 == 0):\n", + " print(number, end=\" \")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "#### Example: Nested Lists (revisited)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "If we are only interested in a *reduction* of `nested_numbers` into a single statistic, as the overall sum in the \"*Nested Lists*\" example, we should replace lists or list comprehensions with generator expressions wherever possible.\n", + "\n", + "The result is the *same*, but no intermediate lists are materialized! That makes our code scale to larger amounts of data and uses the available hardware more efficiently.\n", + "\n", + "Let's adapt the example but keep `nested_numbers` unchanged for now." + ] + }, + { + "cell_type": "code", + "execution_count": 299, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [], + "source": [ + "nested_numbers = [list(range(x, y + 1)) for x, y in zip([1, 2, 3], [7, 8, 9])]" + ] + }, + { + "cell_type": "code", + "execution_count": 300, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "[[1, 2, 3, 4, 5, 6, 7], [2, 3, 4, 5, 6, 7, 8], [3, 4, 5, 6, 7, 8, 9]]" + ] + }, + "execution_count": 300, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "nested_numbers" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "We leave out the brackets and keep everything else as-is: The argument to [sum()](https://docs.python.org/3/library/functions.html#sum), a list comprehension in the initial implementation above, becomes a generator expression." + ] + }, + { + "cell_type": "code", + "execution_count": 301, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "105" + ] + }, + "execution_count": 301, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "sum(number for inner_numbers in nested_numbers for number in inner_numbers)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "That also holds for the alternative formulation as a sum of sums." + ] + }, + { + "cell_type": "code", + "execution_count": 302, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "105" + ] + }, + "execution_count": 302, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "sum(sum(inner_numbers) for inner_numbers in nested_numbers)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "Because `nested_numbers` has an internal structure, we can make it **memoryless** by expressing it as a generator expression derived from `range` objects. [PythonTutor](http://pythontutor.com/visualize.html#code=nested_numbers%20%3D%20%28%28range%28x,%20y%20%2B%201%29%29%20for%20x,%20y%20in%20zip%28range%281,%204%29,%20range%287,%2010%29%29%29%0Aresult%20%3D%20sum%28number%20for%20inner_numbers%20in%20nested_numbers%20for%20number%20in%20inner_numbers%29&cumulative=false&curInstr=0&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false) confirms that no `list` object materializes at any point in time." + ] + }, + { + "cell_type": "code", + "execution_count": 303, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [], + "source": [ + "nested_numbers = ((range(x, y + 1)) for x, y in zip(range(1, 4), range(7, 10)))" + ] + }, + { + "cell_type": "code", + "execution_count": 304, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + " at 0x7f9f44761e58>" + ] + }, + "execution_count": 304, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "nested_numbers" + ] + }, + { + "cell_type": "code", + "execution_count": 305, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "105" + ] + }, + "execution_count": 305, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "sum(number for inner_numbers in nested_numbers for number in inner_numbers)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "We must be careful when assigning a `generator` object to a variable: If we use `nested_numbers` again, for example, in the alternative formulation below, [sum()](https://docs.python.org/3/library/functions.html#sum) returns `0` because `nested_numbers` is exhausted after executing the previous code cell. [PythonTutor](http://pythontutor.com/visualize.html#code=nested_numbers%20%3D%20%28%28range%28x,%20y%20%2B%201%29%29%20for%20x,%20y%20in%20zip%28range%281,%204%29,%20range%287,%2010%29%29%29%0Aresult%20%3D%20sum%28number%20for%20inner_numbers%20in%20nested_numbers%20for%20number%20in%20inner_numbers%29%0Ano_result%20%3D%20sum%28sum%28inner_numbers%29%20for%20inner_numbers%20in%20nested_numbers%29&cumulative=false&curInstr=0&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false) also shows that." + ] + }, + { + "cell_type": "code", + "execution_count": 306, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "0" + ] + }, + "execution_count": 306, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "sum(sum(inner_numbers) for inner_numbers in nested_numbers)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "#### Example: Cartesian Products (revisited)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "Let's also rewrite the \"*Cartesian Products*\" example from above with generator expressions.\n", + "\n", + "As a first optimization, we replace the materialized `list` objects, `first` and `second`, with memoryless `range` objects." + ] + }, + { + "cell_type": "code", + "execution_count": 307, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [], + "source": [ + "first = range(10, 31, 10) # = [10, 20, 30]\n", + "second = range(40, 61, 10) # = [40, 50, 60]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "Now, the first of the two alternatives may be more appealing to many readers. In general, many practitioners seem to dislike `lambda` expressions.\n", + "\n", + "The code cell below *unpacks* the elements produced by `((x / y) + 1 for x in first for y in second)` into the `product()` function from the \"*Packing & Unpacking*\" section above. However, inside `product()`, the elements are *packed* into `args`, a materialized `tuple` object. So, all the memory efficiency gained with the generator expression is voided! [PythonTutor](http://pythontutor.com/visualize.html#code=def%20product%28*args%29%3A%0A%20%20%20%20result%20%3D%20args%5B0%5D%0A%20%20%20%20for%20arg%20in%20args%5B1%3A%5D%3A%0A%20%20%20%20%20%20%20%20result%20*%3D%20arg%0A%20%20%20%20return%20result%0A%0Afirst%20%3D%20range%2810,%2031,%2010%29%0Asecond%20%3D%20range%2840,%2061,%2010%29%0A%0Aresult%20%3D%20product%28*%28%28x%20/%20y%29%20%2B%201%20for%20x%20in%20first%20for%20y%20in%20second%29%29&cumulative=false&curInstr=0&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false) shows how a `tuple` object exists in steps 38-58." + ] + }, + { + "cell_type": "code", + "execution_count": 308, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "20.58" + ] + }, + "execution_count": 308, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "product(*((x / y) + 1 for x in first for y in second))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "On the contrary, the solution with the [reduce()](https://docs.python.org/3/library/functools.html#functools.reduce) function from the [functools](https://docs.python.org/3/library/functools.html) module and the `lambda` expression works *without* all elements materialized at the same time, and [PythonTutor](http://pythontutor.com/visualize.html#code=from%20functools%20import%20reduce%0A%0Afirst%20%3D%20range%2810,%2031,%2010%29%0Asecond%20%3D%20range%2840,%2061,%2010%29%0A%0Aresult%20%3D%20reduce%28%0A%20%20%20%20lambda%20x,%20y%3A%20x%20*%20y,%0A%20%20%20%20%28%28x%20/%20y%29%20%2B%201%20for%20x%20in%20first%20for%20y%20in%20second%29%0A%29&cumulative=false&curInstr=0&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false) confirms that. So, only the second alternative is truly memory-efficient." + ] + }, + { + "cell_type": "code", + "execution_count": 309, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "20.58" + ] + }, + "execution_count": 309, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "reduce(lambda x, y: x * y, ((x / y) + 1 for x in first for y in second))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "In summary, we learn from this example that unpacking generator expressions may be a *bad* idea." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "### Tuple Comprehensions" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "There is no syntax to derive *new* `tuple` objects out of existing ones. However, we can mimic such a construct by combining the [tuple()](https://docs.python.org/3/library/functions.html#func-tuple) built-in with a generator expression.\n", + "\n", + "So, to convert the list comprehension `[(n ** 2) + 1 for n in numbers if n % 2 == 0]` from above into a \"tuple comprehension,\" we write the following." + ] + }, + { + "cell_type": "code", + "execution_count": 310, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "(65, 145, 5, 37, 101, 17)" + ] + }, + "execution_count": 310, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "tuple((n ** 2) + 1 for n in numbers if n % 2 == 0)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "### Boolean Reducers" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "Besides [min()](https://docs.python.org/3/library/functions.html#min), [max()](https://docs.python.org/3/library/functions.html#max), and [sum()](https://docs.python.org/3/library/functions.html#sum), Python provides two boolean reduce functions: [all()](https://docs.python.org/3/library/functions.html#all) and [any()](https://docs.python.org/3/library/functions.html#any).\n", + "\n", + "Let's look at straightforward examples involving `numbers` again." + ] + }, + { + "cell_type": "code", + "execution_count": 311, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [], + "source": [ + "numbers = [7, 11, 8, 5, 3, 12, 2, 6, 9, 10, 1, 4]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "[all()](https://docs.python.org/3/library/functions.html#all) takes an *iterable* argument and returns `True` if *all* elements are *truthy*.\n", + "\n", + "For example, let's check if the square of each element in `numbers` is below `100` or `150`, respectively. We express the computation with a generator expression passed as the only argument to [all()](https://docs.python.org/3/library/functions.html#all)." + ] + }, + { + "cell_type": "code", + "execution_count": 312, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "False" + ] + }, + "execution_count": 312, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "all(x ** 2 < 100 for x in numbers)" + ] + }, + { + "cell_type": "code", + "execution_count": 313, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 313, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "all(x ** 2 < 150 for x in numbers)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "[all()](https://docs.python.org/3/library/functions.html#all) can be viewed as syntactic sugar replacing a `for`-loop: Internally, [all()](https://docs.python.org/3/library/functions.html#all) implements the *short-circuiting* strategy from [Chapter 3](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/03_conditionals.ipynb#Short-Circuiting), and we mimic that by testing for the *opposite* condition in the `if` statement and leaving the `for`-loop early with the `break` statement. In the worst case, if `threshold` were, for example, `150`, we would loop over *all* elements in the *iterable*, which must be *finite* for the code to work. So, [all()](https://docs.python.org/3/library/functions.html#all) is a *linear search* in disguise." + ] + }, + { + "cell_type": "code", + "execution_count": 314, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "False" + ] + }, + "execution_count": 314, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "threshold = 100\n", + "\n", + "for number in numbers:\n", + " if number ** 2 >= threshold: # = the opposite of what we are checking for\n", + " all_below_threshold = False\n", + " break\n", + "else:\n", + " all_below_threshold = True\n", + "\n", + "all_below_threshold" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "The documentation of [all()](https://docs.python.org/3/library/functions.html#all) shows in another way what it does with code: By placing a `return` statement inside the `for`-loop of a function's body, iteration is stopped prematurely once an element does *not* meet the condition. That is the familiar *early exit* pattern at work." + ] + }, + { + "cell_type": "code", + "execution_count": 315, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [], + "source": [ + "def all_alt(iterable):\n", + " \"\"\"Alternative implementation of the built-in all() function.\"\"\"\n", + " for element in iterable:\n", + " if not element: # = the opposite of what we are checking for\n", + " return False\n", + " return True" + ] + }, + { + "cell_type": "code", + "execution_count": 316, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "False" + ] + }, + "execution_count": 316, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "all_alt(x ** 2 < 100 for x in numbers)" + ] + }, + { + "cell_type": "code", + "execution_count": 317, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 317, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "all_alt(x ** 2 < 150 for x in numbers)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "Similarly, [any()](https://docs.python.org/3/library/functions.html#any) checks if *at least* one element in the *iterable* argument is *truthy*.\n", + "\n", + "To continue the example, let's check if the square of *any* element in `numbers` is above `100` or `150`, respectively." + ] + }, + { + "cell_type": "code", + "execution_count": 318, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 318, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "any(x ** 2 > 100 for x in numbers)" + ] + }, + { + "cell_type": "code", + "execution_count": 319, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "False" + ] + }, + "execution_count": 319, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "any(x ** 2 > 150 for x in numbers)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "Expressed with a `for`-loop, the implementation below reveals that [any()](https://docs.python.org/3/library/functions.html#any) follows the *short-circuiting* strategy as well. Here, we do *not* need to check for the opposite condition." + ] + }, + { + "cell_type": "code", + "execution_count": 320, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 320, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "threshold = 100\n", + "\n", + "for number in numbers:\n", + " if number ** 2 > threshold:\n", + " any_above_threshold = True\n", + " break\n", + "else:\n", + " any_above_threshold = False\n", + "\n", + "any_above_threshold" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "The alternative formulation in the documentation of [any()](https://docs.python.org/3/library/functions.html#any) is straightforward." + ] + }, + { + "cell_type": "code", + "execution_count": 321, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [], + "source": [ + "def any_alt(iterable):\n", + " \"\"\"Alternative implementation of the built-in any() function.\"\"\"\n", + " for element in iterable:\n", + " if element:\n", + " return True\n", + " return False" + ] + }, + { + "cell_type": "code", + "execution_count": 322, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 322, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "any_alt(x ** 2 > 100 for x in numbers)" + ] + }, + { + "cell_type": "code", + "execution_count": 323, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "False" + ] + }, + "execution_count": 323, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "any_alt(x ** 2 > 150 for x in numbers)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "### Example: Averaging Even Numbers (revisited)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "With the new concepts in this chapter, let's rewrite the book's introductory \"*Averaging Even Numbers*\" example in [Chapter 1](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/01_elements.ipynb#Example:-Averaging-Even-Numbers) such that it efficiently handles a large sequence of numbers.\n", + "\n", + "We assume the `average_evens()` function below is called with a *finite* and *iterable* object, which generates a \"stream\" of numeric objects that can be cast as `int` objects because the idea of even and odd numbers only makes sense in the context of whole numbers.\n", + "\n", + "The generator expression `(int(n) for n in numbers)` implements the type casting, and when it is evaluated, *nothing* happens except that a `generator` object is stored in `integers`. Then, with the [reduce()](https://docs.python.org/3/library/functools.html#functools.reduce) function from the [functools](https://docs.python.org/3/library/functools.html) module, we *simultaneously* add up *and* count the even numbers produced by the inner generator expression `((n, 1) for n in integers if n % 2 == 0)`. That results in a `generator` object producing `tuple` objects consisting of the next *even* number in line and `1`. Two such `tuple` objects are then iteratively passed to the `lambda` expression as the `x` and `y` arguments. `x` represents the total and the count of the even numbers processed so far, while `y`'s first element, `y[0]`, is the next even number to be added to the running total. The result of the [reduce()](https://docs.python.org/3/library/functools.html#functools.reduce) function is again a `tuple` object, namely the final `total` and `count`. Lastly, we calculate the simple average.\n", + "\n", + "In summary, the implementation of `average_evens()` does *not* keep materialized `list` objects internally like its predecessors from [Chapter 2](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/02_functions.ipynb), but processes the elements of the `numbers` argument on a one-by-one basis." + ] + }, + { + "cell_type": "code", + "execution_count": 324, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [], + "source": [ + "def average_evens(numbers):\n", + " \"\"\"Calculate the average of all even integers.\n", + "\n", + " Args:\n", + " numbers (iterable): a finite stream of numbers;\n", + " may be integers or floats; floats are truncated\n", + "\n", + " Returns:\n", + " float: average\n", + " \"\"\"\n", + " integers = (int(n) for n in numbers)\n", + " total, count = reduce(\n", + " lambda x, y: (x[0] + y[0], x[1] + y[1]),\n", + " ((n, 1) for n in integers if n % 2 == 0)\n", + " )\n", + " return total / count" + ] + }, + { + "cell_type": "code", + "execution_count": 325, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "7.0" + ] + }, + "execution_count": 325, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "average_evens([7, 11, 8, 5, 3, 12, 2, 6, 9, 10, 1, 4])" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "An argument generating `float` objects works as well." + ] + }, + { + "cell_type": "code", + "execution_count": 326, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "7.0" + ] + }, + "execution_count": 326, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "average_evens([7., 11., 8., 5., 3., 12., 2., 6., 9., 10., 1., 4.])" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "To show that `average_evens()` can process a **stream** of data, we simulate `10_000_000` randomly drawn integers between `0` and `100` with the [randint()](https://docs.python.org/3/library/random.html#random.randint) function from the [random](https://docs.python.org/3/library/random.html) module. We use a generator expression derived from a `range` object as the `numbers` argument. So, at *no* point in time is there a materialized `list` or `tuple` object in memory. The result approaching `50` tells us that [randint()](https://docs.python.org/3/library/random.html#random.randint) must be based on a uniform distribution." + ] + }, + { + "cell_type": "code", + "execution_count": 327, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [], + "source": [ + "import random" + ] + }, + { + "cell_type": "code", + "execution_count": 328, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [], + "source": [ + "random.seed(42)" + ] + }, + { + "cell_type": "code", + "execution_count": 329, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "49.994081434519636" + ] + }, + "execution_count": 329, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "average_evens(random.randint(0, 100) for _ in range(10_000_000))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "To show that `average_evens()` filters out odd numbers, we simulate another stream of `10_000_000` randomly drawn odd integers between `1` and `99`. As no function in the [random](https://docs.python.org/3/library/random.html) module does that \"out of the box,\" we must be creative: Doubling a number drawn from `random.randint(0, 49)` results in an even number between `0` and `98`, and adding `1` makes it odd. Then, `average_evens()` raises a `TypeError`, essentially because `(int(n) for n in numbers)` does not generate any element." + ] + }, + { + "cell_type": "code", + "execution_count": 330, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [ + { + "ename": "TypeError", + "evalue": "reduce() of empty sequence with no initial value", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0maverage_evens\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m2\u001b[0m \u001b[0;34m*\u001b[0m \u001b[0mrandom\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mrandint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m49\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m+\u001b[0m \u001b[0;36m1\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0m_\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mrange\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m10_000_000\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;32m\u001b[0m in \u001b[0;36maverage_evens\u001b[0;34m(numbers)\u001b[0m\n\u001b[1;32m 12\u001b[0m total, count = reduce(\n\u001b[1;32m 13\u001b[0m \u001b[0;32mlambda\u001b[0m \u001b[0mx\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0my\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0mx\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m+\u001b[0m \u001b[0my\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mx\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m+\u001b[0m \u001b[0my\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 14\u001b[0;31m \u001b[0;34m(\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mn\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mn\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mintegers\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mn\u001b[0m \u001b[0;34m%\u001b[0m \u001b[0;36m2\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;36m0\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 15\u001b[0m )\n\u001b[1;32m 16\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mtotal\u001b[0m \u001b[0;34m/\u001b[0m \u001b[0mcount\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;31mTypeError\u001b[0m: reduce() of empty sequence with no initial value" + ] + } + ], + "source": [ + "average_evens(2 * random.randint(0, 49) + 1 for _ in range(10_000_000))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "## Iterators vs. Iterables" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "In the \"*Collections vs. Sequences*\" section above, we studied the three and four *behaviors* of collections and sequences. The latter two are *abstract* ideas, and we mainly use them to classify *concrete* data types.\n", + "\n", + "Similarly, we have introduced data types in this chapter that all share the \"behavior\" of modeling some \"rule\" in memory to generate objects \"on the fly:\" They are the `map`, `filter`, and `generator` types. Their main commonality is supporting the built-in [next()](https://docs.python.org/3/library/functions.html#next) function. In computer science terminology, such data types are called **[iterators](https://en.wikipedia.org/wiki/Iterator)**, and the [collections.abc](https://docs.python.org/3/library/collections.abc.html) module formalizes them with the `Iterator` ABC.\n", + "\n", + "So, an example of an iterator is `evens_transformed` below, an object of type `map`." + ] + }, + { + "cell_type": "code", + "execution_count": 331, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [], + "source": [ + "numbers = [7, 11, 8, 5, 3, 12, 2, 6, 9, 10, 1, 4]" + ] + }, + { + "cell_type": "code", + "execution_count": 332, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [], + "source": [ + "evens_transformed = map(lambda x: (x ** 2) + 1, filter(lambda x: x % 2 == 0, numbers))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "Let's first confirm that `evens_transformed` is indeed an `Iterator`, \"abstractly speaking.\"" + ] + }, + { + "cell_type": "code", + "execution_count": 333, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 333, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "isinstance(evens_transformed, abc.Iterator)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "In Python, iterators are *always* also iterables. The reverse does *not* hold! To be precise, iterators are *specializations* of iterables. That is what the \"Inherits from\" columns means in the [collections.abc](https://docs.python.org/3/library/collections.abc.html) module's documentation." + ] + }, + { + "cell_type": "code", + "execution_count": 334, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 334, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "isinstance(evens_transformed, abc.Iterable)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "Furthermore, we revise our definition of *iterables* from above: Just as we defined an *iterator* to be an object that supports the [next()](https://docs.python.org/3/library/functions.html#next) function, we define an *iterable* to be an object that supports the built-in [iter()](https://docs.python.org/3/library/functions.html#iter) function.\n", + "\n", + "The confused reader may now be wondering how the two concepts relate to each other.\n", + "\n", + "In short, the [iter()](https://docs.python.org/3/library/functions.html#iter) function is the general way to create an *iterator* object out of a given *iterable* object. In real-world code, we hardly ever see [iter()](https://docs.python.org/3/library/functions.html#iter) as Python calls it for us in the background. Then, the *iterator* object manages the iteration over the *iterable* object.\n", + "\n", + "For illustration, let's do that ourselves and create *two* iterators out of the iterable `numbers` and see what we can do with them." + ] + }, + { + "cell_type": "code", + "execution_count": 335, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [], + "source": [ + "iterator1 = iter(numbers)" + ] + }, + { + "cell_type": "code", + "execution_count": 336, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [], + "source": [ + "iterator2 = iter(numbers)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "`iterator1` and `iterator2` are of type `list_iterator`." + ] + }, + { + "cell_type": "code", + "execution_count": 337, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "list_iterator" + ] + }, + "execution_count": 337, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "type(iterator1)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "*Iterators* are useful for only *one* operation: Get the next object from the associated *iterable*.\n", + "\n", + "By calling [next()](https://docs.python.org/3/library/functions.html#next) three times with `iterator1` as the argument, we obtain the first three elements of `numbers`." + ] + }, + { + "cell_type": "code", + "execution_count": 338, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "(7, 11, 8)" + ] + }, + "execution_count": 338, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "next(iterator1), next(iterator1), next(iterator1)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "`iterator1` and `iterator2` keep their *states* separate. So, we could \"manually\" loop over an *iterable* in parallel." + ] + }, + { + "cell_type": "code", + "execution_count": 339, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "(5, 7)" + ] + }, + "execution_count": 339, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "next(iterator1), next(iterator2)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "We can also play a \"trick\" and exchange some elements in `numbers`. `iterator1` and `iterator2` do *not* see these changes and present us with the new elements. So, *iterators* not only have state on their own but also keep this separate from the underlying *iterable*." + ] + }, + { + "cell_type": "code", + "execution_count": 340, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [], + "source": [ + "numbers[1], numbers[4] = 99, 99" + ] + }, + { + "cell_type": "code", + "execution_count": 341, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "(99, 99)" + ] + }, + "execution_count": 341, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "next(iterator1), next(iterator2)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "Let's re-assign the elements in `numbers` so that they are in order. Now, the numbers returned from [next()](https://docs.python.org/3/library/functions.html#next) also tell us how often [next()](https://docs.python.org/3/library/functions.html#next) was called with `iterator1` or `iterator2`. We conclude that `list_iterator` objects must be keeping track of the *last* index obtained from the underlying *iterable*." + ] + }, + { + "cell_type": "code", + "execution_count": 342, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "numbers[:] = list(range(1, 13))" + ] + }, + { + "cell_type": "code", + "execution_count": 343, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]" + ] + }, + "execution_count": 343, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "numbers" + ] + }, + { + "cell_type": "code", + "execution_count": 344, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "(6, 3)" + ] + }, + "execution_count": 344, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "next(iterator1), next(iterator2)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "With the concepts introduced in this section, we can now understand the first sentence in the documentation on the [zip()](https://docs.python.org/3/library/functions.html#zip) built-in better: \"Make an *iterator* that aggregates elements from each of the *iterables*.\"\n", + "\n", + "Because *iterators* are always also *iterables*, we pass `iterator1` and `iterator2` as arguments to [zip()](https://docs.python.org/3/library/functions.html#zip).\n", + "\n", + "The returned `zipper` object is of type `zip` and, \"abstractly speaking,\" an `Iterator` as well." + ] + }, + { + "cell_type": "code", + "execution_count": 345, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [], + "source": [ + "zipper = zip(iterator1, iterator2)" + ] + }, + { + "cell_type": "code", + "execution_count": 346, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 346, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "zipper" + ] + }, + { + "cell_type": "code", + "execution_count": 347, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "zip" + ] + }, + "execution_count": 347, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "type(zipper)" + ] + }, + { + "cell_type": "code", + "execution_count": 348, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 348, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "isinstance(zipper, abc.Iterator)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "So far, we have always used [zip()](https://docs.python.org/3/library/functions.html#zip) with a `for` statement and looped over it. That was our earlier definition of an *iterable*. Our revised definition in this section states that an *iterable* is an object that supports the [iter()](https://docs.python.org/3/library/functions.html#iter) function. So, let's see what happens if we pass `zipper` to [iter()](https://docs.python.org/3/library/functions.html#iter)." + ] + }, + { + "cell_type": "code", + "execution_count": 349, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [], + "source": [ + "zipper_iterator = iter(zipper)" + ] + }, + { + "cell_type": "code", + "execution_count": 350, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 350, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "zipper_iterator" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "`zipper_iterator` points to the *same* object as `zipper`! That holds for *iterators* in general: Any *iterator* created from an existing *iterator* with [iter()](https://docs.python.org/3/library/functions.html#iter) is the *iterator* itself." + ] + }, + { + "cell_type": "code", + "execution_count": 351, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 351, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "zipper is zipper_iterator" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "The Python core developers made that design decision so that *iterators* may also be looped over.\n", + "\n", + "The `for`-loop below prints out *six* more `tuple` objects derived from the re-ordered `numbers` because the `iterator1` object hidden inside `zipper` already returned the first *six* elements. So, the respective first elements of the `tuple` objects printed range from `7` to `12`. Similarly, as `iterator2` already provided *three* elements from `numbers`, we see the respective second elements in the range from `4` to `9`." + ] + }, + { + "cell_type": "code", + "execution_count": 352, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "7 > 4 8 > 5 9 > 6 10 > 7 11 > 8 12 > 9 " + ] + } + ], + "source": [ + "for x, y in zipper:\n", + " print(x, \">\", y, end=\" \")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "`zipper` is now *exhausted*. So, the `for`-loop below does *not* make any iteration at all." + ] + }, + { + "cell_type": "code", + "execution_count": 353, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [], + "source": [ + "for x, y in zipper:\n", + " print(x, \">\", y, end=\" \")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "We verify that `iterator1` is exhausted by passing it to [next()](https://docs.python.org/3/library/functions.html#next) again, which raises a `StopIteration` exception." + ] + }, + { + "cell_type": "code", + "execution_count": 354, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [ + { + "ename": "StopIteration", + "evalue": "", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mStopIteration\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mnext\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0miterator1\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;31mStopIteration\u001b[0m: " + ] + } + ], + "source": [ + "next(iterator1)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "On the contrary, `iterator2` is *not* yet exhausted." + ] + }, + { + "cell_type": "code", + "execution_count": 355, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "10" + ] + }, + "execution_count": 355, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "next(iterator2)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "### The `for` Statement (revisited)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "In [Chapter 4](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/04_iteration.ipynb#The-for-Statement), we argue that the `for` statement is syntactic sugar, replacing the `while` statement in many scenarios. In particular, a `for`-loop saves us two tasks: Managing an index variable *and* obtaining the individual elements by indexing. In this sub-section, we look at a more realistic picture, using the new terminology as well.\n", + "\n", + "Let's print out the elements of a `list` object as the *iterable* to be looped over." + ] + }, + { + "cell_type": "code", + "execution_count": 356, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [], + "source": [ + "iterable = [0, 1, 2, 3, 4]" + ] + }, + { + "cell_type": "code", + "execution_count": 357, + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "0 1 2 3 4 " + ] + } + ], + "source": [ + "for element in iterable:\n", + " print(element, end=\" \")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "Our previous and equivalent formulation with a `while` statement is like so." + ] + }, + { + "cell_type": "code", + "execution_count": 358, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "0 1 2 3 4 " + ] + } + ], + "source": [ + "index = 0\n", + "while index < len(iterable):\n", + " element = iterable[index]\n", + " print(element, end=\" \")\n", + " index += 1\n", + "del index" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "What happens behind the scenes in the Python interpreter is shown below.\n", + "\n", + "First, Python calls [iter()](https://docs.python.org/3/library/functions.html#iter) with the `iterable` to be looped over, and obtains an `iterator`. That contains the entire logic of how the `iterable` is looped over. In particular, the `iterator` may or may not pick the `iterable`'s elements in a predictable order. That is up to the \"rule\" it models.\n", + "\n", + "Second, Python enters an *indefinite* `while`-loop. It tries to obtain the next element with [next()](https://docs.python.org/3/library/functions.html#next). If that succeeds, the `for`-loop's code block is executed. Below, that code is placed within the `else`-clause that runs only if *no* exception is raised in the `try`-clause. Then, Python jumps into the next iteration and tries to obtain the next element from the `iterator`, and so on. Once the `iterator` is exhausted, it raises a `StopIteration` exception, and Python leaves the `while`-loop with the `break` statement." + ] + }, + { + "cell_type": "code", + "execution_count": 359, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "0 1 2 3 4 " + ] + } + ], + "source": [ + "iterator = iter(iterable)\n", + "\n", + "while True:\n", + " try:\n", + " element = next(iterator)\n", + " except StopIteration:\n", + " break\n", + " else:\n", + " print(element, end=\" \")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "Understanding *iterators* and *iterables* is helpful for any data science practitioner that deals with large amounts of data. Even without that, these two terms occur everywhere in Python-related texts and documentation." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "## TL;DR" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "**Sequences** are an abstract idea that summarizes *four* behaviors an object may or may not exhibit: We describe them as **finite** and **ordered** **containers** that we may **loop over**. Examples are objects of type `list`, `tuple`, but also `str`. Objects that exhibit all behaviors *except* being ordered are referred to as **collections**. The objects inside a sequence are labeled with a unique *index*, an `int` object in the range $0 \\leq \\text{index} < \\lvert \\text{sequence} \\rvert$, and called its **elements**.\n", + "\n", + "`list` objects are **mutable**. That means we can change the references to other objects it contains, and, in particular, re-assign them. On the contrary, `tuple` objects are like **immutable** lists: We can use them in place of any `list` object as long as we do *not* mutate it.\n", + "\n", + "Often, the work we do with sequences follows the **map-filter-reduce paradigm**: We apply the same transformation to all elements, filter some of them out, and calculate summary statistics from the remaining ones.\n", + "\n", + "An essential idea in this chapter is that, in many situations, we need *not* work with all the data **materialized** in memory. Instead, **iterators** allow us to process sequential data on a one-by-one basis." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.3" + }, + "livereveal": { + "auto_select": "code", + "auto_select_fragment": true, + "scroll": true, + "theme": "serif" + }, + "toc": { + "base_numbering": 1, + "nav_menu": {}, + "number_sections": false, + "sideBar": true, + "skip_h1_title": true, + "title_cell": "Table of Contents", + "title_sidebar": "Contents", + "toc_cell": false, + "toc_position": { + "height": "calc(100% - 180px)", + "left": "10px", + "top": "150px", + "width": "384px" + }, + "toc_section_display": false, + "toc_window_display": false + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/07_sequences_review_and_exercises.ipynb b/07_sequences_review_and_exercises.ipynb new file mode 100644 index 0000000..17e1ffc --- /dev/null +++ b/07_sequences_review_and_exercises.ipynb @@ -0,0 +1,389 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "# Chapter 7: Sequential Data" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Content Review" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Read [Chapter 7](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/07_sequences.ipynb) of the book. Then work through the ten review questions." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Essay Questions " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Answer the following questions briefly with *at most* 300 characters per question!" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Q1**: We have seen **containers** and **iterables** before in [Chapter 4](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/04_iteration.ipynb#Containers-vs.-Iterables). How do they relate to **sequences**? " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + " " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Q2**: Explain the difference between a **mutable** and an **immutable** object in Python with an example!" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + " " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Q3**: What is the difference between a **shallow** and a **deep** copy of an object? How can one of them become a problem?" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + " " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Q4.1**: `tuple` objects have *two* primary usages. First, they can be used in place of `list` objects where **mutability** is *not* required. Second, we use them to model **records**.\n", + "\n", + "Describe why `tuple` objects are a suitable replacement for `list` objects in general!" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + " " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Q4.2**: What do we mean by a **record**? How are `tuple` objects suitable to model records? How can we integrate a **semantic** meaning when working with records?" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + " " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Q5**: With the [map()](https://docs.python.org/3/library/functions.html#map) and [filter()](https://docs.python.org/3/library/functions.html#filter) built-ins and the [reduce()](https://docs.python.org/3/library/functools.html#functools.reduce) function from the [functools](https://docs.python.org/3/library/functools.html) module in the [standard library](https://docs.python.org/3/library/index.html), we can replace many tedious `for`-loops and `if` statements. What are some advantages of doing so?" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + " " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### True / False Questions" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Motivate your answer with *one short* sentence!" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Q6**: `sequence` objects are *not* part of core Python but may be imported from the [standard library](https://docs.python.org/3/library/index.html)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + " " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Q7**: Passing **mutable** objects as arguments to functions is not problematic because functions operate in a **local** scope without affecting the **global** scope." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + " " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Q8**: The **map-filter-reduce** paradigm is an excellent mental concept to organize one's code with. Then, there is a good chance that a program can be **parallelized** if the data input grows." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + " " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Q9**: `lambda` expressions are useful in the context of the **map-filter-reduce** paradigm, where we often do *not* re-use a `function` object more than once." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + " " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Coding Exercises" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Working with Lists" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Q10.1**: Write a function `nested_sum()` that takes a `list` object as its argument, which contains other `list` objects with numbers, and adds up the numbers! Use `nested_numbers` below to test your function!\n", + "\n", + "Hint: You need at least one `for`-loop." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "nested_numbers = [[1, 2, 3], [4], [5], [6, 7], [8], [9]]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def nested_sum():\n", + " ..." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "nested_sum(nested_numbers)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Q10.2**: Provide a one-line expression to obtain the *same* result as `nested_sum()`!\n", + "\n", + "Hints: Use a *list comprehension*. You may want to use the built-in [sum()](https://docs.python.org/3/library/functions.html#sum) function several times." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Q10.3**: Generalize `nested_sum()` into a function `mixed_sum()` that can process a \"mixed\" `list` object, which contains numbers and other `list` objects with numbers! Use `mixed_numbers` below for testing!\n", + "\n", + "Hints: Use the built-in [isinstance()](https://docs.python.org/3/library/functions.html#isinstance) function to check how an element is to be processed. Get extra credit for adhering to *goose typing*, as explained in [Chapter 5](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/05_numbers.ipynb#Goose-Typing)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "mixed_numbers = [[1, 2, 3], 4, 5, [6, 7], 8, [9]]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def mixed_sum():\n", + " ..." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "mixed_sum(nums)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Q10.4.1**: Write a function `cum_sum()` that takes a `list` object with numbers as its argument and returns a *new* `list` object with the **cumulative sums** of these numbers! So, `sum_up` below, `[1, 2, 3, 4, 5]`, should return `[1, 3, 6, 10, 15]`.\n", + "\n", + "Hint: The idea behind is similar to the [cumulative distribution function](https://en.wikipedia.org/wiki/Cumulative_distribution_function) from statistics." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "sum_up = [1, 2, 3, 4, 5]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def cum_sum():\n", + " ..." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "cum_sum(sum_up)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Q10.4.2**: We should always make sure that our functions also work in corner cases. What happens if your implementation of `cum_sum()` is called with an empty list `[]`? Make sure it handles that case *without* crashing! What would be a good return value in this corner case? Describe everything in the docstring.\n", + "\n", + "Hint: It is possible to write this without any extra input validation." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "cum_sum([])" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.3" + }, + "toc": { + "base_numbering": 1, + "nav_menu": {}, + "number_sections": false, + "sideBar": true, + "skip_h1_title": true, + "title_cell": "Table of Contents", + "title_sidebar": "Contents", + "toc_cell": false, + "toc_position": {}, + "toc_section_display": false, + "toc_window_display": false + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/README.md b/README.md index daa2e93..d2f8d4b 100644 --- a/README.md +++ b/README.md @@ -19,8 +19,9 @@ As such they can be viewed in a plain web browser: - [02 - Functions & Modularization](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/02_functions.ipynb) - [03 - Conditionals & Exceptions](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/03_conditionals.ipynb) - [04 - Recursion & Looping](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/04_iteration.ipynb) -- [05 - Numbers](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/05_numbers.ipynb) -- [06 - Text](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/06_text.ipynb) +- [05 - Bits & Numbers](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/05_numbers.ipynb) +- [06 - Bytes & Text](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/06_text.ipynb) +- [07 - Sequential Data](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/07_sequences.ipynb) However, it is recommended that students **install Python and Jupyter locally** and run the code in the notebooks on their own. @@ -125,8 +126,8 @@ are popular tools to automate the described management of virtual environments. After activation for the first time, you must install the project's **dependencies** (= the third-party packages needed to run the code), most notably [Jupyter](https://pypi.org/project/jupyter/) in this project (the -"python -m" is often left out; if you have poetry installed, you may just -type `poetry install` instead). +"python -m" is often left out [but should not be](https://snarky.ca/why-you-should-use-python-m-pip/); +if you have poetry installed, you may just type `poetry install` instead). - `python -m pip install -r requirements.txt` diff --git a/lorem_ipsum.txt b/lorem_ipsum.txt new file mode 100644 index 0000000..c0e21ab --- /dev/null +++ b/lorem_ipsum.txt @@ -0,0 +1,6 @@ +Lorem Ipsum is simply dummy text of the printing and typesetting industry. +Lorem Ipsum has been the industry's standard dummy text ever since the 1500s +when an unknown printer took a galley of type and scrambled it to make a type +specimen book. It has survived not only five centuries but also the leap into +electronic typesetting, remaining essentially unchanged. It was popularised in +the 1960s with the release of Letraset sheets.