intro-to-python/08_mfr/01_content.ipynb
Alexander Hess 94e5112f10
Release 0.1.0
After refurbishing the project we prepare a new relaease.
There are no changes with respect to the contents as compared to v0.0.0
that are noteworthy release notes.
2024-04-08 22:13:31 +02:00

2384 lines
68 KiB
Text

{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"**Note**: Click on \"*Kernel*\" > \"*Restart Kernel and Clear All Outputs*\" in [JupyterLab](https://jupyterlab.readthedocs.io/en/stable/) *before* reading this notebook to reset its output. If you cannot run this file on your machine, you may want to open it [in the cloud <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_mb.png\">](https://mybinder.org/v2/gh/webartifex/intro-to-python/main?urlpath=lab/tree/08_mfr/01_content.ipynb)."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Chapter 8: Map, Filter, & Reduce (continued)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"After introducing the Map-Filter-Reduce paradigm in the [first part <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_nb.png\">](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/main/08_mfr/00_content.ipynb) of this chapter, we first see how `list` comprehensions can replace the [map() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/functions.html#map) and [filter() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/functions.html#filter) built-ins in many cases. Then, we learn how `generator` expressions are like `list` comprehensions *without* using the memory. We end this part with a short discussion of the built-in [all() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/functions.html#all) and [any() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/functions.html#any) functions."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## `list` Comprehensions"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Consider again the original \"*A simple Filter*\" example from [Chapter 4 <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_nb.png\">](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/main/04_iteration/03_content.ipynb#Example:-A-simple-Filter), re-written such that both the mapping and the filtering are done in *one* `for`-loop instead of the *two* above."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"numbers = [7, 11, 8, 5, 3, 12, 2, 6, 9, 10, 1, 4]"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"evens_transformed = []\n",
"\n",
"for number in numbers:\n",
" if number % 2 == 0:\n",
" evens_transformed.append((number ** 2) + 1)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"370"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sum(evens_transformed)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"**`list` comprehensions** are *expressions* to derive *new* `list` objects out of *existing* ones (cf., [reference <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/reference/expressions.html#displays-for-lists-sets-and-dictionaries)). Practically, this means that we place the `for` and `if` inside brackets `[` and `]`.\n",
"\n",
"So, the example can be written in a single *expression* like below replacing the compound `for` *statement* above."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"[65, 145, 5, 37, 101, 17]"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"[(n ** 2) + 1 for n in numbers if n % 2 == 0]"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"A list comprehension may always be used in a place where otherwise a `list` object would work.\n",
"\n",
"For example, let's rewrite the \"*A simple Filter*\" example from [Chapter 4 <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_nb.png\">](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/main/04_iteration/03_content.ipynb#Example:-A-simple-Filter) in just one line. As a caveat, the code below *materializes* all elements in memory *before* summing them up, and may, therefore, cause a `MemoryError` when executed with a bigger `numbers` list. We see with [PythonTutor <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](http://pythontutor.com/visualize.html#code=numbers%20%3D%20range%281,%2013%29%0Aresult%20%3D%20sum%28%5B%28n%20**%202%29%20%2B%201%20for%20n%20in%20numbers%20if%20n%20%25%202%20%3D%3D%200%5D%29&cumulative=false&curstr=0&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false) how a `list` object exists in memory at step 17 and then \"gets lost\" right after. As the next section shows, this downside may be mitigated."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"370"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sum([(n ** 2) + 1 for n in numbers if n % 2 == 0])"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"### Example: Nested Lists"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"List comprehensions may come with several `for`'s and `if`'s.\n",
"\n",
"The cell below creates a `list` object that contains three other `list` objects with a series of numbers in them. The first and last numbers in each inner `list` object are offset by `1`."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [],
"source": [
"nested_numbers = [\n",
" [1, 2, 3, 4, 5, 6, 7],\n",
" [2, 3, 4, 5, 6, 7, 8],\n",
" [3, 4, 5, 6, 7, 8, 9],\n",
"]"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"data": {
"text/plain": [
"[[1, 2, 3, 4, 5, 6, 7], [2, 3, 4, 5, 6, 7, 8], [3, 4, 5, 6, 7, 8, 9]]"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"nested_numbers"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"To do something meaningful with the numbers, we have to remove the inner layer of `list` objects and **flatten** (i.e., \"un-nest\") the data.\n",
"\n",
"Without list comprehensions, we achieve that with two nested `for`-loops."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [],
"source": [
"flat_numbers = []\n",
"\n",
"for inner_numbers in nested_numbers:\n",
" for number in inner_numbers:\n",
" flat_numbers.append(number)"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"data": {
"text/plain": [
"[1, 2, 3, 4, 5, 6, 7, 2, 3, 4, 5, 6, 7, 8, 3, 4, 5, 6, 7, 8, 9]"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"flat_numbers"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"That translates into a list comprehension like below. The order of the `for`'s may be confusing at first but is the *same* as writing out the nested `for`-loops as above."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"data": {
"text/plain": [
"[1, 2, 3, 4, 5, 6, 7, 2, 3, 4, 5, 6, 7, 8, 3, 4, 5, 6, 7, 8, 9]"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"[number for inner_numbers in nested_numbers for number in inner_numbers]"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Now, we may use the `list` object resulting from the list comprehension in any way we want.\n",
"\n",
"For example, to add up the flattened numbers with [sum() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/functions.html#sum). The same caveat holds as before: The `list` object passed into [sum() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/functions.html#sum) is *materialized* with *all* its elements *before* the sum is calculated!"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"data": {
"text/plain": [
"105"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sum([number for inner_numbers in nested_numbers for number in inner_numbers])"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"In this particular example, however, we can exploit the fact that any sum of numbers can be expressed as the sum of sums of mutually exclusive and collectively exhaustive subsets of these numbers and get away with just *one* `for` in the list comprehension."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"data": {
"text/plain": [
"105"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sum([sum(inner_numbers) for inner_numbers in nested_numbers])"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Example: Working with Cartesian Products"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"A popular use case of nested list comprehensions is applying a transformation to each $n$-tuple in the [Cartesian product <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_wiki.png\">](https://en.wikipedia.org/wiki/Cartesian_product) created from elements of $n$ iterables. In the generic illustration below, a transformation $f(x, y)$ is applied to each $2$-tuple in the Cartesian product $X \\times Y$ where $x$ is an element in $X = \\{x_1, x_2, x_3\\}$ and $y$ is an element in $Y = \\{y_1, y_2, y_3\\}$."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"|$Y$ \\ $X$| $x_1$ | $x_2$ | $x_3$ |\n",
"|---------|--------------|--------------|--------------|\n",
"| $y_1$ |f($x_1$,$y_1$)|f($x_2$,$y_1$)|f($x_3$,$y_1$)|\n",
"| $y_2$ |f($x_1$,$y_2$)|f($x_2$,$y_2$)|f($x_3$,$y_2$)|\n",
"| $y_3$ |f($x_1$,$y_3$)|f($x_2$,$y_3$)|f($x_3$,$y_3$)|"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"For example, let's add each quotient obtained by taking the numerator $x$ from `[10, 20, 30]` and the denominator $y$ from `[40, 50, 60]` to `1`, and then find the overall product. This transformation can be described mathematically as the function $z = f(x, y)$ below."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"$$z = f(x, y) = \\prod{ (1 + \\frac{x}{y} )}$$"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Further, the table below visualizes the calculations: The result is the product of *nine* entries."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"|`y` \\ `x`|**10**|**20**|**30**|\n",
"|------|------|------|------|\n",
"|**40**| 1.25 | 1.50 | 1.75 |\n",
"|**50**| 1.20 | 1.40 | 1.60 |\n",
"|**60**| 1.17 | 1.33 | 1.50 |"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"To express that in Python, we start by creating two `list` objects, `first` and `second`."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"first = [10, 20, 30]\n",
"second = [40, 50, 60]"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"For a Cartesian product, we must loop over *all* possible $2$-tuples where one element is drawn from `first` and the other from `second`. We achieve that with two nested `for`-loops, in which we calculate each `quotient` and append it to an initially empty `cartesian_product` list."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"[1.25, 1.2, 1.1666666666666667, 1.5, 1.4, 1.3333333333333333, 1.75, 1.6, 1.5]"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cartesian_product = []\n",
"\n",
"for numerator in first:\n",
" for denominator in second:\n",
" quotient = 1 + (numerator / denominator)\n",
" cartesian_product.append(quotient)\n",
"\n",
"cartesian_product"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Next, we convert the two explicit `for`-loops into one list comprehensions and use `x` and `y` as shorter variable names."
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"[1.25, 1.2, 1.1666666666666667, 1.5, 1.4, 1.3333333333333333, 1.75, 1.6, 1.5]"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"[1 + (x / y) for x in first for y in second]"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"The order of the `for`'s is *important*: The list comprehension above divides numbers from `first` by numbers from `second`, whereas the list comprehension below does the opposite."
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"data": {
"text/plain": [
"[5.0, 3.0, 2.333333333333333, 6.0, 3.5, 2.666666666666667, 7.0, 4.0, 3.0]"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"[1 + (x / y) for x in second for y in first]"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"To find the overall product, we *unpack* the list comprehension into the `product()` function from the \"*Function Definitions & Calls*\" sub-section in [Chapter 7 <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_nb.png\">](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/main/07_sequences/03_content.ipynb#Function-Definitions-&-Calls)."
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"def product(*args):\n",
" \"\"\"Multiply all arguments.\"\"\"\n",
" result = args[0]\n",
"\n",
" for arg in args[1:]:\n",
" result *= arg\n",
"\n",
" return result"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"20.58"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"product(*[1 + (x / y) for x in first for y in second])"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Alternatively, we use a `lambda` expression with the [reduce() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/functools.html#functools.reduce) function from the [functools <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/functools.html) module."
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"from functools import reduce"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/plain": [
"20.58"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"reduce(lambda x, y: x * y, [1 + (x / y) for x in first for y in second])"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"While this example is stylized, Cartesian products are hidden in many applications, and it shows how the various language features introduced in this chapter can be seamlessly combined to process sequential data."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## `generator` Expressions"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Because of the high memory consumption, Pythonistas avoid materialized `list` objects, and, thus, also `list` comprehensions, whenever possible. Instead, they prefer to work with **[`generator` expressions <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/reference/expressions.html#generator-expressions)**. Syntactically, they work like list comprehensions except that parentheses, `(` and `)`, replace brackets, `[` and `]`.\n",
"\n",
"Let's go back to the original \"*A simple Filter*\" example from [Chapter 4 <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_nb.png\">](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/main/04_iteration/03_content.ipynb#Example:-A-simple-Filter) one more time, apply the transformation $y := x^2 + 1$ to all even `numbers`, and sum them up."
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"numbers = [7, 11, 8, 5, 3, 12, 2, 6, 9, 10, 1, 4]"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"To filter and transform `numbers`, we wrote a list comprehension above ..."
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"[65, 145, 5, 37, 101, 17]"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"[(n ** 2) + 1 for n in numbers if n % 2 == 0]"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"... that now becomes a `generator` expression."
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"<generator object <genexpr> at 0x7fcf703d84a0>"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"((n ** 2) + 1 for n in numbers if n % 2 == 0)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"A `generator` expression evaluates to yet another \"rule\"-like object in memory that knows how to generate the individual objects in a series one by one. Whereas a `list` comprehension materializes its elements in memory *when* it is evaluated, the opposite holds true for `generator` expressions: *No* object is created in memory except the \"rule\" itself. Because of this behavior, we describe `generator` expressions as *lazy* and `list` comprehensions as *eager*.\n",
"\n",
"To materialize the elements specified by a `generator` expression, we use the [list() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/functions.html#func-list) constructor as seen above."
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"[65, 145, 5, 37, 101, 17]"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"list(((n ** 2) + 1 for n in numbers if n % 2 == 0))"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Whenever a `generator` expression is the only argument in a function call, we may merge the double parentheses, `((` and `))`, into just `(` and `)`."
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"data": {
"text/plain": [
"[65, 145, 5, 37, 101, 17]"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"list((n ** 2) + 1 for n in numbers if n % 2 == 0)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"A common use case is to reduce the elements into a single object instead, for example, by adding them up with [sum() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/functions.html#sum) as in the original \"*A simple Filter*\" example. [PythonTutor <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](http://pythontutor.com/visualize.html#code=numbers%20%3D%20range%281,%2013%29%0Asum_with_list%20%3D%20sum%28%5B%28n%20**%202%29%20%2B%201%20for%20n%20in%20numbers%20if%20n%20%25%202%20%3D%3D%200%5D%29%0Asum_with_gen%20%3D%20sum%28%28n%20**%202%29%20%2B%201%20for%20n%20in%20numbers%20if%20n%20%25%202%20%3D%3D%200%29&cumulative=false&curstr=0&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false) shows how the code cell below does *not* create a temporary `list` object in memory whereas a `list` comprehension would do so (cf., step 17)."
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"370"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sum((n ** 2) + 1 for n in numbers if n % 2 == 0)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Let's assign the object to which the `generator` expression below evaluates to to a variable `gen` and inspect it."
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"gen = ((n ** 2) + 1 for n in numbers if n % 2 == 0)"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"<generator object <genexpr> at 0x7fcf703d8c10>"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"gen"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Unsurprisingly, `generator` expressions create objects of type `generator`."
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"generator"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"type(gen)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"`generator` objects work just like the `map` and `filter` objects in the [first part <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_nb.png\">](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/main/08_mfr/00_content.ipynb) of this chapter. So, with the [next() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/functions.html#next) function, we can generate elements one by one."
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"65"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"next(gen)"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"data": {
"text/plain": [
"145"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"next(gen)"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"data": {
"text/plain": [
"5"
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"next(gen)"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"data": {
"text/plain": [
"37"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"next(gen)"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"data": {
"text/plain": [
"101"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"next(gen)"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"data": {
"text/plain": [
"17"
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"next(gen)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Once a `generator` object runs out of elements, it raises a `StopIteration` exception, and we say that the `generator` object is **exhausted**. To loop over its elements again, we must create a *new* one."
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"ename": "StopIteration",
"evalue": "",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mStopIteration\u001b[0m Traceback (most recent call last)",
"Cell \u001b[0;32mIn[36], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m \u001b[38;5;28;43mnext\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43mgen\u001b[49m\u001b[43m)\u001b[49m\n",
"\u001b[0;31mStopIteration\u001b[0m: "
]
}
],
"source": [
"next(gen)"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"ename": "StopIteration",
"evalue": "",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mStopIteration\u001b[0m Traceback (most recent call last)",
"Cell \u001b[0;32mIn[37], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m \u001b[38;5;28;43mnext\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43mgen\u001b[49m\u001b[43m)\u001b[49m\n",
"\u001b[0;31mStopIteration\u001b[0m: "
]
}
],
"source": [
"next(gen)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Calling the [next() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/functions.html#next) function repeatedly with the *same* `generator` object as the argument is essentially what a `for`-loop automates for us. So, `generator` objects are *iterable*. We look into this in detail further below in the \"*The `for` Statement (revisited)*\" section."
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"65 145 5 37 101 17 "
]
}
],
"source": [
"for number in ((n ** 2) + 1 for n in numbers if n % 2 == 0):\n",
" print(number, end=\" \")"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"### Example: Nested Lists (continued)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"If we are only interested in a *reduction* of `nested_numbers` into a single statistic, as the overall sum in the \"*Nested Lists*\" example above, we should replace `list` objects or `list` comprehensions with `generator` expressions wherever possible. The result is the *same*, but no intermediate `list` objects are materialized! That makes our code scale to large amounts of data.\n",
"\n",
"Let's adapt the example `nested_numbers` from above."
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"data": {
"text/plain": [
"[[1, 2, 3, 4, 5, 6, 7], [2, 3, 4, 5, 6, 7, 8], [3, 4, 5, 6, 7, 8, 9]]"
]
},
"execution_count": 39,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"nested_numbers"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Compared to the implementation with the list comprehension above, we simply remove the brackets, `[` and `]`: The argument to [sum() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/functions.html#sum) becomes a generator expression."
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"data": {
"text/plain": [
"105"
]
},
"execution_count": 40,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sum(number for inner_numbers in nested_numbers for number in inner_numbers)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"That also holds for the alternative formulation as a sum of sums."
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"data": {
"text/plain": [
"105"
]
},
"execution_count": 41,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sum(sum(inner_numbers) for inner_numbers in nested_numbers)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Because `nested_numbers` has an internal structure (i.e., the inner `list` objects are series of consecutive `int` objects), we can even provide an effectively **memoryless** implementation by expressing it as a `generator` expression derived from `range` objects. [PythonTutor <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](http://pythontutor.com/visualize.html#code=nested_numbers%20%3D%20%28%28range%28x,%20y%20%2B%201%29%29%20for%20x,%20y%20in%20zip%28range%281,%204%29,%20range%287,%2010%29%29%29%0Aresult%20%3D%20sum%28number%20for%20inner_numbers%20in%20nested_numbers%20for%20number%20in%20inner_numbers%29&cumulative=false&curstr=0&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false) confirms that no `list` objects materialize at any point in time."
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [],
"source": [
"nested_numbers = ((range(x, y + 1)) for x, y in zip(range(1, 4), range(7, 10)))"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"data": {
"text/plain": [
"<generator object <genexpr> at 0x7fcf70347970>"
]
},
"execution_count": 43,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"nested_numbers"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"data": {
"text/plain": [
"105"
]
},
"execution_count": 44,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sum(number for inner_numbers in nested_numbers for number in inner_numbers)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"We must be careful when assigning a `generator` object to a variable: If we use `nested_numbers` again, for example, in the alternative formulation below, [sum() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/functions.html#sum) returns `0` because `nested_numbers` is exhausted after executing the previous code cell. [PythonTutor <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](http://pythontutor.com/visualize.html#code=nested_numbers%20%3D%20%28%28range%28x,%20y%20%2B%201%29%29%20for%20x,%20y%20in%20zip%28range%281,%204%29,%20range%287,%2010%29%29%29%0Aresult%20%3D%20sum%28number%20for%20inner_numbers%20in%20nested_numbers%20for%20number%20in%20inner_numbers%29%0Ano_result%20%3D%20sum%28sum%28inner_numbers%29%20for%20inner_numbers%20in%20nested_numbers%29&cumulative=false&curstr=0&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false) also shows that."
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"data": {
"text/plain": [
"0"
]
},
"execution_count": 45,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sum(sum(inner_numbers) for inner_numbers in nested_numbers)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Example: Working with Cartesian Products (continued)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Let's also rewrite the \"*Working with Cartesian Products*\" example from above with generator expressions.\n",
"\n",
"As a first optimization, we replace the materialized `list` objects, `first` and `second`, with memoryless `range` objects."
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"first = range(10, 31, 10) # \"==\" [10, 20, 30]\n",
"second = range(40, 61, 10) # \"==\" [40, 50, 60]"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Now, the first of the two alternative solutions may be more appealing to many readers. In general, many practitioners seem to dislike `lambda` expressions.\n",
"\n",
"In the first solution, we *unpack* the elements produced by `(1 + (x / y) for x in first for y in second)` into the `product()` function from the \"*Function Definitions & Calls*\" sub-section in [Chapter 7 <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_nb.png\">](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/main/07_sequences/03_content.ipynb#Function-Definitions-&-Calls). However, inside `product()`, the elements are *packed* into `args`, a *materialized* `tuple` object! So, all the memory efficiency gained by using a generator expression is lost! [PythonTutor <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](http://pythontutor.com/visualize.html#code=def%20product%28*args%29%3A%0A%20%20%20%20result%20%3D%20args%5B0%5D%0A%20%20%20%20for%20arg%20in%20args%5B1%3A%5D%3A%0A%20%20%20%20%20%20%20%20result%20*%3D%20arg%0A%20%20%20%20return%20result%0A%0Afirst%20%3D%20range%2810,%2031,%2010%29%0Asecond%20%3D%20range%2840,%2061,%2010%29%0A%0Aresult%20%3D%20product%28*%281%20%2B%20%28x%20/%20y%29%20for%20x%20in%20first%20for%20y%20in%20second%29%29&cumulative=false&curstr=0&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false) shows how a `tuple` object exists in steps 38-58."
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"20.58"
]
},
"execution_count": 47,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"product(*(1 + (x / y) for x in first for y in second)) # not memory efficient!"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"On the contrary, the second solution with the [reduce() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/functools.html#functools.reduce) function from the [functools <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/functools.html) module and the `lambda` expression works *without* the elements materialized at the same time, and [PythonTutor <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](http://pythontutor.com/visualize.html#code=from%20functools%20import%20reduce%0A%0Afirst%20%3D%20range%2810,%2031,%2010%29%0Asecond%20%3D%20range%2840,%2061,%2010%29%0A%0Aresult%20%3D%20reduce%28%0A%20%20%20%20lambda%20x,%20y%3A%20x%20*%20y,%0A%20%20%20%20%281%20%2B%20%28x%20/%20y%29%20for%20x%20in%20first%20for%20y%20in%20second%29%0A%29&cumulative=false&curstr=0&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false) confirms that. So, only the second alternative is truly memory-efficient!"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"20.58"
]
},
"execution_count": 48,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"reduce(lambda x, y: x * y, (1 + (x / y) for x in first for y in second))"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"In summary, we learn from this example that unpacking `generator` objects *may* be a *bad* idea."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Example: Averaging all even Numbers in a List (revisited)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"With the new concepts in this chapter, let's rewrite the book's introductory \"*Averaging all even Numbers in a List*\" example from [Chapter 1 <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_nb.png\">](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/main/01_elements/00_content.ipynb#Example:-Averaging-all-even-Numbers-in-a-List) such that it efficiently handles a large sequence of numbers. We continue from its latest implementation, the `average_evens()` function in the \"*Keyword-only Arguments*\" section in [Chapter 2 <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_nb.png\">](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/main/02_functions/00_content.ipynb#Keyword-only-Arguments).\n",
"\n",
"We assume that `average_evens()` is called with a *finite* and *iterable* object that generates a **stream** of numeric objects that can be cast as `int` objects. After all, the idea of even and odd numbers makes sense only in the context of whole numbers.\n",
"\n",
"The `generator` expression `(round(n) for n in numbers)` implements the type casting, and, when it is evaluated during a function call, *nothing* happens except that a `generator` object is assigned to `integers`. Then, with the [reduce() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/functools.html#functools.reduce) function from the [functools <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/functools.html) module, we *simultaneously* add up *and* count the even numbers with the `generator` object to which the inner `generator` expression `((n, 1) for n in integers if n % 2 == 0)` evaluates to. That `generator` object takes the `integers` generator as its source and produces `tuple` objects consisting of the next *even* number in line and `1`. Two such `tuple` objects are then iteratively passed to the `function` object to which the `lambda` expression evaluates to. `x` represents the total and the count of the even numbers processed so far, while `y`'s first element, `y[0]`, is the next *even* number to be added to the running total. The result of calling [reduce() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/functools.html#functools.reduce) is again a `tuple` object, namely the final `total` and `count`. Lastly, we calculate the simple average and scale it.\n",
"\n",
"In summary, this implementation of `average_evens()` does *not* keep materialized `list` objects internally like its predecessors in [Chapter 2 <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_nb.png\">](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/main/02_functions/00_content.ipynb) but processes the elements of the `numbers` argument on a one-by-one basis."
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"def average_evens(numbers, *, scalar=1):\n",
" \"\"\"Calculate the average of all even numbers.\n",
"\n",
" Args:\n",
" numbers (iterable): a finite stream of the numbers to be averaged;\n",
" if non-whole numbers are provided, they are rounded\n",
" scalar (float, optional): multiplies the average; defaults to 1\n",
"\n",
" Returns:\n",
" float: (scaled) average\n",
" \"\"\"\n",
" integers = (round(n) for n in numbers)\n",
" total, count = reduce(\n",
" lambda x, y: (x[0] + y[0], x[1] + y[1]),\n",
" ((n, 1) for n in integers if n % 2 == 0)\n",
" )\n",
" return scalar * total / count"
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"7.0"
]
},
"execution_count": 50,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"average_evens([7, 11, 8, 5, 3, 12, 2, 6, 9, 10, 1, 4])"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"We may provide an optional `scalar` argument as before."
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"data": {
"text/plain": [
"14.0"
]
},
"execution_count": 51,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"average_evens([7, 11, 8, 5, 3, 12, 2, 6, 9, 10, 1, 4], scalar=2)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"An argument with `float` objects works just as well."
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"7.0"
]
},
"execution_count": 52,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"average_evens([7., 11., 8., 5., 3., 12., 2., 6., 9., 10., 1., 4.])"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"To show that `average_evens()` can process a large stream of data, we simulate `10_000_000` randomly drawn integers between `0` and `100` with the [randint() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/random.html#random.randint) function from the [random <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/random.html) module. We use a `generator` expression derived from a `range` object as the `numbers` argument. So, at *no* point in time is there a materialized `list` object in memory. The result approaching `50` confirms that [randint() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/random.html#random.randint) must be based on a uniform distribution."
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"import random"
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [],
"source": [
"random.seed(42)"
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"49.994081434519636"
]
},
"execution_count": 55,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"average_evens(random.randint(0, 100) for _ in range(10_000_000))"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"To show that `average_evens()` filters out odd numbers, we simulate another stream of `10_000_000` randomly drawn odd integers between `1` and `99`. As no function in the [random <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/random.html) module does that \"out of the box,\" we must be creative: Doubling a number drawn from `random.randint(0, 49)` results in an even number between `0` and `98`, and adding `1` makes it odd. Then, `average_evens()` raises a `TypeError`, essentially because `(int(n) for n in numbers)` does not generate any element."
]
},
{
"cell_type": "code",
"execution_count": 56,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"ename": "TypeError",
"evalue": "reduce() of empty iterable with no initial value",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)",
"Cell \u001b[0;32mIn[56], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m \u001b[43maverage_evens\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;241;43m2\u001b[39;49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43m \u001b[49m\u001b[43mrandom\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mrandint\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;241;43m0\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m49\u001b[39;49m\u001b[43m)\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m+\u001b[39;49m\u001b[43m \u001b[49m\u001b[38;5;241;43m1\u001b[39;49m\u001b[43m \u001b[49m\u001b[38;5;28;43;01mfor\u001b[39;49;00m\u001b[43m \u001b[49m\u001b[43m_\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;129;43;01min\u001b[39;49;00m\u001b[43m \u001b[49m\u001b[38;5;28;43mrange\u001b[39;49m\u001b[43m(\u001b[49m\u001b[38;5;241;43m10_000_000\u001b[39;49m\u001b[43m)\u001b[49m\u001b[43m)\u001b[49m\n",
"Cell \u001b[0;32mIn[49], line 13\u001b[0m, in \u001b[0;36maverage_evens\u001b[0;34m(numbers, scalar)\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[38;5;250m\u001b[39m\u001b[38;5;124;03m\"\"\"Calculate the average of all even numbers.\u001b[39;00m\n\u001b[1;32m 3\u001b[0m \n\u001b[1;32m 4\u001b[0m \u001b[38;5;124;03mArgs:\u001b[39;00m\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 10\u001b[0m \u001b[38;5;124;03m float: (scaled) average\u001b[39;00m\n\u001b[1;32m 11\u001b[0m \u001b[38;5;124;03m\"\"\"\u001b[39;00m\n\u001b[1;32m 12\u001b[0m integers \u001b[38;5;241m=\u001b[39m (\u001b[38;5;28mround\u001b[39m(n) \u001b[38;5;28;01mfor\u001b[39;00m n \u001b[38;5;129;01min\u001b[39;00m numbers)\n\u001b[0;32m---> 13\u001b[0m total, count \u001b[38;5;241m=\u001b[39m \u001b[43mreduce\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m 14\u001b[0m \u001b[43m \u001b[49m\u001b[38;5;28;43;01mlambda\u001b[39;49;00m\u001b[43m \u001b[49m\u001b[43mx\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43my\u001b[49m\u001b[43m:\u001b[49m\u001b[43m \u001b[49m\u001b[43m(\u001b[49m\u001b[43mx\u001b[49m\u001b[43m[\u001b[49m\u001b[38;5;241;43m0\u001b[39;49m\u001b[43m]\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m+\u001b[39;49m\u001b[43m \u001b[49m\u001b[43my\u001b[49m\u001b[43m[\u001b[49m\u001b[38;5;241;43m0\u001b[39;49m\u001b[43m]\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mx\u001b[49m\u001b[43m[\u001b[49m\u001b[38;5;241;43m1\u001b[39;49m\u001b[43m]\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m+\u001b[39;49m\u001b[43m \u001b[49m\u001b[43my\u001b[49m\u001b[43m[\u001b[49m\u001b[38;5;241;43m1\u001b[39;49m\u001b[43m]\u001b[49m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 15\u001b[0m \u001b[43m \u001b[49m\u001b[43m(\u001b[49m\u001b[43m(\u001b[49m\u001b[43mn\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m1\u001b[39;49m\u001b[43m)\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43;01mfor\u001b[39;49;00m\u001b[43m \u001b[49m\u001b[43mn\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;129;43;01min\u001b[39;49;00m\u001b[43m \u001b[49m\u001b[43mintegers\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43;01mif\u001b[39;49;00m\u001b[43m \u001b[49m\u001b[43mn\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m%\u001b[39;49m\u001b[43m \u001b[49m\u001b[38;5;241;43m2\u001b[39;49m\u001b[43m \u001b[49m\u001b[38;5;241;43m==\u001b[39;49m\u001b[43m \u001b[49m\u001b[38;5;241;43m0\u001b[39;49m\u001b[43m)\u001b[49m\n\u001b[1;32m 16\u001b[0m \u001b[43m\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 17\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m scalar \u001b[38;5;241m*\u001b[39m total \u001b[38;5;241m/\u001b[39m count\n",
"\u001b[0;31mTypeError\u001b[0m: reduce() of empty iterable with no initial value"
]
}
],
"source": [
"average_evens(2 * random.randint(0, 49) + 1 for _ in range(10_000_000))"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"## `tuple` Comprehensions"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"There is no syntax to derive *new* `tuple` objects out of existing ones. However, we can mimic such a construct by combining the built-in [tuple() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/functions.html#func-tuple) constructor with a `generator` expression.\n",
"\n",
"So, to convert the `list` comprehension `[(n ** 2) + 1 for n in numbers if n % 2 == 0]` from above into a \"`tuple` comprehension,\" we write the following."
]
},
{
"cell_type": "code",
"execution_count": 57,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"data": {
"text/plain": [
"(65, 145, 5, 37, 101, 17)"
]
},
"execution_count": 57,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"tuple((n ** 2) + 1 for n in numbers if n % 2 == 0)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Boolean Reducers"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Besides [min() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/functions.html#min), [max() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/functions.html#max), and [sum() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/functions.html#sum), Python provides two boolean reduce functions: [all() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/functions.html#all) and [any() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/functions.html#any).\n",
"\n",
"Let's look at straightforward examples involving `numbers` again."
]
},
{
"cell_type": "code",
"execution_count": 58,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"numbers = [7, 11, 8, 5, 3, 12, 2, 6, 9, 10, 1, 4]"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"[all() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/functions.html#all) takes an `iterable` argument and returns `True` if *all* elements are *truthy*.\n",
"\n",
"For example, let's check if the square of each element in `numbers` is below `100` or `150`, respectively. We express the computation with a `generator` expression passed as the only argument to [all() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/functions.html#all)."
]
},
{
"cell_type": "code",
"execution_count": 59,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"False"
]
},
"execution_count": 59,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"all(x ** 2 < 100 for x in numbers)"
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 60,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"all(x ** 2 < 150 for x in numbers)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"[all() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/functions.html#all) can be viewed as syntactic sugar replacing a `for`-loop: Internally, [all() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/functions.html#all) implements the *short-circuiting* strategy explained in [Chapter 3 <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_nb.png\">](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/main/03_conditionals/00_content.ipynb#Short-Circuiting), and we mimic that by testing for the *opposite* condition in the `if` statement and stopping the `for`-loop early with the `break` statement. In the worst case, if `threshold` were, for example, `150`, we would loop over *all* elements in the `iterable` argument, which must be *finite* for the code to work. So, [all() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/functions.html#all) is a *linear search* in disguise."
]
},
{
"cell_type": "code",
"execution_count": 61,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"data": {
"text/plain": [
"False"
]
},
"execution_count": 61,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"threshold = 100\n",
"\n",
"for number in numbers:\n",
" if number ** 2 >= threshold: # the opposite of what we are checking for\n",
" all_below_threshold = False\n",
" break\n",
"else:\n",
" all_below_threshold = True\n",
"\n",
"all_below_threshold"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"The documentation of [all() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/functions.html#all) shows in another way what it does with code: By placing a `return` statement in a `for`-loop's body inside a function, iteration is stopped prematurely once an element does *not* meet the condition. That is the familiar *early exit* pattern at work."
]
},
{
"cell_type": "code",
"execution_count": 62,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [],
"source": [
"def all_alt(iterable):\n",
" \"\"\"Alternative implementation of the built-in all() function.\"\"\"\n",
" for element in iterable:\n",
" if not element: # the opposite of what we are checking for\n",
" return False\n",
" return True"
]
},
{
"cell_type": "code",
"execution_count": 63,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"data": {
"text/plain": [
"False"
]
},
"execution_count": 63,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"all_alt(x ** 2 < 100 for x in numbers)"
]
},
{
"cell_type": "code",
"execution_count": 64,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 64,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"all_alt(x ** 2 < 150 for x in numbers)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Similarly, [any() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/functions.html#any) checks if *at least* one element in the `iterable` argument is *truthy*.\n",
"\n",
"To continue the example, let's check if the square of *any* element in `numbers` is above `100` or `150`, respectively."
]
},
{
"cell_type": "code",
"execution_count": 65,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 65,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"any(x ** 2 > 100 for x in numbers)"
]
},
{
"cell_type": "code",
"execution_count": 66,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"data": {
"text/plain": [
"False"
]
},
"execution_count": 66,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"any(x ** 2 > 150 for x in numbers)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"The code cell below shows how [any() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/functions.html#any) works internally: It also follows the *short-circuiting* strategy. Here, we do *not* need to check for the opposite condition."
]
},
{
"cell_type": "code",
"execution_count": 67,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 67,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"threshold = 100\n",
"\n",
"for number in numbers:\n",
" if number ** 2 > threshold:\n",
" any_above_threshold = True\n",
" break\n",
"else:\n",
" any_above_threshold = False\n",
"\n",
"any_above_threshold"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"The alternative formulation in the documentation of [any() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/functions.html#any) is straightforward and also uses the early exit pattern."
]
},
{
"cell_type": "code",
"execution_count": 68,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [],
"source": [
"def any_alt(iterable):\n",
" \"\"\"Alternative implementation of the built-in any() function.\"\"\"\n",
" for element in iterable:\n",
" if element:\n",
" return True\n",
" return False"
]
},
{
"cell_type": "code",
"execution_count": 69,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 69,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"any_alt(x ** 2 > 100 for x in numbers)"
]
},
{
"cell_type": "code",
"execution_count": 70,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"data": {
"text/plain": [
"False"
]
},
"execution_count": 70,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"any_alt(x ** 2 > 150 for x in numbers)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.2"
},
"livereveal": {
"auto_select": "code",
"auto_select_fragment": true,
"scroll": true,
"theme": "serif"
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": false,
"sideBar": true,
"skip_h1_title": true,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {
"height": "calc(100% - 180px)",
"left": "10px",
"top": "150px",
"width": "384px"
},
"toc_section_display": false,
"toc_window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 4
}