"We studied numbers (cf., [Chapter 5](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/05_numbers_00_lecture.ipynb)) and textual data (cf., [Chapter 6](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/06_text_00_lecture.ipynb)) first, mainly because objects of the presented data types are \"simple,\" for two reasons: First, they are *immutable*, and, as we saw in the \"*Who am I? And how many?*\" section in [Chapter 1](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/01_elements_00_lecture.ipynb#Who-am-I?-And-how-many?), mutable objects can quickly become hard to reason about. Second, they are \"flat\" in the sense that they are *not* composed of other objects.\n",
"The `str` type is a bit of a corner case in this regard. While one could argue that a longer `str` object, for example, `\"text\"`, is composed of individual characters, this is *not* the case in memory as the literal `\"text\"` only creates *one* object (i.e., one \"bag\" of $0$s and $1$s modeling all characters).\n",
"This chapter, [Chapter 8](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/08_mappings_00_lecture.ipynb), and [Chapter 9](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/09_arrays_00_lecture.ipynb) introduce various \"complex\" data types. While some are mutable and others are not, they all share that they are primarily used to \"manage,\" or structure, the memory in a program. Unsurprisingly, computer scientists refer to the ideas and theories behind these data types as **[data structures](https://en.wikipedia.org/wiki/Data_structure)**.\n",
"In this chapter, we focus on data types that model all kinds of sequential data. Examples of such data are [spreadsheets](https://en.wikipedia.org/wiki/Spreadsheet) or [matrices](https://en.wikipedia.org/wiki/Matrix_%28mathematics%29)/[vectors](https://en.wikipedia.org/wiki/Vector_%28mathematics_and_physics%29). Such formats share the property that they are composed of smaller units that come in a sequence of, for example, rows/columns/cells or elements/entries."
"[Chapter 6](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/06_text_00_lecture.ipynb#A-\"String\"-of-Characters) already describes the *sequence* properties of `str` objects. Here, we take a step back and study these properties on their own before looking at bigger ideas.\n",
"The [collections.abc](https://docs.python.org/3/library/collections.abc.html) module in the [standard library](https://docs.python.org/3/library/index.html) defines a variety of **abstract base classes** (ABCs). We saw ABCs already in [Chapter 5](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/05_numbers_00_lecture.ipynb#The-Numerical-Tower), where we use the ones from the [numbers](https://docs.python.org/3/library/numbers.html) module in the [standard library](https://docs.python.org/3/library/index.html) to classify Python's numeric data types according to mathematical ideas. Now, we take the ABCs from the [collections.abc](https://docs.python.org/3/library/collections.abc.html) module to classify the data types in this chapter and [Chapter 8](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/08_mappings_00_lecture.ipynb) according to their behavior in various contexts.\n",
"Among others, one commonality between the two is that we may loop over them with the `for` statement. So, in the context of iteration, both exhibit the *same* behavior."
"In [Chapter 4](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/04_iteration_00_lecture.ipynb#Containers-vs.-Iterables), we referred to such types as *iterables*. That is *not* a proper [English](https://dictionary.cambridge.org/spellcheck/english-german/?q=iterable) word, even if it may sound like one at first sight. Yet, it is an official term in the Python world formalized with the `Iterable` ABC in the [collections.abc](https://docs.python.org/3/library/collections.abc.html) module.\n",
"For the data science practitioner, it is worthwhile to know such terms as, for example, the documentation on the [built-ins](https://docs.python.org/3/library/functions.html) uses them extensively: In simple words, any built-in that takes an argument called \"*iterable*\" may be called with *any* object that supports being looped over. Already familiar [built-ins](https://docs.python.org/3/library/functions.html) include [enumerate()](https://docs.python.org/3/library/functions.html#enumerate), [sum()](https://docs.python.org/3/library/functions.html#sum), or [zip()](https://docs.python.org/3/library/functions.html#zip). So, they do *not* require the argument to be of a certain data type (e.g., `list`); instead, any *iterable* type works."
"As in the context of *goose typing* in [Chapter 5](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/05_numbers_00_lecture.ipynb#Goose-Typing), we can use ABCs with the built-in [isinstance()](https://docs.python.org/3/library/functions.html#isinstance) function to check if an object supports a behavior.\n",
"\u001b[0;32m<ipython-input-9-8249bf25b5b7>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0;32mfor\u001b[0m \u001b[0mdigit\u001b[0m \u001b[0;32min\u001b[0m \u001b[0;36m999\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 2\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdigit\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;31mTypeError\u001b[0m: 'int' object is not iterable"
]
}
],
"source": [
"for digit in 999:\n",
" print(digit)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Most of the data types in this and the next chapter exhibit three [orthogonal](https://en.wikipedia.org/wiki/Orthogonality) (i.e., \"independent\") behaviors, formalized by ABCs in the [collections.abc](https://docs.python.org/3/library/collections.abc.html) module as:\n",
"Alternatively, we could also check if `numbers` and `text` are `Container` types with [isinstance()](https://docs.python.org/3/library/functions.html#isinstance)."
"Numeric objects do *not* \"contain\" references to other objects, and that is why they are considered \"flat\" data types. The `in` operator raises a `TypeError`. Conceptually speaking, Python views numeric types as \"wholes\" without any \"parts.\""
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"False"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"isinstance(999, abc.Container)"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"ename": "TypeError",
"evalue": "argument of type 'int' is not iterable",
"Analogously, being `Sized` types, we can pass `numbers` and `text` as the argument to the built-in [len()](https://docs.python.org/3/library/functions.html#len) function and obtain \"meaningful\" results. The exact meaning depends on the data type: For `numbers`, [len()](https://docs.python.org/3/library/functions.html#len) tells us how many elements are in the `list` object; for `text`, it tells us how many [Unicode characters](https://en.wikipedia.org/wiki/Unicode) make up the `str` object. *Abstractly* speaking, both data types exhibit the *same* behavior of *finiteness*."
"On the contrary, even though `999` consists of three digits for humans, numeric objects in Python have no concept of a \"size\" or \"length,\" and the [len()](https://docs.python.org/3/library/functions.html#len) function raises a `TypeError`."
"\u001b[0;32m<ipython-input-21-ae09c1aa5305>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m999\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;31mTypeError\u001b[0m: object of type 'int' has no len()"
]
}
],
"source": [
"len(999)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"These three behaviors are so essential that whenever they coincide for a data type, it is called a **collection**, formalized with the `Collection` ABC. That is where the [collections.abc](https://docs.python.org/3/library/collections.abc.html) module got its name from: It summarizes all ABCs related to collections; in particular, it defines a hierarchy of specialized kinds of collections.\n",
"Without going into too much detail, one way to read the summary table at the beginning of the [collections.abc](https://docs.python.org/3/library/collections.abc.html) module's documention is as follows: The first column, titled \"ABC\", lists all collection-related ABCs in Python. The second column, titled \"Inherits from,\" indicates if the idea behind the ABC is *original* (e.g., the first row with the `Container` ABC has an empty \"Inherits from\" column) or a *combination* (e.g., the row with the `Collection` ABC has `Sized`, `Iterable`, and `Container` in the \"Inherits from\" column). The third and fourth columns list the methods that come with a data type following an ABC. We keep ignoring the methods named in the dunder style for now.\n",
"\n",
"So, let's confirm that both `numbers` and `text` are collections."
"They share one more common behavior: When looping over them, we can *predict* the *order* of the elements or characters. The ABC in the [collections.abc](https://docs.python.org/3/library/collections.abc.html) module corresponding to this behavior is `Reversible`. While sounding unintuitive at first, it is evident that if something is reversible, it must have a forward order, to begin with.\n",
"The [reversed()](https://docs.python.org/3/library/functions.html#reversed) built-in allows us to loop over the elements or characters in reverse order."
"Collections that exhibit this fourth behavior are referred to as **sequences**, formalized with the `Sequence` ABC in the [collections.abc](https://docs.python.org/3/library/collections.abc.html) module."
"Most of the data types introduced in the remainder of this chapter are sequences. Nevertheless, we also look at some data types that are neither collections nor sequences but still useful to model sequential data in practice.\n",
"\n",
"In Python-related documentations, the terms collection and sequence are heavily used, and the data science practitioner should always think of them in terms of the three or four behaviors they exhibit.\n",
"\n",
"Data types that are collections but not sequences are covered in Chapter 8."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## The `list` Type"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"As already seen multiple times, to create a `list` object, we use the *literal notation* and list all elements within brackets `[` and `]`."
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"empty = []"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [],
"source": [
"simple = [40, 50]"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"The elements do *not* need to be of the *same* type, and `list` objects may also be **nested**."
"[PythonTutor](http://www.pythontutor.com/visualize.html#code=empty%20%3D%20%5B%5D%0Asimple%20%3D%20%5B40,%2050%5D%0Anested%20%3D%20%5Bempty,%2010,%2020.0,%20%22Thirty%22,%20simple%5D&cumulative=false&curInstr=0&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false) shows how `nested` holds references to the `empty` and `simple` objects. Technically, it holds three more references to the `10`, `20.0`, and `\"Thirty\"` objects as well. However, to simplify the visualization, these three objects are shown right inside the `nested` object. That may be done because they are immutable and \"flat\" data types. In general, the $0$s and $1$s inside a `list` object in memory always constitute references to other objects only."
"Alternatively, we use the [list()](https://docs.python.org/3/library/functions.html#func-list) built-in to create a `list` object out of an iterable we pass to it as the argument.\n",
"For example, we can wrap the [range()](https://docs.python.org/3/library/functions.html#func-range) built-in with [list()](https://docs.python.org/3/library/functions.html#func-list): As described in [Chapter 4](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/04_iteration_00_lecture.ipynb#Containers-vs.-Iterables), `range` objects, like `range(1, 13)` below, are iterable and generate `int` objects \"on the fly\" (i.e., one by one). The [list()](https://docs.python.org/3/library/functions.html#func-list) around it acts like a `for`-loop and **materializes** twelve `int` objects in memory that then become the elements of the newly created `list` object. [PythonTutor](http://www.pythontutor.com/visualize.html#code=r%20%3D%20range%281,%2013%29%0Al%20%3D%20list%28range%281,%2013%29%29&cumulative=false&curInstr=0&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false) shows this difference visually."
"Beware of passing a `range` object over a \"big\" horizon as the argument to [list()](https://docs.python.org/3/library/functions.html#func-list) as that may lead to a `MemoryError` and the computer crashing."
"\u001b[0;32m<ipython-input-38-dbb397cbb3b9>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mlist\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mrange\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m999_999_999_999\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"As another example, we create a `list` object from a `str` object, which is iterable, as well. Then, the individual characters become the elements of the new `list` object!"
"`list` objects are *sequences*. To reiterate that, we briefly summarize the *four* behaviors of a sequence and provide some more `list`-specific details below:"
"The \"length\" of `nested` is *five* because `empty` and `simple` count as *one* element each. In other words, `nested` holds five references to other objects, two of which are `list` objects."
"With a `for`-loop, we can iterate over all elements in a *predictable* order, forward or backward. As `list` objects hold *references* to other *objects*, these have an *indentity* and may even be of *different* types; however, the latter observation is rarely, if ever, useful in practice."
"The `in` operator checks if a given object is \"contained\" in a `list` object. It uses the `==` operator behind the scenes (i.e., *not* the `is` operator) conducting a **[linear search](https://en.wikipedia.org/wiki/Linear_search)**: So, Python implicitly loops over *all* elements and only stops prematurely if an element evaluates equal to the searched object. A linear search may, therefore, be relatively *slow* for big `list` objects."
"Because of the *predictable* order and the *finiteness*, each element in a sequence can be labeled with a unique *index*, an `int` object in the range $0 \\leq \\text{index} < \\lvert \\text{sequence} \\rvert$.\n",
"Brackets, `[` and `]`, are the literal syntax for accessing individual elements of any sequence type. In this book, we also call them the *indexing operator* in this context."
"\u001b[0;32m<ipython-input-47-cb216f4a067e>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mnested\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m5\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;31mIndexError\u001b[0m: list index out of range"
]
}
],
"source": [
"nested[5]"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Negative indices are used to count in reverse order from the end of a sequence, and brackets may be chained to access nested objects. So, to access the `50` inside `simple` via the `nested` object, we write `nested[-1][1]`."
"Slicing `list` objects works analogously to slicing `str` objects: We use the literal syntax with either one or two colons `:` inside the brackets `[]` to separate the *start*, *stop*, and *step* values. Slicing creates a *new* `list` object with the elements chosen from the original one.\n",
"\n",
"For example, to obtain the three elements in the \"middle\" of `nested`, we slice from `1` (including) to `4` (excluding)."
"The literal notation with the colons `:` is *syntactic sugar*. It saves us from using the [slice()](https://docs.python.org/3/library/functions.html#slice) built-in to create `slice` objects. [slice()](https://docs.python.org/3/library/functions.html#slice) takes *start*, *stop*, and *step* arguments in the same way as the familiar [range()](https://docs.python.org/3/library/functions.html#func-range), and the `slice` objects it creates are used just as *indexes* above.\n",
"\n",
"In most cases, the literal notation is more convenient to use; however, with `slice` objects, we can give names to slices and reuse them across several sequences."
"If not passed to [slice()](https://docs.python.org/3/library/functions.html#slice), these attributes default to `None`. That is why the cell below has no output."
"At first glance, `nested` and `nested_copy` seem to cause no pain. For `list` objects, the comparison operator `==` goes over the elements in both operands in a pairwise fashion and checks if they all evaluate equal (cf., the \"*List Comparison*\" section below for more details).\n",
"However, as [PythonTutor](http://pythontutor.com/visualize.html#code=nested%20%3D%20%5B%5B%5D,%2010,%2020.0,%20%22Thirty%22,%20%5B40,%2050%5D%5D%0Anested_copy%20%3D%20nested%5B%3A%5D&cumulative=false&curInstr=0&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false) reveals, only the *references* to the elements are copied, and not the objects in `nested` themselves! Because of that, `nested_copy` is a so-called **[shallow copy](https://en.wikipedia.org/wiki/Object_copying#Shallow_copy)** of `nested`.\n",
"We could also see this with the [id()](https://docs.python.org/3/library/functions.html#id) function: The respective first elements in both `nested` and `nested_copy` are the *same* object, namely `empty`. So, we have three ways of accessing the *same* address in memory. Also, we say that `nested` and `nested_copy` partially share the *same* state."
"Knowing this becomes critical if the elements in a `list` object are mutable objects (i.e., we can change them *in place*), and this is the case with `nested` and `nested_copy`, as we see in the next section on \"*Mutability*\".\n",
"\n",
"As both the original `nested` object and its copy reference the *same* `list` objects in memory, any changes made to them are visible to both! Because of that, working with shallow copies can easily become confusing.\n",
"Instead of a shallow copy, we could also create a so-called **[deep copy](https://en.wikipedia.org/wiki/Object_copying#Deep_copy)** of `nested`: Then, the copying process recursively follows every reference in a nested data structure and creates copies of *every* object found.\n",
"To explicitly create shallow or deep copies, the [copy](https://docs.python.org/3/library/copy.html) module in the [standard library](https://docs.python.org/3/library/index.html) provides two functions, [copy()](https://docs.python.org/3/library/copy.html#copy.copy) and [deepcopy()](https://docs.python.org/3/library/copy.html#copy.deepcopy). We must always remember that slicing creates *shallow* copies only."
"Now, the first elements of `nested` and `nested_deep_copy` are *different* objects, and [PythonTutor](http://pythontutor.com/visualize.html#code=import%20copy%0Anested%20%3D%20%5B%5B%5D,%2010,%2020.0,%20%22Thirty%22,%20%5B40,%2050%5D%5D%0Anested_deep_copy%20%3D%20copy.deepcopy%28nested%29&cumulative=false&curInstr=0&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false) shows that there are *six* `list` objects in memory."
"As this [StackOverflow question](https://stackoverflow.com/questions/184710/what-is-the-difference-between-a-deep-copy-and-a-shallow-copy) shows, understanding shallow and deep copies is a common source of confusion independent of the programming language."
"In contrast to `str` objects, `list` objects are *mutable*: We may assign new elements to indices or slices and also remove elements. That changes the *references* in a `list` object. In general, if an object is *mutable*, we say that it may be changed *in place*."
"That is precisely the confusion we talked about above when we said that `nested_copy` is a *shallow* copy of `nested`. [PythonTutor](http://pythontutor.com/visualize.html#code=nested%20%3D%20%5B%5B%5D,%2010,%2020.0,%20%22Thirty%22,%20%5B40,%2050%5D%5D%0Anested_copy%20%3D%20nested%5B%3A%5D%0Anested%5B%3A4%5D%20%3D%20%5B100,%20100,%20100%5D%0Anested_copy%5B-1%5D%5B%3A%5D%20%3D%20%5B1,%202,%203%5D&cumulative=false&curInstr=0&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false) shows how both reference the *same* nested `list` object that is changed *in place* from `[40, 50]` into `[1, 2, 3]`.\n",
"Mutability for sequences is formalized by the `MutableSequence` ABC in the [collections.abc](https://docs.python.org/3/library/collections.abc.html) module.\n",
"\n",
"So, we can als \"ask\" Python if `nested` is mutable."
"The `list` type is an essential data structure in any real-world Python application, and many typical `list`-related algorithms from computer science theory are already built into it at the C level (cf., the [documentation](https://docs.python.org/3/library/stdtypes.html#mutable-sequence-types) or the [tutorial](https://docs.python.org/3/tutorial/datastructures.html#more-on-lists) for a full overview; unfortunately, not all methods have direct links). So, understanding and applying the built-in methods of the `list` type not only speeds up the development process but also makes programs significantly faster.\n",
"In contrast to the `str` type's methods in [Chapter 6](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/06_text_00_lecture.ipynb#String-Methods) (e.g., [upper()](https://docs.python.org/3/library/stdtypes.html#str.upper) or [lower()](https://docs.python.org/3/library/stdtypes.html#str.lower)), the `list` type's methods that mutate an object do so *in place*. That means they *never* create *new* `list` objects and return `None` to indicate that. So, we must *never* assign the return value of `list` methods to the variable holding the list!\n",
"With the `extend()` method, we may also append multiple elements provided by an iterable. Here, the iterable is a `list` object itself holding two `str` objects."
"Similar to `append()`, we may add a new element at an arbitrary position with the `insert()` method. `insert()` takes two arguments, an *index* and the element to be inserted."
"`list` objects may be sorted *in place* with the [sort()](https://docs.python.org/3/library/stdtypes.html#list.sort) method. That is different from the built-in [sorted()](https://docs.python.org/3/library/functions.html#sorted) function that takes any *finite* and *iterable* object and returns a *new* `list` object with the iterable's elements sorted!"
"To sort in reverse order, we pass a keyword-only `reverse=True` argument to either the [sort()](https://docs.python.org/3/library/stdtypes.html#list.sort) method or the [sorted()](https://docs.python.org/3/library/functions.html#sorted) function."
"The [sort()](https://docs.python.org/3/library/stdtypes.html#list.sort) method and the [sorted()](https://docs.python.org/3/library/functions.html#sorted) function sort the elements in `names` in alphabetical order, forward or backward. However, that does *not* hold in general.\n",
"We mention above that `list` objects may contain objects of *any* type and even of *mixed* types. Because of that, the sorting is **[delegated](https://en.wikipedia.org/wiki/Delegation_(object-oriented_programming))** to the elements in a `list` object. In a way, Python \"asks\" the elements in a `list` object to sort themselves. As `names` contains only `str` objects, they are sorted according the the comparison rules explained in [Chapter 6](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/06_text_00_lecture.ipynb#String-Comparison).\n",
"To customize the sorting, we pass a keyword-only `key` argument to [sort()](https://docs.python.org/3/library/stdtypes.html#list.sort) or [sorted()](https://docs.python.org/3/library/functions.html#sorted), which must be a `function` object accepting *one* positional argument. Then, the elements in the `list` object are passed to that one by one, and the return values are used as the **sort keys**. The `key` argument is also a popular use case for `lambda` expressions.\n",
"For example, to sort `names` not by alphabet but by the names' lengths, we pass in a reference to the built-in [len()](https://docs.python.org/3/library/functions.html#len) function as `key=len`. Note that there are *no* parentheses after `len`!"
"If two names have the same length, their relative order is kept as is. That is why `\"Karl\"` comes before `\"Carl\" ` below. A [sorting algorithm](https://en.wikipedia.org/wiki/Sorting_algorithm) with that property is called **[stable](https://en.wikipedia.org/wiki/Sorting_algorithm#Stability)**.\n",
"Sorting is an important topic in programming, and we refer to the official [HOWTO](https://docs.python.org/3/howto/sorting.html) for a more comprehensive introduction."
"`sort(reverse=True)` is different from the `reverse()` method. Whereas the former applies some sorting rule in reverse order, the latter simply reverses the elements in a `list` object."
"The `pop()` method removes the *last* element from a `list` object *and* returns it. Below we **capture** the `removed` element to show that the return value is not `None` as with all the methods introduced so far."
"Instead of removing an element by its index, we can also remove it by its value with the `remove()` method. Behind the scenes, Python then compares the object to be removed, `\"Peter\"` in the example, sequentially to each element with the `==` operator and removes the *first* one that evaluates equal to it. `remove()` does *not* return the removed element."
"\u001b[0;32m<ipython-input-114-6bb75d50cf06>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mnames\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mremove\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"Peter\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"`list` objects implement an `index()` method that returns the position of the first element that compares equal to its argument. It fails *loudly* with a `ValueError` if no element compares equal."
"\u001b[0;32m<ipython-input-117-8c741a6680be>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mnames\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mindex\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"Karl\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;31mValueError\u001b[0m: 'Karl' is not in list"
"`copy()` creates a *shallow* copy. So, `names.copy()` below does the same as taking a full slice with `names[:]`, and the caveats from above apply, too."
"Many methods introduced in this section are mentioned in the [collections.abc](https://docs.python.org/3/library/collections.abc.html) module's documentation as well: While the `index()` and `count()` methods come with any data type that is a `Sequence`, the `append()`, `extend()`, `insert()`, `reverse()`, `pop()`, and `remove()` methods are part of any `MutableSequence` type. The `sort()`, `copy()`, and `clear()` methods are `list`-specific.\n",
"\n",
"So, being a sequence does not only imply the four *behaviors* specified above, but also means that a data type comes with certain standardized methods."
"As with `str` objects, the `+` and `*` operators are overloaded for concatenation and always return a *new* `list` object. The references in this newly created `list` object reference the *same* objects as the two original `list` objects. So, the same caveat as with *shallow* copies from above applies!"
"Besides being an operator, the `*` symbol has a second syntactical use, as explained in [PEP 3132](https://www.python.org/dev/peps/pep-3132/) and [PEP 448](https://www.python.org/dev/peps/pep-0448/): It implements what is called **iterable unpacking**. It is *not* an operator syntactically but a notation that Python reads as a literal.\n",
"\n",
"In the example, Python interprets the expression as if the elements of the iterable `names` were placed between `\"Achim\"` and `\"Xavier\"` one by one. So, we do not obtain a nested but a *flat* list."
"The relational operators also work with `list` objects; yet another example of operator overloading.\n",
"\n",
"Comparison is made in a pairwise fashion until the first pair of elements does not evaluate equal or one of the `list` objects ends. The exact comparison rules depend on the elements and not the `list` objects. As with [sort()](https://docs.python.org/3/library/stdtypes.html#list.sort) or [sorted()](https://docs.python.org/3/library/functions.html#sorted) above, comparison is *delegated* to the objects to be compared, and Python \"asks\" the elements in the two `list` objects to compare themselves. Usually, all elements are of the *same* type, and comparison is straightforward."
"If two `list` objects have a different number of elements and all overlapping elements compare equal, the shorter `list` object is considered \"smaller.\" That rule is a common cause for *semantic* errors in a program."
"As `list` objects are mutable, the caller of a function can see the changes made to a `list` object passed to the function as an argument. That is often a surprising *side effect* and should be avoided.\n",
"\n",
"As an example, consider the `add_xyz()` function."
"While this function is being executed, two variables, namely `letters` in the global scope and `arg` inside the function's local scope, reference the *same* `list` object in memory. Furthermore, the passed in `arg` is also the return value.\n",
"\n",
"So, after the function call, `letters_with_xyz` and `letters` are **aliases** as well, referencing the *same* object."
"A better practice is to first create a copy of `arg` within the function that is then modified and returned. If we are sure that `arg` contains immutable elements only, we get away with a shallow copy. The downside of this approach is the higher amount of memory necessary.\n",
"\n",
"The revised `add_xyz()` function below is more natural to reason about as it does *not* modify the passed in `arg` internally. This approach is following the **[functional programming](https://en.wikipedia.org/wiki/Functional_programming)** paradigm that is going through a \"renaissance\" currently. Two essential characteristics of functional programming are that a function *never* changes its inputs and *always* returns the same output given the same inputs.\n",
"For a beginner, it is probably better to stick to this idea and not change any arguments as the original `add_xyz()` above. However, functions that modify and return the argument passed in are an important aspect of object-oriented programming, as explained in Chapter 10."
"If we want to modify the argument passed in, it is best to return `None` and not `arg`, as does the final version of `add_xyz()` below. Then, the user of our function cannot accidentally create two aliases to the same object. That is also why the list methods above all return `None`."
"Functions that only work on the argument passed in are called **modifiers**. Their primary purpose is to change the **state** of the argument. On the contrary, functions that have *no* side effects on the arguments are said to be **pure**."
"... we must use a *trailing comma* to create a `tuple` object holding one element. If we forget the comma, the parentheses are interpreted as the grouping operator and effectively useless!"
"Alternatively, we may use the [tuple()](https://docs.python.org/3/library/functions.html#func-tuple) built-in that takes any iterable as its argument and creates a new `tuple` from its elements."
"Most operations involving `tuple` objects work in the same way as with `list` objects. The main difference is that `tuple` objects are *immutable*. So, if our program does not depend on mutability, we may and should use `tuple` and not `list` objects to model sequential data. That way, we avoid the pitfalls seen above.\n",
"\n",
"`tuple` objects are *sequences* exhibiting the familiar *four* behaviors."
"We can verify the immutability with the `MutableSequence` ABC from the [collections.abc](https://docs.python.org/3/library/collections.abc.html) module: [isinstance()](https://docs.python.org/3/library/functions.html#isinstance) returns `False`. So, a data type that is a `Sequence` may be mutable or not. If it is a `MutableSequence`, it is mutable. If it is *not* a `MutableSequence`, it is *immutable*. There is *no* `ImmutableSequence` ABC."
"The `+` and `*` operators work with `tuple` objects as well. However, we should *not* do that as the whole point of immutability is to *not* mutate an object.\n",
"\n",
"So, instead of writing something like below, we should use a `list` object and call its `append()` method."
"Being immutable, `tuple` objects only provide the `count()` and `index()` methods of `Sequence` types. The `append()`, `extend()`, `insert()`, `reverse()`, `pop()`, and `remove()` methods of `MutableSequence` types are *not* available. The same holds for the `list`-specific methods `sort()`, `copy()`, and `clear()`."
"While `tuple` objects are immutable, this only relates to the references they hold. If a `tuple` object contains mutable objects, the entire nested structure is *not* immutable as a whole.\n",
"\n",
"Consider the following stylized example `not_immutable`: It contains *three* elements, `1`, `[2, ..., 11]`, and `12`, and the elements of the nested `list` object may be changed. While it is not practical to mix data types in a `tuple` object that is used as an \"immutable list,\" we want to make the point that the mere usage of the `tuple` type does *not* guarantee a nested object to be immutable as a whole."
"In the \"*List Operations*\" section above, the `*` symbol **unpacks** the elements of a `list` object into another one. This idea of *iterable unpacking* is built into Python at various places, even *without* the `*` symbol.\n",
"For example, we may write variables on the left-hand side of a `=` statement in a `tuple` style. Then, any *finite* iterable on the right-hand side is unpacked. So, `numbers` is unpacked into *twelve* variables below."
"Having to type twelve variables on the left is already tedious. Furthermore, if the iterable on the right yields a number of elements *different* from the number of variables, we get a `ValueError`."
"So, to make iterable unpacking useful, we prepend the `*` symbol to *one* of the variables on the left: That variable then becomes a `list` object holding the elements not captured by the other variables. We say that the excess elements from the iterable are **packed** into this variable.\n",
"\n",
"For example, let's get the `first` and `last` element of `numbers` and collect the rest in `middle`."
]
},
{
"cell_type": "code",
"execution_count": 202,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"first, *middle, last = numbers"
]
},
{
"cell_type": "code",
"execution_count": 203,
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/plain": [
"7"
]
},
"execution_count": 203,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"first"
]
},
{
"cell_type": "code",
"execution_count": 204,
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/plain": [
"[11, 8, 5, 3, 12, 2, 6, 9, 10, 1]"
]
},
"execution_count": 204,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"middle"
]
},
{
"cell_type": "code",
"execution_count": 205,
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/plain": [
"4"
]
},
"execution_count": 205,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"last"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"If we do not need the `middle` elements, we go with the underscore `_` convention and \"throw\" them away."
]
},
{
"cell_type": "code",
"execution_count": 206,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [],
"source": [
"first, *_, last = numbers"
]
},
{
"cell_type": "code",
"execution_count": 207,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"data": {
"text/plain": [
"7"
]
},
"execution_count": 207,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"first"
]
},
{
"cell_type": "code",
"execution_count": 208,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"data": {
"text/plain": [
"4"
]
},
"execution_count": 208,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"last"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"We already used unpacking before this section without knowing it. Whenever we write a `for`-loop over the [zip()](https://docs.python.org/3/library/functions.html#zip) built-in, that generates a new `tuple` object in each iteration, which we unpack by listing several loop variables.\n",
"\n",
"So, the `name, position` acts like a left-hand side of an `=` statement and unpacks the `tuple` objects generated from \"zipping\" the `names` list and the `positions` tuple together."
"Without unpacking, [zip()](https://docs.python.org/3/library/functions.html#zip) generates a series of `tuple` objects."
]
},
{
"cell_type": "code",
"execution_count": 211,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<class 'tuple'> ('Berthold', 'goalkeeper')\n",
"<class 'tuple'> ('Oliver', 'defender')\n",
"<class 'tuple'> ('Carl', 'midfielder')\n"
]
}
],
"source": [
"for pair in zip(names, positions):\n",
" print(type(pair), pair, sep=\" \")"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Unpacking also works for nested objects. Below, we wrap [zip()](https://docs.python.org/3/library/functions.html#zip) with the [enumerate()](https://docs.python.org/3/library/functions.html#enumerate) built-in to have an index variable `i` inside the `for`-loop. In each iteration, a `tuple` object consisting of `i` and another `tuple` object is created. The inner one then holds the `name` and `position`."
]
},
{
"cell_type": "code",
"execution_count": 212,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0 -> Berthold is a goalkeeper\n",
"1 -> Oliver is a defender\n",
"2 -> Carl is a midfielder\n"
]
}
],
"source": [
"for i, (name, position) in enumerate(zip(names, positions)):\n",
" print(i, \"->\", name, \"is a\", position)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"#### Swapping Variables"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"A popular use case of unpacking is **swapping** two variables.\n",
"\n",
"Consider `a` and `b` below."
]
},
{
"cell_type": "code",
"execution_count": 213,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"a = 0\n",
"b = 1"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Without unpacking, we must use a temporary variable `temp` to swap `a` and `b`."
]
},
{
"cell_type": "code",
"execution_count": 214,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"temp = a\n",
"a = b\n",
"b = temp"
]
},
{
"cell_type": "code",
"execution_count": 215,
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/plain": [
"(1, 0)"
]
},
"execution_count": 215,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a, b"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"With unpacking, the solution is more elegant, and also a bit faster as well. *All* expressions on the right-hand side are evaluated *before* any assignment takes place."
"Unpacking allows us to rewrite the iterative `fibonacci()` function from [Chapter 4](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/04_iteration_00_lecture.ipynb#\"Hard-at-first-Glance\"-Example:-Fibonacci-Numbers-%28revisited%29) in a concise way, now also supporting *goose typing* with the [numbers](https://docs.python.org/3/library/numbers.html) module from the [standard library](https://docs.python.org/3/library/index.html)."
"\u001b[0;32m<ipython-input-223-c98882eeae5f>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mfibonacci\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m12.3\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;32m<ipython-input-220-b3cde1c6219f>\u001b[0m in \u001b[0;36mfibonacci\u001b[0;34m(i)\u001b[0m\n\u001b[1;32m 15\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0misinstance\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mi\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mnumbers\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mReal\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 16\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mi\u001b[0m \u001b[0;34m!=\u001b[0m \u001b[0mint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mi\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 17\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mTypeError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"i is not an integer-like value; it has decimals\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 18\u001b[0m \u001b[0mi\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mi\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 19\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;31mTypeError\u001b[0m: i is not an integer-like value; it has decimals"
]
}
],
"source": [
"fibonacci(12.3)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"#### Function Definitions & Calls"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"The concepts of packing and unpacking are also helpful when writing and using functions.\n",
"\n",
"For example, let's look at the `product()` function below. Its implementation suggests that `args` must be a sequence type. Otherwise, it would not make sense to index into it with `[0]` or take a slice with `[1:]`. In line with the function's name, the `for`-loop multiplies all elements of the `args` sequence. So, what does the `*` do in the header line, and what is the exact data type of `args`?\n",
"\n",
"The `*` is again *not* an operator in this context but a special syntax that makes Python *pack* all *positional* arguments passed to `product()` into a single `tuple` object called `args`."
]
},
{
"cell_type": "code",
"execution_count": 224,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"def product(*args):\n",
" \"\"\"Multiply all arguments.\"\"\"\n",
" result = args[0]\n",
"\n",
" for arg in args[1:]:\n",
" result *= arg\n",
"\n",
" return result"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"So, we can pass an *arbitrary* (i.e., also none) number of *positional* arguments to `product()`.\n",
"\n",
"The product of just one number is the number itself."
]
},
{
"cell_type": "code",
"execution_count": 225,
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/plain": [
"42"
]
},
"execution_count": 225,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"product(42)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Passing in several numbers works as expected."
]
},
{
"cell_type": "code",
"execution_count": 226,
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/plain": [
"100"
]
},
"execution_count": 226,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"product(2, 5, 10)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"However, this implementation of `product()` needs *at least* one argument passed in due to the expression `args[0]` used internally. Otherwise, we see a *runtime* error, namely an `IndexError`. We emphasize that this error is *not* caused in the header line."
"\u001b[0;32m<ipython-input-227-640e0c632b8d>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mproduct\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;31mIndexError\u001b[0m: tuple index out of range"
]
}
],
"source": [
"product()"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Another downside of this implementation is that we can easily generate *semantic* errors: For example, if we pass in an iterable object like the `one_hundred` list, *no* exception is raised. However, the return value is also not a numeric object as we expect. The reason for this is that during the function call, `args` becomes a `tuple` object holding *one* element, which is `one_hundred`, a `list` object. So, we created a nested structure by accident."
]
},
{
"cell_type": "code",
"execution_count": 228,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"one_hundred = [2, 5, 10]"
]
},
{
"cell_type": "code",
"execution_count": 229,
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/plain": [
"[2, 5, 10]"
]
},
"execution_count": 229,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"product(one_hundred)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"This error does not occur if we unpack `one_hundred` upon passing it as the argument."
]
},
{
"cell_type": "code",
"execution_count": 230,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"100"
]
},
"execution_count": 230,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"product(*one_hundred)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"That is the equivalent of writing out the following tedious expression. Yet, that does *not* scale for iterables with many elements in them."
"In the \"*Packing & Unpacking with Functions*\" [exercise](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/07_sequences_02_exercises.ipynb#Packing-&-Unpacking-with-Functions) at the end of this chapter, we look at `product()` in more detail.\n",
"While we needed to unpack `one_hundred` above to avoid the semantic error, unpacking an argument in a function call may also be a convenience in general.\n",
"\n",
"For example, to print the elements of `one_hundred` in one line, we need to use a `for` statement, until now. With unpacking, we get away *without* a loop."
]
},
{
"cell_type": "code",
"execution_count": 232,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[2, 5, 10]\n"
]
}
],
"source": [
"print(one_hundred) # prints the tuple; we do not want that"
]
},
{
"cell_type": "code",
"execution_count": 233,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"2 5 10 "
]
}
],
"source": [
"for number in one_hundred:\n",
" print(number, end=\" \")"
]
},
{
"cell_type": "code",
"execution_count": 234,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"2 5 10\n"
]
}
],
"source": [
"print(*one_hundred) # replaces the for-loop"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### The `namedtuple` Type"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Above, we proposed the idea that `tuple` objects are like \"immutable lists.\" Often, however, we use `tuple` objects to represent a **record** of related **fields**. Then, each element has a *semantic* meaning (i.e., a descriptive name).\n",
"\n",
"As an example, think of a spreadsheet with information on students in a course. Each row represents a record and holds all the data associated with an individual student. The columns (e.g., matriculation number, first name, last name) are the fields that may come as *different* data types (e.g., `int` for the matriculation number, `str` for the names).\n",
"\n",
"A simple way of modeling a single student is as a `tuple` object, for example, `(123456, \"John\", \"Doe\")`. A disadvantage of this approach is that we must remember the order and meaning of the elements/fields in the `tuple` object.\n",
"\n",
"An example from a different domain is the representation of $(x, y)$-points in the $x$-$y$-plane. Again, we could use a `tuple` object like `current_position` below to model the point $(4, 2)$."
]
},
{
"cell_type": "code",
"execution_count": 235,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"current_position = (4, 2)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"We implicitly assume that the first element represents the $x$ and the second the $y$ coordinate. While that follows intuitively from convention in math, we should at least add comments somewhere in the code to document this assumption.\n",
"A better way is to create a *custom* data type. While that is covered in depth in Chapter 10, the [collections](https://docs.python.org/3/library/collections.html) module in the [standard library](https://docs.python.org/3/library/index.html) provides a [namedtuple()](https://docs.python.org/3/library/collections.html#collections.namedtuple) **factory function** that creates \"simple\" custom data types on top of the standard `tuple` type."
"[namedtuple()](https://docs.python.org/3/library/collections.html#collections.namedtuple) takes two arguments. The first argument is the name of the data type. That could be different from the variable `Point` we use to refer to the new type, but in most cases it is best to keep them in sync. The second argument is a sequence with the field names as `str` objects. The names' order corresponds to the one assumed in `current_position`."
]
},
{
"cell_type": "code",
"execution_count": 237,
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [],
"source": [
"Point = namedtuple(\"Point\", [\"x\", \"y\"])"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"The `Point` object is a so-called **class**. That is what it means if an object is of type `type`. It can be used as a **factory** to create *new* `tuple`-like objects of type `Point`."
"\u001b[0;32m<ipython-input-247-b30d0b930bac>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mcurrent_position\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mz\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;31mAttributeError\u001b[0m: 'Point' object has no attribute 'z'"
]
}
],
"source": [
"current_position.z"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"`current_position` continues to work like a `tuple` object! That is why we can use `namedtuple` as a replacement for `tuple`. The underlying implementations exhibit the *same* computational efficiencies and memory usages.\n",
"\n",
"For example, we can index into or loop over `current_position` as it is still a sequence."
"Whenever we process sequential data, most tasks can be classified into one of the three categories **map**, **filter**, or **reduce**. This paradigm has caught attention in recent years as it enables **[parallel computing](https://en.wikipedia.org/wiki/Parallel_computing)**, and this gets important when dealing with big amounts of data.\n",
"For example, let's square each element in `numbers` and add `1` to it. In essence, we apply the transformation $y := x^2 + 1$ expressed as the `transform()` function below."
"With the syntax we know so far, we revert to a `for`-loop that iteratively appends the transformed elements to the initially empty `transformed_numbers` list."
"As this kind of data processing is so common, Python provides the [map()](https://docs.python.org/3/library/functions.html#map) built-in. In its simplest usage form, it takes two arguments: A transformation function that takes one positional argument and an iterable.\n",
"\n",
"We call [map()](https://docs.python.org/3/library/functions.html#map) with the `transform()` function and the `numbers` list as the arguments and store the result in the variable `transformer` to inspect it."
"We might expect to get back a materialized sequence (i.e., all elements exist in memory), and a `list` object would feel the most natural because of the type of the `numbers` argument. However, `transformer` is an object of type `map`."
"Like `range` objects, `map` objects generate a series of objects \"on the fly\" (i.e., one by one), and we use the built-in [next()](https://docs.python.org/3/library/functions.html#next) function to obtain the next object in line. So, we should think of a `map` object as a \"rule\" stored in memory that only knows how to calculate the next object of possibly *infinitely* many.\n",
"\n",
"It is essential to understand that by creating a `map` object with the [map()](https://docs.python.org/3/library/functions.html#map) built-in, nothing happens in memory except the creation of the `map` object. In particular, no second `list` object derived from `numbers` is created. Also, we may view `range` objects as a special case of `map` objects: They are constrained to generating `int` objects only, and the *iterable* argument is replaced with *start*, *stop*, and *step* arguments."
"If we are sure that a `map` object generates a *finite* number of elements, we may materialize them into a `list` object with the [list()](https://docs.python.org/3/library/functions.html#func-list) built-in. In the example, this is the case as `transformer` is derived from a *finite* `list` object.\n",
"\n",
"In summary, instead of creating an empty list first and appending it in a `for`-loop as above, we write the following one-liner and obtain an equal `transformed_numbers` list."
"**Filtering** refers to the idea of creating a subset of a sequence with a **boolean filter** function that indicates if an element should be kept or not.\n",
"As filtering is also a common task, we use the [filter()](https://docs.python.org/3/library/functions.html#filter) built-in that returns an object of type `filter` stored in the `evens` variable."
"`evens` works like `transformer` above, and we use the built-in [next()](https://docs.python.org/3/library/functions.html#next) function to obtain the even numbers one by one. So, the \"next\" element in line is simply the next even `int` object the `filter` object encounters."
"As above, we must explicitly create a materialized `list` object with the [list()](https://docs.python.org/3/library/functions.html#func-list) built-in."
"Using the [map()](https://docs.python.org/3/library/functions.html#map) and [filter()](https://docs.python.org/3/library/functions.html#filter) built-ins, we can quickly switch the order: Filter first and then transform the remaining elements. This variant equals the \"*A simple Filter*\" example from [Chapter 4](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/04_iteration_00_lecture.ipynb#Example:-A-simple-Filter). On the contrary, code with `for`-loops and `if` statements is more tedious to adapt. Additionally, `map` and `filter` objects are optimized at the C level and, therefore, a lot faster as well."
"Other straightforward examples are the built-in [min()](https://docs.python.org/3/library/functions.html#min) or [max()](https://docs.python.org/3/library/functions.html#max) functions."
"[sum()](https://docs.python.org/3/library/functions.html#sum), [min()](https://docs.python.org/3/library/functions.html#min), and [max()](https://docs.python.org/3/library/functions.html#max) can be regarded as special cases.\n",
"\n",
"The generic way of reducing a sequence is to apply a function of *two* arguments on a rolling horizon: Its first argument is the reduction of the elements processed so far, and the second the next element to be reduced.\n",
"\n",
"For illustration, let's replicate [sum()](https://docs.python.org/3/library/functions.html#sum) as such a function, called `add()`. Its implementation only adds two numbers."
"... and loop over all *but* the first element it generates. That we capture separately as the initial `result` with the [next()](https://docs.python.org/3/library/functions.html#next) function. So, `map` objects must be *iterable* as we may loop over them.\n",
"\n",
"We know that `evens_transformed` generates *six* elements. That is why we see *five* growing `result` values resembling a [cumulative sum](http://mathworld.wolfram.com/CumulativeSum.html)."
"The [reduce()](https://docs.python.org/3/library/functools.html#functools.reduce) function in the [functools](https://docs.python.org/3/library/functools.html) module in the [standard library](https://docs.python.org/3/library/index.html) provides more convenience replacing the `for`-loop. It takes two arguments in the same way as the [map()](https://docs.python.org/3/library/functions.html#map) and [filter()](https://docs.python.org/3/library/functions.html#filter) built-ins.\n",
"\n",
"[reduce()](https://docs.python.org/3/library/functools.html#functools.reduce) is **[eager](https://en.wikipedia.org/wiki/Eager_evaluation)** meaning that all computations implied by the contained `map` and `filter` \"rules\" are executed right away, and the code cell returns `370`. On the contrary, [map()](https://docs.python.org/3/library/functions.html#map) and [filter()](https://docs.python.org/3/library/functions.html#filter) create **[lazy](https://en.wikipedia.org/wiki/Lazy_evaluation)** `map` and `filter` objects, and we have to use the [next()](https://docs.python.org/3/library/functions.html#next) function to obtain the elements."
"[map()](https://docs.python.org/3/library/functions.html#map), [filter()](https://docs.python.org/3/library/functions.html#filter), and [reduce()](https://docs.python.org/3/library/functools.html#functools.reduce) take a `function` object as their first argument, and we defined `transform()`, `is_even()`, and `add()` to be used precisely for that.\n",
"\n",
"Often, such functions are used *only once* in a program. However, the primary purpose of functions is to *reuse* them. In such cases, it makes more sense to define them \"anonymously\" right at the position where the first argument goes.\n",
"As mentioned in [Chapter 2](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/02_functions_00_lecture.ipynb#Anonymous-Functions), Python provides `lambda` expressions to create `function` objects *without* a name referencing them.\n",
"With the new concepts in this section, we can rewrite the entire example in just a few lines of code *without* any `for`, `if`, and `def` statements. The resulting code is concise, easy to read, quick to modify, and even faster in execution. Most importantly, it is optimized to handle big amounts of data as *no* temporary `list` objects are materialized in memory."
"If `numbers` comes as a sorted sequence of whole numbers, we may use the [range()](https://docs.python.org/3/library/functions.html#func-range) built-in and get away *without any* `list` object in memory at all!"
"PythonTutor visualizes the differences in the number of computational steps and memory usage:\n",
"- [Version 1](http://pythontutor.com/visualize.html#code=def%20is_even%28element%29%3A%0A%20%20%20%20if%20element%20%25%202%20%3D%3D%200%3A%0A%20%20%20%20%20%20%20%20return%20True%0A%20%20%20%20return%20False%0A%0Adef%20transform%28element%29%3A%0A%20%20%20%20return%20%28element%20**%202%29%20%2B%201%0A%0Anumbers%20%3D%20list%28range%281,%2013%29%29%0A%0Aevens%20%3D%20%5B%5D%0Afor%20number%20in%20numbers%3A%0A%20%20%20%20if%20is_even%28number%29%3A%0A%20%20%20%20%20%20%20%20evens.append%28number%29%0A%0Atransformed%20%3D%20%5B%5D%0Afor%20number%20in%20evens%3A%0A%20%20%20%20transformed.append%28transform%28number%29%29%0A%0Aresult%20%3D%20sum%28transformed%29&cumulative=false&curInstr=0&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false): With `for`-loops, `if` statements, and named functions -> **116** steps\n",
"- [Version 2](http://pythontutor.com/visualize.html#code=numbers%20%3D%20range%281,%2013%29%0Aevens%20%3D%20filter%28lambda%20x%3A%20x%20%25%202%20%3D%3D%200,%20numbers%29%0Atransformed%20%3D%20map%28lambda%20x%3A%20%28x%20**%202%29%20%2B%201,%20evens%29%0Aresult%20%3D%20sum%28transformed%29&cumulative=false&curInstr=0&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false): With named `map` and `filter` objects -> **58** steps\n",
"- [Version 3](http://pythontutor.com/visualize.html#code=result%20%3D%20sum%28map%28lambda%20x%3A%20%28x%20**%202%29%20%2B%201,%20filter%28lambda%20x%3A%20x%20%25%202%20%3D%3D%200,%20range%281,%2013%29%29%29%29&cumulative=false&curInstr=0&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false): Everything in *one* expression -> **55** steps\n",
"\n",
"Versions 2 and 3 are the same, except for the three additional steps required to create the temporary variables. The *major* downside of Version 1 is that, in the worst case, it may need *three times* the memory as compared to the other two versions!\n",
"\n",
"An experienced Pythonista would probably go with Version 2 in a production system to keep the code readable and maintainable."
"For [map()](https://docs.python.org/3/library/functions.html#map) and [filter()](https://docs.python.org/3/library/functions.html#filter), Python provides a nice syntax appealing to people who like mathematics.\n",
"Consider again the \"*A simple Filter*\" example from [Chapter 4](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/04_iteration_00_lecture.ipynb#Example:-A-simple-Filter), written with combined `for` and `if` statements. So, the mapping and filtering steps happen simultaneously."
"**List comprehensions**, or **listcomps** for short, are *expressions* to derive *new* `list` objects out of *existing* ones (cf., [reference](https://docs.python.org/3/reference/expressions.html#displays-for-lists-sets-and-dictionaries)). A single *expression* like below can replace the compound `for` *statement* from above."
"A list comprehension may be used in place of any `list` object.\n",
"\n",
"For example, let's add up all the elements with [sum()](https://docs.python.org/3/library/functions.html#sum). The code below *materializes* all elements in memory *before* summing them up. So, this code might cause a `MemoryError` when executed with a bigger `numbers` list. [PythonTutor](http://pythontutor.com/visualize.html#code=numbers%20%3D%20range%281,%2013%29%0Aresult%20%3D%20sum%28%5B%28n%20**%202%29%20%2B%201%20for%20n%20in%20numbers%20if%20n%20%25%202%20%3D%3D%200%5D%29&cumulative=false&curInstr=0&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false) shows how a `list` object exists in memory at step 17 and then \"gets lost\" right after. "
"List comprehensions may come with several `for`'s and `if`'s.\n",
"\n",
"The cell below creates a `list` object that contains other `list` objects with numbers in them. The starting number in each inner `list` object is offset by `1`."
"That translates into a list comprehension like below. The order of the `for`'s may be confusing at first but is the *same* as writing out the nested `for`-loops."
"As an example, we add up the flattened numbers with [sum()](https://docs.python.org/3/library/functions.html#sum). The same caveat holds as before: The `list` object passed into [sum()](https://docs.python.org/3/library/functions.html#sum) is *materialized* before the sum is calculated!"
"In this particular example, however, we can exploit the fact that any sum of numbers can be expressed as the sum of sums of mutually exclusive and collectively exhaustive subsets of these numbers and get away with just *one* `for` in the list comprehension."
"A popular use case of nested list comprehensions is applying a transformation to each $2$-tuple of the [Cartesian product](https://en.wikipedia.org/wiki/Cartesian_product) of two iterables.\n",
"For example, let's add `1` to each quotient obtained by taking the numerator from `[10, 20, 30]` and the denominator from `[40, 50, 60]`, and then find the product of all quotients. The table below visualizes the calculations: The result is the product of *nine* entries.\n",
"For a Cartesian product, we loop over *all* possible $2$-tuples where one element is drawn from `first` and the other from `second`. That is equivalent to two nested `for`-loops."
"The order of the `for`'s is *important*: The list comprehension above divides numbers from `first` by numbers from `second`, whereas the list comprehension below does the opposite."
"To find the overall product, we *unpack* the first list comprehension right into the `product()` function from the \"*Packing & Unpacking*\" section above."
"product(*[(x / y) + 1 for x in first for y in second])"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Alternatively, we use a `lambda` expression with the [reduce()](https://docs.python.org/3/library/functools.html#functools.reduce) function from the [functools](https://docs.python.org/3/library/functools.html) module."
"reduce(lambda x, y: x * y, [(x / y) + 1 for x in first for y in second])"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"While this example is stylized, Cartesian products are hidden in many applications, and it shows how the various language features introduced in this chapter can be seamlessly combined to process sequential data."
"Pythonistas would forgo materialized `list` objects, and, thus, also list comprehensions, all together, and use a more memory-efficient approach with **[generator expressions](https://docs.python.org/3/reference/expressions.html#generator-expressions)**, or **genexps** for short. Syntactically, they work like list comprehensions except that parentheses replace the brackets.\n",
"We can think of it as yet another \"rule\" in memory that knows how to generate the individual objects in a series one by one. Whereas a list comprehension materializes its elements in memory *when* it is evaluated, the opposite holds for generator expressions, and *no* object is created in memory except the \"rule\" itself. Because of this behavior, we describe generator expressions as *lazy* and list comprehensions as *eager*.\n",
"\n",
"So, to materialize all elements specified by a generator expression, we revert to the [list()](https://docs.python.org/3/library/functions.html#func-list) built-in."
"A common use case is to reduce the elements into a single object instead, for example, by adding them up with [sum()](https://docs.python.org/3/library/functions.html#sum). [PythonTutor](http://pythontutor.com/visualize.html#code=numbers%20%3D%20range%281,%2013%29%0Asum_with_list%20%3D%20sum%28%5B%28n%20**%202%29%20%2B%201%20for%20n%20in%20numbers%20if%20n%20%25%202%20%3D%3D%200%5D%29%0Asum_with_gen%20%3D%20sum%28%28n%20**%202%29%20%2B%201%20for%20n%20in%20numbers%20if%20n%20%25%202%20%3D%3D%200%29&cumulative=false&curInstr=0&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false) shows how the code cell below does *not* create a temporary `list` object in memory, whereas a list comprehension would (cf., step 17)."
"Once a `generator` object runs out of elements, it raises a `StopIteration` exception. We say that the `generator` object is **exhausted**, and to loop over its elements again, we must create a *new* one."
"\u001b[0;32m<ipython-input-325-6e72e47198db>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mnext\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mgen\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;32m<ipython-input-326-6e72e47198db>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mnext\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mgen\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"Calling the [next()](https://docs.python.org/3/library/functions.html#next) function repeatedly with the *same* `generator` object as the argument is essentially what a `for`-loop automates for us. So, `generator` objects are *iterable*."
"If we are only interested in a *reduction* of `nested_numbers` into a single statistic, as the overall sum in the \"*Nested Lists*\" example, we should replace lists or list comprehensions with generator expressions wherever possible.\n",
"The result is the *same*, but no intermediate lists are materialized! That makes our code scale to larger amounts of data and uses the available hardware more efficiently.\n",
"We leave out the brackets and keep everything else as-is: The argument to [sum()](https://docs.python.org/3/library/functions.html#sum), a list comprehension in the initial implementation above, becomes a generator expression."
"Because `nested_numbers` has an internal structure, we can make it **memoryless** by expressing it as a generator expression derived from `range` objects. [PythonTutor](http://pythontutor.com/visualize.html#code=nested_numbers%20%3D%20%28%28range%28x,%20y%20%2B%201%29%29%20for%20x,%20y%20in%20zip%28range%281,%204%29,%20range%287,%2010%29%29%29%0Aresult%20%3D%20sum%28number%20for%20inner_numbers%20in%20nested_numbers%20for%20number%20in%20inner_numbers%29&cumulative=false&curInstr=0&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false) confirms that no `list` object materializes at any point in time."
"We must be careful when assigning a `generator` object to a variable: If we use `nested_numbers` again, for example, in the alternative formulation below, [sum()](https://docs.python.org/3/library/functions.html#sum) returns `0` because `nested_numbers` is exhausted after executing the previous code cell. [PythonTutor](http://pythontutor.com/visualize.html#code=nested_numbers%20%3D%20%28%28range%28x,%20y%20%2B%201%29%29%20for%20x,%20y%20in%20zip%28range%281,%204%29,%20range%287,%2010%29%29%29%0Aresult%20%3D%20sum%28number%20for%20inner_numbers%20in%20nested_numbers%20for%20number%20in%20inner_numbers%29%0Ano_result%20%3D%20sum%28sum%28inner_numbers%29%20for%20inner_numbers%20in%20nested_numbers%29&cumulative=false&curInstr=0&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false) also shows that."
"Now, the first of the two alternatives may be more appealing to many readers. In general, many practitioners seem to dislike `lambda` expressions.\n",
"\n",
"The code cell below *unpacks* the elements produced by `((x / y) + 1 for x in first for y in second)` into the `product()` function from the \"*Packing & Unpacking*\" section above. However, inside `product()`, the elements are *packed* into `args`, a materialized `tuple` object. So, all the memory efficiency gained with the generator expression is voided! [PythonTutor](http://pythontutor.com/visualize.html#code=def%20product%28*args%29%3A%0A%20%20%20%20result%20%3D%20args%5B0%5D%0A%20%20%20%20for%20arg%20in%20args%5B1%3A%5D%3A%0A%20%20%20%20%20%20%20%20result%20*%3D%20arg%0A%20%20%20%20return%20result%0A%0Afirst%20%3D%20range%2810,%2031,%2010%29%0Asecond%20%3D%20range%2840,%2061,%2010%29%0A%0Aresult%20%3D%20product%28*%28%28x%20/%20y%29%20%2B%201%20for%20x%20in%20first%20for%20y%20in%20second%29%29&cumulative=false&curInstr=0&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false) shows how a `tuple` object exists in steps 38-58."
"On the contrary, the solution with the [reduce()](https://docs.python.org/3/library/functools.html#functools.reduce) function from the [functools](https://docs.python.org/3/library/functools.html) module and the `lambda` expression works *without* all elements materialized at the same time, and [PythonTutor](http://pythontutor.com/visualize.html#code=from%20functools%20import%20reduce%0A%0Afirst%20%3D%20range%2810,%2031,%2010%29%0Asecond%20%3D%20range%2840,%2061,%2010%29%0A%0Aresult%20%3D%20reduce%28%0A%20%20%20%20lambda%20x,%20y%3A%20x%20*%20y,%0A%20%20%20%20%28%28x%20/%20y%29%20%2B%201%20for%20x%20in%20first%20for%20y%20in%20second%29%0A%29&cumulative=false&curInstr=0&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false) confirms that. So, only the second alternative is truly memory-efficient."
"In summary, we learn from this example that unpacking generator expressions *may* be a *bad* idea."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"### Tuple Comprehensions"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"There is no syntax to derive *new* `tuple` objects out of existing ones. However, we can mimic such a construct by combining the [tuple()](https://docs.python.org/3/library/functions.html#func-tuple) built-in with a generator expression.\n",
"So, to convert the list comprehension `[(n ** 2) + 1 for n in numbers if n % 2 == 0]` from above into a \"tuple comprehension,\" we write the following."
"Besides [min()](https://docs.python.org/3/library/functions.html#min), [max()](https://docs.python.org/3/library/functions.html#max), and [sum()](https://docs.python.org/3/library/functions.html#sum), Python provides two boolean reduce functions: [all()](https://docs.python.org/3/library/functions.html#all) and [any()](https://docs.python.org/3/library/functions.html#any).\n",
"\n",
"Let's look at straightforward examples involving `numbers` again."
"[all()](https://docs.python.org/3/library/functions.html#all) takes an *iterable* argument and returns `True` if *all* elements are *truthy*.\n",
"\n",
"For example, let's check if the square of each element in `numbers` is below `100` or `150`, respectively. We express the computation with a generator expression passed as the only argument to [all()](https://docs.python.org/3/library/functions.html#all)."
"[all()](https://docs.python.org/3/library/functions.html#all) can be viewed as syntactic sugar replacing a `for`-loop: Internally, [all()](https://docs.python.org/3/library/functions.html#all) implements the *short-circuiting* strategy from [Chapter 3](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/03_conditionals_00_lecture.ipynb#Short-Circuiting), and we mimic that by testing for the *opposite* condition in the `if` statement and leaving the `for`-loop early with the `break` statement. In the worst case, if `threshold` were, for example, `150`, we would loop over *all* elements in the *iterable*, which must be *finite* for the code to work. So, [all()](https://docs.python.org/3/library/functions.html#all) is a *linear search* in disguise."
"The documentation of [all()](https://docs.python.org/3/library/functions.html#all) shows in another way what it does with code: By placing a `return` statement inside the `for`-loop of a function's body, iteration is stopped prematurely once an element does *not* meet the condition. That is the familiar *early exit* pattern at work."
"Expressed with a `for`-loop, the implementation below reveals that [any()](https://docs.python.org/3/library/functions.html#any) follows the *short-circuiting* strategy as well. Here, we do *not* need to check for the opposite condition."
"With the new concepts in this chapter, let's rewrite the book's introductory \"*Averaging Even Numbers*\" example in [Chapter 1](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/01_elements_00_lecture.ipynb#Example:-Averaging-Even-Numbers) such that it efficiently handles a large sequence of numbers.\n",
"We assume the `average_evens()` function below is called with a *finite* and *iterable* object, which generates a \"stream\" of numeric objects that can be cast as `int` objects because the idea of even and odd numbers only makes sense in the context of whole numbers.\n",
"\n",
"The generator expression `(int(n) for n in numbers)` implements the type casting, and when it is evaluated, *nothing* happens except that a `generator` object is stored in `integers`. Then, with the [reduce()](https://docs.python.org/3/library/functools.html#functools.reduce) function from the [functools](https://docs.python.org/3/library/functools.html) module, we *simultaneously* add up *and* count the even numbers produced by the inner generator expression `((n, 1) for n in integers if n % 2 == 0)`. That results in a `generator` object producing `tuple` objects consisting of the next *even* number in line and `1`. Two such `tuple` objects are then iteratively passed to the `lambda` expression as the `x` and `y` arguments. `x` represents the total and the count of the even numbers processed so far, while `y`'s first element, `y[0]`, is the next even number to be added to the running total. The result of the [reduce()](https://docs.python.org/3/library/functools.html#functools.reduce) function is again a `tuple` object, namely the final `total` and `count`. Lastly, we calculate the simple average.\n",
"In summary, the implementation of `average_evens()` does *not* keep materialized `list` objects internally like its predecessors from [Chapter 2](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/02_functions_00_lecture.ipynb), but processes the elements of the `numbers` argument on a one-by-one basis."
"To show that `average_evens()` can process a **stream** of data, we simulate `10_000_000` randomly drawn integers between `0` and `100` with the [randint()](https://docs.python.org/3/library/random.html#random.randint) function from the [random](https://docs.python.org/3/library/random.html) module. We use a generator expression derived from a `range` object as the `numbers` argument. So, at *no* point in time is there a materialized `list` or `tuple` object in memory. The result approaching `50` tells us that [randint()](https://docs.python.org/3/library/random.html#random.randint) must be based on a uniform distribution."
"To show that `average_evens()` filters out odd numbers, we simulate another stream of `10_000_000` randomly drawn odd integers between `1` and `99`. As no function in the [random](https://docs.python.org/3/library/random.html) module does that \"out of the box,\" we must be creative: Doubling a number drawn from `random.randint(0, 49)` results in an even number between `0` and `98`, and adding `1` makes it odd. Then, `average_evens()` raises a `TypeError`, essentially because `(int(n) for n in numbers)` does not generate any element."
"In the \"*Collections vs. Sequences*\" section above, we studied the three and four *behaviors* of collections and sequences. The latter two are *abstract* ideas, and we mainly use them to classify *concrete* data types.\n",
"Similarly, we have introduced data types in this chapter that all share the \"behavior\" of modeling some \"rule\" in memory to generate objects \"on the fly:\" They are the `map`, `filter`, and `generator` types. Their main commonality is supporting the built-in [next()](https://docs.python.org/3/library/functions.html#next) function. In computer science terminology, such data types are called **[iterators](https://en.wikipedia.org/wiki/Iterator)**, and the [collections.abc](https://docs.python.org/3/library/collections.abc.html) module formalizes them with the `Iterator` ABC.\n",
"\n",
"So, an example of an iterator is `evens_transformed` below, an object of type `generator`."
"In Python, iterators are *always* also iterables. The reverse does *not* hold! To be precise, iterators are *specializations* of iterables. That is what the \"Inherits from\" columns means in the [collections.abc](https://docs.python.org/3/library/collections.abc.html) module's documentation."
"Furthermore, we revise our definition of *iterables* from above: Just as we defined an *iterator* to be an object that supports the [next()](https://docs.python.org/3/library/functions.html#next) function, we define an *iterable* to be an object that supports the built-in [iter()](https://docs.python.org/3/library/functions.html#iter) function.\n",
"The confused reader may now be wondering how the two concepts relate to each other.\n",
"\n",
"In short, the [iter()](https://docs.python.org/3/library/functions.html#iter) function is the general way to create an *iterator* object out of a given *iterable* object. In real-world code, we hardly ever see [iter()](https://docs.python.org/3/library/functions.html#iter) as Python calls it for us in the background. Then, the *iterator* object manages the iteration over the *iterable* object.\n",
"\n",
"For illustration, let's do that ourselves and create *two* iterators out of the iterable `numbers` and see what we can do with them."
"By calling [next()](https://docs.python.org/3/library/functions.html#next) three times with `iterator1` as the argument, we obtain the first three elements of `numbers`."
"We can also play a \"trick\" and exchange some elements in `numbers`. `iterator1` and `iterator2` do *not* see these changes and present us with the new elements. So, *iterators* not only have state on their own but also keep this separate from the underlying *iterable*."
"Let's re-assign the elements in `numbers` so that they are in order. Now, the numbers returned from [next()](https://docs.python.org/3/library/functions.html#next) also tell us how often [next()](https://docs.python.org/3/library/functions.html#next) was called with `iterator1` or `iterator2`. We conclude that `list_iterator` objects must be keeping track of the *last* index obtained from the underlying *iterable*."
"With the concepts introduced in this section, we can now understand the first sentence in the documentation on the [zip()](https://docs.python.org/3/library/functions.html#zip) built-in better: \"Make an *iterator* that aggregates elements from each of the *iterables*.\"\n",
"Because *iterators* are always also *iterables*, we pass `iterator1` and `iterator2` as arguments to [zip()](https://docs.python.org/3/library/functions.html#zip).\n",
"\n",
"The returned `zipper` object is of type `zip` and, \"abstractly speaking,\" an `Iterator` as well."
"So far, we have always used [zip()](https://docs.python.org/3/library/functions.html#zip) with a `for` statement and looped over it. That was our earlier definition of an *iterable*. Our revised definition in this section states that an *iterable* is an object that supports the [iter()](https://docs.python.org/3/library/functions.html#iter) function. So, let's see what happens if we pass `zipper` to [iter()](https://docs.python.org/3/library/functions.html#iter)."
"`zipper_iterator` references the *same* object as `zipper`! That holds for *iterators* in general: Any *iterator* created from an existing *iterator* with [iter()](https://docs.python.org/3/library/functions.html#iter) is the *iterator* itself."
"The `for`-loop below prints out *six* more `tuple` objects derived from the now ordered `numbers` because the `iterator1` object hidden inside `zipper` already returned the first *six* elements. So, the respective first elements of the `tuple` objects printed range from `7` to `12`. Similarly, as `iterator2` already provided *three* elements from `numbers`, we see the respective second elements in the range from `4` to `9`."
"We verify that `iterator1` is exhausted by passing it to [next()](https://docs.python.org/3/library/functions.html#next) again, which raises a `StopIteration` exception."
"\u001b[0;32m<ipython-input-383-0680f5ec8ce3>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mnext\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0miterator1\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"Understanding *iterators* and *iterables* is helpful for any data science practitioner that deals with large amounts of data. Even without that, these two terms occur everywhere in Python-related texts and documentation."
"In [Chapter 4](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/04_iteration_00_lecture.ipynb#The-for-Statement), we argue that the `for` statement is syntactic sugar, replacing the `while` statement in many scenarios. In particular, a `for`-loop saves us two tasks: Managing an index variable *and* obtaining the individual elements by indexing. In this sub-section, we look at a more realistic picture, using the new terminology as well.\n",
"What happens behind the scenes in the Python interpreter is shown below.\n",
"\n",
"First, Python calls [iter()](https://docs.python.org/3/library/functions.html#iter) with the `iterable` to be looped over, and obtains an `iterator`. That contains the entire logic of how the `iterable` is looped over. In particular, the `iterator` may or may not pick the `iterable`'s elements in a predictable order. That is up to the \"rule\" it models.\n",
"\n",
"Second, Python enters an *indefinite* `while`-loop. It tries to obtain the next element with [next()](https://docs.python.org/3/library/functions.html#next). If that succeeds, the `for`-loop's code block is executed. Below, that code is placed within the `else`-clause that runs only if *no* exception is raised in the `try`-clause. Then, Python jumps into the next iteration and tries to obtain the next element from the `iterator`, and so on. Once the `iterator` is exhausted, it raises a `StopIteration` exception, and Python leaves the `while`-loop with the `break` statement."
"Now that we know the concept of an *iterator*, let's compare some of the built-ins introduced in this chapter in detail and make sure we understand what is going on in memory. This sub-section is thus a great summary of this chapter as well.\n",
"We use two simple examples, `numbers` and `memoryless`, to guide us through the discussion. `numbers` creates *thirteen* objects in memory and `memoryless` only *one* (cf., [PythonTutor](http://www.pythontutor.com/visualize.html#code=numbers%20%3D%20%5B7,%2011,%208,%205,%203,%2012,%202,%206,%209,%2010,%201,%204%5D%0Amemoryless%20%3D%20range%281,%2013%29&cumulative=false&curInstr=2&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false))."
"The [sorted()](https://docs.python.org/3/library/functions.html#sorted) function takes a *finite* and *iterable* object as its argument and *materializes* its elements into a *new* `list` object that is returned.\n",
"The argument may already be materialized, as is the case with `numbers`, but may also be an *iterable* without any objects in it, such as `memoryless`. In both cases, we end up with materialized `list` objects with the elements sorted in *forward* order (cf., [PythonTutor](http://www.pythontutor.com/visualize.html#code=numbers%20%3D%20%5B7,%2011,%208,%205,%203,%2012,%202,%206,%209,%2010,%201,%204%5D%0Amemoryless%20%3D%20range%281,%2013%29%0Aresult1%20%3D%20sorted%28numbers%29%0Aresult2%20%3D%20sorted%28memoryless%29&cumulative=false&curInstr=4&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false))."
"By adding a keyword-only argument `reverse=True`, the materialized `list` objects are sorted in *reverse* order (cf., [PythonTutor](http://www.pythontutor.com/visualize.html#code=numbers%20%3D%20%5B7,%2011,%208,%205,%203,%2012,%202,%206,%209,%2010,%201,%204%5D%0Amemoryless%20%3D%20range%281,%2013%29%0Aresult1%20%3D%20sorted%28numbers,%20reverse%3DTrue%29%0Aresult2%20%3D%20sorted%28memoryless,%20reverse%3DTrue%29&cumulative=false&curInstr=4&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false))."
"The [reversed()](https://docs.python.org/3/library/functions.html#reversed) built-in takes a *sequence* object as its argument and returns an *iterator*. The argument must be *finite* and *reversible* (i.e., *iterable* in *reverse* order) as otherwise [reversed()](https://docs.python.org/3/library/functions.html#reversed) could neither determine the last element that becomes the first nor loop in a *predictable* backward fashion. [PythonTutor](http://www.pythontutor.com/visualize.html#code=numbers%20%3D%20%5B7,%2011,%208,%205,%203,%2012,%202,%206,%209,%2010,%201,%204%5D%0Amemoryless%20%3D%20range%281,%2013%29%0Aiterator1%20%3D%20reversed%28numbers%29%0Aiterator2%20%3D%20reversed%28memoryless%29&cumulative=false&curInstr=4&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false) confirms that [reversed()](https://docs.python.org/3/library/functions.html#reversed) does *not* materialize any elements but only returns an *iterator*.\n",
"**Side Note**: Even though `range` objects, like `memoryless` here, do *not* \"contain\" references to other objects, they count as *sequence* types, and as such, they are also *container* types. The `in` operator works with `range` objects because we can always cast the object to be checked as an `int` and check if that lies within the `range` object's *start* and *stop* values, taking a potential *step* value into account (cf., this [blog post](https://treyhunner.com/2018/02/python-range-is-not-an-iterator/) for more details on the [range()](https://docs.python.org/3/library/functions.html#func-range) built-in)."
"To materialize the elements, we can pass the returned *iterators* to, for example, the [list()](https://docs.python.org/3/library/functions.html#func-list) or [tuple()](https://docs.python.org/3/library/functions.html#func-tuple) built-ins. That creates *new* `list` and `tuple` objects (cf., [PythonTutor](http://www.pythontutor.com/visualize.html#code=numbers%20%3D%20%5B7,%2011,%208,%205,%203,%2012,%202,%206,%209,%2010,%201,%204%5D%0Amemoryless%20%3D%20range%281,%2013%29%0Aresult1%20%3D%20list%28reversed%28numbers%29%29%0Aresult2%20%3D%20tuple%28reversed%28memoryless%29%29&cumulative=false&curInstr=4&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false)).\n",
"\n",
"To reiterate some more new terminology from this chapter, we describe [reversed()](https://docs.python.org/3/library/functions.html#reversed) as *lazy*, whereas [list()](https://docs.python.org/3/library/functions.html#func-list) and [tuple()](https://docs.python.org/3/library/functions.html#func-tuple) are *eager*. The former has no significant side effect in memory, while the latter may require a lot of memory."
"Of course, we can also loop over the returned *iterators* instead.\n",
"\n",
"That works because *iterators* are always *iterables*; in particular, as the previous \"*The for Statement (revisited)*\" sub-section explains, the `for`-loops below call `iter(reversed(numbers))` and `iter(reversed(memoryless))` behind the scenes. However, the *iterators* returned by [iter()](https://docs.python.org/3/library/functions.html#iter) are the *same* as the `reversed(numbers)` and `reversed(memoryless)` iterators passed in! In summary, the `for`-loops below involve many subtleties that together make Python the expressive language it is."
"As with [sorted()](https://docs.python.org/3/library/functions.html#sorted), the [reversed()](https://docs.python.org/3/library/functions.html#reversed) built-in does *not* mutate its argument."
"To point out the potentially obvious, we compare the results of *sorting* `numbers` in *reverse* order with *reversing* it: These are *different* concepts!"
"Whereas both [sorted()](https://docs.python.org/3/library/functions.html#sorted) and [reversed()](https://docs.python.org/3/library/functions.html#reversed) do *not* mutate their arguments, the *mutable* `list` type comes with two methods, [sort()](https://docs.python.org/3/library/stdtypes.html#list.sort) and `reverse()`, that implement the same logic but mutate an object, like `numbers` below, *in place*. To indicate that all changes occur *in place*, the [sort()](https://docs.python.org/3/library/stdtypes.html#list.sort) and `reverse()` methods always return `None`, which is not shown."
"The `reverse()` method is *eager*, as opposed to the *lazy* [reversed()](https://docs.python.org/3/library/functions.html#reversed) built-in. That means the mutations causes by the `reverse()` method are written into memory right away."
"**Sequences** are an *abstract* concept that summarizes *four* behaviors an object may or may not exhibit. Sequences are\n",
"- **finite** and\n",
"- **ordered**\n",
"- **containers** that we may\n",
"- **loop over**.\n",
"\n",
"Examples are the `list`, `tuple`, but also the `str` types.\n",
"\n",
"Objects that exhibit all behaviors *except* being ordered are referred to as **collections**.\n",
"\n",
"The objects inside a sequence are called its **elements** and may be labeled with a unique **index**, an `int` object in the range $0 \\leq \\text{index} < \\lvert \\text{sequence} \\rvert$.\n",
"\n",
"`list` objects are **mutable**. That means we can change the references to the other objects it contains, and, in particular, re-assign them.\n",
"\n",
"On the contrary, `tuple` objects are like **immutable** lists: We can use them in place of any `list` object as long as we do *not* need to mutate it. Often, `tuple` objects are also used to model **records** of related **fields**.\n",
"The tasks we do with sequential data follow the **map-filter-reduce paradigm**: We apply the same transformation to all elements, filter some of them out, and calculate summary statistics from the remaining ones.\n",
"An essential idea in this chapter is that, in many situations, we need *not* have all the data **materialized** in memory. Instead, **iterators** allow us to process sequential data on a one-by-one basis.\n",