"While controlling the flow of execution with an `if` statement is a must-have building block in any programming language, it alone does not allow us to run a block of code repetitively, and we need to be able to do precisely that to write useful software.\n",
"The `for` statement shown in some examples before might be the missing piece in the puzzle. However, we can live without it and postpone its official introduction until the second half of this chapter.\n",
"Instead, we dive into the big idea of **iteration** by studying the concept of **recursion** first. This order is opposite to many other introductory books that only treat the latter as a nice-to-have artifact, if at all. Yet, understanding recursion sharpens one's mind and contributes to seeing problems from a different angle."
"Recursive functions contain some form of a conditional check (e.g., `if` statement) to identify a **base case** that ends the recursion *when* reached. Otherwise, the function would keep calling itself forever.\n",
"The meaning of the word *recursive* is similar to *circular*. However, a truly circular definition is not very helpful, and we think of a recursive function as kind of a \"circular function with a way out at the end.\""
"A rather trivial toy example illustrates the important aspects concretely: If called with any positive integer as its `n` argument, `countdown()` just prints that number and calls itself with the *new* `n` being the *old* `n` minus `1`. This continues until `countdown()` is called with `n=0`. Then, the flow of execution hits the base case, and the function calls stop."
"As trivial as this seems, a lot of complexity is hidden in this implementation. In particular, the order in which objects are created and de-referenced in memory might not be apparent right away as [PythonTutor](http://pythontutor.com/visualize.html#code=def%20countdown%28n%29%3A%0A%20%20%20%20if%20n%20%3D%3D%200%3A%0A%20%20%20%20%20%20%20%20print%28%22Happy%20new%20Year!%22%29%0A%20%20%20%20else%3A%0A%20%20%20%20%20%20%20%20print%28n%29%0A%20%20%20%20%20%20%20%20countdown%28n%20-%201%29%0A%0Acountdown%283%29&cumulative=false&curInstr=0&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false) shows: Each time `countdown()` is called, Python creates a *new* frame in the part of the memory where it manages all the names. This way, Python *isolates* all the different `n` variables from each other. As new frames are created until we reach the base case, after which the frames are destroyed in the *reversed* order, this is called a **[stack](https://en.wikipedia.org/wiki/Stack_(abstract_data_type))** of frames in computer science terminology. In simple words, a stack is a last-in-first-out (LIFO) task queue. Each frame has a single parent frame, namely the one whose recursive function call created it."
"Recursion plays a vital role in mathematics as well, and we likely know about it from some introductory course, for example, in [combinatorics](https://en.wikipedia.org/wiki/Combinatorics)."
"Whenever we find a recursive way of formulating an idea, we can immediately translate it into Python in a *naive* way (i.e., we create a *correct* program that may *not* be an *efficient* implementation yet).\n",
"Below is a first version of `factorial()`: The `return` statement does not have to be a function's last code line, and we may indeed have several `return` statements as well."
"When we read such code, it is often easier not to follow every function call (i.e., `factorial(n - 1)` here) in one's mind but assume we receive a return value as specified in the documentation. Some call this approach a **[leap of faith](http://greenteapress.com/thinkpython2/html/thinkpython2007.html#sec75)**. We practice this already whenever we call built-in functions (e.g., [print()](https://docs.python.org/3/library/functions.html#print) or [len()](https://docs.python.org/3/library/functions.html#len)) where we would have to read C code in many cases.\n",
"To visualize *all* the computational steps of the exemplary `factorial(3)`, we use [PythonTutor](http://pythontutor.com/visualize.html#code=def%20factorial%28n%29%3A%0A%20%20%20%20if%20n%20%3D%3D%200%3A%0A%20%20%20%20%20%20%20%20return%201%0A%20%20%20%20else%3A%0A%20%20%20%20%20%20%20%20recurse%20%3D%20factorial%28n%20-%201%29%0A%20%20%20%20%20%20%20%20result%20%3D%20n%20*%20recurse%0A%20%20%20%20%20%20%20%20return%20result%0A%0Asolution%20%3D%20factorial%283%29&cumulative=false&curInstr=0&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false): The recursion again creates a stack of frames in memory. In contrast to the previous trivial example, each frame leaves a return value in memory after it is destroyed. This return value is then assigned to the `recurse` variable within the parent frame and used to compute `result`."
"A Pythonista would formulate `factorial()` in a more concise way using the so-called **early exit** pattern: No `else`-clause is needed as reaching a `return` statement ends a function call *immediately*. Furthermore, we do not need the temporary variables `recurse` and `result`.\n",
"As [PythonTutor](http://pythontutor.com/visualize.html#code=def%20factorial%28n%29%3A%0A%20%20%20%20if%20n%20%3D%3D%200%3A%0A%20%20%20%20%20%20%20%20return%201%0A%20%20%20%20return%20n%20*%20factorial%28n%20-%201%29%0A%0Asolution%20%3D%20factorial%283%29&cumulative=false&curInstr=0&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false) shows, this implementation is more efficient as it only requires 18 computational steps instead of 24 to calculate `factorial(3)`, an improvement of 25 percent! "
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"def factorial(n):\n",
" \"\"\"Calculate the factorial of a number.\n",
"\n",
" Args:\n",
" n (int): number to calculate the factorial for\n",
"Note that the [math](https://docs.python.org/3/library/math.html) module in the [standard library](https://docs.python.org/3/library/index.html) provides a [factorial()](https://docs.python.org/3/library/math.html#math.factorial) function as well, and we should, therefore, *never* implement it ourselves in a real codebase."
"As famous philosopher Euclid already shows in his \"Elements\" (ca. 300 BC), the greatest common divisor of two integers, i.e., the largest number that divides both integers without a remainder, can be efficiently computed with the following code. This example illustrates that a recursive solution to a problem is not always easy to understand."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"def gcd(a, b):\n",
" \"\"\"Calculate the greatest common divisor of two numbers.\n",
"Euclid's algorithm is stunningly fast, even for large numbers. Its speed comes from the use of the modulo operator `%`. However, this is *not* true for recursion in general, which may result in slow programs if not applied correctly."
"The [math](https://docs.python.org/3/library/math.html) module in the [standard library](https://docs.python.org/3/library/index.html) provides a [gcd()](https://docs.python.org/3/library/math.html#math.gcd) function as well, and, therefore, we should again *never* implement it on our own."
"Help on built-in function gcd in module math:\n",
"\n",
"gcd(x, y, /)\n",
" greatest common divisor of x and y\n",
"\n"
]
}
],
"source": [
"help(math.gcd)"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"data": {
"text/plain": [
"4"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"math.gcd(12, 4)"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"data": {
"text/plain": [
"9"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"math.gcd(112233445566778899, 987654321)"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"data": {
"text/plain": [
"1"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"math.gcd(2, 7919)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"#### \"Easy at first Glance\" Example: [Fibonacci Numbers](https://en.wikipedia.org/wiki/Fibonacci_number)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"The Fibonacci numbers are an infinite sequence of non-negative integers that are calculated such that every number is the sum of its two predecessors where the first two numbers of the sequence are defined to be $0$ and $1$. For example, the first 13 numbers are:"
"Let's write a function `fibonacci()` that calculates the $i$th Fibonacci number where $0$ is the $0$th number. Looking at the numbers in a **backward** fashion (i.e., from right to left), we realize that the return value for `fibonacci(i)` can be reduced to the sum of the return values for `fibonacci(i - 1)` and `fibonacci(i - 2)` disregarding the *two* base cases."
"This implementation is *highly* **inefficient** as small Fibonacci numbers already take a very long time to compute. The reason for this is **exponential growth** in the number of function calls. As [PythonTutor](http://pythontutor.com/visualize.html#code=def%20fibonacci%28i%29%3A%0A%20%20%20%20if%20i%20%3D%3D%200%3A%0A%20%20%20%20%20%20%20%20return%200%0A%20%20%20%20elif%20i%20%3D%3D%201%3A%0A%20%20%20%20%20%20%20%20return%201%0A%20%20%20%20return%20fibonacci%28i%20-%201%29%20%2B%20fibonacci%28i%20-%202%29%0A%0Arv%20%3D%20fibonacci%285%29&cumulative=false&curInstr=0&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false) shows, `fibonacci()` is called again and again with the same `i` arguments.\n",
"To understand this in detail, we would have to study algorithms and data structures (e.g., with [this book](https://www.amazon.de/Introduction-Algorithms-Press-Thomas-Cormen/dp/0262033844/ref=sr_1_1?__mk_de_DE=%C3%85M%C3%85%C5%BD%C3%95%C3%91&crid=1JNE8U0VZGU0O&qid=1569837169&s=gateway&sprefix=algorithms+an%2Caps%2C180&sr=8-1)), a discipline within computer science, and dive into the analysis of **[time complexity of algorithms](https://en.wikipedia.org/wiki/Time_complexity)**.\n",
"Luckily, in the Fibonacci case, the inefficiency can be resolved with a **caching** (i.e., \"re-use\") strategy from the field of **[dynamic programming](https://en.wikipedia.org/wiki/Dynamic_programming)**, namely **[memoization](https://en.wikipedia.org/wiki/Memoization)**. We do so in Chapter 8, after introducing the [dictionaries](https://docs.python.org/3/library/stdtypes.html#dict) data type.\n",
"Let's measure the average run times for `fibonacci()` and varying `i` arguments with the `%%timeit` [cell magic](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-timeit) that comes with Jupyter."
"If a recursion does not reach its base case, it theoretically runs forever. Luckily, Python detects that and saves the computer from crashing by raising a `RecursionError`.\n",
"\u001b[0;32m<ipython-input-29-62f686a9f8a1>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mrun_forever\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;32m<ipython-input-28-80575699c62c>\u001b[0m in \u001b[0;36mrun_forever\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mrun_forever\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[0;34m\"\"\"Also a pointless function should have a docstring.\"\"\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 3\u001b[0;31m \u001b[0mrun_forever\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"... last 1 frames repeated, from the frame below ...\n",
"\u001b[0;32m<ipython-input-28-80575699c62c>\u001b[0m in \u001b[0;36mrun_forever\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mrun_forever\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[0;34m\"\"\"Also a pointless function should have a docstring.\"\"\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 3\u001b[0;31m \u001b[0mrun_forever\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;31mRecursionError\u001b[0m: maximum recursion depth exceeded"
"However, even the trivial `countdown()` function from above is not immune to infinite recursion. Let's call it with `3.1` instead of `3`. What goes wrong here?"
"\u001b[0;32m<ipython-input-30-43e3ad331761>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mcountdown\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m3.1\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;32m<ipython-input-31-188b816be8b6>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mfactorial\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m3.1\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"The infinite recursions could easily be avoided by replacing `n == 0` with `n <= 0` in both functions and thereby **generalizing** them. But even then, calling either `countdown()` or `factorial()` with a non-integer number is *semantically* wrong, and, therefore, we better leave the base cases unchanged.\n",
"Errors as above are a symptom of missing **type checking**: By design, Python allows us to pass in not only integers but objects of any type as arguments to the `countdown()` and `factorial()` functions. As long as the arguments \"behave\" like integers, we do not encounter any *runtime* errors. This is the case here as the two example functions only use the `-` and `*` operators internally, and, in the context of arithmetic, a `float` object behaves like an `int` object. So, the functions keep calling themselves until Python decides with a built-in heuristic that the recursion is likely not going to end and aborts the computations with a `RecursionError`. Strictly speaking, a `RecursionError` is, of course, a *runtime* error as well.\n",
"The missing type checking is *100% intentional* and considered a **[feature of rather than a bug](https://www.urbandictionary.com/define.php?term=It%27s%20not%20a%20bug%2C%20it%27s%20a%20feature)** in Python!\n",
"Pythonistas often use the colloquial term **[duck typing](https://en.wikipedia.org/wiki/Duck_typing)** when referring to the same idea, and the saying goes, \"If it walks like a duck and it quacks like a duck, it must be a duck.\" For example, we could call `factorial()` with the `float` object `3.0`, and the recursion works out fine. So, as long as the `3.0` \"walks\" like a `3` and \"quacks\" like a `3`, it \"must be\" a `3`.\n",
"We see similar behavior when we mix objects of types `int` and `float` with arithmetic operators. For example, `1 + 2.0` works because Python implicitly views the `1` as a `1.0` at runtime and then knows how to do floating-point arithmetic: Here, the `int` \"walks\" and \"quacks\" like a `float`. Strictly speaking, this is yet another example of operator overloading, whereas duck typing refers to the same behavior when passing arguments to function calls.\n",
"The important lesson is that we must expect our functions to be called with objects of *any* type at runtime, as opposed to the one type we had in mind when we defined the function.\n",
"Duck typing is possible because Python is a dynamically typed language. On the contrary, in statically typed languages like C, we *must* declare (i.e., \"specify\") the data type of every parameter in a function definition. Then, a `RecursionError` as for `countdown(3.1)` or `factorial(3.1)` above could not occur. For example, if we declared the `countdown()` and `factorial()` functions to only accept `int` objects, calling the functions with a `float` argument would immediately fail *syntactically*. As a downside, we would then lose the ability to call `factorial()` with `3.0`, which is *semantically* correct nevertheless.\n",
"So, there is no black or white answer as to which of the two language designs is better. Yet, most professional programmers have strong opinions concerning duck typing, reaching from \"love\" to \"hate.\" This is another example of how programming is a subjective art rather than \"objective\" science. Python's design is probably more appealing to beginners who intuitively regard `3` and `3.0` as interchangeable."
"We use the built-in [isinstance()](https://docs.python.org/3/library/functions.html#isinstance) function to make sure `factorial()` is called with an `int` object as the argument. We further **validate the input** by verifying that the integer is non-negative.\n",
"Meanwhile, we also see how we manually raise exceptions with the `raise` [statement](https://docs.python.org/3/reference/simple_stmts.html#the-raise-statement), another way of controlling the flow of execution.\n",
"The first two branches in the revised `factorial()` function act as **guardians** ensuring that the code does not produce *unexpected* runtime errors: Errors may be expected when mentioned in the docstring.\n",
"Forcing `n` to be an `int` is a very puritan way of handling the issues discussed above. A more relaxed approach could be to also accept a `float` and use its [is_integer()](https://docs.python.org/3/library/stdtypes.html#float.is_integer) method to check if `n` could be cast as an `int`. After all, being too puritan, we cannot take advantage of duck typing.\n",
"So, in essence, we are doing *two* things here: Besides checking the type, we also enforce **domain-specific** (i.e., mathematical here) rules concerning the non-negativity of `n`."
"\u001b[0;32m<ipython-input-35-188b816be8b6>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mfactorial\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m3.1\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;32m<ipython-input-32-54b57bc2107d>\u001b[0m in \u001b[0;36mfactorial\u001b[0;34m(n)\u001b[0m\n\u001b[1;32m 13\u001b[0m \"\"\"\n\u001b[1;32m 14\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0misinstance\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mn\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mint\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 15\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mTypeError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"Factorial is only defined for integers\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 16\u001b[0m \u001b[0;32melif\u001b[0m \u001b[0mn\u001b[0m \u001b[0;34m<\u001b[0m \u001b[0;36m0\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 17\u001b[0m \u001b[0;32mraise\u001b[0m \u001b[0mValueError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"Factorial is not defined for negative integers\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;31mTypeError\u001b[0m: Factorial is only defined for integers"
]
}
],
"source": [
"factorial(3.1)"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"ename": "ValueError",
"evalue": "Factorial is not defined for negative integers",
"\u001b[0;32m<ipython-input-36-b06b2cada645>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mfactorial\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0;36m42\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;32m<ipython-input-32-54b57bc2107d>\u001b[0m in \u001b[0;36mfactorial\u001b[0;34m(n)\u001b[0m\n\u001b[1;32m 15\u001b[0m \u001b[0;32mraise\u001b[0m \u001b[0mTypeError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"Factorial is only defined for integers\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 16\u001b[0m \u001b[0;32melif\u001b[0m \u001b[0mn\u001b[0m \u001b[0;34m<\u001b[0m \u001b[0;36m0\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 17\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mValueError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"Factorial is not defined for negative integers\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 18\u001b[0m \u001b[0;32melif\u001b[0m \u001b[0mn\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;36m0\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 19\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;31mValueError\u001b[0m: Factorial is not defined for negative integers"
"With everything *officially* introduced so far (i.e., without the introductory example in [Chapter 1](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/01_elements.ipynb) that only served as an overview), Python would be what is called **[Turing complete](https://en.wikipedia.org/wiki/Turing_completeness)**. That means that anything that could be formulated as an algorithm could be expressed with all the language features we have seen. Note that, in particular, we have *not* yet formally *introduced* the `for` and `while` statements!"
"Whereas functions combined with `if` statements suffice to model any repetitive logic, Python comes with a `while` [statement](https://docs.python.org/3/reference/compound_stmts.html#the-while-statement) that often makes it easier to implement iterative ideas.\n",
"It consists of a header line with a boolean expression followed by an indented code block. Before the first and after every execution of the code block, the boolean expression is evaluated, and if it is (still) equal to `True`, the code block runs (again). Eventually, some variable referenced in the boolean expression is changed in the code block such that the condition becomes `False`.\n",
"If the condition is `False` before the first iteration, the entire code block is *never* executed. As the flow of control keeps \"**looping**\" (i.e., more formally, **iterating**) back to the beginning of the code block, this concept is also called a `while`-loop and each pass through the loop an **iteration**."
"Let's rewrite the `countdown()` example in an iterative style. We also build in **input validation** by allowing the function only to be called with strictly positive integers. As any positive integer hits $0$ at some point when iteratively decremented by $1$, `countdown()` is guaranteed to **terminate**. Also, the base case is now handled at the end of the function, which commonly happens with iterative solutions to problems."
"As [PythonTutor](http://pythontutor.com/visualize.html#code=def%20countdown%28n%29%3A%0A%20%20%20%20if%20not%20isinstance%28n,%20int%29%3A%0A%20%20%20%20%20%20%20%20raise%20TypeError%28%22...%22%29%0A%20%20%20%20elif%20n%20%3C%3D%200%3A%0A%20%20%20%20%20%20%20%20raise%20ValueError%28%22...%22%29%0A%0A%20%20%20%20while%20n%20!%3D%200%3A%0A%20%20%20%20%20%20%20%20print%28n%29%0A%20%20%20%20%20%20%20%20n%20-%3D%201%0A%0A%20%20%20%20print%28%22Happy%20new%20Year!%22%29%0A%0Acountdown%283%29&cumulative=false&curInstr=0&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false) shows, there is a subtle but essential difference in the way a `while` statement is treated in memory: In short, `while` statements can *not* run into a `RecursionError` as only *one* frame is needed to manage the names. After all, there is only *one* function call to be made. For typical day-to-day applications, this difference is, however, not so important *unless* a problem instance becomes so big that a large (i.e., $> 3.000$) number of recursive calls must be made."
"The iterative implementation of `gcd()` below accepts any two strictly positive integers. As in any iteration through the loop, the smaller number is subtracted from the larger one, the two decremented values of `a` and `b` eventually become equal. Thus, this algorithm is also guaranteed to terminate. If one of the two numbers were negative or $0$ in the first place, `gcd()` would run forever, and not even Python could detect this. Try this out by removing the input validation and running the function with negative arguments!"
"We also see that this implementation is a lot *less* efficient than its recursive counterpart which solves `gcd()` for the same two numbers $112233445566778899$ and $987654321$ within microseconds."
"As with recursion, we must ensure that the iteration ends. For the above `countdown()` and `gcd()` examples, we could \"prove\" (i.e., at least argue in favor) that some pre-defined **termination criterion** is reached eventually. However, this cannot be done in all cases, as the following example shows."
"The function below implements this game. Does it always reach $1$? No one has proven it so far! We include some input validation as before because `collatz()` would for sure not terminate if we called it with a negative number. Further, the Collatz sequence also works for real numbers, but then we would have to study fractals (cf., [this](https://en.wikipedia.org/wiki/Collatz_conjecture#Iterating_on_real_or_complex_numbers)). So we restrict our example to integers only."
"Recursion and the `while` statement are two sides of the same coin. Disregarding that in the case of recursion Python internally faces some additional burden for managing the stack of frames in memory, both approaches lead to the *same* computational steps in memory. More importantly, we can re-formulate a recursive implementation in an iterative way and vice versa despite one of the two ways often \"feeling\" a lot more natural given a particular problem.\n",
"So how does the `for` [statement](https://docs.python.org/3/reference/compound_stmts.html#the-for-statement) we saw in the very first example in this book fit into this picture? It is a *redundant* language construct to provide a *shorter* and more *convenient* syntax for common applications of the `while` statement. In programming, such additions to a language are called **syntactic sugar**. A cup of tea tastes better with sugar, but we may drink tea without sugar too.\n",
"Consider the following `numbers` list. Without the `for` statement, we have to manage a temporary **index variable** `i` to loop over all the elements and also obtain the individual elements with the `[]` operator in each iteration of the loop."
"The `for` statement, on the contrary, makes the actual business logic more apparent by stripping all the **[boilerplate code](https://en.wikipedia.org/wiki/Boilerplate_code)** away."
"For sequences of integers, the [range()](https://docs.python.org/3/library/functions.html#func-range) built-in makes the `for` statement even more convenient: It creates a list-like object of type `range` that generates integers \"on the fly,\" and we look closely at the underlying effects in memory in Chapter 7."
"[range()](https://docs.python.org/3/library/functions.html#func-range) takes optional `start` and `step` arguments that we use to customize the sequence of integers even more."
"The essential difference between the above `list` objects, `[0, 1, 2, 3, 4]` and `[1, 3, 5, 7, 9]`, and the `range` objects, `range(5)` and `range(1, 10, 2)`, is that in the former case *six* objects are created in memory *before* the `for` statement starts running, *one* `list` holding pointers to *five* `int` objects, whereas in the latter case only *one* `range` object is created that **generates** `int` objects one at a time *while* the `for`-loop runs.\n",
"However, we can loop over both of them. So a natural question to ask is why Python treats objects of *different* types in the *same* way when used with a `for` statement.\n",
"So far, the overarching storyline in this book goes like this: In Python, *everything* is an object. Besides its *identity* and *value*, every object is characterized by belonging to *one data type* that determines how the object behaves and what we may do with it.\n",
"Now, just as we classify objects by their types, we also classify these **concrete data types** (e.g., `int`, `float`, `str`, or `list`) into **abstract concepts**.\n",
"We did this already in [Chapter 1](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/01_elements.ipynb) when we described a `list` object as \"some sort of container that holds [...] pointers to other objects\". So, abstractly speaking, **containers** are any objects that are \"composed\" of other objects and also \"manage\" how these objects are organized. `list` objects, for example, have the property that they model an internal order associated with its elements. There exist, however, other container types, many of which do *not* come with an order. So, containers primarily \"contain\" other objects and have *nothing* to do with looping.\n",
"On the contrary, the abstract concept of **iterables** is all about looping: Any object that we can loop over is, by definition, an iterable. So, `range` objects, for example, are iterables that do *not* contain other objects. Moreover, looping does *not* have to occur in a *predictable* order, although this is the case for both `list` and `range` objects.\n",
"Typically, containers are iterables, and iterables are containers. Yet, only because these two concepts coincide often, we must not think of them as the same. Chapter 10 finally gives an explanation as to how abstract concepts are implemented and play together.\n",
"The characteristic operator associated with a container type is the `in` operator: It checks if a given object evaluates equal to at least one of the objects in the container. Colloquially, it checks if an object is \"contained\" in the container. Formally, this operation is called **membership testing**."
"The cell below shows the *exact* workings of the `in` operator: Although `3.0` is *not* contained in `numbers`, it evaluates equal to the `3` that is, which is why the following expression evaluates to `True`. So, while we could colloquially say that `numbers` \"contains\" `3.0`, it actually does not."
"If we must have an index variable in the loop's body, we use the [enumerate()](https://docs.python.org/3/library/functions.html#enumerate) built-in that takes an *iterable* as its argument and then generates a \"stream\" of \"pairs\" consisting of an index variable and an object provided by the iterable. There is *no* need to ever revert to the `while` statement to loop over an iterable object."
"for i, name in enumerate(first_names, start=1):\n",
" print(i, name, sep=\" > \", end=\" \")"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"The [zip()](https://docs.python.org/3/library/functions.html#zip) built-in allows us to combine the elements of two or more iterables in a *pairwise* fashion: It conceptually works like a zipper for a jacket."
"#### \"Hard at first Glance\" Example: [Fibonacci Numbers](https://en.wikipedia.org/wiki/Fibonacci_number) (revisited)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"In contrast to its recursive counterpart, the iterative `fibonacci()` function below is somewhat harder to read. For example, it is not so obvious as to how many iterations through the `for`-loop we need to make when implementing it. There is an increased risk of making an *off-by-one* error. Moreover, we need to track a `temp` variable along, at least until we have worked through Chapter 7. Do you understand what `temp` does?\n",
"However, one advantage of calculating Fibonacci numbers in a **forward** fashion with a `for` statement is that we could list the entire sequence in ascending order as we calculate the desired number. To show this, we added `print()` statements in `fibonacci()` below.\n",
"We do *not* need to store the index variable in the `for`-loop's header line: That is what the underscore variable `_` indicates; we \"throw it away.\""
"The iterative `factorial()` implementation is comparable to its recursive counterpart when it comes to readability. One advantage of calculating the factorial in a forward fashion is that we could track the intermediate `product` as it grows."
"The remainder of this chapter introduces more syntactic sugar: None of the language constructs introduced below are needed but contribute to making Python the expressive and easy to read language that it is."
"A first naive implementation could look like this: We loop over *every* element in `numbers` and set an **indicator variable** `is_above`, initialized as `False`, to `True` once we encounter an element satisfying the search condition."
"This implementation is *inefficient* as even if the *first* element in `numbers` has a square greater than `100`, we loop until the last element: This could take a long time for a big list.\n",
"Moreover, we must initialize `is_above` *before* the `for`-loop and write an `if`-`else`-logic *after* it to check for the result. The actual business logic is *not* clear right away.\n",
"Luckily, Python provides a `break` [statement](https://docs.python.org/3/reference/simple_stmts.html#the-break-statement) that lets us stop the `for`-loop in any iteration of the loop. Conceptually, it is yet another means of controlling the flow of execution."
"This is a computational improvement. However, the code still consists of *three* logical sections: Some initialization *before* the `for`-loop, the loop itself, and some finalizing logic. We prefer to convey the program's idea in *one* compound statement instead."
"To express the logic in a prettier way, we add an `else`-clause at the end of the `for`-loop. The `else`-branch is executed *iff* the `for`-loop is *not* stopped with a `break` statement **prematurely** (i.e., *before* reaching the *last* iteration in the loop). The word \"else\" implies a somewhat unintuitive meaning and had better been named a `then`-clause. In most scenarios, however, the `else`-clause logically goes together with some `if` statement in the `for`-loop's body.\n",
"Overall, the code's expressive power increases. Not many programming languages support a `for`-`else`-branching, which turns out to be very useful in practice."
"Of course, if we set the `threshold` an element's square has to pass higher, for example, to `200`, we have to loop through the entire `numbers` list. There is *no way* to optimize this **[linear search](https://en.wikipedia.org/wiki/Linear_search)** further, at least as long as we model `numbers` as a `list` object. More advanced data types, however, exist that mitigate that downside."
"Often, we process some iterable with numeric data, for example, a list of numbers as in the introductory example in [Chapter 1](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/01_elements.ipynb) or, more realistically, data from a CSV file with many rows and columns.\n",
"We study this **map-filter-reduce** paradigm extensively in Chapter 7 after introducing more advanced data types that are needed to work with \"big\" data.\n",
"In general, code gets harder to comprehend the more **horizontal space** it occupies. It is commonly considered good practice to grow a program **vertically** rather than horizontally. Code compliant with [PEP 8](https://www.python.org/dev/peps/pep-0008/#maximum-line-length) requires us to use *at most* 79 characters in a line!\n",
"With already three levels of indentation, less horizontal space is available for the actual code block. Of course, one could flatten the two `if` statements with the logical `and` operator, as shown in [Chapter 3](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/03_conditionals.ipynb#The-if-Statement). Then, however, we trade off horizontal space against a more \"complex\" `if` logic, and this is *not* a real improvement.\n",
"A Pythonista would instead make use of the `continue` [statement](https://docs.python.org/3/reference/simple_stmts.html#the-continue-statement) that causes a loop to jump into the next iteration skipping the rest of the code block.\n",
"This is yet another illustration of why programming is an art. The two preceding code cells do the *same* with *identical* time complexity. However, the latter is arguably easier to read for a human, even more so when the business logic grows beyond two nested filters.\n",
"Sometimes we find ourselves in situations where we *cannot* know ahead of time how often or until which point in time a code block is to be executed."
"Python provides the built-in [input()](https://docs.python.org/3/library/functions.html#input) function that prints a message to the user, called the **prompt**, and reads in what was typed in response as a `str` object. We use it to process a user's \"unreliable\" input to our program (i.e., a user might type in some invalid response). Further, we use the [random()](https://docs.python.org/3/library/random.html#random.random) function in the [random](https://docs.python.org/3/library/random.html) module to model the coin toss.\n",
"A popular pattern to approach such **indefinite loops** is to go with a `while True` statement, which on its own would cause Python to enter into an infinite loop. Then, once a particular event occurs, we `break` out of the loop.\n",
"1. If a user enters *anything* other than `\"heads\"` or `\"tails\"`, for example, `\"Heads\"` or `\"Tails\"`, the program keeps running *without* the user knowing about the mistake!\n",
"2. The code *intermingles* the coin tossing with the processing of the user's input: Mixing *unrelated* business logic in the *same* code block makes a program harder to read and, more importantly, maintain in the long run."
"`get_guess()` not only reads in the user's input but also implements a simple input validation pattern in that the [strip()](https://docs.python.org/3/library/stdtypes.html?highlight=__contains__#str.strip) and [lower()](https://docs.python.org/3/library/stdtypes.html?highlight=__contains__#str.lower) methods remove preceding and trailing whitespace and lower case the input ensuring that the user may spell the input in any possible way (e.g., all upper or lower case). Also, `get_guess()` checks if the user entered one of the two valid options. If so, it returns either `\"heads\"` or `\"tails\"`; if not, it returns `None`."
"Second, we rewrite the `if`-`else`-logic to handle the case where `get_guess()` returns `None` explicitly: Whenever the user enters something invalid, a warning is shown, and another try is granted. We use the `is` operator and not the `==` operator as `None` is a singleton object.\n",
"Now, the program's business logic is expressed in a clearer way. More importantly, we can now change it more easily. For example, we could make the `toss_coin()` function base the tossing on a probability distribution other than the uniform (i.e., replace the [random.random()](https://docs.python.org/3/library/random.html#random.random) function with another one). In general, modular architecture leads to improved software maintenance."
"First, we combine functions that call themselves with conditional statements. This concept is known as **recursion** and suffices to control the flow of execution in *every* way we desire. For a beginner, this approach of **backward** reasoning might not be intuitive, but it turns out to be a handy tool to have in one's toolbox.\n",
"Second, the `while` and `for` statements are alternative and potentially more intuitive ways to express iteration as they correspond to a **forward** reasoning. The `for` statement is **syntactic sugar** that allows rewriting common occurrences of the `while` statement concisely. Python provides the `break` and `continue` statements as well as an optional `else`-clause that make working with the `for` and `while` statements even more convenient.\n",
"**Iterables** are any **concrete data types** that support being looped over, for example, with the `for` statement. The idea behind iterables is an **abstract concept** that may or may not be implemented by any given concrete data type. For example, both `list` and `range` objects can be looped over. The `list` type is also a **container** as any given `list` objects \"contains\" pointers to other objects in memory. On the contrary, the `range` type does not point to any other objects but instead creates *new* `int` objects \"on the fly\" (i.e., when being looped over)."