"In [Chapter 1](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/01_elements_00_lecture.ipynb#Example:-Averaging-Even-Numbers), we simply typed the code to calculate the average of the even numbers in a list of whole numbers into several code cells. Then, we executed them one after another. We had no way of *reusing* the code except for either executing cells multiple times. And, whenever we find ourselves doing repetitive manual work, we can be sure that there must be a way of automating what we are doing.\n",
"This chapter shows how Python offers language constructs that let us **define** functions ourselves that we may then **call** just like the built-in ones. Also, we look at how we can extend our Python installation with functionalities written by other people."
"Python comes with plenty of useful functions built in, some of which we have already seen before (e.g., [print()](https://docs.python.org/3/library/functions.html#print), [sum()](https://docs.python.org/3/library/functions.html#sum), [len()](https://docs.python.org/3/library/functions.html#len), or [id()](https://docs.python.org/3/library/functions.html#id)). The [documentation](https://docs.python.org/3/library/functions.html) has the full list. Just as core Python itself, they are mostly implemented in C and thus very fast.\n",
"\n",
"Below, [sum()](https://docs.python.org/3/library/functions.html#sum) adds up all the elements in the `numbers` list while [len()](https://docs.python.org/3/library/functions.html#len) counts the number of elements in it."
"`sum` and `len` are *no* [keywords](https://docs.python.org/3/reference/lexical_analysis.html#keywords) like `for` or `if` but variables that reference *objects* in memory. Often, we hear people say that \"everything is an object in Python\" (e.g., this [question](https://stackoverflow.com/questions/40478536/in-python-what-does-it-mean-by-everything-is-an-object)). While this phrase may sound abstract in the beginning, it simply means that the entire memory is organized with \"bags\" of $0$s and $1$s, and there are even bags for the built-in functions. That is *not* true for many other languages (e.g., C or Java) and often a source of confusion for people coming to Python from another language.\n",
"\n",
"The built-in [id()](https://docs.python.org/3/library/functions.html#id) function tells us where in memory a particular built-in function is stored."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"140413081843264"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"id(sum)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"data": {
"text/plain": [
"140413081842224"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"id(len)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"[type()](https://docs.python.org/3/library/functions.html#type) reveals that built-in functions like [sum()](https://docs.python.org/3/library/functions.html#sum) or [len()](https://docs.python.org/3/library/functions.html#len) are objects of type `builtin_function_or_method`."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"builtin_function_or_method"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"type(sum)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"data": {
"text/plain": [
"builtin_function_or_method"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"type(len)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Python's object-oriented nature allows us to have functions work with themselves. While seemingly not useful from a beginner's point of view, that enables a lot of powerful programming styles later on."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"140413081841824"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"id(id)"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"data": {
"text/plain": [
"builtin_function_or_method"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"type(id)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"To execute a function, we **call** it with the **call operator** `()` as shown many times in [Chapter 1](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/01_elements_00_lecture.ipynb) and above.\n",
"\n",
"If we are unsure whether a variable references a function or not, we can verify that with the built-in [callable()](https://docs.python.org/3/library/functions.html#callable) function.\n",
"\n",
"Abstractly speaking, *any* object that can be called with the call operator `()` is a so-called **callable**. And, objects of type `builtin_function_or_method` are just one kind of examples thereof. We will see another one already in the next sub-section."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"callable(sum)"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"callable(len)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"`list` objects, for example, are *not* callable."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"False"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"callable(numbers)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Constructors"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"The list of [built-in functions](https://docs.python.org/3/library/functions.html) in the documentation should really be named a list of built-in *callables*.\n",
"\n",
"Besides the built-in functions, the list also features **constructors** for the built-in types. They may be used to **[cast](https://en.wikipedia.org/wiki/Type_conversion)** (i.e., \"convert\") any object as an object of a given type.\n",
"\n",
"For example, to \"convert\" a `float` or a `str` into an `int` object, we use the [int()](https://docs.python.org/3/library/functions.html#int) built-in. Below, *new* `int` objects are created from the `7.0` and `\"7\"` objects that are *newly* created themselves before being processed by [int()](https://docs.python.org/3/library/functions.html#int) right away *without* ever being referenced by a variable."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"7"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"int(7.0)"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"7"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"int(\"7\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Casting an object as an `int` is different from rounding with the built-in [round()](https://docs.python.org/3/library/functions.html#round) function!"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"7"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"int(7.99)"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"8"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"round(7.99)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Notice the subtle difference compared to the behavior of the `//` operator in [Chapter 1](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/01_elements_00_lecture.ipynb##%28Arithmetic#%29-Operators) that \"rounds\" towards minus infinity: [int()](https://docs.python.org/3/library/functions.html#int) always \"rounds\" towards `0`."
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"data": {
"text/plain": [
"-7"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"int(-7.99)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Not all conversions are valid and *runtime* errors may occur as the `ValueError` shows."
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"ename": "ValueError",
"evalue": "invalid literal for int() with base 10: 'seven'",
"\u001b[0;32m<ipython-input-18-af421b358f21>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"seven\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;31mValueError\u001b[0m: invalid literal for int() with base 10: 'seven'"
]
}
],
"source": [
"int(\"seven\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"We may also cast in the other direction with the [float()](https://docs.python.org/3/library/functions.html#float) or [str()](https://docs.python.org/3/library/functions.html#func-str) built-ins."
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"7.0"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"float(7)"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/plain": [
"'7'"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"str(7)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Constructors are full-fledged objects as well."
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"94916229764288"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"id(int)"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"data": {
"text/plain": [
"94916229768192"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"id(float)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"They are of type `type`, which is different from `builtin_function_or_method` above."
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"type"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"type(int)"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"data": {
"text/plain": [
"type"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"type(float)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"As already noted, constructors are *callables*. In that regard, they behave the same as built-in functions. We may call them with the call operator `()`."
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"callable(int)"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"callable(float)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"The attentive student may already have discovered that we refer to `builtin_function_or_method` objects as \"built-in functions\" and `type` objects as just \"built-ins.\" For a beginner, that difference is not so important. But, the ambitious student should already be aware that such subtleties exist.\n",
"We may create so-called *user-defined* **functions** with the `def` statement (cf., [reference](https://docs.python.org/3/reference/compound_stmts.html#function-definitions)). To extend an already familiar example, we reuse the introductory example from [Chapter 1](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/01_elements_00_lecture.ipynb#Best-Practices) in its final Pythonic version and transform it into the function `average_evens()` below. We replace the variable name `numbers` with `integers` for didactical purposes in the first couple of examples.\n",
"A function's **name** must be chosen according to the same naming rules as ordinary variables since Python manages function names like variables. In this book, we further adopt the convention of ending function names with parentheses `()` in text cells for faster comprehension when reading (i.e., `average_evens()` vs. `average_evens`). These are *not* part of the name but must always be written out in the `def` statement for syntactic reasons.\n",
"Functions may define an arbitrary number of **parameters** as inputs that can then be referenced within the indented **code block**: They are listed within the parentheses in the `def` statement (i.e., `integers` below). \n",
"Together, the name and the list of parameters are also referred to as the function's **[signature](https://en.wikipedia.org/wiki/Type_signature)** (i.e., `average_evens(integers)` below).\n",
"A function may specify an *explicit* **return value** (i.e., \"result\" or \"output\") with the `return` statement (cf., [reference](https://docs.python.org/3/reference/simple_stmts.html#the-return-statement)): Functions that have one are considered **fruitful**; otherwise, they are **void**. Functions of the latter kind are still useful because of their **side effects**. For example, the built-in [print()](https://docs.python.org/3/library/functions.html#print) function changes what we see on the screen. Strictly speaking, [print()](https://docs.python.org/3/library/functions.html#print) and other void functions also have an *implicit* return value, namely the `None` object.\n",
"A function should define a **docstring** that describes what it does in a short subject line, what parameters it expects (i.e., their types), and what it returns, if anything. A docstring is a syntactically valid multi-line string (i.e., type `str`) defined within **triple-double quotes** `\"\"\"`. Strings are covered in depth in [Chapter 6](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/06_text_00_lecture.ipynb#The-str-Type). Widely adopted standards for docstrings are [PEP 257](https://www.python.org/dev/peps/pep-0257/) and section 3.8 of [Google's Python Style Guide](https://github.com/google/styleguide/blob/gh-pages/pyguide.md)."
"This works as functions are full-fledged *objects*. So, `average_evens` is just a name referencing an object in memory with an **identity**, a **type**, namely `function`, and a **value**. In that regard, `average_evens` is *no* different from the variable `numbers` or the built-ins' names."
"Its value may seem awkward at first: It consists of a location showing where the function is defined (i.e., `__main__` here, which is Python's way of saying \"in this notebook\") and the signature wrapped inside angle brackets `<` and `>`.\n",
" \n",
"The angle brackets are a convention to indicate that the value may *not* be used as a *literal* (i.e., typed back into another code cell). Chapter 10 introduces the concept of a **text representation** of an object, which is related to the *semantic* meaning of an object's value as discussed in [Chapter 1](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/01_elements_00_lecture.ipynb#Value-/-\"Meaning\"), and the angle brackets convention is one such way to represent an object as text. When executed, the angle brackets cause a `SyntaxError` because Python expects the `<` operator to come with an operand on both sides (cf., [Chapter 3](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/03_conditionals_00_lecture.ipynb#Relational-Operators))."
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"ename": "SyntaxError",
"evalue": "invalid syntax (<ipython-input-31-7f49dff38622>, line 1)",
"output_type": "error",
"traceback": [
"\u001b[0;36m File \u001b[0;32m\"<ipython-input-31-7f49dff38622>\"\u001b[0;36m, line \u001b[0;32m1\u001b[0m\n\u001b[0;31m <function __main__.average_evens(numbers)>\u001b[0m\n\u001b[0m ^\u001b[0m\n\u001b[0;31mSyntaxError\u001b[0m\u001b[0;31m:\u001b[0m invalid syntax\n"
]
}
],
"source": [
"<function __main__.average_evens(numbers)>"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"`average_evens` is, of course, callable. So, the `function` type is the third kind of callable in this chapter."
"The built-in [help()](https://docs.python.org/3/library/functions.html#help) function shows a function's docstring.\n",
"\n",
"Whenever we use code to analyze or obtain information on an object, we say that we **[introspect](https://en.wikipedia.org/wiki/Type_introspection)** it."
"Once defined, we may call a function with the call operator `()` as often as we wish. The formal parameters are then filled in by **passing** *expressions* (e.g., literals or variables) as **arguments** to the function within the parentheses."
"The parameters listed in a function's definition (i.e., `integers` in the example) and variables created *inside* it during execution (i.e., `evens` and `average`) are **local** to that function. That means they only reference an object in memory *while* the function is being executed and are dereferenced immediately when the function call returns. We say they go out of **scope**. That is why we see the `NameError`s below."
"\u001b[0;32m<ipython-input-41-756dfa02d4a8>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mintegers\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;31mNameError\u001b[0m: name 'integers' is not defined"
"\u001b[0;32m<ipython-input-42-df246468d241>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mevens\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;32m<ipython-input-43-c3fe9b4213f6>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0maverage\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"[PythonTutor](http://pythontutor.com/visualize.html#code=numbers%20%3D%20%5B7,%2011,%208,%205,%203,%2012,%202,%206,%209,%2010,%201,%204%5D%0A%0Adef%20average_evens%28integers%29%3A%0A%20%20%20%20evens%20%3D%20%5Bn%20for%20n%20in%20integers%20if%20n%20%25%202%20%3D%3D%200%5D%0A%20%20%20%20average%20%3D%20sum%28evens%29%20/%20len%28evens%29%0A%20%20%20%20return%20average%0A%0Aresult%20%3D%20average_evens%28numbers%29&cumulative=false&curInstr=0&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false) visualizes what happens in memory: To be precise, in the exact moment when the function call is initiated and `numbers` passed in as the `integers` argument, there are *two* references to the *same* `list` object (cf., steps 4-5 in the visualization). We also see how Python creates a *new* **frame** that holds the function's local scope (i.e., \"internal names\") in addition to the **global** frame. Frames are nothing but [namespaces](https://en.wikipedia.org/wiki/Namespace) to *isolate* the names of different **scopes** from each other. The list comprehension `[n for n in integers if n % 2 == 0]` constitutes yet another frame that is in scope as the `list` object assigned to `evens` is *being* created (cf., steps 6-20). When the function returns, only the global frame is left (cf., last step)."
"On the contrary, while a function is *being* executed, it may reference the variables of **enclosing scopes** (i.e., \"outside\" of it). This is a common source of *semantic* errors. Consider the following stylized and incorrect example `average_wrong()`. The error is hard to spot with eyes: The function never references the `integers` parameter but the `numbers` variable in the **global scope** instead."
"[PythonTutor](http://pythontutor.com/visualize.html#code=numbers%20%3D%20%5B7,%2011,%208,%205,%203,%2012,%202,%206,%209,%2010,%201,%204%5D%0A%0Adef%20average_wrong%28integers%29%3A%0A%20%20%20%20evens%20%3D%20%5Bn%20for%20n%20in%20numbers%20if%20n%20%25%202%20%3D%3D%200%5D%0A%20%20%20%20average%20%3D%20sum%28evens%29%20/%20len%28evens%29%0A%20%20%20%20return%20average%0A%0Aresult%20%3D%20average_wrong%28%5B123,%20456,%20789%5D%29&cumulative=false&curInstr=0&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false) is again helpful at visualizing the error interactively: Creating the `list` object `evens` eventually references takes *16* computational steps, namely two for managing the list comprehension, one for setting up an empty `list` object, *twelve* for filling it with elements derived from `numbers` in the global scope (i.e., that is the error), and one to make `evens` reference it (cf., steps 6-21).\n",
"The frames logic shown by PythonTutor is the mechanism with which Python not only manages the names inside *one* function call but also for *many* potentially *simultaneous* calls, as revealed in [Chapter 4](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/04_iteration_00_lecture.ipynb#Trivial-Example:-Countdown). It is the reason why we may reuse the same names for the parameters and variables inside both `average_evens()` and `average_wrong()` without Python mixing them up. So, as we already read in the [Zen of Python](https://www.python.org/dev/peps/pep-0020/), \"namespaces are one honking great idea\" (cf., `import this`), and a frame is just a special kind of namespace."
"Code gets even more confusing when variables by the *same* name from *different* scopes collide. In particular, what should we expect to happen if a function \"changes\" a globally defined variable in its body?\n",
"`average_evens()` below works like `average_evens()` above except that it rounds the numbers in `integers` with the built-in [round()](https://docs.python.org/3/library/functions.html#round) function before filtering and averaging them. [round()](https://docs.python.org/3/library/functions.html#round) returns `int` objects independent of its argument being an `int` or a `float` object. On the first line in its body, `average_evens()` introduces a *local* variable `numbers` whose name collides with the one defined in the global scope."
"As a good practice, let's first \"verify\" that `average_evens()` is \"correct\" by calling it with inputs for which we can calculate the answer in our heads. Treating a function as a \"black box\" (i.e., input-output specification) when testing is also called [unit testing](https://en.wikipedia.org/wiki/Unit_testing) and plays an important role in modern software engineering."
"Such tests are often and conveniently expressed with the `assert` statement (cf., [reference](https://docs.python.org/3/reference/simple_stmts.html#the-assert-statement)): If the expression following `assert` evaluates to `True`, nothing happens."
"In summary, Python is smart enough to keep all the involved `numbers` variables apart. So, the global `numbers` variable is still referencing the *same* `list` object as before."
"The reason why everything works is that *every* time we (re-)assign an object to a variable *inside* a function's body with the `=` statement, this is done in the *local* scope by default. There are ways to change variables existing in an outer scope from within a function, but this is a rather advanced topic.\n",
"\n",
"[PythonTutor](http://pythontutor.com/visualize.html#code=numbers%20%3D%20%5B7,%2011,%208,%205,%203,%2012,%202,%206,%209,%2010,%201,%204%5D%0A%0Adef%20average_evens%28integers%29%3A%0A%20%20%20%20numbers%20%3D%20%5Bround%28n%29%20for%20n%20in%20integers%5D%0A%20%20%20%20evens%20%3D%20%5Bn%20for%20n%20in%20numbers%20if%20n%20%25%202%20%3D%3D%200%5D%0A%20%20%20%20average%20%3D%20sum%28evens%29%20/%20len%28evens%29%0A%20%20%20%20return%20average%0A%0Aresult%20%3D%20average_evens%28%5B40.0,%2041.1,%2042.2,%2043.3,%2044.4%5D%29&cumulative=false&curInstr=0&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false) shows how *two* `numbers` variables exist in *different* scopes referencing *different* objects (cf., steps 14-25) when we execute `average_evens([40.0, 41.1, 42.2, 43.3, 44.4])`.\n",
"\n",
"Variables whose names collide with the ones of variables in enclosing scopes - and the global scope is just the most enclosing scope - are said to **shadow** them.\n",
"\n",
"While this is not a problem for Python, it may lead to less readable code for humans and should be avoided if possible. But, as the software engineering wisdom goes, \"[naming things](https://skeptics.stackexchange.com/questions/19836/has-phil-karlton-ever-said-there-are-only-two-hard-things-in-computer-science)\" is often considered a hard problem as well, and we have to be prepared to encounter shadowing variables.\n",
"\n",
"Shadowing also occurs if a parameter in the function definition goes by the same name as a variable in an outer scope. Below, `average_evens()` is identical to the first version in this chapter except that the parameter `integers` is now called `numbers` as well."
"[PythonTutor](http://pythontutor.com/visualize.html#code=numbers%20%3D%20%5B7,%2011,%208,%205,%203,%2012,%202,%206,%209,%2010,%201,%204%5D%0A%0Adef%20average_evens%28numbers%29%3A%0A%20%20%20%20evens%20%3D%20%5Bn%20for%20n%20in%20numbers%20if%20n%20%25%202%20%3D%3D%200%5D%0A%20%20%20%20average%20%3D%20sum%28evens%29%20/%20len%28evens%29%0A%20%20%20%20return%20average%0A%0Aresult%20%3D%20average_evens%28numbers%29&cumulative=false&curInstr=0&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false) reveals that in this example there are *two* `numbers` variables in *different* scope referencing the *same* `list` object in memory (cf., steps 4-23)."
"So far, we have specified only one parameter in each of our user-defined functions. In [Chapter 1](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/01_elements_00_lecture.ipynb#%28Arithmetic%29-Operators), however, we saw the built-in [divmod()](https://docs.python.org/3/library/functions.html#divmod) function take two arguments. And, the order in which they are passed in matters! Whenever we call a function and list its arguments in a comma separated manner, we say that we pass in the arguments *by position* or refer to them as **positional arguments**."
"For many functions, there is a natural order to the arguments: For example, for any kind of division passing the dividend first and the divisor second seems intuitive. But what if that is not the case in another setting? For example, let's create a close relative of the above `average_evens()` function that also scales the resulting average by a factor. What is more natural? Passing in `numbers` first? Or `scalar`? There is no obvious way and we continue with the first alternative for no concrete reason."
"Now, this function call is a bit harder to understand as we always need to remember what the `2` means. This becomes even harder with more parameters.\n",
"Unfortunately, there are ways to screw this up with a `SyntaxError`: If positional and keyword arguments are mixed, the keyword arguments *must* come last."
"\u001b[0;32m<ipython-input-66-d910518345ec>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mscaled_average_evens\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mnumbers\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"Defining `average_evens()` and `scaled_average_evens()` as above leads to a repetition of most of their code. That is *not* good as such a redundancy makes a code base hard to maintain in the long run: Whenever we change the logic in one function, we must *not* forget to do so for the other function as well. And, most likely, we forget about such issues in larger projects.\n",
"\n",
"Below, three of four lines in the functions' bodies are identical!"
"For example, as not scaling an average is just a special case of scaling it with `1`, we could redefine the two functions like below: In this version, the function resembling the *special* case, `average_evens()`, **forwards** the call to the more *general* function, `scaled_average_evens()`, passing a `scalar` argument of `1`. As the name `scaled_average_evens` within the body of `average_evens()` is looked up each time the function is *being* executed, we may define `average_evens()` before `scaled_average_evens()`."
"*Assuming* that scaling the average occurs rarely, it may be a good idea to handle both cases in *one* function definition by providing a **default argument** of `1` for the `scalar` parameter."
"If `scalar` is *not* passed in, it automatically takes the value `1`.\n",
"\n",
"If `scalar` is passed in, this may be done as either a positional or a keyword argument. Which of the two calls where `scalar` is `2` is faster to understand in a larger program?"
"Because we *assumed* that scaling occurs rarely, we would prefer that our new version of `average_evens()` be called with a *keyword argument* whenever `scalar` is passed in. Then, a function call is never ambiguous when reading the source code.\n",
"Python offers a **keyword-only** syntax when defining a function that *forces* a caller to pass the `scalar` argument *by name* if it is passed in at all: To do so, we place an asterisk `*` before the arguments that may only be passed in by name. Note that the keyword-only syntax also works *without* a default argument."
"We can thus think of it as doing *two* things **atomically** (i.e., either both of them happen or none). First, a `function` object is created that contains the concrete $0$s and $1$s that resemble the instructions we put into the function's body. In the context of a function, these $0$s and $1$s are also called **[byte code](https://en.wikipedia.org/wiki/Bytecode)**. Then, a name referencing the new `function` object is created.\n",
"Only this second aspect makes `def` a statement: Merely creating a new object in memory without making it accessible for later reference does *not* constitute a side effect because the state the program is *not* changed. After all, if we cannot reference an object, how do we know it exists in the first place?\n",
"Python provides a `lambda` expression syntax that allows us to *only* create a `function` object in memory *without* making a name reference it (cf., [reference](https://docs.python.org/3/reference/expressions.html#lambda)). It starts with the keyword `lambda` followed by an optional listing of comma separated parameters, a mandatory colon, and *one* expression that serves as the return value of the resulting `function` object. Because it does *not* create a name referencing the object, we effectively create \"anonymous\" functions with it.\n",
"To inspect the object created by a `lambda` expression, we use the simple `=` statement and assign it to the variable `add_three`, which is really `add_three()` as per our convention from above."
"[type()](https://docs.python.org/3/library/functions.html#type) and [callable()](https://docs.python.org/3/library/functions.html#callable) confirm that `add_three` is indeed a callable `function` object."
"Alternatively, we could call an `function` object created with a `lambda` expression right away (i.e., without assigning it to a variable), which looks quite weird for now as we need *two* pairs of parentheses: The first one serves as a delimiter whereas the second represents the call operator."
"The main point of having functions without a reference to them is to use them in a situation where we know ahead of time that we use the function only *once*.\n",
"Popular applications of lambda expressions occur in combination with the **map-filter-reduce** paradigm (cf., [Chapter 7](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/07_sequences_00_lecture.ipynb#Lambda-Expressions)) or when we do \"number crunching\" with **arrays** and **data frames** (cf., Chapter 9)."
"So far, we have only used what we refer to as **core** Python in this book. By this, we mean all the syntactical rules as specified in the [language reference](https://docs.python.org/3/reference/) and a minimal set of about 50 built-in [functions](https://docs.python.org/3/library/functions.html). With this, we could already implement any algorithm or business logic we can think of!\n",
"However, after our first couple of programs, we would already start seeing recurring patterns in the code we write. In other words, we would constantly be \"reinventing the wheel\" in each new project.\n",
"Would it not be smarter to pull out the reusable components from our programs and put them into some project independent **library** of generically useful functionalities? Then we would only need a way of including these **utilities** in our projects.\n",
"As all programmers across all languages face this very same issue, most programming languages come with a so-called **[standard library](https://en.wikipedia.org/wiki/Standard_library)** that provides utilities to accomplish everyday tasks without much code. Examples are making an HTTP request to some website, open and read popular file types (e.g., CSV or Excel files), do something on a computer's file system, and many more."
"Python also comes with a [standard library](https://docs.python.org/3/library/index.html) that is structured into coherent modules and packages for given topics: A **module** is just a plain text file with the file extension *.py* that contains Python code while a **package** is a folder that groups several related modules.\n",
"The code in the [standard library](https://docs.python.org/3/library/index.html) is contributed and maintained by many volunteers around the world. In contrast to so-called \"third-party\" packages (cf., the next section below), the Python core development team closely monitors and tests the code in the [standard library](https://docs.python.org/3/library/index.html). Consequently, we can be reasonably sure that anything provided by it works correctly independent of our computer's operating system and will most likely also be there in the next Python versions. Parts in the [standard library](https://docs.python.org/3/library/index.html) that are computationally expensive are often rewritten in C and, therefore, much faster than anything we could write in Python ourselves. So, whenever we can solve a problem with the help of the [standard library](https://docs.python.org/3/library/index.html), it is almost always the best way to do so as well.\n",
"The [standard library](https://docs.python.org/3/library/index.html) has grown very big over the years, and we refer to the website [PYMOTW](https://pymotw.com/3/index.html) (i.e., \"Python Module of the Week\") that features well written introductory tutorials and how-to guides to most parts of the library. The same author also published a [book](https://www.amazon.com/Python-Standard-Library-Example-Developers/dp/0134291050/ref=as_li_ss_tl?ie=UTF8&qid=1493563121&sr=8-1&keywords=python+3+standard+library+by+example) that many Pythonistas keep on their shelf for reference. Knowing what is in the [standard library](https://docs.python.org/3/library/index.html) is quite valuable for solving real-world tasks quickly.\n",
"Throughout this book, we look at many modules and packages from the [standard library](https://docs.python.org/3/library/index.html) in more depth, starting with the [math](https://docs.python.org/3/library/math.html) and [random](https://docs.python.org/3/library/random.html) modules in this chapter."
"The [math](https://docs.python.org/3/library/math.html) module provides non-trivial mathematical functions like $sin(x)$ and constants like $\\pi$ or $\\text{e}$.\n",
"To make functions and variables defined \"somewhere else\" available in our current program, we must first **import** them with the `import` statement (cf., [reference](https://docs.python.org/3/reference/simple_stmts.html#import)). "
"This creates the variable `math` that references a **[module object](https://docs.python.org/3/glossary.html#term-module)** (i.e., type `module`) in memory."
"`module` objects serve as namespaces to organize the names inside a module. In this context, a namespace is nothing but a prefix that avoids collision with the variables already defined at the location where we import the module into.\n",
"\n",
"Let's see what we can do with the `math` module.\n",
"The [dir()](https://docs.python.org/3/library/functions.html#dir) built-in function may also be used with an argument passed in. Ignoring the dunder-style names, `math` offers quite a lot of names. As we cannot know at this point if a listed name refers to a function or an ordinary variable, we use the more generic term **attribute** to mean either one of them."
"Common mathematical constants and functions are now available via the dot operator `.` on the `math` object. This operator is sometimes also called the **attribute access operator**, in line with the just introduced term."
"Observe how the arguments passed to functions do not need to be just variables or simple literals. Instead, we may pass in any *expression* that evaluates to a *new* object of the type the function expects.\n",
"So just as a reminder from the expression vs. statement discussion in [Chapter 1](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/01_elements_00_lecture.ipynb#Expressions): An expression is *any* syntactically correct combination of variables and literals with operators. And the call operator `()` is yet another operator. So both of the next two code cells are just expressions! They have no permanent side effects in memory. We may execute them as often as we want *without* changing the state of the program (i.e., this Jupyter notebook).\n",
"So, regarding the very next cell in particular: Although the `2 ** 2` creates a *new* object `4` in memory that is then immediately passed into the [math.sqrt()](https://docs.python.org/3/library/math.html#math.sqrt) function, once that function call returns, \"all is lost\" and the newly created `4` object is forgotten again, as well as the return value of [math.sqrt()](https://docs.python.org/3/library/math.html#math.sqrt)."
"Often, we need a random variable, for example, when we want to build a simulation. The [random](https://docs.python.org/3/library/random.html) module in the [standard library](https://docs.python.org/3/library/index.html) often suffices for that."
"Besides the usual dunder-style attributes, the built-in [dir()](https://docs.python.org/3/library/functions.html#dir) function lists some attributes in an upper case naming convention and many others starting with a *single* underscore `_`. To understand the former, we must wait until Chapter 10, while the latter is explained further below."
"The [random.random()](https://docs.python.org/3/library/random.html#random.random) function generates a uniformly distributed `float` number between $0$ (including) and $1$ (excluding)."
"While we could build some conditional logic with an `if` statement to map the number generated by [random.random()](https://docs.python.org/3/library/random.html#random.random) to a finite set of elements manually, the [random.choice()](https://docs.python.org/3/library/random.html#random.choice) function provides a lot more **convenience** for us. We call it with, for example, the `numbers` list and it draws one element out of it with equal chance."
"To reproduce the *same* random numbers in a simulation each time we run it, we set the **[random seed](https://en.wikipedia.org/wiki/Random_seed)**. It is good practice to do that at the beginning of a program or notebook. It becomes essential when we employ randomized machine learning algorithms, like the [Random Forest](https://en.wikipedia.org/wiki/Random_forest), and want to obtain **reproducible** results for publication in academic journals.\n",
"The [random](https://docs.python.org/3/library/random.html) module provides the [random.seed()](https://docs.python.org/3/library/random.html#random.seed) function to do that."
"As the Python community is based around open source, many developers publish their code, for example, on the Python Package Index [PyPI](https://pypi.org) from where anyone may download and install it for free using command-line based tools like [pip](https://pip.pypa.io/en/stable/) or [conda](https://conda.io/en/latest/). This way, we can always customize our Python installation even more. Managing many such packages is quite a deep topic on its own, sometimes fearfully called **[dependency hell](https://en.wikipedia.org/wiki/Dependency_hell)**.\n",
"The difference between the [standard library](https://docs.python.org/3/library/index.html) and such **third-party** packages is that in the first case, the code goes through a much more formalized review process and is officially endorsed by the Python core developers. Yet, many third-party projects also offer the highest quality standards and are also relied on by many businesses and researchers.\n",
"Throughout this book, we will look at many third-party libraries, mostly from Python's [scientific stack](https://scipy.org/about.html), a tightly coupled set of third-party libraries for storing **big data** efficiently (e.g., [numpy](http://www.numpy.org/)), \"wrangling\" (e.g., [pandas](https://pandas.pydata.org/)) and visualizing them (e.g., [matplotlib](https://matplotlib.org/) or [seaborn](https://seaborn.pydata.org/)), fitting classical statistical models (e.g., [statsmodels](http://www.statsmodels.org/)), training machine learning models (e.g., [sklearn](http://scikit-learn.org/)), and much more.\n",
"[numpy](http://www.numpy.org/) is the de-facto standard in the Python world for handling **array-like** data. That is a fancy word for data that can be put into a matrix or vector format. We look at it in depth in [Chapter 9](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/09_arrays_00_lecture.ipynb).\n",
"As [numpy](http://www.numpy.org/) is *not* in the [standard library](https://docs.python.org/3/library/index.html), it must be *manually* installed, for example, with the [pip](https://pip.pypa.io/en/stable/) tool. As mentioned in [Chapter 0](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/00_intro_00_lecture.ipynb#Markdown-vs.-Code-Cells), to execute terminal commands from within a Jupyter notebook, we start a code cell with an exclamation mark.\n",
"If you are running this notebook with an installation of the [Anaconda Distribution](https://www.anaconda.com/distribution/), then [numpy](http://www.numpy.org/) is probably already installed. Running the cell below confirms that."
"[numpy](http://www.numpy.org/) is conventionally imported with the shorter **idiomatic** name `np`. The `as` in the import statement changes the resulting variable name. It is a shortcut for the three lines `import numpy`, `np = numpy`, and `del numpy`."
"[numpy](http://www.numpy.org/) somehow magically adds new behavior to Python's built-in arithmetic operators. For example, we may now [scalar-multiply](https://en.wikipedia.org/wiki/Scalar_multiplication) `vec`.\n",
"[numpy](http://www.numpy.org/)'s functions are implemented in highly optimized C code and, therefore, are fast, especially when dealing with bigger amounts of data."
"This scalar multiplication would \"fail\" if we used a plain `list` object like `numbers` instead of an `numpy.ndarray` object like `vec`. The two types exhibit different **behavior** when used with the same operator, another example of **operator overloading**."
"[numpy](http://www.numpy.org/)'s `numpy.ndarray` objects integrate nicely with Python's built-in functions (e.g., [sum()](https://docs.python.org/3/library/functions.html#sum)) or functions from the [standard library](https://docs.python.org/3/library/index.html) (e.g., [random.choice()](https://docs.python.org/3/library/random.html#random.choice))."
"For sure, we can create local modules and packages. In the repository's main directory, there is a [*sample_module.py*](https://github.com/webartifex/intro-to-python/blob/master/sample_module.py) file that contains, among others, a function equivalent to the final version of `average_evens()`. To be realistic, this sample module is structured in a modular manner with several functions building on each other. It is best to skim over it *now* before reading on.\n",
"To make code we put into a *.py* file available in our program, we import it as a module just as we did above with modules in the [standard library](https://docs.python.org/3/library/index.html) or third-party packages.\n",
"The *name* to be imported is the file's name except for the *.py* part. For this to work, the file's name *must* adhere to the *same* rules as hold for [variable names](https://docs.python.org/3/reference/lexical_analysis.html#identifiers) in general.\n",
"What happens during an import is as follows. When Python sees the `import sample_module` part, it first creates a *new* object of type `module` in memory. This is effectively an *empty* namespace. Then, it executes the imported file's code from top to bottom. Whatever variables are still defined at the end of this, are put into the module's namespace. Only if the file's code does *not* raise an error, will Python make a variable in our current location (i.e., `mod` here) reference the created `module` object. Otherwise, it is discarded. In essence, it is as if we copied and pasted the file's code in place of the import statement. If we import an already imported module again, Python is smart enough to avoid doing all this work all over and does nothing."
"Disregarding the dunder-style attributes, `mod` defines the five attributes `_default_scalar`, `_scaled_average`, `average`, `average_evens`, and `average_odds`, which are exactly the ones we would expect from reading the [*sample_module.py*](https://github.com/webartifex/intro-to-python/blob/master/sample_module.py) file.\n",
"A convention when working with imported code is to *disregard* any attributes starting with an underscore `_`. These are considered **private** and constitute **implementation details** the author of the imported code might change in a future version of his software. We *must* not rely on them in any way.\n",
"In contrast, the three remaining **public** attributes are the functions `average()`, `average_evens()`, and `average_odds()` that we may use after the import."
"We use the imported `mod.average_evens()` just like `average_evens()` defined above. The advantage we get from **modularization** with *.py* files is that we can now easily reuse functions across different Jupyter notebooks without redefining them again and again. Also, we can \"source out\" code that distracts from the storyline told in a notebook."
"Packages are a generalization of modules, and we look at one in detail in Chapter 10. You may, however, already look at a [sample package](https://github.com/webartifex/intro-to-python/tree/master/sample_package) in the repository, which is nothing but a folder with *.py* files in it.\n",
"- make programs easier to comprehend and debug for humans as they give names to the smaller parts of a larger program (i.e., they **modularize** a code base), and\n",
"Functions are a special kind of **callables**. Any object that may be **called** with the call operator `()` is a callable. Built-in functions and **constructors** are other kinds of callables.\n",