"In Chapter 1 we typed the **business logic** of our little program to calculate the mean of a subset of a list of numbers right into the code cells. Then, we executed them one after another. We had no way of **re-using** the code except for either re-executing the cells or copying and pasting their contents into other cells. And whenever we find ourselves doing repetitive manual work, we can be sure that there must be a way of automating what we are doing.\n",
"\n",
"At the same time, we executed built-in functions (e.g., [print()](https://docs.python.org/3/library/functions.html#print), [sum()](https://docs.python.org/3/library/functions.html#sum), [len()](https://docs.python.org/3/library/functions.html#len), [id()](https://docs.python.org/3/library/functions.html#id), or [type()](https://docs.python.org/3/library/functions.html#type)) that obviously must be re-using the same parts inside core Python every time we use them.\n",
"\n",
"This chapter shows how Python offers language constructs that let us **define** our own functions that we can then **call** just like the built-in ones."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Function Definition"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"So-called **[user-defined functions](https://docs.python.org/3/reference/compound_stmts.html#function-definitions)** can be created with the `def` statement. To extend an already familiar example, we re-use the introductory example from Chapter 1 in its final Pythonic version and transform it into the function `average_evens()` below. \n",
"\n",
"A function's **name** must be chosen according to the same naming rules as for ordinary variables. In fact, Python manages function names just like variables. In this book, we further adopt the convention of ending function names with parentheses \"`()`\" in text cells for faster comprehension when reading (i.e., `average_evens()` vs. `average_evens`). These are not actually part of the name but must always be written out in the `def` statement for syntactic reasons.\n",
"\n",
"Functions may define an arbitrary number of **parameters** as inputs that can then be referenced within the indented **code block**: They are simply listed within the parentheses in the `def` statement (i.e., `numbers` below). \n",
"\n",
"The code block is often also called a function's **body** while the first line with the `def` in it is the **header** and must end with a colon.\n",
"\n",
"Together, the name and the list of parameters are also referred to as the function's **[signature](https://en.wikipedia.org/wiki/Type_signature)** (i.e., `average_evens(numbers)` below).\n",
"\n",
"A function may come with an *explicit* **[return value](https://docs.python.org/3/reference/simple_stmts.html#the-return-statement)** (i.e., \"result\" or \"output\") specified with the `return` statement: Functions that have one are considered **fruitful**; otherwise, they are **void**. Functions of the latter kind are still useful because of their **side effects** (e.g., the [print()](https://docs.python.org/3/library/functions.html#print) built-in). Strictly speaking, they also have an *implicit* return value of `None` that is different from the `False` we saw in Chapter 1.\n",
"\n",
"To maintain good coding practices, a function should define a **docstring** that describes what it does in a short subject line, what parameters it expects (i.e., their types), and what it returns (if anything). A docstring is a syntactically valid multi-line string (i.e., type `str`) defined with **triple-double quotes** (strings are covered in depth in Chapter 6). Good standards as to how to format a docstring are [PEP 257](https://www.python.org/dev/peps/pep-0257/) and section 3.8 in [Google's Python Style Guide](https://github.com/google/styleguide/blob/gh-pages/pyguide.md)."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"def average_evens(numbers):\n",
" \"\"\"Calculate the average of all even numbers in a list.\n",
"\n",
" Args:\n",
" numbers (list): a list of numbers; may be integers or floats\n",
"\n",
" Returns:\n",
" float: average\n",
" \"\"\"\n",
" evens = [n for n in numbers if n % 2 == 0]\n",
" average = sum(evens) / len(evens)\n",
" return average"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Once defined, a function can be referenced just like any other variable by its name (i.e., *without* the parenthesis). Its value might seem awkward at first: It consists of the location where we defined the function (i.e., `__main__`, which is Python's way of saying \"in this notebook\") and the signature."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"<function __main__.average_evens(numbers)>"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"average_evens"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"A function is an **object** on its own with an **identity** (i.e., memory location) and a **type**, namely `function`."
"The built-in [help()](https://docs.python.org/3/library/functions.html#help) function shows a function's docstring.\n",
"\n",
"Whenever we use code to analyze or obtain information on an object, we say that we **[introspect](https://en.wikipedia.org/wiki/Type_introspection)** it."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Help on function average_evens in module __main__:\n",
"\n",
"average_evens(numbers)\n",
" Calculate the average of all even numbers in a list.\n",
" \n",
" Args:\n",
" numbers (list): a list of numbers; may be integers or floats\n",
" \n",
" Returns:\n",
" float: average\n",
"\n"
]
}
],
"source": [
"help(average_evens)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"In a Jupyter notebook, we can just as well add a question mark to a function's name to achieve the same. Then, a small tab opens in our browser."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [],
"source": [
"average_evens?"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Two questions marks even show a function's source code."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [],
"source": [
"average_evens??"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Function Calls"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Once defined we can **call** (i.e., \"execute\") a function with the **call operator** `()`. The formal parameters are filled in by passing variables or expressions as **arguments** to the function within the parentheses."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"nums = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/plain": [
"6.0"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"average_evens(nums)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"The return value is usually assigned to a new variable for later reference. Otherwise we would loose access to it in memory right away."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"avg = average_evens(nums)"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/plain": [
"6.0"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"avg"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Scoping Rules"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Local Scope disappears"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Notice how the parameters listed in a function's definition (i.e., `numbers`) and variables created inside it during execution (i.e., `evens` and `average`) are **local** to that function. That means they are only mapped to an object in memory while the function is being executed and de-referenced immediately when the function returns. We say they **go out of scope** once the function terminates."
"\u001b[0;32m<ipython-input-12-6a54518a0c2c>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mnumbers\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;31mNameError\u001b[0m: name 'numbers' is not defined"
"\u001b[0;32m<ipython-input-13-df246468d241>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mevens\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;31mNameError\u001b[0m: name 'evens' is not defined"
"\u001b[0;32m<ipython-input-14-c3fe9b4213f6>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0maverage\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;31mNameError\u001b[0m: name 'average' is not defined"
]
}
],
"source": [
"average"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Global Scope is everywhere"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"On the contrary, while a function is being executed, it can \"see\" the variables of the **enclosing scope** (i.e., \"outside\" of it). This is a common source of *semantic* errors. Consider the following stylized (and incorrect) example `average_wrong()`. The error is hard to spot with eyes: The function never references the `numbers` parameter but the `nums` variable in the **global scope** instead."
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"def average_wrong(numbers):\n",
" \"\"\"Calculate the average of all even numbers in a list.\n",
"\n",
" Args:\n",
" numbers (list): a list of numbers; may be integers or floats\n",
"\n",
" Returns:\n",
" float: average\n",
" \"\"\"\n",
" evens = [n for n in nums if n % 2 == 0] # should reference numbers, not nums\n",
" average = sum(evens) / len(evens)\n",
" return average"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"nums"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/plain": [
"6.0"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"average_wrong(nums) # the result is correct by accident!"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/plain": [
"6.0"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"average_wrong([123, 456, 789])"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Also, observe how both `average_evens()` and `average_wrong()` use the same names for their respective parameters and variables internally. For sure, Python is smart enough to not mix them up. This is because each function call creates a temporary **[namespace](https://en.wikipedia.org/wiki/Namespace)** that *isolates* the local scope's names for usage only from within the function. As we saw in the [Zen of Python](https://www.python.org/dev/peps/pep-0020/), \"namespaces are one honking great idea\" (cf., `import this`)."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Shadowing"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Code gets even more confusing when variables by the same name from different scopes collide. In particular, what should we expect to happen if a function changes a globally defined variable internally?\n",
"`average_odds()` below works like `average_evens()` above except that it **[casts](https://en.wikipedia.org/wiki/Type_conversion)** (i.e., \"converts\") the elements of `numbers` as objects of type `int` with the [int()](https://docs.python.org/3/library/functions.html#int) built-in first before filtering and averaging them. In doing so, it introduces an *internal* variable `nums` whose name collides with the one in the global scope. The **inequality operator** `!=` is just the **reversed** version of `==`."
"`nums` in the global scope is of course the same list from above."
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"nums"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"As good practice, let's first use inputs for which we can calculate the answer in our heads to verify that `average_odds()` is correct."
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/plain": [
"3.0"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"average_odds([1, 100, 3, 100, 5]) # verify the function's correctness with predictable inputs"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"To make the confusion even bigger, let's also pass the global `nums` as an argument to `average_odds()`."
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/plain": [
"5.0"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"average_odds(nums)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Python, however, is again smart enough to keep the two `nums` variables apart. So the global `nums` is still pointing to the very same list object as before."
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/plain": [
"[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"nums"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"The reason why everything works just fine is that *every time* we (re-)assign an object to a variable inside a function with the `=` statement, this is done in the local scope by default. There are ways to change variables existing in an outer scope from within a function but we save that for a later chapter.\n",
"\n",
"Variables whose names collide with the ones of variables in enclosing scopes - and the global scope is just the most enclosing scope - are said to **shadow** them.\n",
"\n",
"While this is not a problem for Python as we have observed, it may lead to less readable code for us humans and should be avoided if possible. But, as we have also heard, \"[naming things](https://skeptics.stackexchange.com/questions/19836/has-phil-karlton-ever-said-there-are-only-two-hard-things-in-computer-science)\" is often considered hard as well and we have to be prepared to encounter shadowing variables."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Built-in Functions"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Python comes with plenty of useful functions built in, some of which we have already seen before. The [documentation](https://docs.python.org/3/library/functions.html) has the full list. Just as core Python itself, they are implemented in C and thus very fast.\n",
"\n",
"[len()](https://docs.python.org/3/library/functions.html#len) counts the number of elements in a container object while [sum()](https://docs.python.org/3/library/functions.html#sum) adds up all the elements."
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"10"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(nums)"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/plain": [
"55"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sum(nums)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"We can cast certain objects as a different type. For example, to \"convert\" a float or a text into an integer, we use the [int()](https://docs.python.org/3/library/functions.html#int) built-in. This actually creates a *new* object of type `int` from the provided `avg` or `\"6\"` objects who continue to exist in memory unchanged."
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"6.0"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"avg"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/plain": [
"6"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"int(avg)"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"6"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"int(\"6\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Observe that casting as an integer is different from rounding with the [round()](https://docs.python.org/3/library/functions.html#round) built-in function."
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"7"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"int(7.99)"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/plain": [
"8"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"round(7.99)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Not all conversions are valid and *runtime* errors can occur as the `ValueError` shows."
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"ename": "ValueError",
"evalue": "invalid literal for int() with base 10: 'six'",
"\u001b[0;32m<ipython-input-31-28b6aa9c0167>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"six\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;31mValueError\u001b[0m: invalid literal for int() with base 10: 'six'"
]
}
],
"source": [
"int(\"six\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"We can also go in the other direction with the [float()](https://docs.python.org/3/library/functions.html#float) built-in function."
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"data": {
"text/plain": [
"42.0"
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"float(42)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Positional vs. Keyword Arguments"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"So far we have only specified one parameter in each of our user-defined functions. In Chapter 1, however, we saw the built-in function [divmod()](https://docs.python.org/3/library/functions.html#divmod) taking two arguments. Obviously, the order of the numbers passed in mattered. Whenever we call a function and list its arguments in a comma seperated manner, we say that we pass in the arguments by position or refer to them as **positional arguments**."
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"(4, 2)"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"divmod(42, 10)"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/plain": [
"(0, 10)"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"divmod(10, 42)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"For many functions there is a natural order to the arguments. But what if this is not the case? For example, let's create a close relative of the above `average_evens()` function that also scales the resulting average by a factor. What is more natural? Passing in `numbers` first? Or `scalar`? There is no obvious way and we continue with the first alternative for no real reason."
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"def scaled_average_evens(numbers, scalar):\n",
" \"\"\"Calculate a scaled average of all even numbers in a list.\n",
"\n",
" Args:\n",
" numbers (list): a list of numbers; may be integers or floats\n",
" scalar (float): the scalar that multiplies the average\n",
" of the even numbers\n",
"\n",
" Returns:\n",
" float: scaled average\n",
" \"\"\"\n",
" evens = [n for n in numbers if n % 2 == 0]\n",
" average = sum(evens) / len(evens)\n",
" return scalar * average"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"As with [divmod()](https://docs.python.org/3/library/functions.html#divmod), we can pass in the arguments by position."
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"12.0"
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"scaled_average_evens(nums, 2)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"However, now the function call is a bit harder to comprehend as we need to always remember what the `2` means. This becomes even harder the more parameters we specify.\n",
"\n",
"Luckily, we can also reference the formal parameter names as **keyword arguments**. We can even combine positional and keyword arguments in the same function call. Each of the following does the exact same thing."
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"12.0"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"scaled_average_evens(nums, scalar=2)"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"12.0"
]
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"scaled_average_evens(numbers=nums, scalar=2)"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"12.0"
]
},
"execution_count": 39,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"scaled_average_evens(scalar=2, numbers=nums)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Unfortunately, there are ways to screw this up with a `SyntaxError`: If positional and keyword arguments are mixed, the keyword arguments *must* come last."
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"ename": "SyntaxError",
"evalue": "positional argument follows keyword argument (<ipython-input-40-a6b957a168d8>, line 1)",
"Defining both `average_evens()` and `scaled_average_evens()` is also kind of repetitive as most of their code is the same. Such a redundancy will make a code base harder to maintain in the long run as whenever we change the logic in one function we must *not* forget to do so for the other function as well.\n",
"\n",
"A better way is to design related functions in a **modular** fashion such that they re-use each other's logic.\n",
"\n",
"For example, as not scaling an average is just a special case of scaling it with `1`, we could re-define the two functions like below: In this setting, the function resembling the *special* case (i.e., `average_evens()`) simply **forwards** the call to the more *general* function (i.e., `scaled_average_evens()`) using a `scalar=1` argument."
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"def scaled_average_evens(numbers, scalar):\n",
" \"\"\"Calculate a scaled average of all even numbers in a list.\n",
"\n",
" ...\n",
" \"\"\"\n",
" evens = [n for n in numbers if n % 2 == 0]\n",
" average = sum(evens) / len(evens)\n",
" return scalar * average"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [],
"source": [
"def average_evens(numbers):\n",
" \"\"\"Calculate the average of all even numbers in a list.\n",
"\n",
" ...\n",
" \"\"\"\n",
" return scaled_average_evens(numbers, scalar=1) # refactored to use the logic in scaled_average_evens()"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"The outcome of `average_evens(nums)` is of course still `6.0`."
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/plain": [
"6.0"
]
},
"execution_count": 43,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"average_evens(nums)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"If we *assume* that scaling the average occurs rarely, we could also handle both cases in *one* function definition by providing a **default value** for the `scalar` parameter."
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"def average_evens(numbers, scalar=1):\n",
" \"\"\"Calculate the average of all even numbers in a list.\n",
"\n",
" Args:\n",
" numbers (list): list of numbers; may be integers or floats\n",
" scalar (float, optional): the scalar that multiplies the\n",
" average of the even numbers\n",
"\n",
" Returns:\n",
" float: (scaled) average\n",
" \"\"\"\n",
" evens = [n for n in numbers if n % 2 == 0]\n",
" average = sum(evens) / len(evens)\n",
" return scalar * average"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Now we can call the function either with or without the `scalar` argument.\n",
"\n",
"If `scalar` is passed in, this can be done as either a positional or a keyword argument. Which of the two versions where `scalar` is `2` is easier to comprehend in a large program?"
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"6.0"
]
},
"execution_count": 45,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"average_evens(nums)"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/plain": [
"12.0"
]
},
"execution_count": 46,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"average_evens(nums, 2)"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/plain": [
"12.0"
]
},
"execution_count": 47,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"average_evens(nums, scalar=2)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Keyword-only Arguments"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Since we *assumed* that scaling will occur *rarely*, we'd prefer that our new version of `average_evens()` be called with a *keyword argument* whenever `scalar` is passed in explicitly. Then, the second argument is never ambiguous as we could always read its name.\n",
"\n",
"Luckily, Python offers a **keyword-only** syntax, where all we need to do is to place the arguments for which we require *explicit* keyword use after an asterix `*`."
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"def average_evens(numbers, *, scalar=1):\n",
" \"\"\"Calculate the average of all even numbers in a list.\n",
"\n",
" Args:\n",
" numbers (list): list of numbers; may be integers or floats\n",
" scalar (float, optional): the scalar that multiplies the\n",
" average of the even numbers\n",
"\n",
" Returns:\n",
" float: (scaled) average\n",
" \"\"\"\n",
" evens = [n for n in numbers if n % 2 == 0]\n",
" average = sum(evens) / len(evens)\n",
" return scalar * average"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"If we now call the function with a `scalar` argument passed in, we *must* use keyword notation."
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"6.0"
]
},
"execution_count": 49,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"average_evens(nums)"
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/plain": [
"12.0"
]
},
"execution_count": 50,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"average_evens(nums, scalar=2)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"If we pass in `scalar` as a positional argument instead, we obtain a `TypeError`."
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"ename": "TypeError",
"evalue": "average_evens() takes 1 positional argument but 2 were given",
"\u001b[0;32m<ipython-input-51-b36617fef43e>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0maverage_evens\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mnums\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m2\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;31mTypeError\u001b[0m: average_evens() takes 1 positional argument but 2 were given"
]
}
],
"source": [
"average_evens(nums, 2)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Anonymous Functions"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"The `def` statement is a statement because of its side effect of creating a *new* name that points to a *new* `function` object in memory.\n",
"\n",
"We can thus think of it as doing *two* things at once (i.e., either both of them happen or none). First, a `function` object is created that contains the concrete $0$s and $1$s that resemble the instructions we put into the function's body. In the context of a function, these $0$s and $1$s are also called **[byte code](https://en.wikipedia.org/wiki/Bytecode)**. Then, a name is created pointing at the new `function` object.\n",
"\n",
"Only this second aspect makes `def` a statement: Merely creating a new object in memory without making it accessible for later reference does *not* constitute a side effect. This is because the state the program is *not* changed. After all, if we cannot reference an object, how do we know that it is actually existing?\n",
"\n",
"Python provides a so-called **[lambda expression](https://docs.python.org/3/reference/expressions.html#lambda)** syntax that allows us to *only* create a `function` object in memory *without* making a name point to it.\n",
"\n",
"It starts with the keyword `lambda` followed by an optional comma seperated enumeration of parameters, a mandatory colon, and *one* expression that also is the resulting `function` object's return value.\n",
"\n",
"Because it does not create a name pointing to the object, we effectively create \"anonymous\" functions with it. In the example, we create a `function` object that adds `3` to the only argument passed in as the parameter `x` and returns that sum."
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"<function __main__.<lambda>(x)>"
]
},
"execution_count": 52,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"lambda x: x + 3"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"If you think this is a rather pointless thing to do, you are absolutely correct!\n",
"\n",
"We created a `function` object, dit *not* call it, and Python immediately forgot about it. So what's the point?\n",
"\n",
"Just to prove that the `lambda` expression really creates a callable `function` object, we use the simple `=` statement to assign it to the variable `add_three`, which is really `add_three()` as per our convention from above."
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"add_three = lambda x: x + 3 # we could and should use def instead"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Now we can call `add_three()` as if we defined it with the `def` statement to begin with."
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/plain": [
"13"
]
},
"execution_count": 54,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"add_three(10)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Alternatively, we could call any anonymous `function` object created with an `lambda` expression right away (i.e. without assigning it to a variable), which looks really weird for now as we need *two* pairs of parentheses: The first is just a delimiter whereas the second the call operator."
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"42"
]
},
"execution_count": 55,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"(lambda x: x + 3)(39) # this looks weird but will become very useful"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"The main point of having functions without a name is to use them in a situation where we know ahead of time that we will use the function only once.\n",
"\n",
"Very popular contexts where we will apply lambda expressions are with the **map-filter-reduce** paradigm in Chapter 7 or when we do \"number crunching\" with **arrays** and **data frames** in Chapter 9."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Extending Core Python"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"So far, we have only used what we refer to as **core** Python in this book. By this, we mean all the syntactical rules as specified in the [language reference](https://docs.python.org/3/reference/) and a minimal set of about 50 built-in [functions](https://docs.python.org/3/library/functions.html). With this we could already implement any algorithm or business logic we can think of!\n",
"\n",
"However, after our first couple of programs, we would already start seeing recurring patterns in the code we write. In other words, we would constantly be \"re-inventing the wheel\" in each new project.\n",
"\n",
"Would it not be smarter to pull out the re-usable components from our programs and put them into some project independent **library** of generically useful functionalities? Then we would only need a way of including these **utilities** in our projects.\n",
"\n",
"As all programmers across all languages face this very same issue, most programming languages come with a so-called **[standard library](https://en.wikipedia.org/wiki/Standard_library)** that provides utilities to accomplish common tasks without a lot of code. Examples are making an HTTP request to some website, open and read popular file types (e.g., CSV or Excel files), do something on a computer's file system, and many more."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Standard Library"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Python also comes with its own [standard library](https://docs.python.org/3/library/index.html) that is structured into coherent modules and packages for given topics: A **module** is just a plain text file with the file extension *.py* that contains Python code while a **package** is a folder that groups several related modules.\n",
"\n",
"The code in the [standard library](https://docs.python.org/3/library/index.html) is contributed and maintained by many volunteers around the world. In contrast to so-called \"third-party\" packages (cf., the next section below), the Python core development team closely monitors and tests the code in the [standard library](https://docs.python.org/3/library/index.html). Consequently, we can be reasonably sure that anything provided by it works correctly independent of our computer's operating system and will most likely also be there in the next Python versions. Parts in the [standard library](https://docs.python.org/3/library/index.html) that are computationally expensive are often re-written in C and therefore much faster than anything we could code in Python ourselves. So, whenever we can solve a problem with the help of the [standard library](https://docs.python.org/3/library/index.html), it is almost always the best way to do so as well.\n",
"\n",
"The [standard library](https://docs.python.org/3/library/index.html) has grown very big over the years and we refer to the website [PYMOTW](https://pymotw.com/3/index.html) (i.e., \"Python Module of the Week\") that features well written introductory tutorials and how-to guides to most parts of the library. The same author also published a [book](https://www.amazon.com/Python-Standard-Library-Example-Developers/dp/0134291050/ref=as_li_ss_tl?ie=UTF8&qid=1493563121&sr=8-1&keywords=python+3+standard+library+by+example) that many Pythonistas keep on their shelf for reference. Knowing what is in the [standard library](https://docs.python.org/3/library/index.html) is quite valuable for solving real world tasks quickly.\n",
"\n",
"Throughout this book we will look at many modules and packages from the [standard library](https://docs.python.org/3/library/index.html) in more depth, starting with the [math](https://docs.python.org/3/library/math.html) and [random](https://docs.python.org/3/library/random.html) modules in this chapter."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"#### The [math](https://docs.python.org/3/library/math.html) Module"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"The [math](https://docs.python.org/3/library/math.html) module provides non-trivial mathematical functions like $sin(x)$ and constants like $\\pi$ or $\\text{e}$.\n",
"\n",
"To make functions and variables defined \"somewhere else\" available in our current program, we must first **[import](https://docs.python.org/3/reference/simple_stmts.html#import)** them with the `import` statement. "
]
},
{
"cell_type": "code",
"execution_count": 56,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"import math"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"This creates the variable `math` that points to a **[module object](https://docs.python.org/3/glossary.html#term-module)** (i.e., type `module`) in memory."
]
},
{
"cell_type": "code",
"execution_count": 57,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"<module 'math' from '/home/webartifex/.pyenv/versions/anaconda3-2019.07/lib/python3.7/lib-dynload/math.cpython-37m-x86_64-linux-gnu.so'>"
"`module` objects serve as namespaces to organize the names inside a module. In this context, a namespace is nothing but a prefix that avoids collision with the variables already defined at the location where we import the module into.\n",
"\n",
"Let's see what we can do with the `math` module.\n",
"\n",
"The [dir()](https://docs.python.org/3/library/functions.html#dir) built-in function can also be used with an argument passed in. Ignoring the dunder-style names, `math` offers quite a lot of ... names. As we cannot know at this point in time if a listed name refers to a function or an ordinary variable, we use the more generic term **attribute** to mean either one of them."
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"['__doc__',\n",
" '__file__',\n",
" '__loader__',\n",
" '__name__',\n",
" '__package__',\n",
" '__spec__',\n",
" 'acos',\n",
" 'acosh',\n",
" 'asin',\n",
" 'asinh',\n",
" 'atan',\n",
" 'atan2',\n",
" 'atanh',\n",
" 'ceil',\n",
" 'copysign',\n",
" 'cos',\n",
" 'cosh',\n",
" 'degrees',\n",
" 'e',\n",
" 'erf',\n",
" 'erfc',\n",
" 'exp',\n",
" 'expm1',\n",
" 'fabs',\n",
" 'factorial',\n",
" 'floor',\n",
" 'fmod',\n",
" 'frexp',\n",
" 'fsum',\n",
" 'gamma',\n",
" 'gcd',\n",
" 'hypot',\n",
" 'inf',\n",
" 'isclose',\n",
" 'isfinite',\n",
" 'isinf',\n",
" 'isnan',\n",
" 'ldexp',\n",
" 'lgamma',\n",
" 'log',\n",
" 'log10',\n",
" 'log1p',\n",
" 'log2',\n",
" 'modf',\n",
" 'nan',\n",
" 'pi',\n",
" 'pow',\n",
" 'radians',\n",
" 'remainder',\n",
" 'sin',\n",
" 'sinh',\n",
" 'sqrt',\n",
" 'tan',\n",
" 'tanh',\n",
" 'tau',\n",
" 'trunc']"
]
},
"execution_count": 60,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dir(math)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Common mathematical constants and functions are now available via the dot operator `.` on the `math` object. This operator is sometimes also called the **attribute access operator**, in line with the just introduced term."
]
},
{
"cell_type": "code",
"execution_count": 61,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"3.141592653589793"
]
},
"execution_count": 61,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"math.pi"
]
},
{
"cell_type": "code",
"execution_count": 62,
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/plain": [
"2.718281828459045"
]
},
"execution_count": 62,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"math.e"
]
},
{
"cell_type": "code",
"execution_count": 63,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"<function math.sqrt(x, /)>"
]
},
"execution_count": 63,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"math.sqrt"
]
},
{
"cell_type": "code",
"execution_count": 64,
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Help on built-in function sqrt in module math:\n",
"\n",
"sqrt(x, /)\n",
" Return the square root of x.\n",
"\n"
]
}
],
"source": [
"help(math.sqrt)"
]
},
{
"cell_type": "code",
"execution_count": 65,
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/plain": [
"1.4142135623730951"
]
},
"execution_count": 65,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"math.sqrt(2)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Observe how the arguments passed to functions do not need to be just variables or simple literals. Instead, we can pass in any *expression* that evaluates to a *new* object of the type the function expects.\n",
"\n",
"So just as a reminder from the expression vs. statement discussion in Chapter 1: An expression is *any* syntactically correct combination of variables and literals with operators. And the call operator `()` is just ... well another operator. So both of the next two code cells are just expressions! They have no permanent side effect in memory. We can execute them as often as we want *without* changing the state of the program (i.e., this Jupyter notebook).\n",
"\n",
"So, regarding the very next cell in particular: Although the `2 ** 2` creates a *new* object `4` in memory that is then immediately passed into the [math.sqrt()](https://docs.python.org/3/library/math.html#math.sqrt) function, once that function call returns, \"all is lost\" and the newly created `4` object is forgotten again, as well as the return value of [math.sqrt()](https://docs.python.org/3/library/math.html#math.sqrt)."
]
},
{
"cell_type": "code",
"execution_count": 66,
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/plain": [
"2.0"
]
},
"execution_count": 66,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"math.sqrt(2 ** 2)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Even the **composition** of several function calls only constitutes another expression."
]
},
{
"cell_type": "code",
"execution_count": 67,
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/plain": [
"10.0"
]
},
"execution_count": 67,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"math.sqrt(average_evens([99, 100, 101]))"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"If we only need one particular function from a module, we can also use the alternative `from ... import ...` syntax.\n",
"\n",
"This does *not* create a module object but only makes a variable in our current location point to an object defined inside a module directly."
]
},
{
"cell_type": "code",
"execution_count": 68,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [],
"source": [
"from math import sqrt"
]
},
{
"cell_type": "code",
"execution_count": 69,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"data": {
"text/plain": [
"4.0"
]
},
"execution_count": 69,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sqrt(16)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"#### The [random](https://docs.python.org/3/library/random.html) Module"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Often times, we need a random variable, for example, when we want to build a simulation program. The [random](https://docs.python.org/3/library/random.html) module in the [standard library](https://docs.python.org/3/library/index.html) often suffices for that."
]
},
{
"cell_type": "code",
"execution_count": 70,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"import random"
]
},
{
"cell_type": "code",
"execution_count": 71,
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/plain": [
"<module 'random' from '/home/webartifex/.pyenv/versions/anaconda3-2019.07/lib/python3.7/random.py'>"
]
},
"execution_count": 71,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"random"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Besides the usual dunder-style attributes, the [dir()](https://docs.python.org/3/library/functions.html#dir) built-in function lists some attributes in an upper case naming convention and many others starting with a single underscore \"\\_\". To understand the former, we have to wait until Chapter 10 while the latter are explained further below."
]
},
{
"cell_type": "code",
"execution_count": 72,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"['BPF',\n",
" 'LOG4',\n",
" 'NV_MAGICCONST',\n",
" 'RECIP_BPF',\n",
" 'Random',\n",
" 'SG_MAGICCONST',\n",
" 'SystemRandom',\n",
" 'TWOPI',\n",
" '_BuiltinMethodType',\n",
" '_MethodType',\n",
" '_Sequence',\n",
" '_Set',\n",
" '__all__',\n",
" '__builtins__',\n",
" '__cached__',\n",
" '__doc__',\n",
" '__file__',\n",
" '__loader__',\n",
" '__name__',\n",
" '__package__',\n",
" '__spec__',\n",
" '_acos',\n",
" '_bisect',\n",
" '_ceil',\n",
" '_cos',\n",
" '_e',\n",
" '_exp',\n",
" '_inst',\n",
" '_itertools',\n",
" '_log',\n",
" '_os',\n",
" '_pi',\n",
" '_random',\n",
" '_sha512',\n",
" '_sin',\n",
" '_sqrt',\n",
" '_test',\n",
" '_test_generator',\n",
" '_urandom',\n",
" '_warn',\n",
" 'betavariate',\n",
" 'choice',\n",
" 'choices',\n",
" 'expovariate',\n",
" 'gammavariate',\n",
" 'gauss',\n",
" 'getrandbits',\n",
" 'getstate',\n",
" 'lognormvariate',\n",
" 'normalvariate',\n",
" 'paretovariate',\n",
" 'randint',\n",
" 'random',\n",
" 'randrange',\n",
" 'sample',\n",
" 'seed',\n",
" 'setstate',\n",
" 'shuffle',\n",
" 'triangular',\n",
" 'uniform',\n",
" 'vonmisesvariate',\n",
" 'weibullvariate']"
]
},
"execution_count": 72,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dir(random)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"The [random.random()](https://docs.python.org/3/library/random.html#random.random) function generates a uniformly distributed `float` number between $0$ (including) and $1$ (excluding)."
"While we could build some conditional logic with an `if` statement to map the number generated by [random.random()](https://docs.python.org/3/library/random.html#random.random) to a finite set of elements manually, the [random.choice()](https://docs.python.org/3/library/random.html#random.choice) function provides a lot more **convenience** for us. We simply call it with, for example, the `nums` list and it draws one element out of it with equal chance."
" Choose a random element from a non-empty sequence.\n",
"\n"
]
}
],
"source": [
"help(random.choice)"
]
},
{
"cell_type": "code",
"execution_count": 78,
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/plain": [
"8"
]
},
"execution_count": 78,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"random.choice(nums)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"In order to re-produce the same random numbers in a simulation each time we run it, we can set the **[random seed](https://en.wikipedia.org/wiki/Random_seed)**. It is good practice to do this at the beginning of a program or notebook. Then every time we re-start the program, we will get the exact same random numbers again. This becomes very important, for example, when we employ certain machine learning algorithms that rely on randomization, like the infamous [Random Forest](https://en.wikipedia.org/wiki/Random_forest), and want to obtain **re-producable** results.\n",
"\n",
"The [random](https://docs.python.org/3/library/random.html) module provides the [random.seed()](https://docs.python.org/3/library/random.html#random.seed) function to do that."
]
},
{
"cell_type": "code",
"execution_count": 79,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"random.seed(42)"
]
},
{
"cell_type": "code",
"execution_count": 80,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"0.6394267984578837"
]
},
"execution_count": 80,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"random.random()"
]
},
{
"cell_type": "code",
"execution_count": 81,
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [],
"source": [
"random.seed(42)"
]
},
{
"cell_type": "code",
"execution_count": 82,
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/plain": [
"0.6394267984578837"
]
},
"execution_count": 82,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"random.random()"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Third-party Packages"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"As the Python community is based around open source, many developers publish their code, for example, on the Python Package Index [PyPI](https://pypi.org) from where anyone can download and install it for free using command line based tools like [pip](https://pip.pypa.io/en/stable/) or [conda](https://conda.io/en/latest/). This way, we can always customize our Python installation even more. Managing many such packages is actually quite a deep topic on its own, sometimes fearfully called **[dependency hell](https://en.wikipedia.org/wiki/Dependency_hell)**.\n",
"\n",
"The difference between the [standard library](https://docs.python.org/3/library/index.html) and such **third-party** packages is that in the first case the code goes through a much more formalized review process and is officially endorsed by the Python core developers. Yet, many third-party projects also offer the highest quality standards and a lot of such software is actually also relied on by many businesses and researchers.\n",
"\n",
"Throughout this book, we will look at many third-party libraries, mostly from Python's [scientific stack](https://scipy.org/about.html), a tightly coupled set of third-party libraries for storing **big data** efficiently ([numpy](http://www.numpy.org/)), \"wrangling\" ([pandas](https://pandas.pydata.org/)) and visualizing them ([matplotlib](https://matplotlib.org/) and [seaborn](https://seaborn.pydata.org/)), fitting classical statistical models ([statsmodels](http://www.statsmodels.org/)), training machine learning models ([sklearn](http://scikit-learn.org/)), and much more.\n",
"\n",
"Below, we briefly show how to install third-party libraries."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"#### The [numpy](http://www.numpy.org/) Library"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"[numpy](http://www.numpy.org/) is the de-facto standard in the Python world for handling **array-like** data. That is a fancy word for data that can be put into a matrix or vector format. We will look at it in depth in Chapter 9.\n",
"\n",
"As [numpy](http://www.numpy.org/) is *not* in the [standard library](https://docs.python.org/3/library/index.html), it must be *manually* installed, for example, with the [pip](https://pip.pypa.io/en/stable/) tool. As mentioned in Chapter 0, to execute terminal commands from within a Jupyter notebook, we just need to start a code cell with an exclamation mark.\n",
"\n",
"If you are running this notebook with an installation of the [Anaconda Distribution](https://www.anaconda.com/distribution/), then [numpy](http://www.numpy.org/) is probably already installed. Running the cell below, will just confirm that."
]
},
{
"cell_type": "code",
"execution_count": 83,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Requirement already satisfied: numpy in /home/webartifex/.pyenv/versions/anaconda3-2019.07/lib/python3.7/site-packages (1.16.4)\r\n"
]
}
],
"source": [
"!pip install numpy"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"[numpy](http://www.numpy.org/) is conventionally imported with the shorter **idiomatic** name `np`. The `as` in the import statement just changes the resulting variable name. It is a shortcut for the three lines `import numpy`, `np = numpy`, and `del numpy`."
]
},
{
"cell_type": "code",
"execution_count": 84,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"import numpy as np"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"`np` can be used in the same way as `math` or `random` above."
]
},
{
"cell_type": "code",
"execution_count": 85,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"<module 'numpy' from '/home/webartifex/.pyenv/versions/anaconda3-2019.07/lib/python3.7/site-packages/numpy/__init__.py'>"
]
},
"execution_count": 85,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Let's convert the above `nums` list into a vector-like object of type `numpy.ndarray`."
]
},
{
"cell_type": "code",
"execution_count": 86,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"vec = np.array(nums)"
]
},
{
"cell_type": "code",
"execution_count": 87,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])"
]
},
"execution_count": 87,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"vec"
]
},
{
"cell_type": "code",
"execution_count": 88,
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/plain": [
"numpy.ndarray"
]
},
"execution_count": 88,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"type(vec)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"[numpy](http://www.numpy.org/) somehow magically adds new behavior to Python's built-in arithmetic operators. For example, we can now [scalar-multiply](https://en.wikipedia.org/wiki/Scalar_multiplication) `vec`.\n",
"\n",
"[numpy](http://www.numpy.org/)'s functions are implemented in highly optimized C code and therefore fast, especially when it comes to big data."
]
},
{
"cell_type": "code",
"execution_count": 89,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([ 2, 4, 6, 8, 10, 12, 14, 16, 18, 20])"
]
},
"execution_count": 89,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"2 * vec"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"This scalar multiplication would \"fail\" if we used a plain `list` object like `nums` instead of an `numpy.ndarray` object like `vec`. The two types exhibit different **behavior** when used with the same operator, another example of **operator overloading**."
"[numpy](http://www.numpy.org/)'s `numpy.ndarray` objects integrate nicely with Python's built-in functions (e.g., [sum()](https://docs.python.org/3/library/functions.html#sum)) or functions from the [standard library](https://docs.python.org/3/library/index.html) (e.g., [random.choice()](https://docs.python.org/3/library/random.html#random.choice))."
]
},
{
"cell_type": "code",
"execution_count": 91,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"55"
]
},
"execution_count": 91,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sum(vec)"
]
},
{
"cell_type": "code",
"execution_count": 92,
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/plain": [
"1"
]
},
"execution_count": 92,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"random.choice(vec)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Local Modules and Packages"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"For sure, we can create our own modules and packages. In the repository's main directory, there is a [*sample_module.py*](https://github.com/webartifex/intro-to-python/blob/master/sample_module.py) file that contains, among others, a function equivalent to the final version of `average_evens()`. To be realistic, this sample module is structured in a modular manner with several functions building on each other. It is best to skim over it *now* before reading on.\n",
"\n",
"To make code we put into a *.py* file available in our program, we import it as a module just as we did above with modules in the [standard library](https://docs.python.org/3/library/index.html) or third-party packages.\n",
"\n",
"The *name* to be imported is the file's name except for the *.py* part. In order for this to work, the file's name *must* adhere to the *same* rules as hold for [variable names](https://docs.python.org/3/reference/lexical_analysis.html#identifiers) in general.\n",
"\n",
"What happens during an import is conceptually as follows. When Python sees the `import sample_module` part, it first creates a *new* object of type `module` in memory. This is effectively an *empty* namespace. Then, it executes the imported file's code from top to bottom. Whatever variables are still defined at the end of this, are put into the module's namespace. Only if the file's code does *not* raise an error, will Python make a variable in our current location (i.e., `mod` here) point to the created `module` object. Otherwise, it is discarded. In essence, it is as if we copied and pasted the file's code in place of the import statement. If we import an already imported module again, Python is smart enough to avoid doing all this work all over and does nothing."
]
},
{
"cell_type": "code",
"execution_count": 93,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"import sample_module as mod"
]
},
{
"cell_type": "code",
"execution_count": 94,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"<module 'sample_module' from '/home/webartifex/repos/intro-to-python/sample_module.py'>"
]
},
"execution_count": 94,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mod"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Disregarding the dunder-style attributes, `mod` defines the five attributes `_default_scalar`, `_scaled_average`, `average`, `average_evens`, and `average_odds`, which are exactly the ones we would expect from reading the [*sample_module.py*](https://github.com/webartifex/intro-to-python/blob/master/sample_module.py) file.\n",
"\n",
"An important convention when working with imported code is to *disregard* any attributes starting with an underscore \"\\_\". These are considered **private** and constitute **implementation details** the author of the imported code might change in a future version of his software. We *must* not rely on them in any way.\n",
"\n",
"In contrast, the three remaining **public** attributes are the functions `average()`, `average_evens()`, and `average_odds()` that we may use after the import."
]
},
{
"cell_type": "code",
"execution_count": 95,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"['__builtins__',\n",
" '__cached__',\n",
" '__doc__',\n",
" '__file__',\n",
" '__loader__',\n",
" '__name__',\n",
" '__package__',\n",
" '__spec__',\n",
" '_default_scalar',\n",
" '_scaled_average',\n",
" 'average',\n",
" 'average_evens',\n",
" 'average_odds']"
]
},
"execution_count": 95,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dir(mod)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"We can use the imported `mod.average_evens()` just like `average_evens()` defined above. The advantage we get from **modularization** with *.py* files is that we can now easily re-use functions across different Jupyter notebooks without re-defining them again and again. Also, we can \"source out\" code that distracts from the storyline told in a notebook."
]
},
{
"cell_type": "code",
"execution_count": 96,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Help on function average_evens in module sample_module:\n",
"\n",
"average_evens(numbers, *, scalar=1)\n",
" Calculate the average of all even numbers in a list.\n",
" \n",
" Args:\n",
" numbers (list): list of numbers; may be integers or floats\n",
" scalar (float, optional): the scalar that multiplies the\n",
" average of the even numbers\n",
" \n",
" Returns:\n",
" float: (scaled) average\n",
"\n"
]
}
],
"source": [
"help(mod.average_evens)"
]
},
{
"cell_type": "code",
"execution_count": 97,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"6.0"
]
},
"execution_count": 97,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mod.average_evens(nums)"
]
},
{
"cell_type": "code",
"execution_count": 98,
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/plain": [
"12.0"
]
},
"execution_count": 98,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mod.average_evens(nums, scalar=2)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Packages are a generalization of modules and we will look at one in Chapter 10 in detail. You can, however, already look at a [sample package](https://github.com/webartifex/intro-to-python/tree/master/sample_package) in the repository, which is nothing but a folder with *.py* files in it.\n",
"\n",
"As a further references on modules, we refer to the [official tutorial](https://docs.python.org/3/tutorial/modules.html)."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"## TL;DR"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"A **function** is a **named sequence** of statements that perform a computation.\n",
"\n",
"Functions provide benefits as they:\n",
"\n",
"- make programs easier to comprehend and debug for humans as they give names to the smaller parts of a larger program (i.e., they **modularize** a code base), and\n",
"- eliminate redundancies by allowing **re-use of code**.\n",
"\n",
"Functions are **defined** once with the `def` statement. Then, they can be **called** many times with the call operator `()`.\n",
"\n",
"They may process **parameterized** inputs, **passed** in as **arguments**, and output a **return value**.\n",
"\n",
"Arguments can be passed in by **position** or **keyword**. Some functions may even require **keyword-only** arguments.\n",