"**Note**: Click on \"*Kernel*\" > \"*Restart Kernel and Clear All Outputs*\" in [JupyterLab](https://jupyterlab.readthedocs.io/en/stable/) *before* reading this notebook to reset its output. If you cannot run this file on your machine, you may want to open it [in the cloud <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_mb.png\">](https://mybinder.org/v2/gh/webartifex/intro-to-python/main?urlpath=lab/tree/02_functions/02_content.ipynb)."
"So far, we have only used what we refer to as **core** Python in this book. By this, we mean all the syntactical rules as specified in the [language reference <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/reference/) and a minimal set of about 50 built-in [functions <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/functions.html). With this, we could already implement any algorithm or business logic we can think of!\n",
"However, after our first couple of programs, we would already start seeing recurring patterns in the code we write. In other words, we would constantly be \"reinventing the wheel\" in each new project.\n",
"Would it not be smarter to pull out the reusable components from our programs and put them into some project independent **library** of generically useful functionalities? Then we would only need a way of including these **utilities** in our projects.\n",
"\n",
"As all programmers across all languages face this very same issue, most programming languages come with a so-called **[standard library <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_wiki.png\">](https://en.wikipedia.org/wiki/Standard_library)** that provides utilities to accomplish everyday tasks without much code. Examples are making an HTTP request to some website, open and read popular file types (e.g., CSV or Excel files), do something on a computer's file system, and many more."
"Python also comes with a [standard library <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/index.html) that is structured into coherent modules and packages for given topics: A **module** is just a plain text file with the file extension *.py* that contains Python code while a **package** is a folder that groups several related modules.\n",
"\n",
"The code in the [standard library <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/index.html) is contributed and maintained by many volunteers around the world. In contrast to so-called \"third-party\" packages (cf., the next section below), the Python core development team closely monitors and tests the code in the [standard library <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/index.html). Consequently, we can be reasonably sure that anything provided by it works correctly independent of our computer's operating system and will most likely also be there in the next Python versions. Parts in the [standard library <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/index.html) that are computationally expensive are often rewritten in C and, therefore, much faster than anything we could write in Python ourselves. So, whenever we can solve a problem with the help of the [standard library <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/index.html), it is almost always the best way to do so as well.\n",
"\n",
"The [standard library <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/index.html) has grown very big over the years, and we refer to the website [PYMOTW](https://pymotw.com/3/index.html) (i.e., \"Python Module of the Week\") that features well written introductory tutorials and how-to guides to most parts of the library. The same author also published a [book](https://www.amazon.com/Python-Standard-Library-Example-Developers/dp/0134291050/ref=as_li_ss_tl?ie=UTF8&qid=1493563121&sr=8-1&keywords=python+3+standard+library+by+example) that many Pythonistas keep on their shelf for reference. Knowing what is in the [standard library <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/index.html) is quite valuable for solving real-world tasks quickly.\n",
"\n",
"Throughout this book, we look at many modules and packages from the [standard library <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/index.html) in more depth, starting with the [math <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/math.html) and [random <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/random.html) modules in this chapter."
"The [math <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/math.html) module provides non-trivial mathematical functions like $sin(x)$ and constants like $\\pi$ or $\\text{e}$.\n",
"\n",
"To make functions and variables defined \"somewhere else\" available in our current program, we must first **import** them with the `import` statement (cf., [reference <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/reference/simple_stmts.html#import)). "
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"import math"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"This creates the variable `math` that references a **[module object <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/glossary.html#term-module)** (i.e., type `module`) in memory."
"`module` objects serve as namespaces to organize the names inside a module. In this context, a namespace is nothing but a prefix that avoids collision with the variables already defined at the location where we import the module into.\n",
"\n",
"Let's see what we can do with the `math` module.\n",
"\n",
"The [dir() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/functions.html#dir) built-in function may also be used with an argument passed in. Ignoring the dunder-style names, `math` offers quite a lot of names. As we cannot know at this point if a listed name refers to a function or an ordinary variable, we use the more generic term **attribute** to mean either one of them."
"Common mathematical constants and functions are now available via the dot operator `.` on the `math` object. This operator is sometimes also called the **attribute access operator**, in line with the just introduced term."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"3.141592653589793"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"math.pi"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"2.718281828459045"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"math.e"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"<function math.sqrt(x, /)>"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"math.sqrt"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Help on built-in function sqrt in module math:\n",
"\n",
"sqrt(x, /)\n",
" Return the square root of x.\n",
"\n"
]
}
],
"source": [
"help(math.sqrt)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"1.4142135623730951"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"math.sqrt(2)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Observe how the arguments passed to functions do not need to be just variables or simple literals. Instead, we may pass in any *expression* that evaluates to a *new* object of the type the function expects.\n",
"So just as a reminder from the expression vs. statement discussion in [Chapter 1 <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_nb.png\">](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/main/01_elements/03_content.ipynb#Expressions): An expression is *any* syntactically correct combination of variables and literals with operators. And the call operator `()` is yet another operator. So both of the next two code cells are just expressions! They have no permanent side effects in memory. We may execute them as often as we want *without* changing the state of the program (i.e., this Jupyter notebook).\n",
"So, regarding the very next cell in particular: Although the `2 ** 2` creates a *new* object `4` in memory that is then immediately passed into the [math.sqrt() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/math.html#math.sqrt) function, once that function call returns, \"all is lost\" and the newly created `4` object is forgotten again, as well as the return value of [math.sqrt() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/math.html#math.sqrt)."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"2.0"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"math.sqrt(2 ** 2)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Even the **composition** of several function calls only constitutes another expression."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"10.0"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"math.sqrt(sum([99, 100, 101]) / 3)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"If we only need one particular function from a module, we may also use the alternative `from ... import ...` syntax.\n",
"\n",
"This does *not* create a module object but only makes a variable in our current location reference an object defined inside a module directly."
"Often, we need a random variable, for example, when we want to build a simulation. The [random <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/random.html) module in the [standard library <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/index.html) often suffices for that."
"Besides the usual dunder-style attributes, the built-in [dir() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/functions.html#dir) function lists some attributes in an upper case naming convention and many others starting with a *single* underscore `_`. To understand the former, we must wait until [Chapter 11 <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_nb.png\">](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/main/11_classes/00_content.ipynb), while the latter is explained further below."
"The [random.random() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/random.html#random.random) function generates a uniformly distributed `float` number between $0$ (including) and $1$ (excluding)."
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"<function Random.random()>"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"random.random"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Help on built-in function random:\n",
"\n",
"random() method of random.Random instance\n",
" random() -> x in the interval [0, 1).\n",
"\n"
]
}
],
"source": [
"help(random.random)"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"0.782609162553633"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"random.random()"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"While we could build some conditional logic with an `if` statement to map the number generated by [random.random() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/random.html#random.random) to a finite set of elements manually, the [random.choice() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/random.html#random.choice) function provides a lot more **convenience** for us. We call it with, for example, the `numbers` list and it draws one element out of it with equal chance."
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"<bound method Random.choice of <random.Random object at 0x561db35d9a60>>"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"random.choice"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Help on method choice in module random:\n",
"\n",
"choice(seq) method of random.Random instance\n",
" Choose a random element from a non-empty sequence.\n",
"To reproduce the *same* random numbers in a simulation each time we run it, we set the **[random seed <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_wiki.png\">](https://en.wikipedia.org/wiki/Random_seed)**. It is good practice to do that at the beginning of a program or notebook. It becomes essential when we employ randomized machine learning algorithms, like the [Random Forest <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_wiki.png\">](https://en.wikipedia.org/wiki/Random_forest), and want to obtain **reproducible** results for publication in academic journals.\n",
"\n",
"The [random <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/random.html) module provides the [random.seed() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/random.html#random.seed) function to do that."
"As the Python community is based around open source, many developers publish their code, for example, on the Python Package Index [PyPI](https://pypi.org) from where anyone may download and install it for free using command-line based tools like [pip](https://pip.pypa.io/en/stable/) or [conda](https://conda.io/en/latest/). This way, we can always customize our Python installation even more. Managing many such packages is quite a deep topic on its own, sometimes fearfully called **[dependency hell <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_wiki.png\">](https://en.wikipedia.org/wiki/Dependency_hell)**.\n",
"\n",
"The difference between the [standard library <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/index.html) and such **third-party** packages is that in the first case, the code goes through a much more formalized review process and is officially endorsed by the Python core developers. Yet, many third-party projects also offer the highest quality standards and are also relied on by many businesses and researchers.\n",
"\n",
"Throughout this book, we will look at many third-party libraries, mostly from Python's [scientific stack](https://scipy.org/about.html), a tightly coupled set of third-party libraries for storing **big data** efficiently (e.g., [numpy](http://www.numpy.org/)), \"wrangling\" (e.g., [pandas](https://pandas.pydata.org/)) and visualizing them (e.g., [matplotlib](https://matplotlib.org/) or [seaborn](https://seaborn.pydata.org/)), fitting classical statistical models (e.g., [statsmodels](http://www.statsmodels.org/)), training machine learning models (e.g., [sklearn](http://scikit-learn.org/)), and much more.\n",
"\n",
"Below, we briefly show how to install third-party libraries."
"[numpy](http://www.numpy.org/) is the de-facto standard in the Python world for handling **array-like** data. That is a fancy word for data that can be put into a matrix or vector format.\n",
"As [numpy](http://www.numpy.org/) is *not* in the [standard library <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/index.html), it must be *manually* installed, for example, with the [pip](https://pip.pypa.io/en/stable/) tool. As mentioned in [Chapter 0 <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_nb.png\">](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/main/00_intro/00_content.ipynb#Markdown-Cells-vs.-Code-Cells), to execute terminal commands from within a Jupyter notebook, we start a code cell with an exclamation mark.\n",
"If you are running this notebook with an installation of the [Anaconda Distribution](https://www.anaconda.com/distribution/), then [numpy](http://www.numpy.org/) is probably already installed. Running the cell below confirms that."
"[numpy](http://www.numpy.org/) is conventionally imported with the shorter **idiomatic** name `np`. The `as` in the import statement changes the resulting variable name. It is a shortcut for the three lines `import numpy`, `np = numpy`, and `del numpy`."
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"import numpy as np"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"`np` is used in the same way as `math` or `random` above."
"Let's convert the above `numbers` list into a vector-like object of type `numpy.ndarray`."
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"vec = np.array(numbers)"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([ 7, 11, 8, 5, 3, 12, 2, 6, 9, 10, 1, 4])"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"vec"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"numpy.ndarray"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"type(vec)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"[numpy](http://www.numpy.org/) somehow magically adds new behavior to Python's built-in arithmetic operators. For example, we may now [scalar-multiply <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_wiki.png\">](https://en.wikipedia.org/wiki/Scalar_multiplication) `vec`. Also, [numpy](http://www.numpy.org/)'s functions are implemented in highly optimized C code and, therefore, are fast, especially when dealing with bigger amounts of data."
"This scalar multiplication would \"fail\" if we used a plain `list` object like `numbers` instead of an `numpy.ndarray` object like `vec`. The two types exhibit different **behavior** when used with the same operator, another example of **operator overloading**."
"For sure, we can create local modules and packages. In the Chapter 2 directory, there is a [*sample_module.py* <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_gh.png\">](https://github.com/webartifex/intro-to-python/blob/main/02_functions/sample_module.py) file that contains, among others, a function equivalent to the final version of `average_evens()`. To be realistic, this sample module is structured in a modular manner with several functions building on each other. It is best to skim over it *now* before reading on.\n",
"To make code we put into a *.py* file available in our program, we import it as a module just as we did above with modules in the [standard library <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/index.html) or third-party packages.\n",
"\n",
"The `pwd` utility tells us in which directory Python is currently in. We refer to that as the **working directory**, and `pwd` reads \"print working directory.\" JupyterLab automatically sets this to the directory in which the notebook is in."
"The *name* to be imported is the file's name except for the *.py* part. For this to work, the file's name *must* adhere to the *same* rules as hold for [variable names <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/reference/lexical_analysis.html#identifiers) in general.\n",
"\n",
"What happens during an import is as follows. When Python sees the `import sample_module` part, it first creates a *new* object of type `module` in memory. This is effectively an *empty* namespace. Then, it executes the imported file's code from top to bottom. Whatever variables are still defined at the end of this, are put into the module's namespace. Only if the file's code does *not* raise an error, will Python make a variable in our current location (i.e., `mod` here) reference the created `module` object. Otherwise, it is discarded. In essence, it is as if we copied and pasted the file's code in place of the import statement. If we import an already imported module again, Python is smart enough to avoid doing all this work all over and does nothing."
"Disregarding the dunder-style attributes, `mod` defines the attributes `_round_all`, `_scaled_average`, `average`, `average_evens`, and `average_odds`, which are exactly the ones we would expect from reading the [*sample_module.py* <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_gh.png\">](https://github.com/webartifex/intro-to-python/blob/main/02_functions/sample_module.py) file.\n",
"A convention when working with imported code is to *disregard* any attributes starting with a single underscore `_`. These are considered **private** and constitute **implementation details** the author of the imported code might change in a future version of his software. We *must not* rely on them in any way.\n",
"\n",
"In contrast, the three remaining **public** attributes are the functions `average()`, `average_evens()`, and `average_odds()` that we may use after the import."
"We use the imported `mod.average_evens()` just like `average_evens()` defined in the [first part <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_nb.png\">](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/main/02_functions/00_content.ipynb) of this chapter. The advantage we get from **modularization** with *.py* files is that we can now easily reuse functions across different Jupyter notebooks without redefining them again and again. Also, we can \"source out\" code that distracts from the storyline told in a notebook."
"Packages are a generalization of modules, and we look at one in detail in [Chapter 11 <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_nb.png\">](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/main/11_classes/04_content.ipynb#Packages-vs.-Modules).\n",
"As a further reading on modules and packages, we refer to the [official tutorial <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/tutorial/modules.html)."