{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Chapter 0: Python in a Nutshell" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Python itself is a so-called **general purpose** programming language. That means it does *not* know about any **scientific algorithms** \"out of the box.\"\n", "\n", "The purpose of this notebook is to summarize anything that is worthwhile knowing about Python and programming on a \"high level\" and lay the foundation for working with so-called **third-party libraries**, some of which we see in subsequent chapters." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using Python as a Calculator" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Any computer can always be viewed as some sort of a \"fancy calculator\" and Python is no exception from that. The following code snippet, for example, does exactly what we expect it would, namely *addition*." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "1 + 2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In terms of **syntax** (i.e., \"grammatical rules\"), digits are interpreted as plain numbers (i.e., a so-called **numerical literal**) and the `+` symbol consitutes a so-called **operator** that is built into Python.\n", "\n", "Other common operators are `-` for *subtraction*, `*` for *multiplication*, and `**` for *exponentiation*. In terms of arithmetic, Python allows the **chaining** of operations and adheres to conventions from math, namely the [PEMDAS rule ](https://en.wikipedia.org/wiki/Order_of_operations#Mnemonics)." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "45" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "87 - 42" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "15" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "3 * 5" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "8" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "2 ** 3" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "16" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "2 * 2 ** 3" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To change the **order of precedence**, parentheses may be used for grouping. Syntactically, they are so-called **delimiters** that mark the beginning and the end of a **(sub-)expression** (i.e., a group of symbols that are **evaluated** together)." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "64" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(2 * 2) ** 3" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We must beware that some operators do *not* do what we expect. So, the following code snippet is *not* an example of exponentiation." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "2 ^ 3" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*Division* is also not as straighforward as we may think!\n", "\n", "While the `/` operator does *ordinary division*, we must note the subtlety of the `.0` in the result." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "4.0" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "8 / 2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Whereas both `4` and `4.0` have the *same* **semantic meaning** to us humans, they are two *different* \"things\" for a computer!\n", "\n", "Instead of using a single `/`, we may divide with a double `//` just as well." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "4" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "8 // 2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "However, then we must be certain that the result is not a number with decimals other than `.0`. As we can guess from the result below, the `//` operator does *integer division* (i.e., \"whole number\" division)." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "7 // 2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "On the contrary, the `%` operator implements the so-called *modulo division* (i.e., \"rest\" division). Here, a result of `0` indicates that a number is divisible by another one whereas any result other than `0` shows the opposite." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "7 % 2" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "8 % 2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What makes Python such an intuitive and thus beginner-friendly language, is the fact that it is a so-called **[interpreted language ](https://en.wikipedia.org/wiki/Interpreter_%28computing%29)**. In layman's terms, this means that we can go back up and *re-execute* any of the code cells in *any order*: That allows us to built up code *incrementally*. So-called **[compiled languages ](https://en.wikipedia.org/wiki/Compiler)**, on the other hand, would require us to run a program in its entirety even if only one small part has been changed.\n", "\n", "Instead of running individual code cells \"by hand\" and taking the result as it is, Python offers us the usage of **variables** to store \"values.\" A variable is created with the single `=` symbol, the so-called **assignment statement**." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "a = 1" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "b = 2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "After assignment, we can simply ask Python about the values of `a` and `b`." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "b" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Similarly, we can use a variable in place of, for example, a numerical literal within an expression." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a + b" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Also, we may combine several lines of code into a single code cell, adding as many empty lines as we wish to group the code. Then, all of the lines are executed from top to bottom in linear order whenever we execute the cell as a whole." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a = 1\n", "b = 2\n", "\n", "a + b" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Something that fools many beginners is the fact that the `=` statement is *not* to be confused with the concept of an *equation* from math! An `=` statement is *always* to be interpreted from right to left.\n", "\n", "The following code snippet, for example, takes the \"old\" value of `a`, adds the value of `b` to it, and then stores the resulting `3` as the \"new\" value of `a`. After all, a variable is called a variable as its value is indeed variable!" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "a = a + b" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In general, the result of some expression involving variables is often stored in yet another variable for further processing. This is how more realistic programs are built up." ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a = 1\n", "b = 2\n", "\n", "c = a + b\n", "\n", "c" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As most real-life projects involve *non-scalar* data, we take a pre-liminary look at how Python models `list`-like data next. Intuitively, a `list` can be thought of as a **container** holding many \"things.\"\n", "\n", "The syntax to create a `list` are brackets, `[` and `]`, another example of delimiters, listing the individual **elements** of the `list` in between them, separated by commas.\n", "\n", "For example, the next code snippet creates a `list` named `numbers` with the numbers `1`, `2`, `3`, and `4` in it." ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[1, 2, 3, 4]" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "numbers = [a, b, c, 4]\n", "\n", "numbers" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Whenever we use any kind of delimiter, we may break the lines in between them as we wish and add other so-called **whitespace** characters like spaces to format the way the code looks like. So, the following two code cells do *exactly* the same as the previous one, even the `,` after the `4` in the second cell is ignored." ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[1, 2, 3, 4]" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "numbers = [\n", " a, b, c, 4\n", "]\n", "\n", "numbers" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[1, 2, 3, 4]" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "numbers = [\n", " a,\n", " b,\n", " c,\n", " 4,\n", "]\n", "\n", "numbers" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A nice thing to know is that JupyterLab comes with **tab completion** built in. That means we do not have to type out the name `numbers` as a whole. Try it out by simply typing `num` and then hit the tab key on your keyboard. JupyterLab should complete the variable into `numbers`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "num" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A natural operation to do with `list`s is to **access** its elements. That is achieved with another operator that also uses a bracket notation. Each element is associated with an **index**, which is why we say that we \"index into a `list`.\" As with many other programming languages, Python is 0-based, which simply means that whenever we count something, we start to count at `0`.\n", "\n", "For example, to obtain the first element in `numbers`, we write the following." ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "numbers[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that the indexing operation implicitly assumes an **order** among the elements, which is quite intuitive as we specified the numbers in order above.\n", "\n", "Another implicit assumption behind `list`s is that the number of elements is *finite*. Because of that, we may use negative indices starting at `-1` to obtain an element in right-to-left order.\n", "\n", "So, to obtain the last element in `numbers`, we write the following." ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "4" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "numbers[-1]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Expressing Logic" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The main point of using `list`s in Python is to write code that does something repeatedly, once for each element in the `list`.\n", "\n", "The syntactical construct to achieve that is the `for`-loop, which consists of two parts:\n", "- a **header** line specifying what is looped over, and\n", "- a **body** consisting of the block of code that is repeated for each element.\n", "\n", "In the example below, `for number in numbers:` constitutes the header. The expression after the `in` references the \"thing\" that is looped over (here: a `list` of `numbers`) and the name between `for` and `in` becomes a variable that is assigned a new value in each **iteration** over of the loop. A best practice is to use a meaingful name, which is why we choose the singular `number`. The `:` at the end is the charactistic symbol of a header line in general and requires the next line (and possibly many more lines) to be **indented**.\n", "\n", "The indented line constitues the `for`-loop's body. In the example, we simply take each of the numbers in `numbers`, one at a time, and add it to a `total` that is initialized at `0`. In other words, we calculate the sum of all the elements in `numbers`.\n", "\n", "Many beginners struggle with the term \"loop.\" To visualize the looping behavior of this code, we use the online tool [PythonTutor ](http://pythontutor.com/visualize.html#code=numbers%20%3D%20%5B1,%202,%203,%204%5D%0A%0Atotal%20%3D%200%0A%0Afor%20number%20in%20numbers%3A%0A%20%20%20%20total%20%3D%20total%20%2B%20number%0A%0Atotal&cumulative=false&curInstr=0&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false). That tool is helpful for two reasons:\n", "1. It allows us to execute code in \"slow motion\" (i.e., by clicking the \"next\" button on the left side, only the next atomic step of the code snippet is executed).\n", "2. It shows what happens inside the computer's memory on the right-hand side (cf., the \"*Thinking like a Computer*\" section further below)." ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "10" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "total = 0\n", "\n", "for number in numbers:\n", " total = total + number\n", "\n", "total" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Python is pretty agnostic about how far the `for`-loop's body is indented. So, both of the next code cells are equivalent to the one above. Yet, a popular convention in the Python world is to always indent code with 4 spaces per indentation level." ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "10" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "total = 0\n", "\n", "for number in numbers:\n", " total = total + number\n", "\n", "total" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "10" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "total = 0\n", "\n", "for number in numbers:\n", " total = total + number\n", "\n", "total" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Conditional Execution" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As a variation, let's add up only the even numbers. To achieve that, we exploit the fact that even numbers are all numbers that are divisible by `2` and use the `%` operator from above and a new one, namely the `==` operator for *equality comparison*, to express that idea." ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "7 % 2" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "8 % 2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Whenever *arithmetic* operators like `%` are combined in an expression with *relational* operators like `==`, the arithmetic is done first and the comparison last. So, the next two cells first obtain the rest after dividing `7` and `8` by `2` and then compare that to `0`. The result is a so-called **boolean**, either `True` or `False`, which is a computer's way of saying \"yes\" or \"no.\"" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "7 % 2 == 0" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "8 % 2 == 0" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We use such kind of expressions as the **condition** in an `if` statement that constitutes a second layer within our `for`-loop implementation. An `if` statement itself consists of yet another header line with a body. That body's code is only executed if the condition is `True`.\n", "\n", "As an example, the next code snippet loops over all the elements in `numbers` and, for each individual `number`, checks if it is even. Only if that is the case, the `number` is added to the `total`. Otherwise, nothing is done with the `number`. The example also shows how we can add so-called **comments** at the end of a line: Anything that comes after the `#` symbol is disregarded by Python. We use such comments to put little notes to ourselves within the code." ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "6" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "total = 0\n", "\n", "for number in numbers:\n", " if number % 2 == 0: # if the number is even\n", " total = total + number\n", "\n", "total" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`if` statements may have more than one header line: For example, the code in the `else`-clause's body is only executed if the condition in the `if`-clause is `False`. In the code cell below, we calculate the sum of all even numbers and subtract the sum of all odd numbers. The result is `(2 + 4) - (1 + 3)` or `-1 + 2 - 3 + 4` resembling the order of the numbers in the `for`-loop." ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "total = 0\n", "\n", "for number in numbers:\n", " if number % 2 == 0: # if the number is even\n", " total = total + number\n", " else: # if the number is odd\n", " total = total - number\n", "\n", "total" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Modularizing Code" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "One big idea in software engineering is to **modularize** code. The purpose of that is manyfold. Two very important motivations are to\n", "- make a code segment **re-usable**, and to\n", "- give a meaningful name to that code segment.\n", "\n", "The latter gets more important as the codebase in a project grows so big that we can only look at a tiny fraction of it at one point in time.\n", "\n", "The syntactical construct that enables us to achieve that is that of a **function definition**. Just like in math, we can \"define\" a function to be some set of parametrized instructions that provide some (deterministic) **output** given some *concrete* **input**.\n", "\n", "A function is defined with the `def` statement: After the `def` part comes the name of the function followed by the **parameter list** within parentheses. The first couple of lines in the function's body should be a so-called **docstring** that describes what the function does in plain English. Then, comes the code that is to be made repeatable. In the example below, we simply copy & pasted the code to calculate the sum of all even numbers in a `list` into the example function `sum_evens()`. Note that we exchanged the variable name `total` with `result` here to illustrate a point further below. In order for the function to provide back the output to \"the outside world,\" we use the `return` statement (Hint: to see its effect simply re-run the couple of code cells below with and without the `return result` line)." ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [], "source": [ "def sum_evens(numbers):\n", " \"\"\"Sum up all the even numbers in a list.\n", "\n", " Args:\n", " numbers (list of int's): numbers to be summed up\n", "\n", " Returns:\n", " total (int)\n", " \"\"\"\n", " result = 0\n", "\n", " for number in numbers:\n", " if number % 2 == 0: # if the number is even\n", " result = result + number\n", "\n", " return result" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "After defining a function, we can **call** (i.e., \"execute\") it with the `()` operator. So, just as with the `[]` above, the `()` may have a different meaning in a given context.\n", "\n", "Let's execute the function with `numbers` as the input. We see the same `6` below the cell as we do above where we run the code without a function. Without the `return` statement in the function's body, we would not see any output here.\n", "\n", "To see what happens in detail, take a look at [PythonTutor ](http://pythontutor.com/visualize.html#code=numbers%20%3D%20%5B1,%202,%203,%204%5D%0A%0Adef%20sum_evens%28numbers%29%3A%0A%20%20%20%20%22%22%22Sum%20up%20all%20the%20even%20numbers%20in%20a%20list.%22%22%22%0A%20%20%20%20result%20%3D%200%0A%0A%20%20%20%20for%20number%20in%20numbers%3A%0A%20%20%20%20%20%20%20%20if%20number%20%25%202%20%3D%3D%200%3A%0A%20%20%20%20%20%20%20%20%20%20%20%20result%20%3D%20result%20%2B%20number%0A%0A%20%20%20%20return%20result%0A%0Atotal%20%3D%20sum_evens%28numbers%29&cumulative=false&curInstr=0&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false) again. You should notice how there are two variables by the name `numbers` in memory. Python manages the memory with a concept called **namespaces** or **scopes**, which are just fancy terms for saying that Python can tell variables from different contexts apart." ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "6" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sum_evens(numbers)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To re-use the *same* instructions with *different* input, we call the function a second time and give it a brand-new `list` of numbers as its input." ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "30" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sum_evens([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note how the variable `result` only exists \"inside\" the `sum_evens()` function. Hence, we see the `NameError` here." ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "ename": "NameError", "evalue": "name 'result' is not defined", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mresult\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mNameError\u001b[0m: name 'result' is not defined" ] } ], "source": [ "result" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The concept of re-usable functions is so important in programming that Python comes with many [built-in functions ](https://docs.python.org/3/library/functions.html). Two popular examples are the [sum() ](https://docs.python.org/3/library/functions.html#sum) and [len() ](https://docs.python.org/3/library/functions.html#len) functions that calculate the sum or the number of elements in a `list` input." ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "10" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sum(numbers)" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "4" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(numbers)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Another function that comes in handy at times, is the [print() ](https://docs.python.org/3/library/functions.html#print) function that simply \"prints\" out its input to the screen. Below is the popular \"Hello World\" example that is shown in almost any introduction text on any programming language. The double quotes `\"` are yet another delimiter that specifies anything in between them as textual data (cf., the docstring above is just a special case thereof)." ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Hello World\n" ] } ], "source": [ "print(\"Hello World\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Single quotes `'` are basically just synonyms for double quotes `\"`." ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Hello World\n" ] } ], "source": [ "print('Hello World')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The [print() ](https://docs.python.org/3/library/functions.html#print) function is often helpful to **debug** a code snippet (i.e., trying to figure out what it does, step by step)." ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The square of 1 is 1\n", "The square of 2 is 4\n", "The square of 3 is 9\n", "The square of 4 is 16\n" ] } ], "source": [ "for number in numbers:\n", " square = number ** 2\n", " print(\"The square of\", number, \"is\", square)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Extending Core Python" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the Python community, we even say that \"Python comes with batteries included,\" meaning that a plain Python installation (like the one you are probably using to execute this notebook) offers all kinds of functionalities for a multitude of application domains. Thus, the name **general purpose** language.\n", "\n", "To \"enable\" most of these, however, we need to first **import** them from the so-called [standard library ](https://docs.python.org/3/library/index.html). Let's do a quick example here and look at the [random ](https://docs.python.org/3/library/random.html) module that provides functionalities to simulate and work with random numbers." ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [], "source": [ "import random" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To access a function inside the [random ](https://docs.python.org/3/library/random.html) module, for example, the [random() ](https://docs.python.org/3/library/random.html#random.random) function, we use the `.` operator, formally called the attribute access operator. The [random() ](https://docs.python.org/3/library/random.html#random.random) function simply returns a random decimal number between `0` and `1`." ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.38523914298287465" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "random.random()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It could be used, for example, to model a fair coin toss by comparing the number it returns to `0.5` with the `<` operator: In 50% of the cases we see `True` and in the other 50% `False`." ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "random.random() < 0.5" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A second example would be the [choice() ](https://docs.python.org/3/library/random.html#random.choice) function, which draws a random element from a `list` with replacement. We could use it to model a fair die." ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "random.choice([1, 2, 3, 4, 5, 6])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the next chapter, we see how we can extend Python even further by installing and importing **third-party packages**." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Thinking like a Computer" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "An important skill for any data scientist is to learn to \"think\" like a computer does. So far, we have seen that Python is a pretty \"intuitive\" language: Many concepts can already be understood after seeing them once or just a couple of times. Many of the aspects that make other languages harder to learn, are somehow \"magically\" automated by Python in the background, most notably the management of the memory.\n", "\n", "This section introduces a couple of more \"advanced\" concepts that presumably are *not* so intuitive to beginners." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### \"Simple\" Data Types" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "At first, let's review the concept of **object-orientation**, which is the paradigm by which Python manages the memory.\n", "\n", "Take the following three examples. Whereas `a` and `b` have the same **value** (i.e., **semantic meaning**) to us humans, we see in this section that there are a couple of caveats to look out for." ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [], "source": [ "a = 42\n", "b = 42.0\n", "c = 42.87" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "An important idea to understand is that each of the right-hand sides lead to a *new* **object** being created in the computer's memory *first*. An object can be thought of as a \"box\" in memory holding $1$s and $0$s (i.e., physical energy flows inside the computer).\n", "\n", "Objects can and do exist without being **referenced** by a variable. Also, an object may even have several variables referencing them, just as a human may have different names in different contexts (e.g., a formal name in the password, a name by which one is known to friends, and maybe a different name by which one is called by one's spouse).\n", "\n", "In the example, while both `a` and `b` have the *same* value, they are two *distinct* objects. The `is` operator checks if the objects referenced by two variables are indeed the *same* one, or, in other words, have the same **identity**." ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a == b" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a is b" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Every object always has some **data type**, which determines how the object behaves and what we can do with it. The types of `a` and `b` are `int` and `float`, respectively." ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "int" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(a)" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "float" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(b)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "While it seems cumbersome to analyze numbers at this level of detail, the following code cell shows how `float`ing-point numbers, one gold standard of numbers in all of computer science and engineering, behave couter-intutive. Yet, *nothing* is wrong here." ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "0.1 + 0.2 == 0.3" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The data type of an object also determines which **methods** we can invoke on it. A method is just a function that is \"attached\" to an object and can be accessed with the `.` operator seen above. A method necessarily needs the objects it is attached to as in input, which is why it is attached to an object to begin with.\n", "\n", "For example, `float` objects come with an `.is_integer()` method that tells us if the number has non-`0` decimals." ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "b.is_integer()" ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 56, "metadata": {}, "output_type": "execute_result" } ], "source": [ "c.is_integer()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`int` objects on the contrary have no notion of the concept of decimals, which is why they do *not* have an `.is_integer()` method. That is what the `AttributeError` tells us." ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [ { "ename": "AttributeError", "evalue": "'int' object has no attribute 'is_integer'", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mAttributeError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0ma\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mis_integer\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mAttributeError\u001b[0m: 'int' object has no attribute 'is_integer'" ] } ], "source": [ "a.is_integer()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What we could do here, is to take `a` and pass it to the [float() ](https://docs.python.org/3/library/functions.html#float) built-in, a so-called **constructor**, which takes the value of its input and creates a *new* object of the desired `float` type. Yet, we know the answer to `aa.is_integer()` already, even without executing the code cell as `a` has no non-`0` decimals to begin with." ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [], "source": [ "aa = float(a)" ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 59, "metadata": {}, "output_type": "execute_result" } ], "source": [ "aa.is_integer()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's create another example `d` to see further examples of methods." ] }, { "cell_type": "code", "execution_count": 60, "metadata": {}, "outputs": [], "source": [ "d = \"Python rocks\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The type of `d` is `str`, which is short for \"**string**\" and is defined in computer science as a sequence of characters." ] }, { "cell_type": "code", "execution_count": 61, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "str" ] }, "execution_count": 61, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(d)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`str` objects support various methods that \"make sense\" in the context of textual data, for example, the `.lower()` and `.upper()` methods." ] }, { "cell_type": "code", "execution_count": 62, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'python rocks'" ] }, "execution_count": 62, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d.lower()" ] }, { "cell_type": "code", "execution_count": 63, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'PYTHON ROCKS'" ] }, "execution_count": 63, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d.upper()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### \"Complex\" Data Types" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The examples in the previous section are considered \"simple\" as they only model *scalar* values (i.e., an individual object per example). However, we have already seen an example of a more \"complex\" object, namely the `list` called `numbers` above." ] }, { "cell_type": "code", "execution_count": 64, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "list" ] }, "execution_count": 64, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(numbers)" ] }, { "cell_type": "code", "execution_count": 65, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[1, 2, 3, 4]" ] }, "execution_count": 65, "metadata": {}, "output_type": "execute_result" } ], "source": [ "numbers" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`list` objects also come with specific methods on them, for example, the `.append()` method that adds another element at the end of a `list`." ] }, { "cell_type": "code", "execution_count": 66, "metadata": {}, "outputs": [], "source": [ "numbers.append(5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note how the `.append()` method does not lead to any output below the code cell. That is an indication that `numbers` is \"changed in place.\" The formal term for this property is **mutability**. A good working definition is: Any object whose value can be changed *after* its creation, is a **mutable** objects. Objects *without* this property are called **immutable**.\n", "\n", "An example for the latter, is the `tuple` data type. `tuple`s are simply `list`s with the additional property that they cannot be changed. Everything is else is the same as for `list`s. `tuple`s are created with parentheses replacing the brackets." ] }, { "cell_type": "code", "execution_count": 67, "metadata": {}, "outputs": [], "source": [ "more_numbers = (7, 8, 9)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`more_numbers` does not know about the `.append()` method." ] }, { "cell_type": "code", "execution_count": 68, "metadata": {}, "outputs": [ { "ename": "AttributeError", "evalue": "'tuple' object has no attribute 'append'", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mAttributeError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mmore_numbers\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mappend\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m10\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mAttributeError\u001b[0m: 'tuple' object has no attribute 'append'" ] } ], "source": [ "more_numbers.append(10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Whereas both `list` and `tuple` objects perserve the **order** of their elements, the `set` data type does not. Additionally, any object may only be an element of a `set` at most once. The syntax to create `set`s are curly braces, `{` and `}`. By giving up order, `set` objects offer significantly increased processing speed in various situations." ] }, { "cell_type": "code", "execution_count": 69, "metadata": {}, "outputs": [], "source": [ "other_numbers = {3, 3, 3, 2, 2, 1}" ] }, { "cell_type": "code", "execution_count": 70, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{1, 2, 3}" ] }, "execution_count": 70, "metadata": {}, "output_type": "execute_result" } ], "source": [ "other_numbers" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "One last example of a \"complex\" data type is the `dict`ionary type, which models a mapping relationship among the objects it contains. The syntax to create `dict`s also involves curly braces with the additon of using a `:` to specify the mapping relationships.\n", "\n", "For example, to map `int`egers to `str`ings modeling the English words corresponding to the numbers, we could write the following. The objects to the left of the `:` take the role of the **keys** while the ones to the right take the role of the **values**." ] }, { "cell_type": "code", "execution_count": 71, "metadata": {}, "outputs": [], "source": [ "to_words = {\n", " 0: \"zero\",\n", " 1: \"one\",\n", " 2: \"two\",\n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The main purpose of `dict`s is to look up the value mapped to by some key. We can use the indexing notion to achieve that." ] }, { "cell_type": "code", "execution_count": 72, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'zero'" ] }, "execution_count": 72, "metadata": {}, "output_type": "execute_result" } ], "source": [ "to_words[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`dict`s are among the most optimized data type in the Python world and a major building block in codebases solving real-life problems." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A big factor in getting good at any programming language is to learn what data types to use in which situations. There is no \"best\" data type; choosing among a couple of data types always comes down to trade-offs." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.9" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": false, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 4 }