"Do you remember how you first learned to speak in your mother tongue? Probably not. No one's memory goes back that far. Your earliest memory as a child should probably be around the age of three or four years old when you could already say simple things and interact with your environment. Although you did not know any grammar rules yet, other people just understood what you said. At least most of the time.\n",
"It is intuitively best to take the very mindset of a small child when learning a new language. And a programming language is no different from that. This first chapter introduces simplistic examples and we accept them as they are *without* knowing any of the \"grammar\" rules yet. Then, we analyze them in parts and slowly build up our understanding.\n",
"Consequently, if parts of this chapter do not make sense right away, let's not worry too much. Besides introducing the basic elements, it also serves as an outlook for what is to come. So, many terms and concepts used here are deconstructed in great detail in the following chapters."
"As our introductory example, we want to calculate the *average* of all *evens* in a **list** of whole numbers: `[7, 11, 8, 5, 3, 12, 2, 6, 9, 10, 1, 4]`.\n",
"While we are used to finding an [analytical solution](https://math.stackexchange.com/questions/935405/what-s-the-difference-between-analytical-and-numerical-approaches-to-problems/935446#935446) in math (i.e., derive some equation with \"pen and paper\"), we solve this task *programmatically* instead.\n",
"The `if number % 2 == 0` may look confusing at first sight. Both `%` and `==` must have an unintuitive meaning here. Luckily, the **comment** in the same line after the `#` symbol has the answer: The program does something only for an even `number`.\n",
"In particular, it increases `count` by `1` and adds the current `number` onto the [running](https://en.wikipedia.org/wiki/Running_total) `total`. Both `count` and `number` are **initialized** to `0` and the single `=` symbol reads as \"... is *set* equal to ...\". It cannot indicate a mathematical equation as, for example, `count` is generally *not* equal to `count + 1`.\n",
"Lastly, the `average` is calculated as the ratio of the final **values** of `total` and `count`. Overall, we divide the sum of all even numbers by their count: This is nothing but the definition of an average.\n",
"The lines of code \"within\" the `for` and `if` **statements** are **indented** and aligned with multiples of *four spaces*: This shows immediately how the lines relate to each other."
"Only two of the previous four code cells generate an **output** while two remained \"silent\" (i.e., nothing appears below the cell after running it).\n",
"By default, Jupyter notebooks only show the value of the **expression** in the last line of a code cell. And, this output may also be suppressed by ending the last line with a semicolon `;`."
"To see any output other than that, we use the built-in [print()](https://docs.python.org/3/library/functions.html#print) **function**. Here, the parentheses `()` indicate that we **call** (i.e., \"execute\") code written somewhere else."
"Outside Jupyter notebooks, the semicolon `;` is used as a **separator** between statements that must otherwise be on a line on their own. However, it is *not* considered good practice to use it as it makes code less readable."
"Python comes with many built-in **[operators](https://docs.python.org/3/reference/lexical_analysis.html#operators)**: They are **tokens** (i.e., \"symbols\") that have a special meaning to the Python interpreter.\n",
"The arithmetic operators either \"operate\" with the number immediately following them, so-called **unary** operators (e.g., negation), or \"process\" the two numbers \"around\" them, so-called **binary** operators (e.g., addition).\n",
"By definition, operators on their own have *no* permanent **side effects** in the computer's memory. Although the code cells in this section do indeed create *new* numbers in memory (e.g., `77 + 13` creates `90`), they are immediately \"forgotten\" as they are not stored in a **variable** like `numbers` or `average` above. We develop this thought further at the end of this chapter when we compare **expressions** with **statements**.\n",
"Let's see some examples of operators. We start with the binary `+` and the `-` operators for addition and subtraction. Binary operators mimic what mathematicians call [infix notation](https://en.wikipedia.org/wiki/Infix_notation) and have the expected meaning."
"When we compare the output of the `*` and `/` operators for multiplication and division, we note the subtle *difference* between the `42` and the `42.0`: They are the *same* number represented as a *different* **data type**."
"The so-called **floor division operator** `//` always \"rounds\" to an integer and is thus also called **integer division operator**. It is an example of an arithmetic operator we commonly do not know from high school mathematics."
"Even though it appears that the `//` operator **truncates** (i.e., \"cuts off\") the decimals so as to effectively \"round down\" (i.e., the `42.5` became `42` in the previous code cell), this is *not* the case: The result is always \"rounded\" towards minus infinity!"
"A popular convention in both computer science and mathematics is to abbreviate \"only if\" as \"iff\", which is short for \"**[if and only if](https://en.wikipedia.org/wiki/If_and_only_if)**.\" The iff means that a remainder of `0` implies that a number is divisible by another but also that a number's being divisible by another implies a remainder of `0`. The implication goes in *both* directions!\n",
"The built-in [divmod()](https://docs.python.org/3/library/functions.html#divmod) function combines the integer and modulo divisions into one operation. However, grammatically this is *not* an operator but a function. Also, [divmod()](https://docs.python.org/3/library/functions.html#divmod) returns a \"pair\" of integers and not just one."
"Raising a number to a power is performed with the **exponentiation operator** `**`. It is different from the `^` operator other programming languages may use and that also exists in Python with a *different* meaning."
"The standard [order of precedence](https://docs.python.org/3/reference/expressions.html#operator-precedence) from mathematics applies (i.e., [PEMDAS](http://mathworld.wolfram.com/PEMDAS.html) rule) when several operators are combined."
"Some programmers also use \"style\" conventions. For example, we might play with the **whitespace**, which is an umbrella term that refers to any non-printable sign like spaces, tabs, or the like. However, this is *not* a good practice and parentheses convey a much clearer picture."
"There exist many non-mathematical operators that are introduced throughout this book, together with the concepts they implement. They often come in a form different from the unary and binary ones mentioned above."
"An **object** may be viewed as a \"bag\" of $0$s and $1$s in a given memory location. The $0$s and $1$s in a bag make up the object's **value**. There exist different **types** of bags, and each type comes with its own rules how the $0$s and $1$s are interpreted and may be worked with.\n",
"These addresses are *not* meaningful for anything other than checking if two variables reference the *same* object. Let's create a new variable `d` and also set it to `789`."
"`a` and `d` indeed have the *same value* as is checked with the **equality operator** `==`: We say `a` and `d` \"evaluate equal.\" The resulting `True` - and the `False` further below - is yet another data type, a so-called **boolean**. We look into them in [Chapter 3](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/03_conditionals_00_lecture.ipynb#Boolean-Expressions)."
"On the contrary, `a` and `d` are *different objects* as the **identity operator** `is` shows: They are stored at *different* addresses in the memory."
"The [type()](https://docs.python.org/3/library/functions.html#type) built-in shows an object's type. For example, `a` is an integer (i.e., `int`) while `b` is a so-called [floating-point number](https://en.wikipedia.org/wiki/Floating-point_arithmetic) (i.e., `float`)."
"Different types imply different behaviors for the objects. The `b` object, for example, may be \"asked\" if it is a whole number with the [is_integer()](https://docs.python.org/3/library/stdtypes.html#float.is_integer) \"functionality\" that comes with every `float` object.\n",
"Formally, we call such type-specific functionalities **methods** (i.e., as opposed to functions) and we formally introduce them in Chapter 10. For now, it suffices to know that we access them with the **dot operator** `.` on the object. Of course, `b` is a whole number, which the boolean object `True` tells us."
"For an `int` object, this [is_integer()](https://docs.python.org/3/library/stdtypes.html#float.is_integer) check does *not* make sense as we already know it is an `int`: We see the `AttributeError` below as `a` does not even know what `is_integer()` means."
"\u001b[0;32m<ipython-input-38-7db0a38aefcc>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0ma\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mis_integer\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"The `c` object is a so-called **string** type (i.e., `str`), which is Python's way of representing \"text.\" Strings also come with peculiar behaviors, for example, to convert a text to lower, upper, or title case."
"Almost trivially, every object also has a value to which it **evaluates** when referenced. We think of the value as the **conceptual idea** of what the $0$s and $1$s in the bag mean to *humans*. In other words, an object's value regards its *semantic* meaning.\n",
"For built-in data types, Python prints out an object's value as a so-called **[literal](https://docs.python.org/3/reference/lexical_analysis.html#literals)**: This means that we may copy and paste the value back into a code cell and create a *new* object with the *same* value."
"In this book, we follow the convention of creating strings with **double quotes** `\"` instead of the **single quotes** `'` to which Python defaults in its *literal* notation for `str` objects. Both types of quotes may be used interchangeably. So, the `\"Python rocks\"` from above and `'Python rocks'` create two objects that evaluate equal (i.e., `\"Python rocks\" == 'Python rocks'`)."
"Just like the language of mathematics is good at expressing relationships among numbers and symbols, any programming language is just a formal language that is good at expressing computations.\n",
"\n",
"Formal languages come with their own \"grammatical rules\" called **syntax**."
"If we do not follow the rules, the code cannot be **parsed** correctly, i.e., the program does not even start to run but **raises** a **syntax error** indicated as `SyntaxError` in the output. Computers are very dumb in the sense that the slightest syntax error leads to the machine not understanding our code.\n",
"If we were to write an accounting program that adds up currencies, we would, for example, have to model dollar prices as `float` objects as the dollar symbol cannot be understood by Python."
"\u001b[0;36m File \u001b[0;32m\"<ipython-input-47-499e4d0d0cbb>\"\u001b[0;36m, line \u001b[0;32m1\u001b[0m\n\u001b[0;31m for number in numbers\u001b[0m\n\u001b[0m ^\u001b[0m\n\u001b[0;31mSyntaxError\u001b[0m\u001b[0;31m:\u001b[0m invalid syntax\n"
"Furthermore, it relies on whitespace (i.e., indentation), unlike many other programming languages. The `IndentationError` below is just a particular type of a `SyntaxError`."
"However, there are also so-called **runtime errors** that occur whenever otherwise (i.e., syntactically) correct code does not run because of invalid input. Runtime errors are also often referred to as **exceptions**.\n",
"This example does not work because just like in the \"real\" world, Python does not know how to divide by `0`. The syntactically correct code leads to a `ZeroDivisionError`."
"So-called **semantic errors**, on the contrary, are hard to spot as they do *not* crash the program. The only way to find such errors is to run a program with test input for which we can predict the output. However, testing software is a whole discipline on its own and often very hard to do in practice.\n",
"Thus, adhering to just syntax rules is *never* enough. Over time, **best practices** and **style guides** were created to make it less likely for a developer to mess up a program and also to allow \"onboarding\" him as a contributor to an established code base, often called **legacy code**, faster. These rules are *not* enforced by Python itself: Badly styled code still runs. At the very least, Python programs should be styled according to [PEP 8](https://www.python.org/dev/peps/pep-0008/) and documented \"inline\" (i.e., in the code itself) according to [PEP 257](https://www.python.org/dev/peps/pep-0257/).\n",
"An easier to read version of PEP 8 is [here](https://pep8.org/). The video below features a well known **[Pythonista](https://en.wiktionary.org/wiki/Pythonista)** talking about the importance of code style."
"For example, while the above code to calculate the average of the even numbers in `[7, 11, 8, 5, 3, 12, 2, 6, 9, 10, 1, 4]` is correct, a Pythonista would rewrite it in a more \"Pythonic\" way and use the built-in [sum()](https://docs.python.org/3/library/functions.html#sum) and [len()](https://docs.python.org/3/library/functions.html#len) functions (cf., [Chapter 2](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/02_functions_00_lecture.ipynb#Built-in-Functions)) as well as a so-called **list comprehension** (cf., [Chapter 7](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/07_sequences_00_lecture.ipynb#List-Comprehensions)). Pythonic code runs faster in many cases and is less error-prone."
"To get a rough overview of the mindset of a typical Python programmer, look at these rules, also known as the **Zen of Python**, that are deemed so important that they are included in every Python installation."
"That means, for example, that a variable defined towards the bottom could accidentally be referenced at the top of the notebook. This happens quickly when we iteratively built a program and go back and forth between cells.\n",
"As a good practice, it is recommended to click on \"Kernel\" > \"Restart Kernel and Run All Cells\" in the navigation bar once a notebook is finished. That restarts the Python process forgetting all **state** (i.e., all variables) and ensures that the notebook runs top to bottom without any errors the next time it is opened."
"While this book is built with Jupyter notebooks, it is crucial to understand that \"real\" programs are almost never \"linear\" (i.e., top to bottom) sequences of instructions but instead may take many different **flows of execution**.\n",
"In real data science projects, one would probably employ a mixed approach and put reusable code into so-called Python modules (i.e., *.py* files; cf., [Chapter 2](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/02_functions_00_lecture.ipynb#Local-Modules-and-Packages)) and then use Jupyter notebooks to build up a linear report or storyline for an analysis."
"**Variables** are created with the **[assignment statement](https://docs.python.org/3/reference/simple_stmts.html#assignment-statements)** `=`, which is *not* an operator because of its *side effect* of making a **[name](https://docs.python.org/3/reference/lexical_analysis.html#identifiers)** reference an object in memory.\n",
"We read the terms **variable**, **name**, and **identifier** used interchangebly in many Python-related texts. In this book, we adopt the following convention: First, we treat *name* and *identifier* as perfect synonyms but only use the term *name* in the text for clarity. Second, whereas *name* only refers to a string of letters, numbers, and some other symbols, a *variable* means the combination of a *name* and a *reference* to an object in memory."
"When used as a *literal*, a variable evaluates to the value of the object it references. Colloquially, we could say that `a` evaluates to `20.0`, but this would not be an accurate description of what is going on in memory. We see some more colloquialisms in this section but should always relate this to what Python actually does in memory."
"A variable may be **re-assigned** as often as we wish. Thereby, we could also assign an object of a *different* type. Because this is allowed, Python is said to be a **dynamically typed** language. On the contrary, a **statically typed** language like C also allows re-assignment but only with objects of the *same* type. This subtle distinction is one reason why Python is slower at execution than C: As it runs a program, it needs to figure out an object's type each time it is referenced."
"If we want to re-assign a variable while referencing its \"old\" (i.e., current) object, we may also **update** it using a so-called **[augmented assignment statement](https://docs.python.org/3/reference/simple_stmts.html#augmented-assignment-statements)** (i.e., *not* operator), as introduced with [PEP 203](https://www.python.org/dev/peps/pep-0203/): The currently mapped object is implicitly inserted as the first operand on the right-hand side."
"Variables are **[dereferenced](https://docs.python.org/3/reference/simple_stmts.html#the-del-statement)** (i.e., \"deleted\") with the `del` statement. This does *not* delete the object a variable references but merely removes the variable's name from the \"global list of all names.\""
"If we refer to an unknown name, a *runtime* error occurs, namely a `NameError`. The `Name` in `NameError` gives a hint why we choose the term *name* over *identifier* above: Python uses it more often in its error messages."
"\u001b[0;32m<ipython-input-69-89e6c98d9288>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mb\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"Some variables magically exist when a Python process is started or are added by Jupyter. We may safely ignore the former until Chapter 10 and the latter for good."
"It is *crucial* to understand that *several* variables may reference the *same* object in memory. Not having this in mind may lead to many hard to track down bugs.\n",
"However, if a variable references an object of a more \"complex\" type (e.g., `list`), predicting the outcome of a code snippet may be unintuitive for a beginner."
"[Chapter 7](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/07_sequences_00_lecture.ipynb#The-list-Type) discusses lists in more depth. For now, let's view a `list` object as some sort of **container** that holds an arbitrary number of references to other objects and treat the brackets `[]` attached to it as yet another operator, namely the **indexing operator**. So, `x[0]` instructs Python to first follow the reference from the global list of all names to the `x` object. Then, it follows the first reference it finds there to the `1` object we put in the list. The indexing operator must be an operator as we merely *read* the first element and do not change anything in memory permanently.\n",
"Python **begins counting at 0**. This is not the case for many other languages, for example, [MATLAB](https://en.wikipedia.org/wiki/MATLAB), [R](https://en.wikipedia.org/wiki/R_%28programming_language%29), or [Stata](https://en.wikipedia.org/wiki/Stata). To understand why this makes sense, see this short [note](https://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/EWD831.html) by one of the all-time greats in computer science, the late [Edsger Dijkstra](https://en.wikipedia.org/wiki/Edsger_W._Dijkstra)."
"To change the first entry in the list, we use the assignment statement `=` again. Here, this does *not* create a *new* variable, nor overwrite an existing one, but only changes the object referenced as the first element in `x`. As we only change parts of the `x` object, we say that we **mutate** its **state**. To use the bag analogy from above, we keep the same bag but \"flip\" some of the $0$s into $1$s and some of the $1$s into $0$s."
"The difference in behavior illustrated in this sub-section has to do with the fact that `int` and `float` objects are **immutable** types while `list` objects are **mutable**.\n",
"In the first case, an object cannot be changed \"in place\" once it is created in memory. When we assigned `123` to the already existing `a`, we did *not* change the $0$s and $1$s in the object `a` referenced before the assignment but created a new `int` object and made `a` reference it while the `b` variable is *not* affected.\n",
"In general, the assignment statement creates a new name and makes it reference whatever object is on the right-hand side *iff* the left-hand side is a *pure* name (i.e., it contains no operators like the indexing operator in the example). Otherwise, it *mutates* an already existing object. And, we must always expect that the latter may have more than one variable referencing it.\n",
"Visualizing what is going on in memory with a tool like [PythonTutor](http://pythontutor.com/visualize.html#code=x%20%3D%20%5B1,%202,%203%5D%0Ay%20%3D%20x%0Ax%5B0%5D%20%3D%2099%0Aprint%28y%5B0%5D%29&cumulative=false&curInstr=0&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false) may be helpful for a beginner."
"[Phil Karlton](https://skeptics.stackexchange.com/questions/19836/has-phil-karlton-ever-said-there-are-only-two-hard-things-in-computer-science) famously noted during his time at [Netscape](https://en.wikipedia.org/wiki/Netscape):\n",
"Variable names may contain upper and lower case letters, numbers, and underscores (i.e., `_`) and be as long as we want them to be. However, they must not begin with a number. Also, they must not be any of Python's built-in **[keywords](https://docs.python.org/3/reference/lexical_analysis.html#keywords)** like `for` or `if`.\n",
"Variable names should be chosen such that they do not need any more documentation and are self-explanatory. A widespread convention is to use so-called **[snake\\_case](https://en.wikipedia.org/wiki/Snake_case)**: Keep everything lowercase and use underscores to separate words.\n",
"Variables with leading and trailing double underscores, referred to as **dunder** in Python jargon, are used for built-in functionalities and to implement object-oriented features as we see in Chapter 10. We must *not* use this style for variables!"
"An **[expression](https://docs.python.org/3/reference/expressions.html)** is any syntactically correct *combination* of *variables* and *literals* with *operators*.\n",
"What we say about *individual* operators above, namely that they have *no* permanent side effects in memory, should be put here, to begin with: The absence of any permanent side effects is the characteristic property of expressions, and all the code cells in the \"*(Arithmetic) Operators*\" section above contain only expressions!\n",
"The definition of an expression is **recursive**. So, the sub-expression `a + b` is combined with the literal `3` by the operator `**` to form the full expression `(a + b) ** 3`."
"Here, the variable `y` is combined with the literal `2` by the indexing operator `[]`. The resulting expression evaluates to the third element in the `y` list."
"When not used as a delimiter, parentheses also constitute an operator, namely the **call operator** `()`. We saw this syntax above when we called built-in functions and methods."
"A **[statement](https://docs.python.org/3/reference/simple_stmts.html)** is anything that *changes* the *state of a program* or has another permanent *side effect*. Statements, unlike expressions, do not evaluate to a value; instead, they create or change values.\n",
"The built-in [print()](https://docs.python.org/3/library/functions.html#print) function is sometimes regarded as a \"statement\" as well. It used to be an actual statement in Python 2 and has all the necessary properties. It is a bit of a corner case but we can think of it as changing the state of the screen."
"As a good practice, comments should *not* describe *what* happens. This should be evident by reading the code. Otherwise, it is most likely badly written code. Rather, comments should describe *why* something happens.\n",
"We end each chapter with a summary of the main points (i.e., **TL;DR** = \"too long; didn't read\"). The essence in this first chapter is that just as a sentence in a real language like English may be decomposed into its parts (e.g., subject, predicate, and objects), the same may be done with programming languages.\n",
" - e.g., `1`, `1.0`, and `\"one\"` are three different objects of distinct types that are also literals (i.e., by the way we type them into the command line Python knows what the value and type are)\n",
" - include [built-in functions](https://docs.python.org/3/library/functions.html) like [print()](https://docs.python.org/3/library/functions.html#print), [sum()](https://docs.python.org/3/library/functions.html#sum), or [len()](https://docs.python.org/3/library/functions.html#len)\n",
"- flow control (cf., [Chapter 3](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/03_conditionals_00_lecture.ipynb) and [Chapter 4](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/04_iteration_00_lecture.ipynb))\n",
"This PyCon 2015 talk by [Ned Batchelder](https://nedbatchelder.com/), a well-known Pythonista and the organizer of the [Python User Group](https://www.meetup.com/bostonpython/) in Boston, summarizes all situations where some sort of assignment is done in Python. The content is intermediate, and, thus, it might be worthwhile to come back to this talk at a later point in time. However, the contents should be known by everyone claiming to be proficient in Python."