"After learning about the basic building blocks of expressing and structuring the business logic in programs, we focus our attention on the **data types** Python offers us, both built-in and available via the [standard library](https://docs.python.org/3/library/index.html) or third-party packages.\n",
"We start with the \"simple\" ones: Numeric types in this chapter and textual data in Chapter 6. An important fact that holds for all objects of these types is that they are **immutable**. To re-use the bag analogy from [Chapter 1](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/01_elements.ipynb#Objects-vs.-Types-vs.-Values), this means that the $0$s and $1$s making up an object's *value* cannot be changed once the bag is created in memory, implying that any operation with or method on the object creates a *new* object in a *different* memory location.\n",
"Chapters 7, 8, and 9 then cover the more \"complex\" data types, including, for example, the `list` type. Finally, Chapter 10 completes the picture by introducing language constructs to create custom types.\n",
"- [Chapter 1](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/01_elements.ipynb#%28Data%29-Type-%2F-%22Behavior%22) reveals that numbers may come in *different* data types (i.e., `int` vs. `float` so far),\n",
"- [Chapter 3](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/03_conditionals.ipynb#Boolean-Expressions) raises questions regarding the **limited precision** of `float` numbers (e.g., `42 == 42.000000000000001` evaluates to `True`), and\n",
"- [Chapter 4](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/04_iteration.ipynb#Infinite-Recursion) shows that sometimes a `float` \"walks\" and \"quacks\" like an `int`, whereas the reverse is true in other cases.\n",
"This chapter introduces all the [built-in numeric types](https://docs.python.org/3/library/stdtypes.html#numeric-types-int-float-complex): `int`, `float`, and `complex`. To mitigate the limited precision of floating-point numbers, we also look at two replacements for the `float` type in the [standard library](https://docs.python.org/3/library/index.html), namely the `Decimal` type in the [decimals](https://docs.python.org/3/library/decimal.html#decimal.Decimal) and the `Fraction` type in the [fractions](https://docs.python.org/3/library/fractions.html#fractions.Fraction) module."
"The simplest numeric type is the `int` type: It behaves like an [integer in ordinary math](https://en.wikipedia.org/wiki/Integer) (i.e., the set $\\mathbb{Z}$) and supports operators in the way we saw in the section on arithmetic operators in [Chapter 1](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/01_elements.ipynb#%28Arithmetic%29-Operators)."
"Whereas mathematicians argue what the term $0^0$ means (cf., this [article](https://en.wikipedia.org/wiki/Zero_to_the_power_of_zero)), programmers are pragmatic about this and define $0^0 = 1$."
"As computers can only store $0$s and $1$s, `int` objects are nothing but that in memory as well. Consequently, computer scientists and engineers developed conventions as to how $0$s and $1$s are \"translated\" into integers, and one such convention is the **[binary representation](https://en.wikipedia.org/wiki/Binary_number)** of **non-negative integers**. Consider the integers from $0$ through $255$ that are encoded into $0$s and $1$s with the help of this table:"
"A number consists of exactly eight $0$s and $1$s that are read from right to left and referred to as the **bits** of the number. Each bit represents a distinct multiple of $2$, the **digit**. For sure, we start counting at $0$ again.\n",
"To encode the integer $3$, for example, we need to find a combination of $0$s and $1$s such that the sum of digits marked with a $1$ is equal to the number we want to encode. In the example, we set all bits to $0$ except for the first ($i=0$) and second ($i=1$) as $2^0 + 2^1 = 1 + 2 = 3$. So the binary representation of $3$ is $00~00~00~11$. To borrow some terminology from linear algebra, the $3$ is a linear combination of the digits where the coefficients are either $0$ or $1$: $3 = 0*128 + 0*64 + 0*32 + 0*16 + 0*8 + 0*4 + 1*2 + 1*1$. It is *guaranteed* that there is exactly *one* such combination for each number between $0$ and $255$.\n",
"As each bit in the binary representation is one of two values, we say that this representation has a base of $2$. Often, the base is indicated with a subscript to avoid confusion. For example, we write $3_{10} = 00000011_2$ or $3_{10} = 11_2$ for short omitting leading $0$s. A subscript of $10$ implies a decimal number as we know it from grade school.\n",
"We use the built-in [bin()](https://docs.python.org/3/library/functions.html#bin) function to obtain an `int` object's binary representation: It returns a `str` object starting with `\"0b\"` indicating the binary format and as many $0$s and $1$s as are necessary to encode the integer omitting leading $0$s."
"We may pass a `str` object formatted this way as the argument to the [int()](https://docs.python.org/3/library/functions.html#int) built-in, together with `base=2`, to (re-)create an `int` object, for example, with the value of `3`."
"Moreover, we may also use the contents of the returned `str` object as a **literal** instead: Just like we type, for example, `3` without quotes (i.e., \"literally\") into a code cell to create the `int` object `3`, we may type `0b11` to obtain an `int` object with the same value."
"Another example is the integer `177` that is the sum of $128 + 32 + 16 + 1$: Thus, its binary representation is the sequence of bits $10~11~00~01$, or to use our new notation, $177_{10} = 10110001_2$."
"Analogous to typing `177` into a code cell, we may write `0b10110001`, or `0b_1011_0001` to make use of the underscores, and create an `int` object with the value `177`."
"Groups of eight bits are also called a **byte**. As a byte can only represent non-negative integers up to $255$, the table above is extended conceptually with greater digits to the left to model integers beyond $255$. The memory management needed to implement this is built into Python, and we do not need to worry about it.\n",
"Now, an integer is a linear combination of the digits where the coefficients are one of *ten* values, and the base is now $10$. For example, the number $123$ can be expressed as $0*1000 + 1*100 + 2*10 + 3*1$. So, the binary representation follows the same logic as the decimal system taught in grade school. The decimal system is intuitive to us humans, mostly as we learn to count with our *ten* fingers. The $0$s and $1$s in a computer's memory are therefore no rocket science; they only feel unintuitive for a beginner."
"While in the binary and decimal systems there are two and ten distinct coefficients per digit, another convenient representation, the **hexadecimal representation**, uses a base of $16$. It is convenient as one digit stores the same amount of information as *four* bits and the binary representation quickly becomes unreadable for larger numbers. The letters \"a\" through \"f\" are used as digits \"10\" through \"15\".\n",
"\n",
"The following table summarizes the relationship between the three systems:"
"The built-in [hex()](https://docs.python.org/3/library/functions.html#hex) function creates a `str` object starting with `\"0x\"` representing an `int` object's hexadecimal representation. The length depends on how many groups of four bits are implied by the corresponding binary representation.\n",
"The binary representation of `177`, `\"0b10110001\"`, can be viewed as *two* groups of four bits, $1011$ and $0001$, that are encoded as $\\text{b}$ and $1$ in hexadecimal (cf., table above)."
"To (re-)create an `int` object with the value `177`, we call the [int()](https://docs.python.org/3/library/functions.html#int) built-in with a properly formatted `str` object and `base=16` as arguments."
"Hexadecimals between $00_{16}$ and $\\text{ff}_{16}$ (i.e., $0_{10}$ and $255_{10}$) are commonly used to describe colors, for example, in web development but also graphics editors. See this [online tool](https://www.w3schools.com/colors/colors_hexadecimal.asp) for some more background."
"For completeness sake, we mention that there is also the [oct()](https://docs.python.org/3/library/functions.html#oct) built-in to obtain an integer's **octal representation**. The logic is the same as for the hexadecimal representation, and we use *eight* instead of *sixteen* digits. That is the equivalent of viewing the binary representations in groups of three bits. As of today, octal representations have become less important, and the data science practitioner may probably live without them quite well."
"While there are conventions that model negative integers with $0$s and $1$s in memory (cf., [Two's Complement](https://en.wikipedia.org/wiki/Two%27s_complement)), Python manages that for us, and we do not look into the theory here for brevity. We have learned all we need to know about how integers are modeled in a computer.\n",
"The binary and hexadecimal representations of negative integers are identical to their positive counterparts except that they start with a minus sign `-`."
"Their hexadecimal representations occupy *four* bits in memory while only *one* bit is needed. This is because \"1\" and \"0\" are just two of the sixteen possible digits."
"As a reminder, the `None` literal is a type on its own, namely the `NoneType`, and different from `False`. It *cannot* be cast as an integer as the `TypeError` indicates."
"\u001b[0;32m<ipython-input-41-af2123a46eb2>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;32mNone\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"Now that we know how integers are represented with $0$s and $1$s, we look at ways of working with the individual bits, in particular with the so-called **[bitwise operators](https://wiki.python.org/moin/BitwiseOperators)**: As the name suggests, the operators perform some operation on a bit by bit basis. They only work with and always return `int` objects.\n",
"We keep this overview rather short as such \"low-level\" operations are not needed by the data science practitioner regularly. Yet, it is worthwhile to have heard about them as they form the basis of all of arithmetic in computers.\n",
"The first operator is the **bitwise AND** operator `&`: It looks at the bits of its two operands, `11` and `13` in the example, in a pairwise fashion and *iff* both operands have a $1$ in the same position, will the resulting integer have a $1$ in this position as well. The binary representations of `11` and `13` have $1$s in their respective first and fourth bits, which is why `bin(11 & 13)` evaluates to `\"Ob1001\"`."
"The **bitwise OR** operator `|` evaluates to an `int` object whose bits are set to $1$ if the corresponding bits of either *one* or *both* operands are $1$. So in the example `9 | 13` only the second bit is $0$ for both operands, which is why the expression evaluates to `\"0b1101\"`."
"The **bitwise XOR** operator `^` is a special case of the `|` operator in that it evaluates to an `int` object whose bits are set to $1$ if the corresponding bit of *exactly one* of the two operands is $1$. Colloquially, the \"X\" stands for \"exclusive.\" The `^` operator must *not* be confused with the exponentiation operator `**`! In the example, `9 ^ 13`, only the third bit differs between the two operands, which is why it evaluates to `\"0b100\"` omitting the fourth bit."
"The **bitwise NOT** operator `~`, sometimes also called **inversion** operator, is said to \"flip\" the $0$s into $1$s and the $1$s into $0$s. However, it is based on the aforementioned [Two's Complement](https://en.wikipedia.org/wiki/Two%27s_complement) convention and `~x = -(x + 1)` by definition (cf., the [reference](https://docs.python.org/3/reference/expressions.html#unary-arithmetic-and-bitwise-operations)). The full logic behind this is considered out of scope in this book.\n",
"Lastly, the **bitwise left and right shift** operators, `<<` and `>>`, shift all the bits either to the left or to the right. This corresponds to multiplying or dividing an integer by powers of $2$.\n",
"As we have seen above, some assumptions need to be made as to how the $0$s and $1$s in a computer's memory are to be translated into numbers. This process becomes a lot more involved when we go beyond integers and model [real numbers](https://en.wikipedia.org/wiki/Real_number) (i.e., the set $\\mathbb{R}$) with possibly infinitely many digits to the right of the period like $1.23$.\n",
"The **[Institute of Electrical and Electronics Engineers](https://en.wikipedia.org/wiki/Institute_of_Electrical_and_Electronics_Engineers)** (IEEE, pronounced \"eye-triple-E\") is one of the important professional associations when it comes to standardizing all kinds of aspects regarding the implementation of soft- and hardware.\n",
"The **[IEEE 754](https://en.wikipedia.org/wiki/IEEE_754)** standard defines the so-called **floating-point arithmetic** that is commonly used today by all major programming languages. The standard not only defines how the $0$s and $1$s are organized in memory but also, for example, how values are to be rounded, what happens in exceptional cases like divisions by zero, or what is a zero value in the first place.\n",
"In cases where the dot `.` is unnecessary from a mathematical point of view, we either need to end the number with it nevertheless or use the [float()](https://docs.python.org/3/library/functions.html#float) built-in to cast the number explicitly. [float()](https://docs.python.org/3/library/functions.html#float) can process any numeric object or a properly formatted `str` object."
"In general, if we combine `float` and `int` objects in arithmetic operations, we always end up with a `float` type: Python uses the \"broader\" representation."
"`float` objects may also be created with the **scientific literal notation**: We use the symbol `e` to indicate powers of $10$, so $1.23 * 10^0$ translates into `1.23e0`."
"Syntactically, `e` needs a `float` or `int` object in its literal notation on its left and an `int` object on its right, both without a space. Otherwise, we get a `SyntaxError`."
"\u001b[0;32m<ipython-input-77-eff9d613b2d7>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0me1\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"There are also three special values representing \"**not a number,**\" called `nan`, and positive or negative **infinity**, called `inf` or `-inf`, that are created by passing in the corresponding abbreviation as a `str` object to the [float()](https://docs.python.org/3/library/functions.html#float) built-in. These values could be used, for example, as the result of a mathematically undefined operation like division by zero or to model the value of a mathematical function as it goes to infinity."
"`nan` objects *never* compare equal to *anything*, not even to themselves. This happens in accordance with the [IEEE 754](https://en.wikipedia.org/wiki/IEEE_754) standard."
"As a caveat, adding infinities of different signs is an *undefined operation* in math and results in a `nan` object. So, if we (accidentally or unknowingly) do this on a real dataset, we do *not* see any error messages, and our program may continue to run with non-meaningful results! This is an example of a piece of code **failing silently**."
"`float` objects are *inherently* imprecise, and there is *nothing* we can do about it! In particular, arithmetic operations with two `float` objects may result in \"weird\" rounding \"errors\" that are strictly deterministic and occur in accordance with the [IEEE 754](https://en.wikipedia.org/wiki/IEEE_754) standard.\n",
"A popular workaround is to benchmark the difference between the two numbers to be checked for equality against a pre-defined `threshold` *sufficiently* close to `0`, for example, `1e-15`."
"The built-in [format()](https://docs.python.org/3/library/functions.html#format) function allows us to show the **significant digits** of a `float` number as they exist in memory to arbitrary precision. To exemplify it, let's view a couple of `float` objects with `50` digits. This analysis reveals that almost no `float` number is precise! After $14$ or $15$ digits \"weird\" things happen. As we see further below, the \"random\" digits ending the `float` numbers do *not* \"physically\" exist in memory.\n",
"The [format()](https://docs.python.org/3/library/functions.html#format) function is different from the [format()](https://docs.python.org/3/library/stdtypes.html#str.format) method on `str` objects introduced in the next chapter and both work with the so-called [format specification mini-language](https://docs.python.org/3/library/string.html#format-specification-mini-language): `\".50f\"` is the instruction to show `50` digits of a `float` number. But let's not worry too much about these details for now."
"The [format()](https://docs.python.org/3/library/functions.html#format) function does *not* round a `float` object in the mathematical sense! It just allows us to show an arbitrary number of the digits as stored in memory, and it also does *not* change these.\n",
"On the contrary, the built-in [round()](https://docs.python.org/3/library/functions.html#round) function creates a *new* numeric object that is a rounded version of the one passed in as the argument. It adheres to the common rules of math.\n",
"For example, let's round `1 / 3` to five decimals. The obtained value for `roughly_a_third` is also *imprecise* but different from the \"exact\" representation of `1 / 3` above."
"Surprisingly, `0.125` and `0.25` appear to be *precise*, and equality comparison works without the `threshold` workaround: Both are powers of $2$ in disguise."
"To understand these subtleties, we need to look at the **[binary representation of floats](https://en.wikipedia.org/wiki/Double-precision_floating-point_format)** and review the basics of the **[IEEE 754](https://en.wikipedia.org/wiki/IEEE_754)** standard. On modern machines, floats are modeled in so-called double precision with $64$ bits that are grouped as in the figure below. The first bit determines the sign ($0$ for plus, $1$ for minus), the next $11$ bits represent an $exponent$ term, and the last $52$ bits resemble the actual significant digits, the so-called $fraction$ part. The three groups are put together like so:\n",
"A $1.$ is implicitly prepended as the first digit, and both, $fraction$ and $exponent$, are stored in base $2$ representation (i.e., they both are interpreted like integers above). As $exponent$ is consequently non-negative, between $0_{10}$ and $2047_{10}$ to be precise, the $-1023$ centers the entire $2^{exponent-1023}$ term around $1$ and allows the period within the $1.fraction$ part be shifted into either direction by the same amount. Floating-point numbers received their name as the period, formally called the **[radix point](https://en.wikipedia.org/wiki/Radix_point)**, \"floats\" along the significant digits. As an aside, an $exponent$ of all $0$s or all $1$s is used to model the special values `nan` or `inf`.\n",
"As the standard defines the exponent part to come as a power of $2$, we now see why `0.125` is a *precise* float: It can be represented as a power of $2$, i.e., $0.125 = (-1)^0 * 1.0 * 2^{1020-1023} = 2^{-3} = \\frac{1}{8}$. In other words, the floating-point representation of $0.125_{10}$ is $0_2$, $1111111100_2 = 1020_{10}$, and $0_2$ for the three groups, respectively."
"The crucial fact for the data science practitioner to understand is that mapping the *infinite* set of the real numbers $\\mathbb{R}$ to a *finite* set of bits leads to the imprecisions shown above!\n",
"So, floats are usually good approximations of real numbers only with their first $14$ or $15$ digits. If more precision is required, we need to revert to other data types such as a `Decimal` or a `Fraction`, as shown in the next two sections.\n",
"This [blog post](http://fabiensanglard.net/floating_point_visually_explained/) gives another neat and *visual* way as to how to think of floats. It also explains why floats become worse approximations of the reals as their absolute values increase.\n",
"\n",
"The Python [documentation](https://docs.python.org/3/tutorial/floatingpoint.html) provides another good discussion of floats and the goodness of their approximations.\n",
"If we are interested in the exact bits behind a `float` object, we use the [hex()](https://docs.python.org/3/library/stdtypes.html#float.hex) method that returns a `str` object beginning with `\"0x1.\"` followed by the $fraction$ in hexadecimal notation and the $exponent$ as an integer after subtraction of $1023$ and separated by a `\"p\"`."
"one_eighth.hex() # this basically says 2 ** (-3)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Also, the [as_integer_ratio()](https://docs.python.org/3/library/stdtypes.html#float.as_integer_ratio) method returns the two smallest integers whose ratio best approximates a `float` object."
"As seen in [Chapter 1](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/01_elements.ipynb#%28Data%29-Type-%2F-%22Behavior%22), the [is_integer()](https://docs.python.org/3/library/stdtypes.html#float.is_integer) method tells us if a `float` can be casted as an `int` object without any loss in precision."
"As the exact implementation of floats may vary and be dependent on a particular Python installation, we look up the [float_info](https://docs.python.org/3/library/sys.html#sys.float_info) attribute in the [sys](https://docs.python.org/3/library/sys.html) module in the [standard library](https://docs.python.org/3/library/index.html) to check the details. Usually, this is not necessary."
"The [decimal](https://docs.python.org/3/library/decimal.html) module in the [standard library](https://docs.python.org/3/library/index.html) provides a [Decimal](https://docs.python.org/3/library/decimal.html#decimal.Decimal) type that may be used to represent any real number to a user-defined level of precision: \"User-defined\" does *not* mean an infinite or \"exact\" precision! The `Decimal` type merely allows us to work with a number of bits *different* from the $64$ as specified for the `float` type and also to customize the rounding rules and some other settings.\n",
"We import the `Decimal` type and also the [getcontext()](https://docs.python.org/3/library/decimal.html#decimal.getcontext) function from the [decimal](https://docs.python.org/3/library/decimal.html) module."
"[getcontext()](https://docs.python.org/3/library/decimal.html#decimal.getcontext) shows us how the [decimal](https://docs.python.org/3/library/decimal.html) module is set up. By default, the precision is set to $28$ significant digits, which is roughly twice as many as with `float` objects."
"The two simplest ways to create a `Decimal` object is to either **instantiate** it with an `int` or a `str` object consisting of all the significant digits. In the latter case, the scientific notation is also possible."
"It is *not* a good idea to create a `Decimal` from a `float` object. If we did so, we would create a `Decimal` object that internally used extra bits to store the \"random\" digits that are not stored in the `float` object in the first place."
"To verify the precision, we apply the built-in [format()](https://docs.python.org/3/library/functions.html#format) function to the previous code cell and compare it with the same division resulting in a `float` object."
"However, mixing `Decimal` and `float` objects raises a `TypeError`: So, Python prevents us from potentially introducing imprecisions via innocent-looking arithmetic by **failing loudly**."
"To preserve the precision for more advanced mathematical functions, `Decimal` objects come with many methods bound on them. For example, [ln()](https://docs.python.org/3/library/decimal.html#decimal.Decimal.ln) and [log10()](https://docs.python.org/3/library/decimal.html#decimal.Decimal.log10) take the logarithm while [sqrt()](https://docs.python.org/3/library/decimal.html#decimal.Decimal.sqrt) calculates the square root. In general, the functions in the [math](https://docs.python.org/3/library/math.html) module in the [standard library](https://docs.python.org/3/library/index.html) should only be used with `float` objects as they do *not* preserve precision."
"The object returned by the [sqrt()](https://docs.python.org/3/library/decimal.html#decimal.Decimal.sqrt) method is still limited in precision: This must be so as, for example, $\\sqrt{2}$ is an **[irrational number](https://en.wikipedia.org/wiki/Irrational_number)** that *cannot* be expressed with absolute precision using *any* number of bits, even in theory.\n",
"The [quantize()](https://docs.python.org/3/library/decimal.html#decimal.Decimal.quantize) method allows us to cut off a `Decimal` number at any digit that is smaller than the set precision while adhering to the rounding rules from math.\n",
"As with `float` objects, we cannot add infinities of different signs: Now get a module-specific `InvalidOperation` exception instead of a `nan` value. Here, **failing loudly** is a good thing as it prevents us from working with invalid results."
"For more information on the `Decimal` type, see the tutorial at [PYMOTW](https://pymotw.com/3/decimal/index.html) or the official [documentation](https://docs.python.org/3/library/decimal.html)."
"If the numbers in an application can be expressed as [rational numbers](https://en.wikipedia.org/wiki/Rational_number) (i.e., the set $\\mathbb{Q}$), we may model them as a [Fraction](https://docs.python.org/3/library/fractions.html#fractions.Fraction) type from the [fractions](https://docs.python.org/3/library/fractions.html) module in the [standard library](https://docs.python.org/3/library/index.html). As any fraction can always be formulated as the division of one integer by another, `Fraction` objects are inherently precise, just as `int` objects on their own. Further, we maintain the precision as long as we do not use them in a mathematical operation that could result in an irrational number (e.g., taking the square root).\n",
"Among others, there are two simple ways to create a `Fraction` object: We either instantiate one with two `int` objects representing the numerator and denominator or with a `str` object. In the latter case, we have two options again and use either the format \"numerator/denominator\" (i.e., *without* any spaces) or the same format as for `float` and `Decimal` objects above."
"Only the lowest common denominator version is maintained after creation: For example, $\\frac{3}{2}$ and $\\frac{6}{4}$ are the same, and both become `Fraction(3, 2)`."
"`float` objects may *syntactically* be cast as `Fraction` objects as well. However, then we create a `Fraction` object that precisely remembers the `float` object's imprecision: A *bad* idea!"
"`Fraction` objects follow the arithmetic rules from middle school and may be mixed with `int` objects *without* any loss of precision. The result is always a *new* `Fraction` object."
"`Fraction` and `float` objects may also be mixed *syntactically*. However, then the results may exhibit imprecision again, even if we do not see them at first sight! This is another example of code **failing silently**."
"For more examples and discussions, see the tutorial at [PYMOTW](https://pymotw.com/3/fractions/index.html) or the official [documentation](https://docs.python.org/3/library/fractions.html)."
"Some mathematical equations cannot be solved if the solution has to be in the set of the real numbers $\\mathbb{R}$. For example, $x^2 = -1$ can be rearranged into $x = \\sqrt{-1}$, but the square root is not defined for negative numbers. To mitigate this, mathematicians introduced the concept of an [imaginary number](https://en.wikipedia.org/wiki/Imaginary_number) $\\textbf{i}$ that is *defined* as $\\textbf{i} = \\sqrt{-1}$ or often as the solution to the equation $\\textbf{i}^2 = -1$. So, the solution to $x = \\sqrt{-1}$ then becomes $x = \\textbf{i}$.\n",
"If we generalize the example equation into $(mx-n)^2 = -1 \\implies x = \\frac{1}{m}(\\sqrt{-1} + n)$ where $m$ and $n$ are constants chosen from the reals $\\mathbb{R}$, then the solution to the equation comes in the form $x = a + b\\textbf{i}$, the sum of a real number and an imaginary number, with $a=\\frac{n}{m}$ and $b = \\frac{1}{m}$.\n",
"Such \"compound\" numbers are called **[complex numbers](https://en.wikipedia.org/wiki/Complex_number)**, and the set of all such numbers is commonly denoted by $\\mathbb{C}$. The reals $\\mathbb{R}$ are a strict subset of $\\mathbb{C}$ with $b=0$. Further, $a$ is referred to as the **real part** and $b$ as the **imaginary part** of the complex number.\n",
"`complex` numbers are part of core Python. The simplest way to create one is to write an arithmetic expression with the literal `j` notation for $\\textbf{i}$. The `j` is commonly used in many engineering disciplines instead of the symbol $\\textbf{i}$ from math as $I$ in engineering more often than not means [electric current](https://en.wikipedia.org/wiki/Electric_current).\n",
"For example, the answer to $x^2 = -1$ can be written in Python as `1j` like below. This creates a `complex` object with value `1j`. The same syntactic rules apply as with the above `e` notation: No spaces are allowed between the number and the `j`, and the number may be any `int` or `float` literal."
"Alternatively, we may use the [complex()](https://docs.python.org/3/library/functions.html#complex) built-in: This takes two parameters where the second is optional and defaults to `0`. We may either call it with one or two arguments of any numeric type or a `str` object in the format of the previous code cell without any spaces."
"The absolute value of a `complex` number $x$ is defined with the Pythagorean Theorem where $\\lVert x \\rVert = \\sqrt{a^2 + b^2}$ and $a$ and $b$ are the real and imaginary parts. The [abs()](https://docs.python.org/3/library/functions.html#abs) built-in function implements that in Python."
"Also, a conjugate() method is bound to every `complex` object. The [complex conjugate](https://en.wikipedia.org/wiki/Complex_conjugate) is defined to be the complex number with identical real part but an imaginary part reversed in sign."
"The [cmath](https://docs.python.org/3/library/cmath.html) module in the [standard library](https://docs.python.org/3/library/index.html) implements many of the functions from the [math](https://docs.python.org/3/library/math.html) module such that they work with complex numbers."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## The Numeric Tower"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Analogous to the discussion of containers and iterables in [Chapter 4](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/master/04_iteration.ipynb#Containers-vs.-Iterables), we contrast the *concrete* numeric data types in this chapter with the *abstract* ideas behind [numbers in mathematics](https://en.wikipedia.org/wiki/Number).\n",
"\n",
"The figure below summarizes five *major* sets of [numbers in mathematics](https://en.wikipedia.org/wiki/Number) as we know them from high school:\n",
"\n",
"- $\\mathbb{N}$: [Natural numbers](https://en.wikipedia.org/wiki/Natural_number) are all non-negative count numbers, e.g., $0, 1, 2, ...$\n",
"- $\\mathbb{Z}$: [Integers](https://en.wikipedia.org/wiki/Integer) are all numbers *without* a fractional component, e.g., $-1, 0, 1, ...$\n",
"- $\\mathbb{Q}$: [Rational numbers](https://en.wikipedia.org/wiki/Rational_number) are all numbers that can be expressed as a quotient of two integers, e.g., $-\\frac{1}{2}, 0, \\frac{1}{2}, ...$\n",
"- $\\mathbb{R}$: [Real numbers](https://en.wikipedia.org/wiki/Real_number) are all numbers that can be represented as a distance along a line (negative means \"reversed\"), e.g., $\\sqrt{2}, \\pi, \\text{e}, ...$\n",
"- $\\mathbb{C}$: [Complex numbers](https://en.wikipedia.org/wiki/Complex_number) are all numbers of the form $a + b\\textbf{i}$ where $a$ and $b$ are real numbers and $\\textbf{i}$ is the [imaginary number](https://en.wikipedia.org/wiki/Imaginary_number), e.g., $0, \\textbf{i}, 1 + \\textbf{i}, ...$\n",
"\n",
"In the listed order, the five sets are perfect subsets and $\\mathbb{C}$ is the largest set (to be precise, all sets are infinite, but they still have a different number of elements)."
"The *concrete* data types introduced in this chapter are all *imperfect* models of *abstract* mathematical ideas.\n",
"\n",
"The `int` and `Fraction` types are the models \"closest\" to the idea they implement: Whereas $\\mathbb{Z}$ and $\\mathbb{Q}$ are, by definition, infinite, every computer runs out of bits when representing sufficiently large integers or fractions with a sufficiently large number of decimals. However, within a system-dependent minimum and maximum integer range, we can model an integer or fraction without any loss in precision.\n",
"\n",
"For the other types, in particular, the `float` type, the implications of their imprecision are discussed in detail above.\n",
"\n",
"The abstract concepts behind the four outer-most mathematical sets are part of Python since [PEP 3141](https://www.python.org/dev/peps/pep-3141/) in 2007. The [numbers](https://docs.python.org/3/library/numbers.html) module in the [standard library](https://docs.python.org/3/library/index.html) defines what Pythonistas call the **numeric tower**, a collection of five **[abstract data types](https://en.wikipedia.org/wiki/Abstract_data_type)**, or **abstract base classes** as they are called in Python slang:\n",
"As a reminder, the built-in [help()](https://docs.python.org/3/library/functions.html#help) function is always our friend.\n",
"\n",
"The abstract types' docstrings are unsurprisingly similar to the corresponding concrete types' docstrings (for now, let's not worry about the dunder-style names in the docstrings).\n",
"\n",
"For example, both `numbers.Complex` and `complex` list the `imag` and `real` attributes."
]
},
{
"cell_type": "code",
"execution_count": 197,
"metadata": {
"scrolled": true,
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Help on class Complex in module numbers:\n",
"\n",
"class Complex(Number)\n",
" | Complex defines the operations that work on the builtin complex type.\n",
" | \n",
" | In short, those are: a conversion to complex, .real, .imag, +, -,\n",
" | *, /, abs(), .conjugate, ==, and !=.\n",
" | \n",
" | If it is given heterogeneous arguments, and doesn't have special\n",
" | knowledge about them, it should fall back to the builtin complex\n",
" | type as described below.\n",
" | \n",
" | Method resolution order:\n",
" | Complex\n",
" | Number\n",
" | builtins.object\n",
" | \n",
" | Methods defined here:\n",
" | \n",
" | __abs__(self)\n",
" | Returns the Real distance from 0. Called for abs(self).\n",
" | \n",
" | __add__(self, other)\n",
" | self + other\n",
" | \n",
" | __bool__(self)\n",
" | True if self != 0. Called for bool(self).\n",
" | \n",
" | __complex__(self)\n",
" | Return a builtin complex instance. Called for complex(self).\n",
" | \n",
" | __eq__(self, other)\n",
" | self == other\n",
" | \n",
" | __mul__(self, other)\n",
" | self * other\n",
" | \n",
" | __neg__(self)\n",
" | -self\n",
" | \n",
" | __pos__(self)\n",
" | +self\n",
" | \n",
" | __pow__(self, exponent)\n",
" | self**exponent; should promote to float or complex when necessary.\n",
" | \n",
" | __radd__(self, other)\n",
" | other + self\n",
" | \n",
" | __rmul__(self, other)\n",
" | other * self\n",
" | \n",
" | __rpow__(self, base)\n",
" | base ** self\n",
" | \n",
" | __rsub__(self, other)\n",
" | other - self\n",
" | \n",
" | __rtruediv__(self, other)\n",
" | other / self\n",
" | \n",
" | __sub__(self, other)\n",
" | self - other\n",
" | \n",
" | __truediv__(self, other)\n",
" | self / other: Should promote to float when necessary.\n",
"One way to use *abstract* data types is to use them in place of a *concrete* data type.\n",
"\n",
"For example, we may pass them as arguments to the built-in [isinstance()](https://docs.python.org/3/library/functions.html#isinstance) function and check in which of the five mathematical sets the object `1 / 10` is."
]
},
{
"cell_type": "code",
"execution_count": 199,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 199,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"isinstance(1 / 10, float)"
]
},
{
"cell_type": "code",
"execution_count": 200,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 200,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"isinstance(1 / 10, numbers.Number)"
]
},
{
"cell_type": "code",
"execution_count": 201,
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 201,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"isinstance(1 / 10, numbers.Complex)"
]
},
{
"cell_type": "code",
"execution_count": 202,
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 202,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"isinstance(1 / 10, numbers.Real)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Due to the `float` type's inherent imprecision, `1 / 10` is *not* a rational number."
]
},
{
"cell_type": "code",
"execution_count": 203,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"False"
]
},
"execution_count": 203,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"isinstance(1 / 10, numbers.Rational)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"The `Fraction` type qualifies as a rational number; however, the `Decimal` type does not."
"Replacing *concrete* data types with *abstract* ones is particularly valuable in the context of input validation: The revised version of the `factorial()` function below allows its user to *take advantage of duck typing*. If a real but non-integer argument `n` is passed in, `factorial()` tries to cast `n` as an `int` object with the [trunc()](https://docs.python.org/3/library/math.html#math.trunc) function from the [math](https://docs.python.org/3/library/math.html) module in the [standard library](https://docs.python.org/3/library/index.html). [trunc()](https://docs.python.org/3/library/math.html#math.trunc) cuts off all decimals and any *concrete* numeric type implementing the *abstract* `numbers.Real` type supports it (cf., [documentation](https://docs.python.org/3/library/numbers.html#numbers.Real))."
]
},
{
"cell_type": "code",
"execution_count": 206,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"import math"
]
},
{
"cell_type": "code",
"execution_count": 207,
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/plain": [
"0"
]
},
"execution_count": 207,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"math.trunc(1 / 10)"
]
},
{
"cell_type": "code",
"execution_count": 208,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"def factorial(n, *, strict=True):\n",
" \"\"\"Calculate the factorial of a number.\n",
"\n",
" Args:\n",
" n (int): number to calculate the factorial for; must be positive\n",
" strict (bool): if n must not contain decimals; defaults to True\n",
"\n",
" Returns:\n",
" factorial (int)\n",
"\n",
" Raises:\n",
" TypeError: if n is not an integer or cannot be cast as such\n",
" ValueError: if n is negative\n",
" \"\"\"\n",
" if not isinstance(n, numbers.Integral):\n",
" if isinstance(n, numbers.Real):\n",
" if n != math.trunc(n) and strict:\n",
" raise ValueError(\"n is not an integer-like value; it has decimals\")\n",
" n = math.trunc(n)\n",
" else:\n",
" raise TypeError(\"Factorial is only defined for integers\")\n",
"\n",
" if n < 0:\n",
" raise ValueError(\"Factorial is not defined for negative integers\")\n",
" elif n == 0:\n",
" return 1\n",
" return n * factorial(n - 1)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"`factorial()` works as before, but now also accepts, for example, `float` numbers."
]
},
{
"cell_type": "code",
"execution_count": 209,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"1"
]
},
"execution_count": 209,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"factorial(0)"
]
},
{
"cell_type": "code",
"execution_count": 210,
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/plain": [
"6"
]
},
"execution_count": 210,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"factorial(3)"
]
},
{
"cell_type": "code",
"execution_count": 211,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"6"
]
},
"execution_count": 211,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"factorial(3.0)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"With the keyword-only argument `strict`, we can control whether or not a passed in `float` object may be rounded. By default, this is not allowed."
]
},
{
"cell_type": "code",
"execution_count": 212,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"ename": "ValueError",
"evalue": "n is not an integer-like value; it has decimals",
"\u001b[0;32m<ipython-input-212-188b816be8b6>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mfactorial\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m3.1\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;32m<ipython-input-208-65a2caa81b4a>\u001b[0m in \u001b[0;36mfactorial\u001b[0;34m(n, strict)\u001b[0m\n\u001b[1;32m 16\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0misinstance\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mn\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mnumbers\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mReal\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 17\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mn\u001b[0m \u001b[0;34m!=\u001b[0m \u001b[0mmath\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtrunc\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mn\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mand\u001b[0m \u001b[0mstrict\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 18\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mValueError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"n is not an integer-like value; it has decimals\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 19\u001b[0m \u001b[0mn\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mmath\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtrunc\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mn\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 20\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;31mValueError\u001b[0m: n is not an integer-like value; it has decimals"
]
}
],
"source": [
"factorial(3.1)"
]
},
{
"cell_type": "code",
"execution_count": 213,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"6"
]
},
"execution_count": 213,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"factorial(3.1, strict=False)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"For `complex` numbers, `factorial()` still raises a `TypeError`."
]
},
{
"cell_type": "code",
"execution_count": 214,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"ename": "TypeError",
"evalue": "Factorial is only defined for integers",
"\u001b[0;32m<ipython-input-214-7296a74f5dcf>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mfactorial\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m1\u001b[0m \u001b[0;34m+\u001b[0m \u001b[0;36m2j\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;32m<ipython-input-208-65a2caa81b4a>\u001b[0m in \u001b[0;36mfactorial\u001b[0;34m(n, strict)\u001b[0m\n\u001b[1;32m 19\u001b[0m \u001b[0mn\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mmath\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtrunc\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mn\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 20\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 21\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mTypeError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"Factorial is only defined for integers\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 22\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 23\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mn\u001b[0m \u001b[0;34m<\u001b[0m \u001b[0;36m0\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;31mTypeError\u001b[0m: Factorial is only defined for integers"
"Furthermore, the [standard library](https://docs.python.org/3/library/index.html) adds two more types that can be used as substitutes for `float` objects:\n",