3971 lines
96 KiB
Text
3971 lines
96 KiB
Text
|
{
|
||
|
"cells": [
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"**Note**: Click on \"*Kernel*\" > \"*Restart Kernel and Clear All Outputs*\" in [JupyterLab](https://jupyterlab.readthedocs.io/en/stable/) *before* reading this notebook to reset its output. If you cannot run this file on your machine, you may want to open it [in the cloud <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_mb.png\">](https://mybinder.org/v2/gh/webartifex/intro-to-python/develop?urlpath=lab/tree/06_text/00_content.ipynb)."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "slide"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"# Chapter 6: Text & Bytes"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"In this chapter, we continue the study of the built-in data types. The next layer on top of numbers consists of **textual data** that are modeled primarily with the `str` type in Python. `str` objects are more complex than the numeric objects in [Chapter 5 <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_nb.png\">](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/develop/05_numbers/00_content.ipynb) as they *consist* of an *arbitrary* and possibly large number of *individual* characters that may be chosen from *any* alphabet in the history of humankind. Luckily, Python abstracts away most of this complexity from us. However, after looking at the `str` type in great detail, we briefly introduce the `bytes` type at the end of this chapter to understand how characters are modeled in memory."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "slide"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"## The `str` Type"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"To create a `str` object, we use the *literal* notation and type the text between enclosing **double quotes** `\"`."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 1,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "slide"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"text = \"Lorem ipsum dolor sit amet.\""
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"Like everything in Python, `text` is an object with an *identity*, a *type*, and a *value*."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 2,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "fragment"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"140667764715968"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 2,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"id(text)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 3,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "fragment"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"str"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 3,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"type(text)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"As seen before, a `str` object evaluates to itself in a literal notation with enclosing **single quotes** `'`.\n",
|
||
|
"\n",
|
||
|
"In [Chapter 1 <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_nb.png\">](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/develop/01_elements/00_content.ipynb#Value-/-(Semantic)-\"Meaning\"), we specify the double quotes `\"` convention this book follows. Yet, single quotes `'` and double quotes `\"` are *perfect* substitutes. We could use the reverse convention, as well. As [this discussion <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_so.png\">](https://stackoverflow.com/questions/56011/single-quotes-vs-double-quotes-in-python) shows, many programmers have *strong* opinions about such conventions. Consequently, the discussion was \"closed as not constructive\" by the moderators."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 4,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "fragment"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"'Lorem ipsum dolor sit amet.'"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 4,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"text"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"As the single quote `'` is often used in the English language as a shortener, we could make an argument in favor of using the double quotes `\"`: There are possibly fewer situations like the two code cells below, where we must **escape** the kind of quote used as the `str` object's delimiter with a backslash `\"\\\"` inside the text (cf., also the \"*Unicode & (Special) Characters*\" section further below). However, double quotes `\"` are often used as well, for example, to indicate a quote like the one by [Albert Einstein](https://de.wikipedia.org/wiki/Albert_Einstein) below. So, such arguments are not convincing.\n",
|
||
|
"\n",
|
||
|
"Many proponents of the single quote `'` usage claim that double quotes `\"` cause more **visual noise** on the screen. However, this argument is also not convincing as, for example, one could claim that *two* single quotes `''` look so similar to *one* double quote `\"` that a reader may confuse an *empty* `str` object with a missing closing quote `\"`. With the double quotes `\"` convention we at least avoid such confusion (i.e., empty `str` objects are written as `\"\"`).\n",
|
||
|
"\n",
|
||
|
"This discussion is an excellent example of a [flame war <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_wiki.png\">](https://en.wikipedia.org/wiki/Flaming_%28Internet%29#Flame_war) in the programming world: Everyone has an opinion and the discussion leads to *no* result."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 5,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "slide"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"'Einstein said, \"If you can\\'t explain it, you don\\'t understand it.\"'"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 5,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"\"Einstein said, \\\"If you can't explain it, you don't understand it.\\\"\""
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 6,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "fragment"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"'Einstein said, \"If you can\\'t explain it, you don\\'t understand it.\"'"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 6,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"'Einstein said, \"If you can\\'t explain it, you don\\'t understand it.\"'"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"An *important* fact to know is that enclosing quotes of either kind are *not* part of the `str` object's *value*! They are merely *syntax* indicating the literal notation.\n",
|
||
|
"\n",
|
||
|
"So, printing out the sentence with the built-in [print() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/functions.html#print) function does the same in both cases."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 7,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "fragment"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"name": "stdout",
|
||
|
"output_type": "stream",
|
||
|
"text": [
|
||
|
"Einstein said, \"If you can't explain it, you don't understand it.\"\n"
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"print(\"Einstein said, \\\"If you can't explain it, you don't understand it.\\\"\")"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 8,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"name": "stdout",
|
||
|
"output_type": "stream",
|
||
|
"text": [
|
||
|
"Einstein said, \"If you can't explain it, you don't understand it.\"\n"
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"print('Einstein said, \"If you can\\'t explain it, you don\\'t understand it.\"')"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"As an alternative to the literal notation, we may use the built-in [str() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/stdtypes.html#str) constructor to cast non-`str` objects as `str` ones. As [Chapter 11 <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_nb.png\">](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/develop/11_classes/00_content.ipynb) reveals, basically any object in Python has a **text representation**. Because of that we may also pass `list` objects, the boolean `True` and `False`, or `None` to [str() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/stdtypes.html#str)."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 9,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "slide"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"'42'"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 9,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"str(42)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 10,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "fragment"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"'42.87'"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 10,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"str(42.87)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 11,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "fragment"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"'[1, 2, 3]'"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 11,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"str([1, 2, 3])"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 12,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "fragment"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"'True'"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 12,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"str(True)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 13,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"'False'"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 13,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"str(False)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 14,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"'None'"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 14,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"str(None)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "slide"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"### User Input"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"As shown in the \"*Guessing a Coin Toss*\" example in [Chapter 4 <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_nb.png\">](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/develop/04_iteration/03_content.ipynb#Example:-Guessing-a-Coin-Toss), the built-in [input() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/functions.html#input) function displays a prompt to the user and returns whatever is entered as a `str` object. [input() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/functions.html#input) is in particular valuable when writing command-line tools."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 15,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "slide"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"name": "stdin",
|
||
|
"output_type": "stream",
|
||
|
"text": [
|
||
|
"Whatever you enter is put in a new string: 123\n"
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"user_input = input(\"Whatever you enter is put in a new string: \")"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 16,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "fragment"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"str"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 16,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"type(user_input)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 17,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "fragment"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"'123'"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 17,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"user_input"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "slide"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"### Reading Files"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"A more common situation where we obtain `str` objects is when reading the contents of a file with the [open() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/functions.html#open) built-in. In its simplest usage form, to open a [text file <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_wiki.png\">](https://en.wikipedia.org/wiki/Text_file) file, we pass in its path (i.e., \"filename\") as a `str` object."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 18,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "slide"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"file = open(\"lorem_ipsum.txt\")"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"[open() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/functions.html#open) returns a **[proxy <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_wiki.png\">](https://en.wikipedia.org/wiki/Proxy_pattern)** object of type `TextIOWrapper` that allows us to interact with the file on disk. `mode='r'` shows that we opened the file in read-only mode and `encoding='UTF-8'` is explained in detail in the [The `bytes` Type <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_nb.png\">](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/develop/06_text/01_content.ipynb#The-bytes-Type) section at the end of this chapter."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 19,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "fragment"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"_io.TextIOWrapper"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 19,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"type(file)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 20,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "fragment"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"<_io.TextIOWrapper name='lorem_ipsum.txt' mode='r' encoding='UTF-8'>"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 20,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"file"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"`TextIOWrapper` objects come with plenty of type-specific methods and attributes."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 21,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "slide"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"True"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 21,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"file.readable()"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 22,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "fragment"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"False"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 22,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"file.writable()"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 23,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "fragment"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"'lorem_ipsum.txt'"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 23,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"file.name"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 24,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "fragment"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"'UTF-8'"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 24,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"file.encoding"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"So far, we have not yet read anything from the file (i.e., from disk)! That is intentional as, for example, the file could contain more data than could fit into our computer's memory. Therefore, we have to explicitly instruct the `file` object to read some of or all the data in the file.\n",
|
||
|
"\n",
|
||
|
"One way to do that, is to simply loop over the `file` object with the `for` statement as shown next: In each iteration, `line` is assigned the next line in the file. Because we may loop over `TextIOWrapper` objects, they are *iterables*."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 25,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "slide"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"name": "stdout",
|
||
|
"output_type": "stream",
|
||
|
"text": [
|
||
|
"Lorem Ipsum is simply dummy text of the printing and typesetting industry.\n",
|
||
|
"\n",
|
||
|
"Lorem Ipsum has been the industry's standard dummy text ever since the 1500s\n",
|
||
|
"\n",
|
||
|
"when an unknown printer took a galley of type and scrambled it to make a type\n",
|
||
|
"\n",
|
||
|
"specimen book. It has survived not only five centuries but also the leap into\n",
|
||
|
"\n",
|
||
|
"electronic typesetting, remaining essentially unchanged. It was popularised in\n",
|
||
|
"\n",
|
||
|
"the 1960s with the release of Letraset sheets.\n",
|
||
|
"\n"
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"for line in file:\n",
|
||
|
" print(line)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"Once we looped over the `file` object, it is **exhausted**: We can *not* loop over it a second time. So, the built-in [print() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/functions.html#print) function is *never* called in the code cell below!"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 26,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"for line in file:\n",
|
||
|
" print(line)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"After the `for`-loop, the `line` variable is still set and references the *last* line in the file. We verify that it is indeed a `str` object."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 27,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "slide"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"'the 1960s with the release of Letraset sheets.\\n'"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 27,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"line"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 28,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "fragment"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"str"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 28,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"type(line)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"An *important* observation is that the `file` object is still associated with an *open* **[file descriptor <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_wiki.png\">](https://en.wikipedia.org/wiki/File_descriptor)**. Without going into any technical details, we note that an operating system can only handle a *limited* number of \"open files\" at the same time, and, therefore, we should always *close* the file once we are done processing it.\n",
|
||
|
"\n",
|
||
|
"`TextIOWrapper` objects have a `closed` attribute on them that indicates if the associated file descriptor is still open or has been closed. We can \"manually\" close any `TextIOWrapper` object with the [close() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/io.html#io.IOBase.close) method."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 29,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "slide"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"False"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 29,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"file.closed"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 30,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "fragment"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"file.close()"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 31,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "fragment"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"True"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 31,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"file.closed"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"The more Pythonic way is to use [open() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/functions.html#open) within the compound `with` statement (cf., [reference <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/reference/compound_stmts.html#the-with-statement)): In the example below, the indented code block is said to be executed within the **context** of the `file` object that now plays the role of a **[context manager <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/reference/datamodel.html#with-statement-context-managers)**. Many different kinds of context managers exist in Python with different applications and purposes. Context managers returned from [open() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/functions.html#open) mainly ensure that file descriptors get automatically closed after the last line in the code block is executed."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 32,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "slide"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"name": "stdout",
|
||
|
"output_type": "stream",
|
||
|
"text": [
|
||
|
"Lorem Ipsum is simply dummy text of the printing and typesetting industry.\n",
|
||
|
"\n",
|
||
|
"Lorem Ipsum has been the industry's standard dummy text ever since the 1500s\n",
|
||
|
"\n",
|
||
|
"when an unknown printer took a galley of type and scrambled it to make a type\n",
|
||
|
"\n",
|
||
|
"specimen book. It has survived not only five centuries but also the leap into\n",
|
||
|
"\n",
|
||
|
"electronic typesetting, remaining essentially unchanged. It was popularised in\n",
|
||
|
"\n",
|
||
|
"the 1960s with the release of Letraset sheets.\n",
|
||
|
"\n"
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"with open(\"lorem_ipsum.txt\") as file:\n",
|
||
|
" for line in file:\n",
|
||
|
" print(line)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 33,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "fragment"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"True"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 33,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"file.closed"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"Using syntax familiar from [Chapter 3 <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_nb.png\">](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/develop/03_conditionals/00_content.ipynb#The-try-Statement) to explain what the `with open(...) as file:` does above, we provide an alternative formulation with a `try` statement below: The `finally`-branch is *always* executed, even if an exception is raised inside the `for`-loop. Therefore, `file` is sure to be closed too. However, this formulation is somewhat less expressive."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 34,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"name": "stdout",
|
||
|
"output_type": "stream",
|
||
|
"text": [
|
||
|
"Lorem Ipsum is simply dummy text of the printing and typesetting industry.\n",
|
||
|
"\n",
|
||
|
"Lorem Ipsum has been the industry's standard dummy text ever since the 1500s\n",
|
||
|
"\n",
|
||
|
"when an unknown printer took a galley of type and scrambled it to make a type\n",
|
||
|
"\n",
|
||
|
"specimen book. It has survived not only five centuries but also the leap into\n",
|
||
|
"\n",
|
||
|
"electronic typesetting, remaining essentially unchanged. It was popularised in\n",
|
||
|
"\n",
|
||
|
"the 1960s with the release of Letraset sheets.\n",
|
||
|
"\n"
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"try:\n",
|
||
|
" file = open(\"lorem_ipsum.txt\")\n",
|
||
|
" for line in file:\n",
|
||
|
" print(line)\n",
|
||
|
"finally:\n",
|
||
|
" file.close()"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 35,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"True"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 35,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"file.closed"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"As an alternative to reading the contents of a file by looping over a `TextIOWrapper` object, we may also call one of the methods they come with.\n",
|
||
|
"\n",
|
||
|
"For example, the [read() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/io.html#io.TextIOBase.read) method takes a single `size` argument of type `int` and returns a `str` object with the specified number of characters."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 36,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "slide"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"file = open(\"lorem_ipsum.txt\")"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 37,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "fragment"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"'Lorem Ipsum'"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 37,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"file.read(11)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"When we call [read() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/io.html#io.TextIOBase.read) again, the returned `str` object begins where the previous one left off. This is because `TextIOWrapper` objects like `file` simply store a position at which the associated file on disk is being read. In other words, `file` is like a **cursor** pointing into a file."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 38,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"' is simply '"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 38,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"file.read(11)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"On the contrary, the [readline() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/io.html#io.TextIOBase.readline) method keeps reading until it hits a **newline character**. These are shown in `str` objects as `\"\\n\"`."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 39,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "fragment"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"'dummy text of the printing and typesetting industry.\\n'"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 39,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"file.readline()"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"When we call [readline() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/io.html#io.TextIOBase.readline) again, we obtain the next line."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 40,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"\"Lorem Ipsum has been the industry's standard dummy text ever since the 1500s\\n\""
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 40,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"file.readline()"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"Lastly, the [readlines() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/io.html#io.IOBase.readlines) method returns a `list` object that holds *all* lines in the `file` from the current position to the end of the file. The latter position is often abbreviated as **EOF** in the documentation. Let's always remember that [readlines() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/io.html#io.IOBase.readlines) has the potential to crash a computer with a `MemoryError`."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 41,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "fragment"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"['when an unknown printer took a galley of type and scrambled it to make a type\\n',\n",
|
||
|
" 'specimen book. It has survived not only five centuries but also the leap into\\n',\n",
|
||
|
" 'electronic typesetting, remaining essentially unchanged. It was popularised in\\n',\n",
|
||
|
" 'the 1960s with the release of Letraset sheets.\\n']"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 41,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"file.readlines()"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"Calling [readlines() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/io.html#io.IOBase.readlines) a second time, is as pointless as looping over `file` a second time."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 42,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"[]"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 42,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"file.readlines()"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 43,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "fragment"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"file.close()"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"Because every `str` object created by reading the contents of a file in any of the ways shown in this section ends with a `\"\\n\"`, we see empty lines printed between each `line` in the `for`-loops above. To print the entire text without empty lines in between, we pass a `end=\"\"` argument to the [print() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/functions.html#print) function."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 44,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"name": "stdout",
|
||
|
"output_type": "stream",
|
||
|
"text": [
|
||
|
"Lorem Ipsum is simply dummy text of the printing and typesetting industry.\n",
|
||
|
"Lorem Ipsum has been the industry's standard dummy text ever since the 1500s\n",
|
||
|
"when an unknown printer took a galley of type and scrambled it to make a type\n",
|
||
|
"specimen book. It has survived not only five centuries but also the leap into\n",
|
||
|
"electronic typesetting, remaining essentially unchanged. It was popularised in\n",
|
||
|
"the 1960s with the release of Letraset sheets.\n"
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"with open(\"lorem_ipsum.txt\") as file:\n",
|
||
|
" for line in file:\n",
|
||
|
" print(line, end=\"\")"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "slide"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"## A String of Characters"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"A **sequence** is yet another *abstract* concept (cf., the \"*Containers vs. Iterables*\" section in [Chapter 4 <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_nb.png\">](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/develop/04_iteration/02_content.ipynb#Containers-vs.-Iterables)).\n",
|
||
|
"\n",
|
||
|
"It unifies *four* [orthogonal <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_wiki.png\">](https://en.wikipedia.org/wiki/Orthogonality) (i.e., \"independent\") concepts into one bigger idea: Any data type, such as `str`, is considered a sequence if it\n",
|
||
|
"\n",
|
||
|
"1. **contains**\n",
|
||
|
"2. a **finite** number of other objects that\n",
|
||
|
"3. can be **iterated** over\n",
|
||
|
"4. in a *predictable* **order**.\n",
|
||
|
"\n",
|
||
|
"[Chapter 7 <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_nb.png\">](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/develop/07_sequences/00_content.ipynb#Collections-vs.-Sequences) formalizes these concepts in great detail. Here, we keep our focus on the `str` type that historically received its name as it models a **[string of characters <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_wiki.png\">](https://en.wikipedia.org/wiki/String_%28computer_science%29)**. *String* is simply another term for *sequence* in the computer science literature.\n",
|
||
|
"\n",
|
||
|
"Another example of a sequence is the `list` type. Because of that, `str` objects may be treated like `list` objects in many situations.\n",
|
||
|
"\n",
|
||
|
"Below, the built-in [len() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/functions.html#len) function tells us how many characters make up `text`. [len() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/functions.html#len) would not work with an \"infinite\" object. As anything modeled in a program must fit into a computer's finite memory, there cannot exist truly infinite objects; however, [Chapter 8 <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_nb.png\">](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/develop/08_mfr/00_content.ipynb#Iterators-vs.-Iterables) introduces specialized iterable data types that can be used to model an *infinite* series of \"things\" and that, consequently, have no concept of \"length.\""
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 45,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "slide"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"'Lorem ipsum dolor sit amet.'"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 45,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"text"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 46,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "fragment"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"27"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 46,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"len(text)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"Being iterable, we may loop over `text` and do something with the individual characters, for example, print them out with extra space in between them. If it were not for the appropriately chosen name of the `text` variable, we could not tell what *concrete* type of object the `for` statement is looping over."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 47,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "slide"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"name": "stdout",
|
||
|
"output_type": "stream",
|
||
|
"text": [
|
||
|
"L o r e m i p s u m d o l o r s i t a m e t . "
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"for character in text:\n",
|
||
|
" print(character, end=\" \")"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"With the [reversed() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/functions.html#reversed) built-in, we may loop over `text` in reversed order. Reversing `text` only works as it has a forward order to begin with."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 48,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "fragment"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"name": "stdout",
|
||
|
"output_type": "stream",
|
||
|
"text": [
|
||
|
". t e m a t i s r o l o d m u s p i m e r o L "
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"for character in reversed(text):\n",
|
||
|
" print(character, end=\" \")"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"Being a container, we may check if a given `str` object is contained in `text` with the `in` operator, which has *two* distinct usages: First, it checks if a *single* character is contained in a `str` object. Second, it may also check if a shorter `str` object, then called a **substring**, is contained in a longer one."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 49,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "slide"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"True"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 49,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"\"L\" in text"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 50,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "fragment"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"True"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 50,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"\"ipsum\" in text"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 51,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "fragment"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"False"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 51,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"\"veni, vidi, vici\" in text"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "slide"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"## Indexing"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"As `str` objects are *ordered* and *finite*, we may **index** into them to obtain individual characters with the **indexing operator** `[]`. This is analogous to how we obtained individual elements of a `list` object in [Chapter 1 <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_nb.png\">](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/develop/01_elements/03_content.ipynb#Who-am-I?-And-how-many?)."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 52,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "slide"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"'L'"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 52,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"text[0]"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 53,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "fragment"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"'o'"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 53,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"text[1]"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"The index must be of type `int`; othewise, we get a `TypeError`."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 54,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"ename": "TypeError",
|
||
|
"evalue": "string indices must be integers",
|
||
|
"output_type": "error",
|
||
|
"traceback": [
|
||
|
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
|
||
|
"\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)",
|
||
|
"\u001b[0;32m<ipython-input-54-31fb90bbe515>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mtext\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m1.0\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
|
||
|
"\u001b[0;31mTypeError\u001b[0m: string indices must be integers"
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"text[1.0]"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"The last index is one less than the above \"length\" of the `str` object as we start counting at `0`."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 55,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "fragment"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"'.'"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 55,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"text[26] # == text[len(text) - 1]"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"An `IndexError` is raised whenever the index is out of range."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 56,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "fragment"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"ename": "IndexError",
|
||
|
"evalue": "string index out of range",
|
||
|
"output_type": "error",
|
||
|
"traceback": [
|
||
|
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
|
||
|
"\u001b[0;31mIndexError\u001b[0m Traceback (most recent call last)",
|
||
|
"\u001b[0;32m<ipython-input-56-ffa3e1bc2f86>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mtext\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m27\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;31m# == text[len(text)]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
|
||
|
"\u001b[0;31mIndexError\u001b[0m: string index out of range"
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"text[27] # == text[len(text)]"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"We may use *negative* indexes to start counting from the end of the `str` object, as shown in the figure below. Note how this only works because sequences are *finite*."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "slide"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"| Index | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10| 11| 12| 13| 14| 15| 16| 17| 18| 19| 20| 21| 22| 23| 24| 25| 26|\n",
|
||
|
"|:---------:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|\n",
|
||
|
"|**Reverse**|-27|-26|-25|-24|-23|-22|-21|-20|-19|-18|-17|-16|-15|-14|-13|-12|-11|-10|-9 |-8 |-7 |-6 |-5 |-4 |-3 |-2 |-1 |\n",
|
||
|
"| **Character** |`L`|`o`|`r`|`e`|`m`|` `|`i`|`p`|`s`|`u`|`m`|` `|`d`|`o`|`l`|`o`|`r`|` `|`s`|`i`|`t`|` `|`a`|`m`|`e`|`t`|`.`|"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 57,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "fragment"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"'.'"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 57,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"text[-1]"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 58,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "fragment"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"'L'"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 58,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"text[-27] # == text[-len(text)]"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"One reason why programmers like to start counting at `0` is that a positive index and its *corresponding* negative index always add up to the length of the sequence. Here, `6` and `21` add to `27`."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 59,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"'i'"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 59,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"text[6]"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 60,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"'i'"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 60,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"text[-21]"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "slide"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"## Slicing"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"A **slice** is a substring of a `str` object.\n",
|
||
|
"\n",
|
||
|
"The **slicing operator** is a generalization of the indexing operator: We put one, two, or three integers within the brackets `[]`, separated by colons `:`. The three integers are then referred to as the *start*, *stop*, and *step* values.\n",
|
||
|
"\n",
|
||
|
"Let's start with two integers, *start* and *stop*. Whereas the character at the *start* position is included in the returned `str` object, the one at the *stop* position is not. If both *start* and *stop* are positive, the difference \"*stop* minus *start*\" tells us how many characters the resulting slice has. So, below, `5 - 0 == 5` implies that `\"Lorem\"` consists of `5` characters. So, colloquially speaking, `text[0:5]` means \"taking the first `5 - 0 == 5` characters of `text`.\""
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 61,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "slide"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"'Lorem'"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 61,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"text[0:5]"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 62,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "fragment"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"'dolor sit amet.'"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 62,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"text[12:len(text)]"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"If left out, *start* defaults to `0` and *stop* to the length of the `str` object (i.e., the end)."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 63,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "fragment"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"'Lorem'"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 63,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"text[:5]"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 64,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "fragment"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"'dolor sit amet.'"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 64,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"text[12:]"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"Not including the character at the *stop* position makes working with individual slices easier as they add up to the original `str` object again (cf., the \"*String Operations*\" section below regarding the overloaded `+` operator)."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 65,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"'Lorem ipsum dolor sit amet.'"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 65,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"text[:5] + text[5:]"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"Slicing and indexing makes it easy to obtain shorter versions of the original `str` object. A common application would be to **parse** out meaningful substrings from raw text data."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 66,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"'Lorem ipsum sit amet.'"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 66,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"text[:11] + text[-10:]"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"By combining a positive *start* with a negative *stop* index, we specify both ends of the slice *relative* to the ends of the entire `str` object. So, colloquially speaking, `[6:-10]` below means \"drop the first six and last ten characters.\" The length of the resulting slice can then *not* be calculated from the indexes and depends only on the length of the original `str` object!"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 67,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "slide"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"'ipsum dolor'"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 67,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"text[6:-10]"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"For convenience, the indexes do not need to lie within the range from `0` to `len(text)` when slicing. So, no `IndexError` is raised here."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 68,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"'Lorem ipsum dolor sit amet.'"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 68,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"text[-999:999]"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"By leaving out both *start* and *stop*, we take a \"full\" slice that is essentially a *copy* of the original `str` object."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 69,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "fragment"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"'Lorem ipsum dolor sit amet.'"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 69,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"text[:]"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"A *step* value of `i` can be used to obtain only every `i`th character."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 70,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "slide"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"'Lrmismdlrstae.'"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 70,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"text[::2]"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"A negative *step* size of `-1` reverses the order of the characters."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 71,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "fragment"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"'.tema tis rolod muspi meroL'"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 71,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"text[::-1]"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "slide"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"## Immutability"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"Whereas elements of a `list` object *may* be *re-assigned*, as shortly hinted at in [Chapter 1 <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_nb.png\">](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/develop/01_elements/03_content.ipynb#Who-am-I?-And-how-many?), this is *not* allowed for the individual characters of `str` objects. Once created, they can *not* be changed. Formally, we say that `str` objects are **immutable**. In that regard, they are like the numeric types in [Chapter 5 <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_nb.png\">](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/develop/05_numbers/00_content.ipynb).\n",
|
||
|
"\n",
|
||
|
"On the contrary, objects that may be changed after creation, are called **mutable**. We already saw in [Chapter 1 <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_nb.png\">](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/develop/01_elements/03_content.ipynb#Who-am-I?-And-how-many?) how mutable objects are more difficult to reason about for a beginner, in particular, if more than one variable references it. Yet, mutability does have its place in a programmer's toolbox, and we revisit this idea in the next chapters.\n",
|
||
|
"\n",
|
||
|
"The `TypeError` indicates that `str` objects are *immutable*: Assignment to an index or a slice are *not* supported."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 72,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "slide"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"ename": "TypeError",
|
||
|
"evalue": "'str' object does not support item assignment",
|
||
|
"output_type": "error",
|
||
|
"traceback": [
|
||
|
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
|
||
|
"\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)",
|
||
|
"\u001b[0;32m<ipython-input-72-d8a1e6b0b786>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mtext\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m\"X\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
|
||
|
"\u001b[0;31mTypeError\u001b[0m: 'str' object does not support item assignment"
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"text[0] = \"X\""
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 73,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "fragment"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"ename": "TypeError",
|
||
|
"evalue": "'str' object does not support item assignment",
|
||
|
"output_type": "error",
|
||
|
"traceback": [
|
||
|
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
|
||
|
"\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)",
|
||
|
"\u001b[0;32m<ipython-input-73-def135e84998>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mtext\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;36m5\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m\"random\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
|
||
|
"\u001b[0;31mTypeError\u001b[0m: 'str' object does not support item assignment"
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"text[:5] = \"random\""
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "slide"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"## String Methods"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"Objects of type `str` come with many **methods** bound on them (cf., the [documentation <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/stdtypes.html#string-methods) for a full list). As seen before, they work like *normal* functions and are accessed via the **dot operator** `.`. Calling a method is also referred to as **method invocation**.\n",
|
||
|
"\n",
|
||
|
"The [find() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/stdtypes.html#str.find) method returns the index of the first occurrence of a character or a substring. If no match is found, it returns `-1`. A mirrored version searching from the right called [rfind() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/stdtypes.html#str.rfind) exists as well. The [index() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/stdtypes.html#str.index) and [rindex() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/stdtypes.html#str.rindex) methods work in the same way but raise a `ValueError` if no match is found. So, we can control if a search fails *silently* or *loudly*."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 74,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "slide"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"'Lorem ipsum dolor sit amet.'"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 74,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"text"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 75,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "fragment"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"22"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 75,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"text.find(\"a\")"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 76,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "fragment"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"-1"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 76,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"text.find(\"b\")"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 77,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "fragment"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"12"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 77,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"text.find(\"dolor\")"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"[find() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/stdtypes.html#str.find) takes optional *start* and *end* arguments that allow us to find occurrences other than the first one."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 78,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"1"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 78,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"text.find(\"o\")"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 79,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"13"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 79,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"text.find(\"o\", 2)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 80,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"-1"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 80,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"text.find(\"o\", 2, 12)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"The [count() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/stdtypes.html#str.count) method does what we expect."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 81,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "slide"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"'Lorem ipsum dolor sit amet.'"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 81,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"text"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 82,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "fragment"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"1"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 82,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"text.count(\"l\")"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"As [count() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/stdtypes.html#str.count) is *case-sensitive*, we must **chain** it with the [lower() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/stdtypes.html#str.lower) method to get the count of all `\"L\"`s and `\"l\"`s."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 83,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "fragment"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"2"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 83,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"text.lower().count(\"l\")"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"Alternatively, we can use the [upper() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/stdtypes.html#str.upper) method and search for `\"L\"`s."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 84,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"2"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 84,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"text.upper().count(\"L\")"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"Because `str` objects are *immutable*, [upper() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/stdtypes.html#str.upper) and [lower() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/stdtypes.html#str.lower) return *new* `str` objects, even if they do *not* change the value of the original `str` object."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 85,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "slide"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"example = \"random\""
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 86,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"140667840152112"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 86,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"id(example)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 87,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "fragment"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"lower = example.lower()"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 88,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"140667764453680"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 88,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"id(lower)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"`example` and `lower` are *different* objects with the *same* value."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 89,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "fragment"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"False"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 89,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"example is lower"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 90,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "fragment"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"True"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 90,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"example == lower"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"Besides [upper() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/stdtypes.html#str.upper) and [lower() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/stdtypes.html#str.lower) there exist also [title() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/stdtypes.html#str.title) and [swapcase() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/stdtypes.html#str.swapcase) methods."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 91,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"'lorem ipsum dolor sit amet.'"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 91,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"text.lower()"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 92,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"'LOREM IPSUM DOLOR SIT AMET.'"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 92,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"text.upper()"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 93,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"'Lorem Ipsum Dolor Sit Amet.'"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 93,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"text.title()"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 94,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"'lOREM IPSUM DOLOR SIT AMET.'"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 94,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"text.swapcase()"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"Another popular string method is [split() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/stdtypes.html#str.split): It separates a longer `str` object into smaller ones collected in a `list` object. By default, groups of contiguous whitespace characters are used as the *separator*.\n",
|
||
|
"\n",
|
||
|
"As an example, we use [split() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/stdtypes.html#str.split) to print out the individual words in `text` with more whitespace in between them."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 95,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "slide"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"['Lorem', 'ipsum', 'dolor', 'sit', 'amet.']"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 95,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"text.split()"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 96,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "fragment"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"name": "stdout",
|
||
|
"output_type": "stream",
|
||
|
"text": [
|
||
|
"Lorem ipsum dolor sit amet. "
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"for word in text.split():\n",
|
||
|
" print(word, end=\" \")"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"The opposite of splitting is done with the [join() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/stdtypes.html#str.join) method. It is typically invoked on a `str` object that represents a separator (e.g., `\" \"` or `\", \"`) and connects the elements provided by an *iterable* argument (e.g., `words` below) into one *new* `str` object."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 97,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "slide"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"words = [\"This\", \"will\", \"become\", \"a\", \"sentence.\"]"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 98,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "fragment"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"sentence = \" \".join(words)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 99,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "fragment"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"'This will become a sentence.'"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 99,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"sentence"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"As the `str` object `\"abcde\"` below is an *iterable* itself, its characters (!) are joined together with a space `\" \"` in between."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 100,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "fragment"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"'a b c d e'"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 100,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"\" \".join(\"abcde\")"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"The [replace() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/stdtypes.html#str.replace) method creates a *new* `str` object with parts of the original `str` object potentially replaced."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 101,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "slide"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"'This is a sentence.'"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 101,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"sentence.replace(\"will become\", \"is\")"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"Note how `sentence` itself remains unchanged. Bound to an immutable object, [replace() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/stdtypes.html#str.replace) must create *new* objects."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 102,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "fragment"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"'This will become a sentence.'"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 102,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"sentence"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"As seen previously, the [strip() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/stdtypes.html#str.strip) method is often helpful in cleaning text data from unreliable sources like user input from unnecessary leading and trailing whitespace. The [lstrip() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/stdtypes.html#str.lstrip) and [rstrip() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/stdtypes.html#str.rstrip) methods are specialized versions of it."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 103,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "slide"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"'text with whitespace'"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 103,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"\" text with whitespace \".strip()"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 104,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "fragment"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"'text with whitespace '"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 104,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"\" text with whitespace \".lstrip()"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 105,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "fragment"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"' text with whitespace'"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 105,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"\" text with whitespace \".rstrip()"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"When justifying a `str` object for output, the [ljust() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/stdtypes.html#str.ljust) and [rjust() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/stdtypes.html#str.rjust) methods may be helpful."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 106,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "slide"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"'This will become a sentence. '"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 106,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"sentence.ljust(40)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 107,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "fragment"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"' This will become a sentence.'"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 107,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"sentence.rjust(40)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"Similarly, the [zfill() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/stdtypes.html#str.zfill) method can be used to pad a `str` representation of a number with leading `0`s for justified output."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 108,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "fragment"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"'0000042.87'"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 108,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"\"42.87\".zfill(10)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 109,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "fragment"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"'-000042.87'"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 109,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"\"-42.87\".zfill(10)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "slide"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"## String Operations"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"As mentioned in [Chapter 1 <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_nb.png\">](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/develop/01_elements/00_content.ipynb#Operator-Overloading), the `+` and `*` operators are *overloaded* and used for **string concatenation**. They always create *new* `str` objects. That has nothing to do with the `str` type's immutability, but is the default behavior of operators."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 110,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "slide"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"'Hello Lore'"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 110,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"\"Hello \" + text[:4]"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 111,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "fragment"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"'Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum ...'"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 111,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"5 * text[:12] + \"...\""
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "slide"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"### String Comparison"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"The *relational* operators also work with `str` objects, another example of operator overloading. Comparison is done one character at a time in a pairwise fashion until the first pair differs or one operand ends. However, `str` objects are sorted in a \"weird\" way. For example, all upper case characters come before all lower case characters. The reason for that is given in the \"*Characters are Numbers with a Convention*\" sub-section in the [second part <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_nb.png\">](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/develop/06_text/02_content.ipynb#Characters-are-Numbers-with-a-Convention) of this chapter."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 112,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "slide"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"True"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 112,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"\"Apple\" < \"Banana\""
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 113,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "fragment"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"False"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 113,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"\"apple\" < \"Banana\""
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 114,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "fragment"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"True"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 114,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"\"apple\" < \"Banana\".lower()"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"Below is an example with typical German last names that shows how characters other than the first decide the ordering."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 115,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "fragment"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"True"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 115,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"\"Mai\" < \"Maier\" < \"Mayer\" < \"Meier\" < \"Meyer\""
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "slide"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"## String Interpolation"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"Often, we want to use `str` objects as drafts in the source code that are filled in with concrete text only at runtime. This approach is called **string interpolation**. There are three ways to do that in Python."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "slide"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"### f-strings"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"**[Formatted string literals <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/reference/lexical_analysis.html#formatted-string-literals)**, of **f-strings** for short, are the least recently added (cf., [PEP 498 <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://www.python.org/dev/peps/pep-0498/) in 2016) and most readable way: We simply prepend a `str` in its literal notation with an `f`, and put variables, or more generally, expressions, within curly braces `{}`. These are then filled in when the string literal is evaluated."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 116,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "slide"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"name = \"Alexander\"\n",
|
||
|
"time_of_day = \"morning\""
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 117,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "fragment"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"'Hello Alexander! Good morning.'"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 117,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"f\"Hello {name}! Good {time_of_day}.\""
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"Separated by a colon `:`, various formatting options are available. In the beginning, the ability to round numbers for output may be particularly useful: This can be achieved by adding `:.2f` to the variable name inside the curly braces, which casts the number as a `float` and rounds it to two digits. The `:.2f` is a so-called format specifier, and there exists a whole **[format specification mini-language <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/string.html#formatspec)** to govern how specifiers work."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 118,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "fragment"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"pi = 3.141592653"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 119,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "fragment"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"'Pi is 3.14'"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 119,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"f\"Pi is {pi:.2f}\""
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "slide"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"### [format() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/stdtypes.html#str.format) Method"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"`str` objects also provide a [format() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/stdtypes.html#str.format) method that accepts an arbitrary number of *positional* arguments that are inserted into the `str` object in the same order replacing empty curly brackets `{}`. String interpolation with the [format() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/stdtypes.html#str.format) method is a more traditional and probably the most common way as of today. While f-strings are the recommended way going forward, usage of the [format() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/stdtypes.html#str.format) method is likely not declining any time soon."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 120,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "slide"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"'Hello Alexander! Good morning.'"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 120,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"\"Hello {}! Good {}.\".format(name, time_of_day)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"We may use index numbers inside the curly braces if the order is different in the `str` object."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 121,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "fragment"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"'Good morning, Alexander'"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 121,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"\"Good {1}, {0}\".format(name, time_of_day)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"The [format() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/stdtypes.html#str.format) method may alternatively be used with *keyword* arguments as well. Then, we must put the keywords' names within the curly brackets."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 122,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "fragment"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"'Hello Alexander! Good morning.'"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 122,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"\"Hello {name}! Good {time}.\".format(name=name, time=time_of_day)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"Format specifiers work as in the f-string case."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 123,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "fragment"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"'Pi is 3.14'"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 123,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"\"Pi is {:.2f}\".format(pi)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "slide"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"### `%` Operator"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"The `%` operator that we saw in the context of modulo division in [Chapter 1 <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_nb.png\">](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/develop/01_elements/00_content.ipynb#%28Arithmetic%29-Operators) is overloaded with string interpolation when its first operand is a `str` object. The second operand consists of all expressions to be filled in. Format specifiers work with a `%` instead of curly braces and according to a different set of rules referred to as **[printf-style string formatting <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/stdtypes.html#printf-style-string-formatting)**. So, `{:.2f}` becomes `%.2f`.\n",
|
||
|
"\n",
|
||
|
"This way of string interpolation is the oldest and originates from the [C language <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_wiki.png\">](https://en.wikipedia.org/wiki/C_%28programming_language%29). It is still widely spread, but we should use one of the other two ways instead. We show it here mainly for completeness sake."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 124,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "slide"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"'Pi is 3.14'"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 124,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"\"Pi is %.2f\" % pi"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"To insert more than one expression, we must list them in order and between parenthesis `(` and `)`. As [Chapter 7 <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_nb.png\">](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/develop/07_sequences/00_content.ipynb#The-tuple-Type) reveals, this literal syntax creates an object of type `tuple`. Also, to format an expression as text, we use the format specifier `%s`."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 125,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "fragment"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"'Hello Alexander! Good morning.'"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 125,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"\"Hello %s! Good %s.\" % (name, time_of_day)"
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"metadata": {
|
||
|
"kernelspec": {
|
||
|
"display_name": "Python 3",
|
||
|
"language": "python",
|
||
|
"name": "python3"
|
||
|
},
|
||
|
"language_info": {
|
||
|
"codemirror_mode": {
|
||
|
"name": "ipython",
|
||
|
"version": 3
|
||
|
},
|
||
|
"file_extension": ".py",
|
||
|
"mimetype": "text/x-python",
|
||
|
"name": "python",
|
||
|
"nbconvert_exporter": "python",
|
||
|
"pygments_lexer": "ipython3",
|
||
|
"version": "3.8.6"
|
||
|
},
|
||
|
"livereveal": {
|
||
|
"auto_select": "code",
|
||
|
"auto_select_fragment": true,
|
||
|
"scroll": true,
|
||
|
"theme": "serif"
|
||
|
},
|
||
|
"toc": {
|
||
|
"base_numbering": 1,
|
||
|
"nav_menu": {},
|
||
|
"number_sections": false,
|
||
|
"sideBar": true,
|
||
|
"skip_h1_title": true,
|
||
|
"title_cell": "Table of Contents",
|
||
|
"title_sidebar": "Contents",
|
||
|
"toc_cell": false,
|
||
|
"toc_position": {
|
||
|
"height": "calc(100% - 180px)",
|
||
|
"left": "10px",
|
||
|
"top": "150px",
|
||
|
"width": "384px"
|
||
|
},
|
||
|
"toc_section_display": false,
|
||
|
"toc_window_display": false
|
||
|
}
|
||
|
},
|
||
|
"nbformat": 4,
|
||
|
"nbformat_minor": 4
|
||
|
}
|