"**Note**: Click on \"*Kernel*\" > \"*Restart Kernel and Clear All Outputs*\" in [JupyterLab](https://jupyterlab.readthedocs.io/en/stable/) *before* reading this notebook to reset its output. If you cannot run this file on your machine, you may want to open it [in the cloud <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_mb.png\">](https://mybinder.org/v2/gh/webartifex/intro-to-python/main?urlpath=lab/tree/09_mappings/05_appendix.ipynb)."
"The [collections <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/collections.html) module in the [standard library <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/index.html) provides specialized mapping types for common use cases."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"## The `defaultdict` Type"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"The [defaultdict <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/collections.html#collections.defaultdict) type allows us to define a factory function that creates default values whenever we look up a key that does not yet exist. Ordinary `dict` objects would throw a `KeyError` exception in such situations.\n",
"\n",
"Let's say we have a `list` with *records* of goals scored during a soccer game. The records consist of the fields \"Country,\" \"Player,\" and the \"Time\" when a goal was scored. Our task is to group the goals by player and/or country."
"Using a normal `dict` object, we have to tediously check if a player has already scored a goal before. If not, we must create a *new* `list` object with the first time the player scored. Otherwise, we append the goal to an already existing `list` object."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"data": {
"text/plain": [
"{'Müller': [11],\n",
" 'Klose': [23],\n",
" 'Kroos': [24, 26],\n",
" 'Khedira': [29],\n",
" 'Schürrle': [69, 79],\n",
" 'Oscar': [90]}"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"goals_by_player = {}\n",
"\n",
"for _, player, minute in goals:\n",
" if player not in goals_by_player:\n",
" goals_by_player[player] = [minute]\n",
" else:\n",
" goals_by_player[player].append(minute)\n",
"\n",
"goals_by_player"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Instead, with a `defaultdict` object, we can portray the code fragment's intent in a concise form. We pass a reference to the [list() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/functions.html#func-list) built-in to `defaultdict`."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [],
"source": [
"from collections import defaultdict"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"data": {
"text/plain": [
"defaultdict(list,\n",
" {'Müller': [11],\n",
" 'Klose': [23],\n",
" 'Kroos': [24, 26],\n",
" 'Khedira': [29],\n",
" 'Schürrle': [69, 79],\n",
" 'Oscar': [90]})"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"goals_by_player = defaultdict(list)\n",
"\n",
"for _, player, minute in goals:\n",
" goals_by_player[player].append(minute)\n",
"\n",
"goals_by_player"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"data": {
"text/plain": [
"collections.defaultdict"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"type(goals_by_player)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"A reference to the factory function is stored in the `default_factory` attribute."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"data": {
"text/plain": [
"list"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"goals_by_player.default_factory"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"If we want this code to produce a normal `dict` object, we pass `goals_by_player` to the [dict() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/functions.html#func-dict) constructor."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"data": {
"text/plain": [
"{'Müller': [11],\n",
" 'Klose': [23],\n",
" 'Kroos': [24, 26],\n",
" 'Khedira': [29],\n",
" 'Schürrle': [69, 79],\n",
" 'Oscar': [90]}"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dict(goals_by_player)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Being creative, we use a factory function, created with a `lambda` expression, that returns another [defaultdict <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/collections.html#collections.defaultdict) with [list() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/functions.html#func-list) as its factory to group on the country and the player level simultaneously."
"Conversion into a normal and nested `dict` object is now a bit tricky but can be achieved in one line with a comprehension."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"data": {
"text/plain": [
"{'Germany': {'Müller': [11],\n",
" 'Klose': [23],\n",
" 'Kroos': [24, 26],\n",
" 'Khedira': [29],\n",
" 'Schürrle': [69, 79]},\n",
" 'Brazil': {'Oscar': [90]}}"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"{country: dict(by_player) for country, by_player in goals_by_country_and_player.items()}"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"## The `Counter` Type"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"A common task is to count the number of occurrences of elements in an iterable.\n",
"\n",
"The [Counter <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/collections.html#collections.Counter) type provides an easy-to-use interface that can be called with any iterable and returns a `dict`-like object of type `Counter` that maps each unique elements to the number of times it occurs.\n",
"\n",
"To continue the previous example, let's create an overview that shows how many goals a player scorred. We use a generator expression as the argument to `Counter`."
"Now we can look up individual players. `scores` behaves like a normal dictionary with regard to key look-ups."
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"data": {
"text/plain": [
"1"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"scorers[\"Müller\"]"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"By default, it returns `0` if a key is not found. So, we do not have to handle a `KeyError`."
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"data": {
"text/plain": [
"0"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"scorers[\"Lahm\"]"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"`Counter` objects have a [.most_common() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/collections.html#collections.Counter.most_common) method that returns a `list` object containing $2$-element `tuple` objects, where the first element is the element from the original iterable and the second the number of occurrences. The `list` object is sorted in descending order of occurrences."
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"data": {
"text/plain": [
"[('Kroos', 2), ('Schürrle', 2)]"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"scorers.most_common(2)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"We can increase the count of individual entries with the [.update() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/collections.html#collections.Counter.update) method: That takes an *iterable* of the elements we want to count.\n",
"\n",
"Imagine if [Philipp Lahm <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_wiki.png\">](https://en.wikipedia.org/wiki/Philipp_Lahm) had also scored against Brazil."
"If we use a `str` object as the argument instead, each individual character is treated as an element to be updated. That is most likely not what we want."
"Consider `to_words`, `more_words`, and `even_more_words` below. Instead of merging the items of the three `dict` objects together into a *new* one, we want to create an object that behaves as if it contained all the unified items in it without materializing them in memory a second time."
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [],
"source": [
"to_words = {\n",
" 0: \"zero\",\n",
" 1: \"one\",\n",
" 2: \"two\",\n",
"}"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [],
"source": [
"more_words = {\n",
" 2: \"TWO\", # to illustrate a point\n",
" 3: \"three\",\n",
" 4: \"four\",\n",
"}"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [],
"source": [
"even_more_words = {\n",
" 4: \"FOUR\", # to illustrate a point\n",
" 5: \"five\",\n",
" 6: \"six\",\n",
"}"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"The [ChainMap <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/collections.html#collections.ChainMap) type allows us to do precisely that."
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [],
"source": [
"from collections import ChainMap"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"We simply pass all mappings as positional arguments to `ChainMap` and obtain a **proxy** object that occupies almost no memory but gives us access to the union of all the items."
"Let's loop over the items in `chain` and see what is \"in\" it. The order is obviously *unpredictable* but all seven items we expected are there. Keys of later mappings do *not* overwrite earlier keys."
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"4 four\n",
"5 five\n",
"6 six\n",
"2 two\n",
"3 three\n",
"0 zero\n",
"1 one\n"
]
}
],
"source": [
"for number, word in chain.items():\n",
" print(number, word)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"When looking up a non-existent key, `ChainMap` objects raise a `KeyError` just like normal `dict` objects would."