{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "**Note**: Click on \"*Kernel*\" > \"*Restart Kernel and Clear All Outputs*\" in [JupyterLab](https://jupyterlab.readthedocs.io/en/stable/) *before* reading this notebook to reset its output. If you cannot run this file on your machine, you may want to open it [in the cloud ](https://mybinder.org/v2/gh/webartifex/intro-to-python/main?urlpath=lab/tree/09_mappings/05_appendix.ipynb)." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Chapter 9: Mappings & Sets (Appendix)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "The [collections ](https://docs.python.org/3/library/collections.html) module in the [standard library ](https://docs.python.org/3/library/index.html) provides specialized mapping types for common use cases." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "## The `defaultdict` Type" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "The [defaultdict ](https://docs.python.org/3/library/collections.html#collections.defaultdict) type allows us to define a factory function that creates default values whenever we look up a key that does not yet exist. Ordinary `dict` objects would throw a `KeyError` exception in such situations.\n", "\n", "Let's say we have a `list` with *records* of goals scored during a soccer game. The records consist of the fields \"Country,\" \"Player,\" and the \"Time\" when a goal was scored. Our task is to group the goals by player and/or country." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "goals = [\n", " (\"Germany\", \"Müller\", 11), (\"Germany\", \"Klose\", 23),\n", " (\"Germany\", \"Kroos\", 24), (\"Germany\", \"Kroos\", 26),\n", " (\"Germany\", \"Khedira\", 29), (\"Germany\", \"Schürrle\", 69),\n", " (\"Germany\", \"Schürrle\", 79), (\"Brazil\", \"Oscar\", 90),\n", "]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "Using a normal `dict` object, we have to tediously check if a player has already scored a goal before. If not, we must create a *new* `list` object with the first time the player scored. Otherwise, we append the goal to an already existing `list` object." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [ { "data": { "text/plain": [ "{'Müller': [11],\n", " 'Klose': [23],\n", " 'Kroos': [24, 26],\n", " 'Khedira': [29],\n", " 'Schürrle': [69, 79],\n", " 'Oscar': [90]}" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "goals_by_player = {}\n", "\n", "for _, player, minute in goals:\n", " if player not in goals_by_player:\n", " goals_by_player[player] = [minute]\n", " else:\n", " goals_by_player[player].append(minute)\n", "\n", "goals_by_player" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "Instead, with a `defaultdict` object, we can portray the code fragment's intent in a concise form. We pass a reference to the [list() ](https://docs.python.org/3/library/functions.html#func-list) built-in to `defaultdict`." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from collections import defaultdict" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [ { "data": { "text/plain": [ "defaultdict(list,\n", " {'Müller': [11],\n", " 'Klose': [23],\n", " 'Kroos': [24, 26],\n", " 'Khedira': [29],\n", " 'Schürrle': [69, 79],\n", " 'Oscar': [90]})" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "goals_by_player = defaultdict(list)\n", "\n", "for _, player, minute in goals:\n", " goals_by_player[player].append(minute)\n", "\n", "goals_by_player" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [ { "data": { "text/plain": [ "collections.defaultdict" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(goals_by_player)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "A reference to the factory function is stored in the `default_factory` attribute." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [ { "data": { "text/plain": [ "list" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "goals_by_player.default_factory" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "If we want this code to produce a normal `dict` object, we pass `goals_by_player` to the [dict() ](https://docs.python.org/3/library/functions.html#func-dict) constructor." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [ { "data": { "text/plain": [ "{'Müller': [11],\n", " 'Klose': [23],\n", " 'Kroos': [24, 26],\n", " 'Khedira': [29],\n", " 'Schürrle': [69, 79],\n", " 'Oscar': [90]}" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dict(goals_by_player)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "Being creative, we use a factory function, created with a `lambda` expression, that returns another [defaultdict ](https://docs.python.org/3/library/collections.html#collections.defaultdict) with [list() ](https://docs.python.org/3/library/functions.html#func-list) as its factory to group on the country and the player level simultaneously." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [ { "data": { "text/plain": [ "defaultdict(()>,\n", " {'Germany': defaultdict(list,\n", " {'Müller': [11],\n", " 'Klose': [23],\n", " 'Kroos': [24, 26],\n", " 'Khedira': [29],\n", " 'Schürrle': [69, 79]}),\n", " 'Brazil': defaultdict(list, {'Oscar': [90]})})" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "goals_by_country_and_player = defaultdict(lambda: defaultdict(list))\n", "\n", "for country, player, minute in goals:\n", " goals_by_country_and_player[country][player].append(minute)\n", "\n", "goals_by_country_and_player" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "Conversion into a normal and nested `dict` object is now a bit tricky but can be achieved in one line with a comprehension." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [ { "data": { "text/plain": [ "{'Germany': {'Müller': [11],\n", " 'Klose': [23],\n", " 'Kroos': [24, 26],\n", " 'Khedira': [29],\n", " 'Schürrle': [69, 79]},\n", " 'Brazil': {'Oscar': [90]}}" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "{country: dict(by_player) for country, by_player in goals_by_country_and_player.items()}" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "## The `Counter` Type" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "A common task is to count the number of occurrences of elements in an iterable.\n", "\n", "The [Counter ](https://docs.python.org/3/library/collections.html#collections.Counter) type provides an easy-to-use interface that can be called with any iterable and returns a `dict`-like object of type `Counter` that maps each unique elements to the number of times it occurs.\n", "\n", "To continue the previous example, let's create an overview that shows how many goals a player scorred. We use a generator expression as the argument to `Counter`." ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [ { "data": { "text/plain": [ "[('Germany', 'Müller', 11),\n", " ('Germany', 'Klose', 23),\n", " ('Germany', 'Kroos', 24),\n", " ('Germany', 'Kroos', 26),\n", " ('Germany', 'Khedira', 29),\n", " ('Germany', 'Schürrle', 69),\n", " ('Germany', 'Schürrle', 79),\n", " ('Brazil', 'Oscar', 90)]" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "goals" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from collections import Counter" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "scorers = Counter(x[1] for x in goals)" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [ { "data": { "text/plain": [ "Counter({'Kroos': 2,\n", " 'Schürrle': 2,\n", " 'Müller': 1,\n", " 'Klose': 1,\n", " 'Khedira': 1,\n", " 'Oscar': 1})" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "scorers" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [ { "data": { "text/plain": [ "collections.Counter" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(scorers)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "Now we can look up individual players. `scores` behaves like a normal dictionary with regard to key look-ups." ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [ { "data": { "text/plain": [ "1" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "scorers[\"Müller\"]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "By default, it returns `0` if a key is not found. So, we do not have to handle a `KeyError`." ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [ { "data": { "text/plain": [ "0" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "scorers[\"Lahm\"]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "`Counter` objects have a [.most_common() ](https://docs.python.org/3/library/collections.html#collections.Counter.most_common) method that returns a `list` object containing $2$-element `tuple` objects, where the first element is the element from the original iterable and the second the number of occurrences. The `list` object is sorted in descending order of occurrences." ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [ { "data": { "text/plain": [ "[('Kroos', 2), ('Schürrle', 2)]" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "scorers.most_common(2)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "We can increase the count of individual entries with the [.update() ](https://docs.python.org/3/library/collections.html#collections.Counter.update) method: That takes an *iterable* of the elements we want to count.\n", "\n", "Imagine if [Philipp Lahm ](https://en.wikipedia.org/wiki/Philipp_Lahm) had also scored against Brazil." ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "scorers.update([\"Lahm\"])" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [ { "data": { "text/plain": [ "Counter({'Kroos': 2,\n", " 'Schürrle': 2,\n", " 'Müller': 1,\n", " 'Klose': 1,\n", " 'Khedira': 1,\n", " 'Oscar': 1,\n", " 'Lahm': 1})" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "scorers" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "If we use a `str` object as the argument instead, each individual character is treated as an element to be updated. That is most likely not what we want." ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "scorers.update(\"Lahm\")" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [ { "data": { "text/plain": [ "Counter({'Kroos': 2,\n", " 'Schürrle': 2,\n", " 'Müller': 1,\n", " 'Klose': 1,\n", " 'Khedira': 1,\n", " 'Oscar': 1,\n", " 'Lahm': 1,\n", " 'L': 1,\n", " 'a': 1,\n", " 'h': 1,\n", " 'm': 1})" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "scorers" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "## The `ChainMap` Type" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "Consider `to_words`, `more_words`, and `even_more_words` below. Instead of merging the items of the three `dict` objects together into a *new* one, we want to create an object that behaves as if it contained all the unified items in it without materializing them in memory a second time." ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "to_words = {\n", " 0: \"zero\",\n", " 1: \"one\",\n", " 2: \"two\",\n", "}" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "more_words = {\n", " 2: \"TWO\", # to illustrate a point\n", " 3: \"three\",\n", " 4: \"four\",\n", "}" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "even_more_words = {\n", " 4: \"FOUR\", # to illustrate a point\n", " 5: \"five\",\n", " 6: \"six\",\n", "}" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "The [ChainMap ](https://docs.python.org/3/library/collections.html#collections.ChainMap) type allows us to do precisely that." ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from collections import ChainMap" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "We simply pass all mappings as positional arguments to `ChainMap` and obtain a **proxy** object that occupies almost no memory but gives us access to the union of all the items." ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "chain = ChainMap(to_words, more_words, even_more_words)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "Let's loop over the items in `chain` and see what is \"in\" it. The order is obviously *unpredictable* but all seven items we expected are there. Keys of later mappings do *not* overwrite earlier keys." ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "4 four\n", "5 five\n", "6 six\n", "2 two\n", "3 three\n", "0 zero\n", "1 one\n" ] } ], "source": [ "for number, word in chain.items():\n", " print(number, word)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "When looking up a non-existent key, `ChainMap` objects raise a `KeyError` just like normal `dict` objects would." ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [ { "ename": "KeyError", "evalue": "10", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mKeyError\u001b[0m Traceback (most recent call last)", "Cell \u001b[0;32mIn[28], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m \u001b[43mchain\u001b[49m\u001b[43m[\u001b[49m\u001b[38;5;241;43m10\u001b[39;49m\u001b[43m]\u001b[49m\n", "File \u001b[0;32m/usr/lib64/python3.12/collections/__init__.py:1014\u001b[0m, in \u001b[0;36mChainMap.__getitem__\u001b[0;34m(self, key)\u001b[0m\n\u001b[1;32m 1012\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mKeyError\u001b[39;00m:\n\u001b[1;32m 1013\u001b[0m \u001b[38;5;28;01mpass\u001b[39;00m\n\u001b[0;32m-> 1014\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[38;5;21;43m__missing__\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43mkey\u001b[49m\u001b[43m)\u001b[49m\n", "File \u001b[0;32m/usr/lib64/python3.12/collections/__init__.py:1006\u001b[0m, in \u001b[0;36mChainMap.__missing__\u001b[0;34m(self, key)\u001b[0m\n\u001b[1;32m 1005\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21m__missing__\u001b[39m(\u001b[38;5;28mself\u001b[39m, key):\n\u001b[0;32m-> 1006\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mKeyError\u001b[39;00m(key)\n", "\u001b[0;31mKeyError\u001b[0m: 10" ] } ], "source": [ "chain[10]" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.2" }, "livereveal": { "auto_select": "code", "auto_select_fragment": true, "scroll": true, "theme": "serif" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": false, "sideBar": true, "skip_h1_title": true, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": { "height": "calc(100% - 180px)", "left": "10px", "top": "150px", "width": "384px" }, "toc_section_display": false, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 4 }