**Note**: Click on "*Kernel*" > "*Restart Kernel and Clear All Outputs*" in [JupyterLab](https://jupyterlab.readthedocs.io/en/stable/) *before* reading this notebook to reset its output. If you cannot run this file on your machine, you may want to open it [in the cloud <img height="12" style="display: inline-block" src="../static/link/to_mb.png">](https://mybinder.org/v2/gh/webartifex/intro-to-python/develop?urlpath=lab/tree/08_mfr/00_content.ipynb).

# Chapter 8: Map, Filter, & Reduce

In this chapter, we continue the study of sequential data by looking at memory efficient ways to process the elements in a sequence. That is an important topic for the data science practitioner who must be able to work with data that does *not* fit into a single computer's memory.

As shown in [Chapter 4 <img height="12" style="display: inline-block" src="../static/link/to_nb.png">](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/develop/04_iteration/02_content.ipynb#Containers-vs.-Iterables), both the `list` objects `[0, 1, 2, 3, 4]` and `[1, 3, 5, 7, 9]` on the one side and the `range` objects `range(5)` and `range(1, 10, 2)` on the other side allow us to loop over the same numbers. However, the latter two only create *one* `int` object in every iteration while the former two create *all* `int` objects before the loop even starts. In this aspect, we consider `range` objects to be "rules" in memory that know how to calculate the numbers *without* calculating them.

In [Chapter 7 <img height="12" style="display: inline-block" src="../static/link/to_nb.png">](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/develop/07_sequences/01_content.ipynb#The-list-Type), we see how the built-in [list() <img height="12" style="display: inline-block" src="../static/link/to_py.png">](https://docs.python.org/3/library/functions.html#func-list) constructor **materializes** the `range(1, 13)` object into the `list` object `[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]`. In other words, we make `range(1, 13)` calculate *all* numbers at once and store them in a `list` object for further processing.

In many cases, however, it is not necessary to do that, and, in this chapter, we look at other types of "rules" in memory and how we can compose different "rules" together to implement bigger computations.

Next, we take a step back and continue with a simple example involving the familiar `numbers` list. Then, we iteratively exchange `list` objects with "rule"-like objects *without* changing the overall computation at all. As computations involving sequential data are commonly classified into three categories **map**, **filter**, or **reduce**, we do so too for our `numbers` example.

In [1]:
numbers = [7, 11, 8, 5, 3, 12, 2, 6, 9, 10, 1, 4]

## Mapping

**Mapping** refers to the idea of applying a transformation to every element in a sequence.

For example, let's square each element in `numbers` and add `1` to the squares. In essence, we apply the transformation $y := x^2 + 1$ as expressed with the `transform()` function below.

In [2]:
def transform(element):
    """Map elements to their squares plus 1."""
    return (element ** 2) + 1

With the syntax we know so far, we revert to a `for`-loop that iteratively appends the transformed elements to an initially empty `transformed_numbers` list.

In [3]:
transformed_numbers = []

for old in numbers:
    new = transform(old)
    transformed_numbers.append(new)

In [4]:
transformed_numbers

[50, 122, 65, 26, 10, 145, 5, 37, 82, 101, 2, 17]

As this kind of data processing is so common, Python provides the [map() <img height="12" style="display: inline-block" src="../static/link/to_py.png">](https://docs.python.org/3/library/functions.html#map) built-in. In its simplest usage form, it takes two arguments: A transformation `function` that takes exactly *one* positional argument and an `iterable` that provides the objects to be mapped.

We call [map() <img height="12" style="display: inline-block" src="../static/link/to_py.png">](https://docs.python.org/3/library/functions.html#map) with a reference to the `transform()` function and the `numbers` list as the arguments and store the result in the variable `transformer` to inspect it.

In [5]:
transformer = map(transform, numbers)

We might expect to get back a materialized sequence (i.e., all elements exist in memory), and a `list` object would feel the most natural because of the type of the `numbers` argument. However, `transformer` is an object of type `map`.

In [6]:
transformer

<map at 0x7f778422fc10>

In [7]:
type(transformer)

map

Like `range` objects, `map` objects generate a series of objects "on the fly" (i.e., one by one), and we use the built-in [next() <img height="12" style="display: inline-block" src="../static/link/to_py.png">](https://docs.python.org/3/library/functions.html#next) function to obtain the next object in line. So, we should think of a `map` object as a "rule" stored in memory that only knows how to calculate the next object of possibly *infinitely* many.

In [8]:
next(transformer)

50

In [9]:
next(transformer)

122

In [10]:
next(transformer)

65

It is essential to understand that by creating a `map` object with the [map() <img height="12" style="display: inline-block" src="../static/link/to_py.png">](https://docs.python.org/3/library/functions.html#map) built-in, *nothing* happens in memory except the creation of the `map` object. In particular, no second `list` object derived from `numbers` is created. Also, we may view `range` objects as a special case of `map` objects: They are constrained to generating `int` objects only, and the `iterable` argument is replaced with `start`, `stop`, and `step` arguments.

If we are sure that a `map` object generates a *finite* number of elements, we may materialize them into a `list` object with the built-in [list() <img height="12" style="display: inline-block" src="../static/link/to_py.png">](https://docs.python.org/3/library/functions.html#func-list) constructor. Below, we "pull out" the remaining `int` objects from `transformer`, which itself is derived from a *finite* `list` object.

In [11]:
list(transformer)

[26, 10, 145, 5, 37, 82, 101, 2, 17]

In summary, instead of creating an empty list first and appending it in a `for`-loop as above, we write the following one-liner and obtain an equal `transformed_numbers` list.

In [12]:
transformed_numbers = list(map(transform, numbers))

In [13]:
transformed_numbers

[50, 122, 65, 26, 10, 145, 5, 37, 82, 101, 2, 17]

## Filtering

**Filtering** refers to the idea of creating a subset of a sequence with a **boolean filter** `function` that indicates if an element should be kept (i.e., `True`) or not (i.e., `False`).

In the example, let's only keep the even elements in `numbers`. The `is_even()` function implements that as a filter.

In [14]:
def is_even(element):
    """Filter out odd numbers."""
    if element % 2 == 0:
        return True
    return False

As `element % 2 == 0` is already a boolean expression, we could shorten `is_even()` like so.

In [15]:
def is_even(element):
    """Filter out odd numbers."""
    return element % 2 == 0

As before, we first use a `for`-loop that appends the elements to be kept iteratively to an initially empty `even_numbers` list.

In [16]:
even_numbers = []

for number in transformed_numbers:
    if is_even(number):
        even_numbers.append(number)

In [17]:
even_numbers

[50, 122, 26, 10, 82, 2]

Analogously to the `map` object above, we use the [filter() <img height="12" style="display: inline-block" src="../static/link/to_py.png">](https://docs.python.org/3/library/functions.html#filter) built-in to create an object of type `filter` and assign it to `evens`.

In [18]:
evens = filter(is_even, transformed_numbers)

In [19]:
evens

<filter at 0x7f778422fd30>

In [20]:
type(evens)

filter

`evens` works like `transformer` above: With the built-in [next() <img height="12" style="display: inline-block" src="../static/link/to_py.png">](https://docs.python.org/3/library/functions.html#next) function we obtain the even numbers one by one. So, the "next" element in line is simply the next even `int` object the `filter` object encounters.

In [21]:
transformed_numbers

[50, 122, 65, 26, 10, 145, 5, 37, 82, 101, 2, 17]

In [22]:
next(evens)

50

In [23]:
next(evens)

122

In [24]:
next(evens)

26

As above, we could create a materialized `list` object with the [list() <img height="12" style="display: inline-block" src="../static/link/to_py.png">](https://docs.python.org/3/library/functions.html#func-list) constructor.

In [25]:
list(filter(is_even, transformed_numbers))

[50, 122, 26, 10, 82, 2]

We may also chain `map` and `filter` objects derived from the original `numbers` list. As the entire cell is *one* big expression consisting of nested function calls, we read it from the inside out.

In [26]:
list(
    filter(
        is_even,
        map(transform, numbers),
    )
)

[50, 122, 26, 10, 82, 2]

Using the [map() <img height="12" style="display: inline-block" src="../static/link/to_py.png">](https://docs.python.org/3/library/functions.html#map) and [filter() <img height="12" style="display: inline-block" src="../static/link/to_py.png">](https://docs.python.org/3/library/functions.html#filter) built-ins, we can quickly switch the order: Filter first and then transform the remaining elements. This variant equals the "*A simple Filter*" example in [Chapter 4 <img height="12" style="display: inline-block" src="../static/link/to_nb.png">](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/develop/04_iteration/03_content.ipynb#Example:-A-simple-Filter). On the contrary, code with `for`-loops and `if` statements is more tedious to adapt. Additionally, `map` and `filter` objects loop "at the C level" and are a lot faster because of that. Because of that, experienced Pythonistas tend to *not* use explicit `for`-loops so often.

In [27]:
list(
    map(
        transform,
        filter(is_even, numbers),
    )
)

[65, 145, 5, 37, 101, 17]

## Reducing

Lastly, **reducing** sequential data means to summarize the elements into a single statistic.

A simple example is the built-in [sum() <img height="12" style="display: inline-block" src="../static/link/to_py.png">](https://docs.python.org/3/library/functions.html#sum) function.

In [28]:
sum(
    map(
        transform,
        filter(is_even, numbers),
    )
)

370

Other straightforward examples are the built-in [min() <img height="12" style="display: inline-block" src="../static/link/to_py.png">](https://docs.python.org/3/library/functions.html#min) or [max() <img height="12" style="display: inline-block" src="../static/link/to_py.png">](https://docs.python.org/3/library/functions.html#max) functions.

In [29]:
min(map(transform, filter(is_even, numbers)))

5

In [30]:
max(map(transform, filter(is_even, numbers)))

145

[sum() <img height="12" style="display: inline-block" src="../static/link/to_py.png">](https://docs.python.org/3/library/functions.html#sum), [min() <img height="12" style="display: inline-block" src="../static/link/to_py.png">](https://docs.python.org/3/library/functions.html#min), and [max() <img height="12" style="display: inline-block" src="../static/link/to_py.png">](https://docs.python.org/3/library/functions.html#max) can be regarded as special cases.

The generic way of reducing a sequence is to apply a function of *two* arguments on a rolling horizon: Its first argument is the reduction of the elements processed so far, and the second the next element to be reduced.

For illustration, let's replicate [sum() <img height="12" style="display: inline-block" src="../static/link/to_py.png">](https://docs.python.org/3/library/functions.html#sum) as such a function, called `sum_alt()`. Its implementation only adds two numbers.

In [31]:
def sum_alt(sum_so_far, next_number):
    """Reduce a sequence by addition."""
    return sum_so_far + next_number

Further, we create a *new* `map` object derived from `numbers` ...

In [32]:
evens_transformed = map(transform, filter(is_even, numbers))

... and loop over all *but* the first element it generates. The latter is captured separately as the initial `result` with the [next() <img height="12" style="display: inline-block" src="../static/link/to_py.png">](https://docs.python.org/3/library/functions.html#next) function. We know from above that `evens_transformed` generates *six* elements. That is why we see *five* growing `result` values resembling a [cumulative sum](http://mathworld.wolfram.com/CumulativeSum.html). The first `210` is the sum of the first two elements generated by `evens_transformed`, `65` and `145`.

So, we also learn that `map` objects, and analogously `filter` objects, are *iterable* as we may loop over them.

In [33]:
result = next(evens_transformed)

for number in evens_transformed:
    result = sum_alt(result, number)
    print(result, end=" ")  # line added for didactical purposes

210 215 252 353 370 

The final `result` is the same `370` as above.

In [34]:
result

370

The [reduce() <img height="12" style="display: inline-block" src="../static/link/to_py.png">](https://docs.python.org/3/library/functools.html#functools.reduce) function in the [functools <img height="12" style="display: inline-block" src="../static/link/to_py.png">](https://docs.python.org/3/library/functools.html) module in the [standard library <img height="12" style="display: inline-block" src="../static/link/to_py.png">](https://docs.python.org/3/library/index.html) provides more convenience (and speed) replacing the `for`-loop. It takes two arguments, `function` and `iterable`, in the same way as the [map() <img height="12" style="display: inline-block" src="../static/link/to_py.png">](https://docs.python.org/3/library/functions.html#map) and [filter() <img height="12" style="display: inline-block" src="../static/link/to_py.png">](https://docs.python.org/3/library/functions.html#filter) built-ins.

[reduce() <img height="12" style="display: inline-block" src="../static/link/to_py.png">](https://docs.python.org/3/library/functools.html#functools.reduce) is **[eager <img height="12" style="display: inline-block" src="../static/link/to_wiki.png">](https://en.wikipedia.org/wiki/Eager_evaluation)** meaning that all computations implied by the contained `map` and `filter` "rules" are executed immediately, and the code cell evaluates to `370`. On the contrary, [map() <img height="12" style="display: inline-block" src="../static/link/to_py.png">](https://docs.python.org/3/library/functions.html#map) and [filter() <img height="12" style="display: inline-block" src="../static/link/to_py.png">](https://docs.python.org/3/library/functions.html#filter) create **[lazy <img height="12" style="display: inline-block" src="../static/link/to_wiki.png">](https://en.wikipedia.org/wiki/Lazy_evaluation)** `map` and `filter` objects, and we have to use the [next() <img height="12" style="display: inline-block" src="../static/link/to_py.png">](https://docs.python.org/3/library/functions.html#next) function to obtain the elements, one by one.

In [35]:
from functools import reduce

In [36]:
reduce(
    sum_alt,
    map(
        transform,
        filter(is_even, numbers),
    )
)

370

## Lambda Expressions

[map() <img height="12" style="display: inline-block" src="../static/link/to_py.png">](https://docs.python.org/3/library/functions.html#map), [filter() <img height="12" style="display: inline-block" src="../static/link/to_py.png">](https://docs.python.org/3/library/functions.html#filter), and [reduce() <img height="12" style="display: inline-block" src="../static/link/to_py.png">](https://docs.python.org/3/library/functools.html#functools.reduce) take a `function` object as their first argument, and we defined `transform()`, `is_even()`, and `sum_alt()` to be used precisely for that.

Often, such functions are used *only once* in a program. However, the primary purpose of functions is to *reuse* them. In such cases, it makes more sense to define them "anonymously" right at the position where the first argument goes.

As mentioned in [Chapter 2 <img height="12" style="display: inline-block" src="../static/link/to_nb.png">](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/develop/02_functions/00_content.ipynb#Anonymous-Functions), we use `lambda` expressions to create `function` objects *without* a name referencing them.

So, the above `sum_alt()` function could be rewritten as a `lambda` expression like so ...

In [37]:
lambda sum_so_far, next_number: sum_so_far + next_number

<function __main__.<lambda>(sum_so_far, next_number)>

... or even shorter.

In [38]:
lambda x, y: x + y

<function __main__.<lambda>(x, y)>

With the new concepts in this section, we can rewrite the entire example in just a few lines of code *without* any `for`, `if`, and `def` statements. The resulting code is concise, easy to read, quick to modify, and even faster in execution. Most importantly, it is optimized to handle big amounts of data as *no* temporary `list` objects are materialized in memory.

In [39]:
numbers = [7, 11, 8, 5, 3, 12, 2, 6, 9, 10, 1, 4]
evens = filter(lambda x: x % 2 == 0, numbers)
transformed = map(lambda x: (x ** 2) + 1, evens)
sum(transformed)

370

If `numbers` comes as a sorted sequence of whole numbers, we may use the [range() <img height="12" style="display: inline-block" src="../static/link/to_py.png">](https://docs.python.org/3/library/functions.html#func-range) built-in and get away *without* any materialized `list` object in memory at all!

In [40]:
numbers = range(1, 13)
evens = filter(lambda x: x % 2 == 0, numbers)
transformed = map(lambda x: (x ** 2) + 1, evens)
sum(transformed)

370

To additionally save the temporary variables, `numbers`, `evens`, and `transformed`, we could write the entire computation as *one* expression.

In [41]:
sum(
    map(
        lambda x: (x ** 2) + 1,
        filter(
            lambda x: x % 2 == 0,
            range(1, 13),
        )
    )
)

370

PythonTutor visualizes the differences in the number of computational steps and memory usage:
- [Version 1 <img height="12" style="display: inline-block" src="../static/link/to_py.png">](http://pythontutor.com/visualize.html#code=def%20is_even%28element%29%3A%0A%20%20%20%20if%20element%20%25%202%20%3D%3D%200%3A%0A%20%20%20%20%20%20%20%20return%20True%0A%20%20%20%20return%20False%0A%0Adef%20transform%28element%29%3A%0A%20%20%20%20return%20%28element%20**%202%29%20%2B%201%0A%0Anumbers%20%3D%20list%28range%281,%2013%29%29%0A%0Aevens%20%3D%20%5B%5D%0Afor%20number%20in%20numbers%3A%0A%20%20%20%20if%20is_even%28number%29%3A%0A%20%20%20%20%20%20%20%20evens.append%28number%29%0A%0Atransformed%20%3D%20%5B%5D%0Afor%20number%20in%20evens%3A%0A%20%20%20%20transformed.append%28transform%28number%29%29%0A%0Aresult%20%3D%20sum%28transformed%29&cumulative=false&curInstr=0&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false): With `for`-loops, `if` statements, and named functions -> **116** steps and **3** `list` objects
- [Version 2 <img height="12" style="display: inline-block" src="../static/link/to_py.png">](http://pythontutor.com/visualize.html#code=numbers%20%3D%20range%281,%2013%29%0Aevens%20%3D%20filter%28lambda%20x%3A%20x%20%25%202%20%3D%3D%200,%20numbers%29%0Atransformed%20%3D%20map%28lambda%20x%3A%20%28x%20**%202%29%20%2B%201,%20evens%29%0Aresult%20%3D%20sum%28transformed%29&cumulative=false&curInstr=0&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false): With named `map` and `filter` objects -> **58** steps and **no** `list` object
- [Version 3 <img height="12" style="display: inline-block" src="../static/link/to_py.png">](http://pythontutor.com/visualize.html#code=result%20%3D%20sum%28map%28lambda%20x%3A%20%28x%20**%202%29%20%2B%201,%20filter%28lambda%20x%3A%20x%20%25%202%20%3D%3D%200,%20range%281,%2013%29%29%29%29&cumulative=false&curInstr=0&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false): Everything in *one* expression -> **55** steps and **no** `list` object

Versions 2 and 3 are the same, except for the three additional steps required to create the temporary variables. The *major* downside of Version 1 is that, in the worst case, it may need *three times* the memory as compared to the other two versions!

An experienced Pythonista would probably go with Version 2 in a production system to keep the code readable and maintainable.

The map-filter-reduce paradigm has caught attention in recent years as it enables **[parallel computing <img height="12" style="display: inline-block" src="../static/link/to_wiki.png">](https://en.wikipedia.org/wiki/Parallel_computing)**, and this gets important when dealing with big amounts of data. The workings in the memory as shown in this section provide an idea why.