"**Note**: Click on \"*Kernel*\" > \"*Restart Kernel and Clear All Outputs*\" in [JupyterLab](https://jupyterlab.readthedocs.io/en/stable/) *before* reading this notebook to reset its output. If you cannot run this file on your machine, you may want to open it [in the cloud <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_mb.png\">](https://mybinder.org/v2/gh/webartifex/intro-to-data-science/main?urlpath=lab/tree/01_scientific_stack/00_content_numpy.ipynb)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Chapter 1: Python's Scientific Stack (Part 1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Python itself does not come with any scientific algorithms. However, over time, many third-party libraries emerged that are useful to build machine learning applications. In this context, \"third-party\" means that the libraries are *not* part of Python's standard library.\n",
"\n",
"Among the popular ones are [numpy <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_np.png\">](https://numpy.org/) (numerical computations, linear algebra), [pandas <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_pd.png\">](https://pandas.pydata.org/) (data processing), [matplotlib <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_plt.png\">](https://matplotlib.org/) (visualisations), and [scikit-learn <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_skl.png\">](https://scikit-learn.org/stable/index.html) (machine learning algorithms)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Extending Python with Third-party Packages"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Before we can import these libraries, we must ensure that they installed on our computers. If you installed Python via the Anaconda Distribution that should already be the case. Otherwise, we can use Python's **package manager** `pip` to install them manually.\n",
"\n",
"`pip` is a so-called command-line interface (CLI), meaning it is a program that is run within a terminal window. JupyterLab allows us to run such a CLI tool from within a notebook by starting a code cell with a single `%` symbol. Here, this does not mean Python's modulo operator but is just an instruction to JupyterLab that the following code is *not* Python.\n",
"\n",
"So, let's proceed by installing [numpy <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_np.png\">](https://numpy.org/) and [matplotlib <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_plt.png\">](https://matplotlib.org/)."
"After we have ensured that the third-party libraries are installed locally, we can simply go ahead with the `import` statement. All the libraries are commonly imported with shorter prefixes for convenient use later on."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import matplotlib.pyplot as plt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's see how the data type provided by these scientific libraries differ from Python's built-in ones."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Vectors and Matrices with Numpy"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As an example, let's start by creating a `list` object."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[1, 2, 3, 4, 5]"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"vector = [1, 2, 3, 4, 5]\n",
"\n",
"vector"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We call the `list` object by the name `vector` as that is what the data is supposed to mean conceptually. As we remember from our linear algebra courses, vectors should implement scalar-multiplication. So, the following code cell should result in `[3, 6, 9, 12, 15]` as the answer. Surprisingly, the result is a new `list` with all the elements in `vector` repeated three times. That operation is called **concatenation** and is an example of a concept called **operator overloading**. That means that an operator, like `*` in the example, may exhibit a different behavior depending on the data type of its operands."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5]"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"3 * vector"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[numpy <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_np.png\">](https://numpy.org/), among others, provides a data type called an **n-dimensional array**. This may sound fancy at first but when used with only 1 or 2 dimensions, it basically represents vectors and matrices as we know them from linear algebra. Additionally, arrays allow for much faster computations as they are implemented in the very efficient [C language <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_wiki.png\">](https://en.wikipedia.org/wiki/C_%28programming_language%29) and optimized for numerical operations."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To create an array, we use the [np.array() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_np.png\">](https://docs.scipy.org/doc/numpy/reference/generated/numpy.array.html#numpy-array) constructor and provide it a `list` of values."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([1, 2, 3, 4, 5])"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"v1 = np.array([1, 2, 3, 4, 5])\n",
"\n",
"v1"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"numpy.ndarray"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"type(v1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The vector `v1` can now be multiplied with a scalar yielding a result meaningful in the context of linear algebra."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([ 3, 6, 9, 12, 15])"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"v2 = 3 * v1\n",
"\n",
"v2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To model a matrix, we use a `list` of (row) `list`s of values. Note how the output below the cell contains *two* levels of brackets `[` and `]`."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 1, 2, 3, 4, 5],\n",
" [ 6, 7, 8, 9, 10]])"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"m1 = np.array([\n",
" [1, 2, 3, 4, 5],\n",
" [6, 7, 8, 9, 10],\n",
"])\n",
"\n",
"m1"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we can use the [np.dot() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_np.png\">](https://numpy.org/doc/stable/reference/generated/numpy.dot.html#numpy.dot) function to multiply a matrix with a vector to obtain a new vector ..."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([ 55, 130])"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"v3 = np.dot(m1, v1)\n",
"\n",
"v3"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"... or simply transpose it by accessing its [.T <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_np.png\">](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.T.html#numpy.ndarray.T) attribute."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 1, 6],\n",
" [ 2, 7],\n",
" [ 3, 8],\n",
" [ 4, 9],\n",
" [ 5, 10]])"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"m1.T"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The rules from maths still apply and it makes a difference if a vector is multiplied from the left or the right by a matrix. The following operation will fail."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"ename": "ValueError",
"evalue": "shapes (5,) and (2,5) not aligned: 5 (dim 0) != 2 (dim 0)",
"\u001b[0;31mValueError\u001b[0m: shapes (5,) and (2,5) not aligned: 5 (dim 0) != 2 (dim 0)"
]
}
],
"source": [
"np.dot(v1, m1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Indexing & Slicing"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In order to retrieve only a subset of an array's data, we index or slice into it. This is similar to how we index or slice into `list` objects, in particular, if we deal with one-dimensional arrays like `v1`, `v2`, or `v3`."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([1, 2, 3, 4, 5])"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"v1"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Indexing may be done with either positive (`0`-based) indexes from the left or negative (`1`-based) indexes from the right.\n",
"\n",
"Here, we obtain the first and the last element in `v1`."
"Slicing uses the same `:` notation as with `list`s taking *start*, *stop*, and *step* values."
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([2, 3, 4])"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"v1[1:-1]"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([3, 4, 5])"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"v1[2:]"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([1, 2, 3, 4])"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"v1[:-1]"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([2, 4])"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"v1[1:-1:2]"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([1, 3, 5])"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"v1[::2]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Indexing and slicing become slightly more complicated when working with higher dimensional arrays like `m1`. In principle, we must provide either an index or a slice for *each* dimension."
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 1, 2, 3, 4, 5],\n",
" [ 6, 7, 8, 9, 10]])"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"m1"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For example, to slice out the first row of the matrix, we write `[0, :]`. The `0` implies taking elements in *only* the *first* row and the `:` means taking elements across *all* columns."
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([1, 2, 3, 4, 5])"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"m1[0, :]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For sure, we could also *not* leave out the *start* and *stop* values along the second dimension and obtain the same result. But that would not only be unneccessary but also communicate a different meaning, namely to \"take elements from the first through the fifth columns\" instead of \"take elements from all columns.\""
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([1, 2, 3, 4, 5])"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"m1[0, 0:5]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Whenever we take *all* elements in *later* dimensions (i.e., those more to the right), we may also drop the index or slice instead and keep only the *earlier* ones. However, the previous style is a bit more explicit in that we are working with a two-dimensional array."
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([1, 2, 3, 4, 5])"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"m1[0]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As another example, let's slice out elements across *all* rows but in only the *second* column. Colloquially, we can say that we \"slice out the second column\" from the matrix."
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([2, 7])"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"m1[:, 1]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If we wanted to slice out a smaller matrix from a larger one, we slice along both dimensions. Above, we have only sliced along one dimension while indexing into the other.\n",
"\n",
"For example, to obtain a 2x2 square matrix consisting of the two left-most columns in `m1`, we can write the following."
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[1, 2],\n",
" [6, 7]])"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"m1[:2, :2]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Similarly, to slice out a 2x2 matrix consisting of the two right-most columns, we write the following."
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 4, 5],\n",
" [ 9, 10]])"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"m1[-2:, -2:]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To access individual elements as scalars, we combine two indexes along both dimensions.\n",
"\n",
"For example, to access the element in the lower-left corner, we write the following."
"Both, the vectors `v1`, `v2`, and `v3`, and the matrix `m1` have the *same* **data type** from a technical point of view, namely `np.ndarray`."
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"numpy.ndarray"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"type(v1)"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"numpy.ndarray"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"type(m1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"So, how can we tell that they have different dimensions without looking at how they are displayed below a code cell?\n",
"\n",
"The `np.ndarray` type comes with a couple of **properties** that we can access via the dot notation. Examples are `.ndim` and `.shape`.\n",
"\n",
"While `.ndim` simply tells us how many dimensions an array has, ..."
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"v1.ndim"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"2"
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"m1.ndim"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"... `.shape` reveals the structure of the elements. The one-element tuple `(5,)` below says that there is one dimension along which there are five elements ..."
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(5,)"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"v1.shape"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"... whereas the two-element tuple `(2, 5)` indicates that the array's elements are placed along two dimensions spanning a grid of two elements in one and five elements into the other dimension. We know such notations already from our linear algebra courses where we would call `m1` a 2x5 matrix."
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(2, 5)"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"m1.shape"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There is a relationship behind the dimensionality of an array we are slicing and indexing and what we get back as a result: Whenever we index into an array along some dimension, the result will have one dimension less than the original.\n",
"\n",
"For example, if we start with a two-dimensional array like `m1` and slice along both dimensions, the result will also have two dimensions, even if it holds only one element."
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 1, 2, 3, 4, 5],\n",
" [ 6, 7, 8, 9, 10]])"
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"m1"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[1]])"
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"m1[:1, :1] # Note the double brackets"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If, on the contrary, we slice only along one of the two dimensions and index into the other, the result is a one-dimensional array. So, both of the below slices have the *same* properties and we cannot tell if one of them was (part of) a row or column in `m1`."
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([6, 7])"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"m1[1, :2]"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([ 5, 10])"
]
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"m1[:, -1]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Finally, indexing along both dimensions gives us back a scalar value."
"[numpy <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_np.png\">](https://numpy.org/) provides various **constructors** (i.e., functions) to create all kinds of arrays.\n",
"\n",
"For example, [np.linspace() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_np.png\">](https://numpy.org/doc/stable/reference/generated/numpy.linspace.html#numpy.linspace) creates an array of equidistant numbers, often used to model the values on a function's x-axis. [np.pi <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_np.png\">](https://numpy.org/doc/stable/reference/constants.html#numpy.pi) is an alias for Python's built-in `math.pi`. The cell below creates `100` coordinate points arranged between $-3\\pi$ and $+3\\pi$."
"Besides constructors, [numpy <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_np.png\">](https://numpy.org/) provides all kinds of **vectorized** functions. The concept of **vectorization** means that a function is executed once for *each* element in an array. Vectorized functions are one major benefit of using [numpy <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_np.png\">](https://numpy.org/): Not only are they optimized heavily on the C level (i.e., \"behind the scenes\") but also allow us to avoid writing explicit `for`-loops that are for some technical reasons considered \"slow\" in Python.\n",
"\n",
"For example, [np.sin() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_np.png\">](https://numpy.org/doc/stable/reference/generated/numpy.sin.html#numpy.sin) calculates the sine for each element in the `x` array. The resulting `y` array has thus the same `.shape` as `x`."
"Next, let's use [matplotlib <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_plt.png\">](https://matplotlib.org/)'s [plt.plot() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_plt.png\">](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.plot.html#matplotlib.pyplot.plot) function to visualize the sine curve between $-3\\pi$ and $+3\\pi$."
"If we use only `10` equidistant points for `x`, the resulting curve does not look like a curve. There is a trade-off behind this: To make nice plots, we need a rather high granularity of data points. That, however, causes more computations in the background making the plotting slower. With toy data like the one here this is not an issue. For real life data that may be different."
"[numpy <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_np.png\">](https://numpy.org/) provides further constructors. Most notably are the ones in the [np.random <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_np.png\">](https://numpy.org/doc/stable/reference/random/index.html#module-numpy.random) module that mirrors the [random <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/random.html) module in the [standard library <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_py.png\">](https://docs.python.org/3/library/index.html): For example, [np.random.normal() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_np.png\">](https://numpy.org/doc/stable/reference/random/generated/numpy.random.normal.html#numpy.random.normal) and [np.random.gamma() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_np.png\">](https://numpy.org/doc/stable/reference/random/generated/numpy.random.gamma.html#numpy.random.gamma) return arrays whose elements follow the normal and gamma probability distributions."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let us quickly generate some random data points and draw a scatter plot with [matplotlib <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_plt.png\">](https://matplotlib.org/)'s [plt.scatter() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_plt.png\">](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.scatter.html#matplotlib.pyplot.scatter) function."
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.collections.PathCollection at 0x7f008b235d00>"