Merge branch 'initial-setup' into develop

This commit is contained in:
Alexander Hess 2020-10-12 15:29:55 +02:00
commit 6891f82c8d
Signed by: alexander
GPG key ID: 344EA5AB10D868E0
16 changed files with 1625 additions and 3 deletions

1
.gitignore vendored
View file

@ -1,2 +1,3 @@
**/.ipynb_checkpoints/
.python-version .python-version
.venv/ .venv/

22
.pre-commit-config.yaml Normal file
View file

@ -0,0 +1,22 @@
default_stages: [commit]
fail_fast: true
repos:
- repo: local
hooks:
- id: fix-branch-references
name: Check for wrong branch references
entry: poetry run nox -s fix-branch-references --
language: system
stages: [commit, merge-commit]
types: [text]
# Enable hooks provided by the pre-commit project to
# enforce rules that local tools could not that easily.
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v3.2.0
hooks:
- id: check-added-large-files
args: [--maxkb=250]
- id: check-merge-conflict
- id: no-commit-to-branch
args: [--branch, main]
- id: trailing-whitespace

178
README.md
View file

@ -5,3 +5,181 @@ in programming with **[Python <img height="12" style="display: inline-block" src
The **main goal** is to **prepare** students The **main goal** is to **prepare** students
for **further studies** in the "field" of **data science**. for **further studies** in the "field" of **data science**.
### Prerequisites
To be suitable for *beginners*, there are *no* formal prerequisites.
It is only expected that the student has:
- a *solid* understanding of the **English** language,
- knowledge of **basic mathematics** from high school,
- the ability to **think conceptually** and **reason logically**, and
- the willingness to **invest** around **90-120 hours** on this course.
## Getting started
If you are a total beginner,
follow the instructions in the "Installation" section next.
If you are familiar with
the [git](https://git-scm.com/)
and [poetry](https://python-poetry.org/docs/) command-line tools,
you may want to look at the "Alternative Installation" section further below.
### Installation
To follow this course, an installation of **Python 3.8** or higher is expected.
A popular and beginner friendly way is
to install the [Anaconda Distribution](https://www.anaconda.com/products/individual)
that not only ships Python itself
but also comes pre-packaged with a lot of third-party libraries.
<img src="static/anaconda_download.png" width="50%">
Scroll down to the [download](https://www.anaconda.com/products/individual#Downloads) section
and install the latest version for your operating system
(i.e., *2020-07* with Python 3.8 at the time of this writing).
After installation,
you find an entry "[Anaconda Navigator](https://docs.anaconda.com/anaconda/navigator/)"
in your start menu.
Click on it.
<img src="static/anaconda_start_menu.png" width="50%">
A window opens giving you several options to start various applications.
In the beginning, we will work mostly with [JupyterLab](https://jupyterlab.readthedocs.io/en/stable/).
Click on "Launch".
<img src="static/anaconda_navigator.png" width="50%">
A new tab in your web browser opens:
The website is "localhost" and some number (e.g., 8888).
This is the [JupyterLab](https://jupyterlab.readthedocs.io/en/stable/) application
that is used to display the course materials.
On the left, you see the files and folders on your computer.
This file browser works like any other.
In the center, you see several options to launch (i.e., "create") new files.
<img src="static/jupyter_lab.png" width="50%">
To check if your Python installation works,
double-click on the "Python 3" tile under the "Notebook" section.
That opens a new [Jupyter notebook](https://jupyter-notebook.readthedocs.io/en/stable/)
named "Untitled.ipynb".
<img src="static/jupyter_notebook_blank.png" width="50%">
Enter some basic Python in the **code cell**, for example, `1 + 2`.
Then, press the **Enter** key *while* holding down the **Control** key
(if that does not work, try with the **Shift** key)
to **execute** the snippet.
The result of the calculation, `3` in the example, shows up below the cell.
<img src="static/jupyter_notebook_example.png" width="50%">
After setting up Python,
click on the green "Code" button on the top right on this website
to download the course materials.
As a beginner, choosing "Download ZIP" is likely the easiest option.
Then, unpack the ZIP file into a folder of your choice,
ideally somewhere within your personal user folder
so that the files show up right away in [JupyterLab](https://jupyterlab.readthedocs.io/en/stable/).
<img src="static/repo_download.png" width="50%">
### Alternative Installation (for Instructors)
Python can also be installed in a "pure" way
obtained directly from its core development team [here](https://www.python.org/downloads/).
Then, it comes *without* any third-party packages,
which is *not* a problem at all.
Managing third-party packages can be automated to a large degree,
for example, with tools such as [poetry](https://python-poetry.org/docs/).
However, this may be too "advanced" for a beginner
as it involves working with a [command-line interface <img height="12" style="display: inline-block" src="static/link/to_wiki.png">](https://en.wikipedia.org/wiki/Command-line_interface) (CLI),
also called a **terminal**,
which looks like the one below.
It is used *without* a mouse by typing commands into it.
The following instructions assume that
[git](https://git-scm.com/), [poetry](https://python-poetry.org/docs/),
and [pyenv](https://github.com/pyenv/pyenv) are installed.
<img src="static/cli_install.png" width="50%" align="center">
The screeshot above shows how this project can be set up in an alternative way
with the [zsh](https://en.wikipedia.org/wiki/Z_shell) CLI.
First, the [git](https://git-scm.com/) tool is used
to **clone** the course materials as a **repository**
into a new folder called "*intro-to-python*"
that lives under a "*repos*" folder.
- `git clone https://github.com/webartifex/intro-to-python.git`
The `cd` command is used to "change directories".
In the screenshot, the [pyenv](https://github.com/pyenv/pyenv) tool is used
to set the project's Python version.
[pyenv](https://github.com/pyenv/pyenv)'s purpose is
to manage *many* parallel Python installations on the same computer.
It is highly recommended for professional users;
however, any other way of installing Python works as well.
- `pyenv local ...`
On the contrary, the [poetry](https://python-poetry.org/docs/) tool is used
to manage third-party packages within the *same* Python installation
and, more importantly, on a per-project basis.
So, for example,
whereas "Project A" may depend on [numpy](https://numpy.org/) *v1.19*
from June 2020 be installed,
"Project B" may use *v1.14* from January 2018 instead
(cf., numpy's [release history](https://pypi.org/project/numpy/#history)).
To achieve this per-project **isolation**,
[poetry](https://python-poetry.org/docs/) uses so-called **virtual environments**
behind the scenes.
While one could do that manually,
for example, by using Python's built-in
[venv <img height="12" style="display: inline-block" src="static/link/to_py.png">](https://docs.python.org/3/library/venv.html) module,
it is more convenient and reliable to have [poetry](https://python-poetry.org/docs/)
automate this.
The following *one* command not only
creates a new virtual environment (manually: `python -m venv venv`)
and *activates* it (manually: `source venv/bin/activate`),
it also installs the versions of the project's third-party dependencies
as specified in the [poetry.lock](poetry.lock) file
(manually: `python -m pip install -r requirements.txt`
if a [requirements.txt](https://docs.python.org/3/tutorial/venv.html#managing-packages-with-pip)
file is used;
the `python -m` part is often left out [but should not be](https://snarky.ca/why-you-should-use-python-m-pip/)):
- `poetry install`
[poetry](https://python-poetry.org/docs/) is also used
to execute commands in the project's (virtual) environment.
The command is then prefixed with `poetry run ...`.
For example, to do the equivalent of clicking "Launch" in the Anaconda Navigator:
- `poetry run jupyter lab`
This opens a new tab in your web browser just as above.
The command-line interface stays open in the background,
like in the screenshot below,
and prints log messages as we work in [JupyterLab](https://jupyterlab.readthedocs.io/en/stable/).
<img src="static/cli_jupyter_lab.png" width="50%" align="center">
## About the Author
Alexander Hess is a PhD student
at the Chair of Logistics Management at [WHU - Otto Beisheim School of Management](https://www.whu.edu)
where he conducts research on urban delivery platforms
and teaches coding courses based on Python in the BSc and MBA programs.
Connect him on [LinkedIn](https://www.linkedin.com/in/webartifex).

140
noxfile.py Normal file
View file

@ -0,0 +1,140 @@
"""Configure nox as the task runner.
Nox provides the following tasks:
- "init-project": install the pre-commit hooks
- "fix-branch-references": adjusts links with git branch references in
various files (e.g., Mardown or notebooks)
"""
import contextlib
import glob
import os
import re
import shutil
import subprocess
import tempfile
import nox
REPOSITORY = "webartifex/intro-to-python"
# Use a unified .cache/ folder for all develop tools.
nox.options.envdir = ".cache/nox"
# All tools except git and poetry are project dependencies.
# Avoid accidental successes if the environment is not set up properly.
nox.options.error_on_external_run = True
@nox.session(name="init-project", venv_backend="none")
def init_project(session):
"""Install the pre-commit hooks."""
for type_ in (
"pre-commit",
"pre-merge-commit",
):
session.run("poetry", "run", "pre-commit", "install", f"--hook-type={type_}")
@nox.session(name="fix-branch-references", venv_backend="none")
def fix_branch_references(_session):
"""Change git branch references.
Intended to be run as a pre-commit hook.
Many files in the project (e.g., README.md) contain links to resources on
github.com, nbviewer.jupyter.org, or mybinder.org that contain git branch
labels.
This task rewrites branch labels into either "main" or "develop".
"""
# Glob patterns that expand into the files whose links are re-written.
paths = ["*.md", "**/*.ipynb"]
branch = (
subprocess.check_output(
("git", "rev-parse", "--abbrev-ref", "HEAD"),
)
.decode()
.strip()
)
# If the current branch is only temporary and will be merged into "main", ...
if branch.startswith("release-") or branch.startswith("hotfix-"):
branch = "main"
# If the branch is not "main", we assume it is a feature branch.
elif branch != "main":
branch = "develop"
rewrites = [
{
"name": "github",
"pattern": re.compile(
fr"((((http)|(https))://github\.com/{REPOSITORY}/((blob)|(tree))/)([\w-]+)/)"
),
"replacement": fr"\2{branch}/",
},
{
"name": "nbviewer",
"pattern": re.compile(
fr"((((http)|(https))://nbviewer\.jupyter\.org/github/{REPOSITORY}/((blob)|(tree))/)([\w-]+)/)",
),
"replacement": fr"\2{branch}/",
},
{
"name": "mybinder",
"pattern": re.compile(
fr"((((http)|(https))://mybinder\.org/v2/gh/{REPOSITORY}/)([\w-]+)\?)",
),
"replacement": fr"\2{branch}?",
},
]
for expanded in _expand(*paths):
with _line_by_line_replace(expanded) as (old_file, new_file):
for line in old_file:
for rewrite in rewrites:
line = re.sub(rewrite["pattern"], rewrite["replacement"], line)
new_file.write(line)
def _expand(*patterns):
"""Expand glob patterns into paths.
Args:
*patterns: the patterns to be expanded
Yields:
path: a single expanded path
"""
for pattern in patterns:
yield from glob.glob(pattern.strip())
@contextlib.contextmanager
def _line_by_line_replace(path):
"""Replace/change the lines in a file one by one.
This generator function yields two file handles, one to the current file
(i.e., `old_file`) and one to its replacement (i.e., `new_file`).
Usage: loop over the lines in `old_file` and write the files to be kept
to `new_file`. Files not written to `new_file` are removed!
Args:
path: the file whose lines are to be replaced
Yields:
old_file, new_file: handles to a file and its replacement
"""
file_handle, new_file_path = tempfile.mkstemp()
with os.fdopen(file_handle, "w") as new_file:
with open(path) as old_file:
yield old_file, new_file
shutil.copymode(path, new_file_path)
os.remove(path)
shutil.move(new_file_path, path)

1282
poetry.lock generated

File diff suppressed because it is too large Load diff

View file

@ -13,4 +13,9 @@ license = "MIT"
[tool.poetry.dependencies] [tool.poetry.dependencies]
python = "^3.8" python = "^3.8"
jupyterlab = "^2.2.8"
[tool.poetry.dev-dependencies] [tool.poetry.dev-dependencies]
# Task runners
nox = "^2020.8.22"
pre-commit = "^2.7.1"

Binary file not shown.

After

Width:  |  Height:  |  Size: 79 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 74 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 702 KiB

BIN
static/cli_install.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 60 KiB

BIN
static/cli_jupyter_lab.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 109 KiB

BIN
static/jupyter_lab.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 52 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 43 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 44 KiB

BIN
static/link/to_wiki.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 503 B

BIN
static/repo_download.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 58 KiB