897 lines
40 KiB
Text
897 lines
40 KiB
Text
{
|
||
"cells": [
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"# An Introduction to Python and Programming"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "skip"
|
||
}
|
||
},
|
||
"source": [
|
||
"This course is set up to be a *thorough* introduction to programming in [Python](https://www.python.org/).\n",
|
||
"\n",
|
||
"It teaches the concepts behind and the syntax of the core Python language as defined by the [Python Software Foundation](https://www.python.org/psf/) in the official [language reference](https://docs.python.org/3/reference/index.html) and introduces additions to the language as distributed with the [standard library](https://docs.python.org/3/library/index.html) that come with every installation.\n",
|
||
"\n",
|
||
"Furthermore, some very popular third-party libraries like [numpy](https://www.numpy.org/), [pandas](https://pandas.pydata.org/), [matplotlib](https://matplotlib.org/), and others are portrayed."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "-"
|
||
}
|
||
},
|
||
"source": [
|
||
"<img src=\"static/logo.png\" width=\"15%\" align=\"left\">"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "skip"
|
||
}
|
||
},
|
||
"source": [
|
||
"## Prerequisites"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "skip"
|
||
}
|
||
},
|
||
"source": [
|
||
"To be suitable for *total beginners*, there are *no* formal prerequisites. It is only expected that the student has:\n",
|
||
"\n",
|
||
"- a *solid* understanding of the **English language** (i.e., usage of *technical* terms with *narrow* and *distinct* meanings),\n",
|
||
"- knowledge of **basic mathematics** from high school (i.e., addition, subtraction, multiplication, division, and a little bit of calculus and statistics),\n",
|
||
"- the ability to **think conceptually** and **reason logically** (i.e., *not* just memorizing), and\n",
|
||
"- the willingness to **invest 2-4 hours a day for a month** (cf., \"ABC\"-rule at the end)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"## Objective"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"The course's **main goal** is to **prepare** the student **for further studies** in the \"field\" of **data science**."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "skip"
|
||
}
|
||
},
|
||
"source": [
|
||
"This includes but is not limited to more advanced courses on topics such as:\n",
|
||
"- linear algebra\n",
|
||
"- statistics & econometrics\n",
|
||
"- data cleaning & wrangling\n",
|
||
"- data visualization\n",
|
||
"- data engineering (incl. SQL databases)\n",
|
||
"- data mining (incl. web scraping)\n",
|
||
"- feature generation, machine learning, & deep learning\n",
|
||
"- optimization & (meta-)heuristics\n",
|
||
"- algorithms & data structures\n",
|
||
"- quantitative finance (e.g., option valuation)\n",
|
||
"- quantitative marketing (e.g., customer segmentation)\n",
|
||
"- quantitative supply chain management (e.g., forecasting)\n",
|
||
"- management science & decision models\n",
|
||
"- backend / API / web development (to serve data products to clients)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "skip"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Why data science?"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "skip"
|
||
}
|
||
},
|
||
"source": [
|
||
"The term **[data science](https://en.wikipedia.org/wiki/Data_science)** is rather vague and does actually *not* refer to an academic discipline. Instead the term was popularized by the tech industry who also coined non-meaningful job titles such as \"[rockstar](https://www.quora.com/Why-are-engineers-called-rockstars-and-ninjas)\" or \"[ninja developers](https://www.quora.com/Why-are-engineers-called-rockstars-and-ninjas)\". Most *serious* definitions describe the field as being **multi-disciplinary** *integrating* scientific methods, algorithms, and systems thinking to extract knowledge from (structured and unstructured) data *and* also emphasize the importance of **[domain knowledge](https://en.wikipedia.org/wiki/Domain_knowledge)**.\n",
|
||
"\n",
|
||
"Recently, this integration aspect feeds back into the academic world. The [MIT](https://www.mit.edu/), for example, created the new [Stephen A. Schwarzman College of Computing](http://computing.mit.edu) for [artifical intelligence](https://en.wikipedia.org/wiki/Artificial_intelligence) (with a 1 billion dollar initial investment) where students undergo a \"bilingual\" curriculum with half the classes in quantitative and method-centric fields (like the ones mentioned above) and the other half in domains such as biology, business, chemistry, politics, (art) history, or linguistics (cf., the [official Q&As](http://computing.mit.edu/faq/) or this [NYT article](https://www.nytimes.com/2018/10/15/technology/mit-college-artificial-intelligence.html)). Their strategists see a future where programming skills are just as naturally embedded into every students' studies as are nowadays subjects like calculus, statistics, or academic writing. Then, programming literacy is not just another \"nice to have\" skill but a prerequisite (or an enabler) to understanding more advanced topics in the actual domains studied. This could make teaching easier for top-notch researchers who use a lot of programming in their day-to-day lifes answering big questions: The student and the teacher are then more likely to \"speak\" the same language."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "skip"
|
||
}
|
||
},
|
||
"source": [
|
||
"## Installation"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "skip"
|
||
}
|
||
},
|
||
"source": [
|
||
"To follow this course, a working installation of **Python 3.6** or higher is needed.\n",
|
||
"\n",
|
||
"A popular and beginner friendly way is to install the [Anaconda Distribution](https://www.anaconda.com/distribution/) that not only ships Python and the standard library but comes pre-packaged with a lot of third-party libraries from the so-called \"scientific stack\". Just go to the [download](https://www.anaconda.com/download/) page and install the latest version (i.e., *2019-07* with Python 3.7 at the time of this writing) for your operating system.\n",
|
||
"\n",
|
||
"Then, among others, you will find an entry \"Jupyter Notebook\" in your start menu like below. Click on it and a new tab in your web browser will open where you can switch between folders as you could in your computer's default file browser."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "skip"
|
||
}
|
||
},
|
||
"source": [
|
||
"<img src=\"static/anaconda.png\" width=\"40%\">"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "skip"
|
||
}
|
||
},
|
||
"source": [
|
||
"To download the course's materials as a ZIP file, open the accompanying [GitHub repository](https://github.com/webartifex/intro-to-python) in a web browser and click on the green \"Clone or download\" button on the right. Then, unpack the ZIP file into a folder of your choosing (ideally somewhere within your personal user folder so that the files show up right away)."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "skip"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Alternative Installation"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "skip"
|
||
}
|
||
},
|
||
"source": [
|
||
"Python can also be installed in a \"pure\" way as obtained from its core development team. However, this is somewhat too \"advanced\" for a beginner who would then also be responsible for setting up all the third-party libraries needed to actually view this document. Plus, many of the involveld steps must be done in a [terminal](https://en.wikipedia.org/wiki/Terminal_emulator) window, which tends to be a bit intimidating for most beginners."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "skip"
|
||
}
|
||
},
|
||
"source": [
|
||
"<img src=\"static/terminal.png\" width=\"50%\" align=\"center\">"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "skip"
|
||
}
|
||
},
|
||
"source": [
|
||
"For more \"couragous\" beginners wanting to learn how to accomplish this, here is a rough sketch of the aspects to know. First, all Python realeases are available for free on the official [download](https://www.python.org/downloads/) page for any supported operating system. Choose the one you want, then download and install it (cf., the [instruction notes](https://wiki.python.org/moin/BeginnersGuide/Download)). As this only includes core Python and the standard library, the beginner then needs to learn about the [pip](https://pip.pypa.io/en/stable/) module, which is to be used inside a terminal. With the command `python -m pip install jupyter`, all necessary third-party libraries can be installed (cf., more background [here](https://jupyter.readthedocs.io/en/latest/install.html)). However, this would be done in a *system-wide* fashion and is not recommended. Instead, the best practice is to create a so-called **virtual environment** with the [venv](https://docs.python.org/3/library/venv.html) module with which the installed third-party packages can be *isolated* on a per-project basis (the command `python -m venv env-name` creates a virtual enviroment called \"env-name\"). This tactic is employed to avoid a situation known as **[dependency hell](https://en.wikipedia.org/wiki/Dependency_hell)** Once created, the virtual environment must then be activated each time before resuming work in each terminal (with the command `source env-name/bin/activate`). While there exist convenience tools that automate parts of this (e.g., [poetry](https://poetry.eustace.io/docs/) or [virtualenvwrapper](https://virtualenvwrapper.readthedocs.io/en/latest/)), it only distracts a beginner from actually studying the Python language. Yet, it is still worthwhile to have heard about these terms and concepts as many online resources often implicitly assume the user to know about them."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"## Jupyter Notebooks"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "skip"
|
||
}
|
||
},
|
||
"source": [
|
||
"The document you are viewing is a so-called [Jupyter notebook](https://jupyter-notebook.readthedocs.io/en/stable/notebook.html), a file format introduced by the [Jupyter Project](https://jupyter.org/).\n",
|
||
"\n",
|
||
"\"Jupyter\" is an [acronym](https://en.wikipedia.org/wiki/Acronym) derived from the names of the three major programming languages **[Julia](https://julialang.org/)**, **[Python](https://www.python.org)**, and **[R](https://www.r-project.org/)**, all of which play significant roles in the world of data science. The Jupyter Project's idea is to serve as an integrating platform such that different programming languages and software packages can be used together within the same project easily.\n",
|
||
"\n",
|
||
"Furthermore, Jupyter notebooks have become a de-facto standard for communicating and exchanging results in the data science community (both in academia and business) and often provide a more intuitive alternative to terminal based ways of running Python (e.g., the default [Python interpreter](https://docs.python.org/3/tutorial/interpreter.html) as shown above or a more advanced interactive version like [IPython](https://ipython.org/)) or even a full-fledged [Integrated Development Environment](https://en.wikipedia.org/wiki/Integrated_development_environment) (e.g., the commercial [PyCharm](https://www.jetbrains.com/pycharm/) or the free [Spyder](https://github.com/spyder-ide/spyder)).\n",
|
||
"\n",
|
||
"In particular, they allow to mix plain English text with Python code cells. The plain text can be formatted using the [Markdown](https://guides.github.com/features/mastering-markdown/) language and mathematical expressions can be typeset with [LaTeX](https://www.overleaf.com/learn/latex/Free_online_introduction_to_LaTeX_%28part_1%29). Lastly, we can include pictures, plots, and even videos. Because of these features, the notebooks developed for this course come in a self-contained \"tutorial\" style that enables students to learn and review the material on their own."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "skip"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Markdown vs. Code Cells"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "skip"
|
||
}
|
||
},
|
||
"source": [
|
||
"A Jupyter notebook consists of cells that have a type associated with them. So far, only cells of type \"Markdown\" have been used, which is the default way to present (formatted) text.\n",
|
||
"\n",
|
||
"The next cell below is an example of a \"Code\" cell containing a line of actual Python code: it simply outputs the text \"Hello world\" when executed. To edit an existing code cell, just enter into it with a mouse click. You know that you are \"in\" a code cell when you see the frame of the code cell turn green.\n",
|
||
"\n",
|
||
"Besides this **edit mode** there is also a so-called **command mode** that you can reach by hitting the \"Escape\" key after entering a code cell, which turns the frame's color into blue. Using the Enter\" and \"Escape\" keys you can now switch between the two modes.\n",
|
||
"\n",
|
||
"To *execute* (or *run*) a code cell, hold the \"Control\" key and press \"Enter\". Note how you do not go to the subsequent cell. Alternatively, you can hold the \"Shift\" key and press \"Enter\", which executes the cell and places your focus on the next cell right after.\n",
|
||
"\n",
|
||
"Similarly, a Markdown cell is also in either the edit or command mode. For example, simply double-click on the text you are just reading, which takes you into edit mode. Now you could change the formatting (e.g., make a word printed in *italics* or **bold** with single or double asterisks) and then \"execute\" the cell to actually render the text as specified.\n",
|
||
"\n",
|
||
"To change a cell's type, choose either \"Code\" or \"Markdown\" in the navigation bar at the top."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 1,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"Hello world\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"print(\"Hello world\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "skip"
|
||
}
|
||
},
|
||
"source": [
|
||
"Sometimes a code cell starts with an exclamation mark `!`. Then, the Jupyter notebook behaves as if you just typed the following commands right into a terminal. The cell below asks `python` to show its version number. This is actually *not* Python code but a command in the [Shell](https://en.wikipedia.org/wiki/Shell_script) language. The `!` is useful to execute short commands without leaving a Jupyter notebook."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 2,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "-"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"Python 3.7.3\r\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"!python --version"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "skip"
|
||
}
|
||
},
|
||
"source": [
|
||
"## Programming vs. Computer Science vs. IT"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "skip"
|
||
}
|
||
},
|
||
"source": [
|
||
"For this course *programming* is \"defined\" as\n",
|
||
"- a **structured** way of **problem solving**\n",
|
||
"- by **expressing** the steps of a **computation / process**\n",
|
||
"- and thereby **documenting** the process in a formal way\n",
|
||
"\n",
|
||
"Programming is always **concrete** and based on a **particular case**.\n",
|
||
"\n",
|
||
"Programming definitely exhibits elements of an **art** or a **craft** as we will hear programmers call code \"beautiful\" or \"ugly\" or talk about the \"expressive\" power of an application."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "skip"
|
||
}
|
||
},
|
||
"source": [
|
||
"That is different from *computer science*, which is\n",
|
||
"- a field of study comparable to (applied) **mathematics** that\n",
|
||
"- asks **abstract** questions (\"Is something computable at all?\"),\n",
|
||
"- develops and analyses **algorithms** and **data structures**,\n",
|
||
"- and **proves** the **correctness** of a program\n",
|
||
"\n",
|
||
"In a sense, a computer scientist does not need to know a programming language to work. In fact, many computer scientists only know how to produce \"ugly\" code in the eyes of professional programmers."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "skip"
|
||
}
|
||
},
|
||
"source": [
|
||
"*IT* or *information technology* is a term that has many meanings to many people. Often, it has something to do with hardware and devices both of which are out of scope for programmers and computer scientists. In fact, many people from the two aforementioned fields are more than happy if their printer and internet just work as they do not know a lot more about that than non-technical people."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"## Why Python?"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "skip"
|
||
}
|
||
},
|
||
"source": [
|
||
"### What is Python?"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "skip"
|
||
}
|
||
},
|
||
"source": [
|
||
"- [Guido van Rossum](https://en.wikipedia.org/wiki/Guido_van_Rossum) (Python’s **[Benevolent Dictator for Life](https://en.wikipedia.org/wiki/Benevolent_dictator_for_life)**) was bored during a week around Christmas 1989 and started Python as a hobby project \"that would keep \\[him\\] occupied\" for some days\n",
|
||
"- the idea was to create a **general-purpose scripting** language that would allow fast **prototyping** and would **run on every operating system**\n",
|
||
"- Python grew through the 90s as van Rossum promoted it via his \"Computer Programming for Everybody\" initiative that had the **goal to encourage a basic level of coding literacy** as an equal knowledge alongside English literacy and math skills\n",
|
||
"- to become more independent from its creator the next major version **Python 2** (released in 2000; still in heavy use as of today) was **open-sourced** from the get-go which attracted a **large and global community of programmers** that **contributed** their individual knowledge and best practices in their free time to make Python even better\n",
|
||
"- **Python 3** resulted from a major overhaul of the language in 2008 taking into account the **learnings from almost two decades**, streamlining the language, and getting ready for the age of **big data**\n",
|
||
"- the language is named after the sketch comedy group [Monty Python](https://en.wikipedia.org/wiki/Monty_Python)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "skip"
|
||
}
|
||
},
|
||
"source": [
|
||
"#### Summary"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"Python is a **general-purpose** programming language that allows for **fast development**, is **easy to comprehend**, **open-source**, long established, unifies the knowledge of **hundreds of thousands of experts** around the world, runs on basically every machine, and can handle the complexities of applications involving **big data**."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "skip"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Why open-source?"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "skip"
|
||
}
|
||
},
|
||
"source": [
|
||
"Couldn't a company like Google, Facebook, or Microsoft come up with a better programming language? The following argument provides hints why this cannot really be the case.\n",
|
||
"\n",
|
||
"Wouldn't it be weird if professors and scholars of English literature and language studies dictated how we'd have to speak in day-to-day casual conversations or how authors of poesy and novels should use language constructs to achieve a certain type of mood? If you agree with that premise, it makes sense to assume that even programming languages should evolve in a \"natural\" way as users just *use* the language over time and in new and unpredictable contexts making up their own new conventions."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "skip"
|
||
}
|
||
},
|
||
"source": [
|
||
"Loose *communities* are the main building block around which open-source software is built. Someone starts a project (like Guido) and makes it free to use for anybody (e.g., on a code-sharing platform like [GitHub](https://github.com/)). People find it useful enough to solve one of their daily problems and start using it. They see how a project could be improved and provide new use cases (via the popularized concept of a \"[pull request](https://help.github.com/articles/about-pull-requests/)\"). The project grows both in lines of code and number of people using it. After a while, people start local user groups to share their same interests and meet on a regular basis (e.g., this is a big market for companies like [Meetup](https://www.meetup.com/) or non-profits like [PyData](https://pydata.org/)). Out of these local and usually monthly meetups grow yearly conferences on the country or even continental level (e.g., the original [PyCon](https://us.pycon.org/) in the US, [EuroPython](https://europython.eu/), or [PyCon.DE](https://de.pycon.org/)). The content presented at these conferences is made publicly available via GitHub and YouTube (e.g., [PyCon 2019](https://www.youtube.com/channel/UCxs2IIVXaEHHA4BtTiWZ2mQ) or [EuroPython](http://europython.tv/)) and serves as references on what people are working on and introductions to the endless number of specialized fields."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "skip"
|
||
}
|
||
},
|
||
"source": [
|
||
"While these communities are rather loose and constantly changing, smaller in-groups, often democratically organized and elected (e.g., the Python Software Foundation), take care of, for example, the development of the \"core\" Python language itself.\n",
|
||
"\n",
|
||
"Interestingly, Python is just a specification (i.e., a set of rules) as to what is allowed and what not. The current version of Python can always be lookep up in the [Python Language Reference](https://docs.python.org/3/reference/index.html). In order to make changes to that, anyone can make a so-called **[Python Enhancement Proposal](https://www.python.org/dev/peps/)**, or **PEP** for short, where it needs to be specified what exact changes are to be made and argued why that is a good thing to do. These PEPs are reviewed by the [core developers](https://devguide.python.org/coredev/) and anyone interested and are then either accepted, modified, or rejected if, for example, the change introduces internal inconsistencies. This is very similar to the **double blind peer review** established in academia. In fact, many of the contributors held or hold positions in academia, one more indicator of the high quality standards in the Python community. To learn more about PEPs, check out [PEP 1](https://www.python.org/dev/peps/pep-0001/) that describes the entire process.\n",
|
||
"\n",
|
||
"In total, no one single entity can control how the language evolves and the users' needs and ideas always feed back to the language specification via a quality controlled and \"democratic\" process."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "skip"
|
||
}
|
||
},
|
||
"source": [
|
||
"Besides being free as in **\"free beer\"**, a major benefit of open-source is that one can always **look up how something works in detail** (that is the literal meaning of *open* source). This is a huge benefit compared to commercial languages (e.g., MATLAB) since a programmer can always continue to **study best practices** or how things are done. Along this way, many **errors are uncovered** as well. Furthermore, if one runs an open-source application, one can be reasonably sure that no bad people built in a \"backdoor\". [Free software](https://en.wikipedia.org/wiki/Free_software) is consequently free of charge but brings many other freedoms with it, most notable the freedom to change the code."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "skip"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Isn't C a lot faster?"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "skip"
|
||
}
|
||
},
|
||
"source": [
|
||
"The \"weird\" thing is that the default Python implementation is actually written in the C language.\n",
|
||
"\n",
|
||
"[C](https://en.wikipedia.org/wiki/C_%28programming_language%29) and [C++](https://en.wikipedia.org/wiki/C%2B%2B) (check this [introduction](https://www.learncpp.com/)) are wide-spread and long established (i.e., since the 1970s) programming languages that are employed in many mission critical softwares (e.g., operating systems themselves, low latency databases and web servers, nuclear reactor control systems, airplanes, ...). They are fast mainly because the programmer not only needs to come up with the **business logic** but also manage the computer's memory \"manually\" (and the knowledge necessary to do that is not easy to learn).\n",
|
||
"\n",
|
||
"In contrast, Python automatically manages the memory for the programmer. This speeds up the development process a lot. So speed here is really a trade-off of application run time vs. engineering / development time and in many applications the program's run time is not that important (e.g., what if C needs 0.001 seconds in a case where Python needs 0.1 seconds to train a linear regression model?). When the requirements change and computing speed becomes an issue, the Python community offers many third-party libraries (usually also written in C) where specific problems can be solved in near-C time.\n",
|
||
"\n",
|
||
"**In a nutshell**: While it is of course true that C is a lot faster than Python when it comes to **pure computation time**, in many cases this does not really matter as the **significantly shorter development cycles** are the more important cost factor in a rapidly changing business world."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Who uses it?"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"<img src=\"static/logos.png\" width=\"70%\">"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "skip"
|
||
}
|
||
},
|
||
"source": [
|
||
"While it is usually not the best argument to quote authorative figures like the pope, we will do this in this section anyways:\n",
|
||
"\n",
|
||
"- **[Massachusetts Institute of Technology](https://www.mit.edu/)**\n",
|
||
" - teaches Python in its [introductory course](https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-0001-introduction-to-computer-science-and-programming-in-python-fall-2016/) to computer science independent of the student's major\n",
|
||
" - replaced the infamous course on the [Scheme](https://groups.csail.mit.edu/mac/projects/scheme/) language ([source](https://news.ycombinator.com/item?id=602307))\n",
|
||
"- **[Google](https://www.google.com/)**\n",
|
||
" - used the strategy **\"Python where we can, C++ where we must\"** from its early days on to stay flexible in a rapidly changing environment ([source](https://stackoverflow.com/questions/2560310/heavy-usage-of-python-at-google))\n",
|
||
" - the very first web-crawler was written in **Java and so difficult** to maintain that it was **re-written in Python** right away ([source](https://www.amazon.com/Plex-Google-Thinks-Works-Shapes/dp/1416596585/ref=sr_1_1?ie=UTF8&qid=1539101827&sr=8-1&keywords=in+the+plex))\n",
|
||
" - Python and C++, Java, and Go are the only four server-side languages to be deployed to production\n",
|
||
" - Guido van Rossom was hired by Google from 2005 to 2012 to advance the language there\n",
|
||
"- **[NASA](https://www.nasa.gov/)** open-sources a lot of its projects, many of which are written in Python and regard working with really big data ([source](https://code.nasa.gov/language/python/))\n",
|
||
"- **[Facebook](https://facebook.com/)** uses Python besides C++ and its legacy PHP (a language for building websites; the \"cool kid\" from the early 2000s)\n",
|
||
"- **[Instagram](https://instagram.com/)** operates the largest installation of the popular **web framework [Django](https://www.djangoproject.com/)** ([source](https://instagram-engineering.com/web-service-efficiency-at-instagram-with-python-4976d078e366))\n",
|
||
"- **[Spotify](https://spotify.com/)** bases its data science on Python ([source](https://labs.spotify.com/2013/03/20/how-we-use-python-at-spotify/))\n",
|
||
"- **[Netflix](https://netflix.com/)** also runs its predictive models on Python ([source](https://medium.com/netflix-techblog/python-at-netflix-86b6028b3b3e))\n",
|
||
"- **[Dropbox](https://dropbox.com/)** \"stole\" Guido van Rossom from Google to help scale the platform ([source](https://medium.com/dropbox-makers/guido-van-rossum-on-finding-his-way-e018e8b5f6b1))\n",
|
||
"- **[JPMorgan Chase](https://www.jpmorganchase.com/)** requires new employees to learn Python as part of the onboarding process starting with the 2018 intake ([source](https://www.ft.com/content/4c17d6ce-c8b2-11e8-ba8f-ee390057b8c9?segmentId=a7371401-027d-d8bf-8a7f-2a746e767d56))\n",
|
||
"- ...\n",
|
||
"- ... this list is intentionally shortened as Python is simply very popular ..."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "skip"
|
||
}
|
||
},
|
||
"source": [
|
||
"As images tell more than words, here are two plots of popular languages' \"market shares\" based on the number of questions asked on [Stack Overflow](https://stackoverflow.blog/2017/09/06/incredible-growth-python/), the most relevant platform for answering programming related questions: As of late 2017, Python surpassed [Java](https://www.java.com/en/), heavily used in big corporates, and [JavaScript](https://developer.mozilla.org/en-US/docs/Web/JavaScript), the \"language of the internet\" that does everything in web browsers, in popularity. Two blog posts from \"technical\" people explain this in more depth to the layman: [Stack Overflow](https://stackoverflow.blog/2017/09/14/python-growing-quickly/) and [DataCamp](https://www.datacamp.com/community/blog/python-scientific-computing-case)."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"<img src=\"static/growth_major_languages.png\" width=\"50%\">"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "skip"
|
||
}
|
||
},
|
||
"source": [
|
||
"As the graph below shows, neither Google's very own language **[Go](https://golang.org/)** nor **[R](https://www.r-project.org/)**, a very popular language in the niche of statistics and data science, can compete with Python's year-to-year growth."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"<img src=\"static/growth_smaller_languages.png\" width=\"50%\">"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "skip"
|
||
}
|
||
},
|
||
"source": [
|
||
"Even popular news and media outlets notice the recent popularity of Python: [Economist](https://www.economist.com/graphic-detail/2018/07/26/python-is-becoming-the-worlds-most-popular-coding-language), [Huffington Post](https://www.huffingtonpost.com/entry/why-python-is-the-best-programming-language-with-which_us_59ef8f62e4b04809c05011b9), [TechRepublic](https://www.techrepublic.com/article/why-python-is-so-popular-with-developers-3-reasons-the-language-has-exploded/), and [QZ](https://qz.com/1408660/the-rise-of-python-as-seen-through-a-decade-of-stack-overflow/)."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"## How to learn Programming"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### The ABC Rule"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "skip"
|
||
}
|
||
},
|
||
"source": [
|
||
"Simply put, **always be coding**.\n",
|
||
"\n",
|
||
"Programming is more than just writing code into a text file. It means reading through parts of documentation, blogs with best practices, and tutorials, or researching some problem on Stack Overflow while trying to implement some feature in the application at hand. Also, it means using command line tools to automate some part of the work, or manage different versions of a software simultaneously using software like **[git](https://git-scm.com/)**. In short, programming involves a lot of \"muscle memory\". This can only be built and kept up through near-daily usage.\n",
|
||
"\n",
|
||
"Further, many aspects of software architecture and best practices can only be understood after having implemented some requirement for the very first time. Coding also means \"breaking\" things to find out what makes them actually work to begin with.\n",
|
||
"\n",
|
||
"Coding is learned best by just doing it for some time on a daily or at least regular basis and not right before some task is due, just like learning a \"real\" language."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### The Maker's Schedule"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "skip"
|
||
}
|
||
},
|
||
"source": [
|
||
"[Y Combinator](https://www.ycombinator.com/)'s co-founder [Paul Graham](https://en.wikipedia.org/wiki/Paul_Graham_%28programmer%29) wrote a very popular and often cited [article](http://www.paulgraham.com/makersschedule.html) where he divides every person into belonging to one of two groups:\n",
|
||
"\n",
|
||
"- **Managers**: People that need to organize things and command others (like a \"boss\"). Their schedule is usually organized by the hour or even by 30 minute intervals.\n",
|
||
"- **Makers**: People that create things (like programmers, artists, or writers). Such people think in half days or full days.\n",
|
||
"\n",
|
||
"Have you ever wondered why so many tech nerds work all the nights and sleep during \"weird\" hours? This is mainly because many programming related tasks require a certain \"flow\" state of one's mind. This is hard to achieve when one can get interupted, even if it is only for one short question. Graham describes that only knowing that one has an appointment in three hours can cause a programmer to not get into a flow state.\n",
|
||
"\n",
|
||
"As a result, do not set aside a certain amount of time for learning something but rather plan in an **entire evening** or a **rainy Sunday** where you can work on a problem in an **open end** setting. And do not be surprised any more to hear a \"I looked at it over the weekend\" from a programmer."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Phase Iteration - The $\\pi$ Rule"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "skip"
|
||
}
|
||
},
|
||
"source": [
|
||
"When being asked the above question, most programmers will answer something that can be classified into one of two broader groups.\n",
|
||
"\n",
|
||
"**1) Toy Problem / Case Study / Prototype**: Pick some problem that you want to write a programatic solution for, break it down into smaller packages, and solve them \"backwards\".\n",
|
||
"\n",
|
||
"**2) Books / Video Tutorials / Courses**: Research the best book / blog post / video tutorial for something and work it through from start to end.\n",
|
||
"\n",
|
||
"The truth is that you need to iterate between these two phases.\n",
|
||
"\n",
|
||
"Building a prototype will always reveal issues no book or tutorial can think of before. Data is never clean as it comes. Some algorithm from a text book must be adapted to a very specific but important aspect of a case study. It is important to learn to \"ship a product\", i.e., to finish some project to the very end because only then you will have looked at all the aspects.\n",
|
||
"\n",
|
||
"The major downside of this approach is that you likely learn bad \"patterns\" as whatever you learn is overfitted to the case and you do not get the big picture or mental concepts behind a solution. This is a gap well written books can fill in (e.g., check the Python / Programming books by [Packt](https://www.packtpub.com/packt/offers/free-learning/) or [O’Reilly](https://www.oreilly.com/))."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"## Contents"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"**Part 1: Expressing Logic**\n",
|
||
"\n",
|
||
"- What is a programming language?\n",
|
||
" 1. Structure of a Program\n",
|
||
"- What kind of words exist?\n",
|
||
" 2. Variables, Expressions, Statements, & Comments\n",
|
||
" 3. Functions\n",
|
||
"- How can we form sentences from words?\n",
|
||
" 4. Conditionals\n",
|
||
" 5. Iteration"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"**Part 2: Managing Data**\n",
|
||
"\n",
|
||
"- How is data stored in memory?\n",
|
||
" 6. Numbers\n",
|
||
" 7. Text\n",
|
||
" 8. Sequences\n",
|
||
" 9. Mappings & Sets\n",
|
||
"- How can we create our own data types?\n",
|
||
" 10. Classes & Instances"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "skip"
|
||
}
|
||
},
|
||
"source": [
|
||
"## xkcd Comic"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "skip"
|
||
}
|
||
},
|
||
"source": [
|
||
"As with every good lecture, there always has to be a [xkcd](https://xkcd.com/353/) comic somewhere."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 3,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "skip"
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"import antigravity"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "skip"
|
||
}
|
||
},
|
||
"source": [
|
||
"<img src=\"static/xkcd.png\" width=\"30%\">"
|
||
]
|
||
}
|
||
],
|
||
"metadata": {
|
||
"kernelspec": {
|
||
"display_name": "Python 3",
|
||
"language": "python",
|
||
"name": "python3"
|
||
},
|
||
"language_info": {
|
||
"codemirror_mode": {
|
||
"name": "ipython",
|
||
"version": 3
|
||
},
|
||
"file_extension": ".py",
|
||
"mimetype": "text/x-python",
|
||
"name": "python",
|
||
"nbconvert_exporter": "python",
|
||
"pygments_lexer": "ipython3",
|
||
"version": "3.7.3"
|
||
},
|
||
"livereveal": {
|
||
"auto_select": "code",
|
||
"auto_select_fragment": true,
|
||
"scroll": true,
|
||
"theme": "serif"
|
||
},
|
||
"toc": {
|
||
"base_numbering": 1,
|
||
"nav_menu": {},
|
||
"number_sections": false,
|
||
"sideBar": true,
|
||
"skip_h1_title": true,
|
||
"title_cell": "Table of Contents",
|
||
"title_sidebar": "Contents",
|
||
"toc_cell": false,
|
||
"toc_position": {
|
||
"height": "calc(100% - 180px)",
|
||
"left": "10px",
|
||
"top": "150px",
|
||
"width": "384px"
|
||
},
|
||
"toc_section_display": false,
|
||
"toc_window_display": false
|
||
}
|
||
},
|
||
"nbformat": 4,
|
||
"nbformat_minor": 2
|
||
}
|