"It teaches the concepts behind and the syntax of the core Python language as defined by the [Python Software Foundation](https://www.python.org/psf/) in the official [language reference](https://docs.python.org/3/reference/index.html). Furthermore, it introduces commonly used functionalities from the [standard library](https://docs.python.org/3/library/index.html) and popular third-party libraries like [numpy](https://www.numpy.org/), [pandas](https://pandas.pydata.org/), [matplotlib](https://matplotlib.org/), and others."
"The term **[data science](https://en.wikipedia.org/wiki/Data_science)** is rather vague and does *not* refer to an academic discipline. Instead, the term was popularized by the tech industry, who also coined non-meaningful job titles such as \"[rockstar](https://www.quora.com/Why-are-engineers-called-rockstars-and-ninjas)\" or \"[ninja developers](https://www.quora.com/Why-are-engineers-called-rockstars-and-ninjas).\" Most *serious* definitions describe the field as being **multi-disciplinary** *integrating* scientific methods, algorithms, and systems thinking to extract knowledge from (structured and unstructured) data *and* also emphasize the importance of **[domain knowledge](https://en.wikipedia.org/wiki/Domain_knowledge)**.\n",
"Recently, this integration aspect feeds back into the academic world. The [MIT](https://www.mit.edu/), for example, created the new [Stephen A. Schwarzman College of Computing](http://computing.mit.edu) for [artificial intelligence](https://en.wikipedia.org/wiki/Artificial_intelligence) (with a 1 billion dollar initial investment) where students undergo a \"bilingual\" curriculum with half the classes in quantitative and method-centric fields (like the ones mentioned above) and the other half in domains such as biology, business, chemistry, politics, (art) history, or linguistics (cf., the [official Q&As](http://computing.mit.edu/faq/) or this [NYT article](https://www.nytimes.com/2018/10/15/technology/mit-college-artificial-intelligence.html)). Their strategists see a future where programming skills are just as naturally embedded into every students' curricula as are nowadays subjects like calculus, statistics, or academic writing. Then, programming literacy is not just another \"nice to have\" skill but a prerequisite, or an enabler, to understanding more advanced topics in the actual domains studied. Top-notch researchers who use programming in their day-to-day lives could then teach students more efficiently in their \"language.\""
"A popular and beginner-friendly way is to install the [Anaconda Distribution](https://www.anaconda.com/distribution/) that not only ships Python and the standard library but comes pre-packaged with a lot of third-party libraries from the so-called \"scientific stack.\" Just go to the [download](https://www.anaconda.com/download/) page and install the latest version (i.e., *2019-10* with Python 3.7 at the time of this writing) for your operating system.\n",
"A new tab in your web browser opens with the website being \"localhost\" and some number (e.g., 8888). This is the [JupyterLab](https://jupyterlab.readthedocs.io/en/stable/) application that is used to display and run the Jupyter notebooks mentioned above. On the left, you see the files and folders in your local user folder. This file browser works like any other. In the center, you have several options to launch a new notebook file."
"Next, to download the materials accompanying this book as a ZIP file, open this [GitHub repository](https://github.com/webartifex/intro-to-python) in a web browser, and click on the green \"Clone or download\" button on the top right. Then, unpack the ZIP file into a folder of your choosing, ideally somewhere within your personal user folder so that the files show up right away in JupyterLab."
"The document you are viewing is a so-called [Jupyter notebook](https://jupyter-notebook.readthedocs.io/en/stable/notebook.html), a file format introduced by the [Jupyter Project](https://jupyter.org/).\n",
"\n",
"\"Jupyter\" is an [acronym](https://en.wikipedia.org/wiki/Acronym) derived from the names of the three major programming languages **[Julia](https://julialang.org/)**, **[Python](https://www.python.org)**, and **[R](https://www.r-project.org/)**, all of which play significant roles in the world of data science. The Jupyter Project's idea is to serve as an integrating platform such that different programming languages and software packages can be used together within the same project easily.\n",
"Furthermore, Jupyter notebooks have become a de-facto standard for communicating and exchanging results in the data science community (both in academia and business) and often provide a more intuitive alternative to terminal-based ways of running Python (e.g., the default [Python interpreter](https://docs.python.org/3/tutorial/interpreter.html) as shown above or a more advanced interactive version like [IPython](https://ipython.org/)) or even a full-fledged [Integrated Development Environment](https://en.wikipedia.org/wiki/Integrated_development_environment) (e.g., the commercial [PyCharm](https://www.jetbrains.com/pycharm/) or the free [Spyder](https://github.com/spyder-ide/spyder)).\n",
"In particular, they allow mixing plain English text with Python code cells. The plain text can be formatted using the [Markdown](https://guides.github.com/features/mastering-markdown/) language, and mathematical expressions can be typeset with [LaTeX](https://www.overleaf.com/learn/latex/Free_online_introduction_to_LaTeX_%28part_1%29). Lastly, we can include pictures, plots, and even videos. Because of these features, the notebooks developed for this book come in a self-contained \"tutorial\" style that enables students to learn and review the material on their own."
"A Jupyter notebook consists of cells that have a type associated with them. So far, only cells of type \"Markdown\" have been used, which is the default way to present formatted text.\n",
"The next cell below is an example of a \"Code\" cell containing a line of actual Python code: it merely outputs the text \"Hello world\" when executed. To edit an existing code cell, enter into it with a mouse click. You know that you are \"in\" a code cell when you see the frame of the code cell turn green.\n",
"Besides this **edit mode**, there is also a so-called **command mode** that you can reach by hitting the \"Escape\" key after entering a code cell, which turns the frame's color blue. Using the \"Enter\" and \"Escape\" keys, you can now switch between the two modes.\n",
"To *execute*, or \"*run*,\" a code cell, hold the \"Control\" key and press \"Enter.\" Note how you do not go to the subsequent cell. Alternatively, you can hold the \"Shift\" key and press \"Enter,\" which executes the cell and places your focus on the next cell right after.\n",
"Similarly, a Markdown cell is also in either edit or command mode. For example, double-click on the text you are just reading, which takes you into edit mode. Now you could change the formatting (e.g., make a word printed in *italics* or **bold** with single or double asterisks) and then \"execute\" the cell to render the text as specified.\n",
"Sometimes a code cell starts with an exclamation mark `!`. Then, the Jupyter notebook behaves as if the following command were typed directly into a terminal. The cell below asks `python` to show its version number and is *not* Python code but a command in the [Shell](https://en.wikipedia.org/wiki/Shell_script) language. The `!` is useful to execute short commands without leaving a Jupyter notebook."
"It exhibits elements of a form of **art** or a **craft** as we hear programmers call code \"beautiful\" or \"ugly\" or talk about the \"expressive\" power of an application."
"In a sense, a computer scientist does not need to know a programming language to work, and many computer scientists only know how to produce \"ugly\"-looking code in the eyes of professional programmers."
"*IT* or *information technology* is a term that has many meanings to many people. Often, it has something to do with hardware or physical devices, both of which are out of scope for programmers and computer scientists. Many computer scientists and programmers are more than happy if their printer and internet connection work as they often do not know a lot more about that than non-technical people."
"Here is a brief history of and some background on Python (cf., also this [TechRepublic article](https://www.techrepublic.com/article/python-is-eating-the-world-how-one-developers-side-project-became-the-hottest-programming-language-on-the-planet/) for a more elaborate story):\n",
"- [Guido van Rossum](https://en.wikipedia.org/wiki/Guido_van_Rossum) (Python’s **[Benevolent Dictator for Life](https://en.wikipedia.org/wiki/Benevolent_dictator_for_life)**) was bored during a week around Christmas 1989 and started Python as a hobby project \"that would keep \\[him\\] occupied\" for some days\n",
"- the idea was to create a **general-purpose scripting** language that would allow fast **prototyping** and would **run on every operating system**\n",
"- Python grew through the 90s as van Rossum promoted it via his \"Computer Programming for Everybody\" initiative that had the **goal to encourage a basic level of coding literacy** as an equal knowledge alongside English literacy and math skills\n",
"- to become more independent from its creator the next major version **Python 2** (released in 2000; still in heavy use as of today) was **open-sourced** from the get-go which attracted a **large and global community of programmers** that **contributed** their expertise and best practices in their free time to make Python even better\n",
"- **Python 3** resulted from a significant overhaul of the language in 2008 taking into account the **learnings from almost two decades**, streamlining the language, and getting ready for the age of **big data**\n",
"Python is a **general-purpose** programming language that allows for **fast development**, is **easy to read**, **open-source**, long-established, unifies the knowledge of **hundreds of thousands of experts** around the world, runs on basically every machine, and can handle the complexities of applications involving **big data**."
"Couldn't a company like Google, Facebook, or Microsoft come up with a better programming language? The following argument provides hints on why this cannot be the case.\n",
"Wouldn't it be weird if professors and scholars of English literature and language studies dictated how we'd have to speak in day-to-day casual conversations or how authors of poesy and novels should use language constructs to achieve a particular type of mood? If you agree with that premise, it makes sense to assume that even programming languages should evolve in a \"natural\" way as users *use* the language over time and in new and unpredictable contexts creating new conventions."
"Loose *communities* are the primary building block around which open-source software projects are built. Someone, like Guido, starts a project and makes it free to use for anybody (e.g., on a code-sharing platform like [GitHub](https://github.com/)). People find it useful enough to solve one of their daily problems and start using it. They see how a project could be improved and provide new use cases (via the popularized concept of a \"[pull request](https://help.github.com/articles/about-pull-requests/)\"). The project grows both in lines of code and people using it. After a while, people start local user groups to share their same interests and meet regularly (e.g., this is a big market for companies like [Meetup](https://www.meetup.com/) or non-profits like [PyData](https://pydata.org/)). Out of these local and usually monthly meetups grow yearly conferences on the country or even continental level (e.g., the original [PyCon](https://us.pycon.org/) in the US, [EuroPython](https://europython.eu/), or [PyCon.DE](https://de.pycon.org/)). The content presented at these conferences is made publicly available via GitHub and YouTube (e.g., [PyCon 2019](https://www.youtube.com/channel/UCxs2IIVXaEHHA4BtTiWZ2mQ) or [EuroPython](http://europython.tv/)) and serves as references on what people are working on and introductions to the endless number of specialized fields."
"While these communities are somewhat loose and continuously changing, smaller in-groups, often democratically organized and elected (e.g., the [Python Software Foundation](https://www.python.org/psf/)), take care of, for example, the development of the \"core\" Python language itself.\n",
"Interestingly, Python is just a specification (i.e., a set of rules) as to what is allowed and what not. The current version of Python can always be looked up in the [Python Language Reference](https://docs.python.org/3/reference/index.html). To make changes to that, anyone can make a so-called **[Python Enhancement Proposal](https://www.python.org/dev/peps/)**, or **PEP** for short, where it needs to be specified what exact changes are to be made and argued why that is a good thing to do. These PEPs are reviewed by the [core developers](https://devguide.python.org/coredev/) and interested people and are then either accepted, modified, or rejected if, for example, the change introduces internal inconsistencies. This process is similar to the **double-blind peer review** established in academia. Many of the contributors held or hold positions in academia, one more indicator of the high quality standards in the Python community. To learn more about PEPs, check out [PEP 1](https://www.python.org/dev/peps/pep-0001/) that describes the entire process.\n",
"In total, no one single entity can control how the language evolves, and the users' needs and ideas always feed back to the language specification via a quality controlled and \"democratic\" process."
"Besides being free as in **\"free beer**,\" a major benefit of open-source is that one can always **look up how something works in detail**: That is the literal meaning of *open* source and different as compared to commercial languages (e.g., MATLAB) since a programmer can always continue to **study best practices** or find out how things are implemented. Along this way, many **errors are uncovered** as well. Furthermore, if one runs an open-source application, one can be reasonably sure that no bad people built in a \"backdoor.\" [Free software](https://en.wikipedia.org/wiki/Free_software) is consequently free of charge but brings many other freedoms with it, most notably the freedom to change the code."
"[C](https://en.wikipedia.org/wiki/C_%28programming_language%29) and [C++](https://en.wikipedia.org/wiki/C%2B%2B) (cf., this [introduction](https://www.learncpp.com/)) are wide-spread and long-established (i.e., since the 1970s) programming languages employed in many mission-critical software systems (e.g., operating systems themselves, low latency databases and web servers, nuclear reactor control systems, airplanes, ...). They are fast, mainly because the programmer not only needs to come up with the **business logic** but also manage the computer's memory \"manually\" (and the knowledge necessary to do that is not easy to learn).\n",
"In contrast, Python automatically manages the memory for the programmer. So, speed here is a trade-off between application run time and engineering/development time. Often, the program's run time is not that important: For example, what if C needs 0.001 seconds in a case where Python needs 0.1 seconds to train a linear regression model? When the requirements change and computing speed becomes an issue, the Python community offers many third-party libraries (usually also written in C) where specific problems can be solved in near-C time.\n",
"**In a nutshell**: While it is, of course, true that C is a lot faster than Python when it comes to **pure computation time**, often, this does not matter as the **significantly shorter development cycles** are the more significant cost factor in a rapidly changing business world."
"While it is usually not the best argument to quote authoritative figures like the pope, we briefly look at who uses Python here and leave it up to the reader to decide if this is convincing or not:\n",
"- **[Massachusetts Institute of Technology](https://www.mit.edu/)**\n",
" - teaches Python in its [introductory course](https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-0001-introduction-to-computer-science-and-programming-in-python-fall-2016/) to computer science independent of the student's major\n",
" - replaced the infamous course on the [Scheme](https://groups.csail.mit.edu/mac/projects/scheme/) language (cf., [source](https://news.ycombinator.com/item?id=602307))\n",
" - used the strategy **\"Python where we can, C++ where we must\"** from its early days on to stay flexible in a rapidly changing environment (cf., [source](https://stackoverflow.com/questions/2560310/heavy-usage-of-python-at-google))\n",
" - the very first web-crawler was written in **Java and so difficult to maintain** that it was **rewritten in Python** right away (cf., [source](https://www.amazon.com/Plex-Google-Thinks-Works-Shapes/dp/1416596585/ref=sr_1_1?ie=UTF8&qid=1539101827&sr=8-1&keywords=in+the+plex))\n",
"- **[NASA](https://www.nasa.gov/)** open-sources many of its projects, often written in Python and regarding analyses with big data (cf., [source](https://code.nasa.gov/language/python/))\n",
"- **[Facebook](https://facebook.com/)** uses Python besides C++ and its legacy PHP (a language for building websites; the \"cool kid\" from the early 2000s)\n",
"- **[Instagram](https://instagram.com/)** operates the largest installation of the popular **web framework [Django](https://www.djangoproject.com/)** (cf., [source](https://instagram-engineering.com/web-service-efficiency-at-instagram-with-python-4976d078e366))\n",
"- **[Spotify](https://spotify.com/)** bases its data science on Python (cf., [source](https://labs.spotify.com/2013/03/20/how-we-use-python-at-spotify/))\n",
"- **[Netflix](https://netflix.com/)** also runs its predictive models on Python (cf., [source](https://medium.com/netflix-techblog/python-at-netflix-86b6028b3b3e))\n",
"- **[Dropbox](https://dropbox.com/)** \"stole\" Guido van Rossom from Google to help scale the platform (cf., [source](https://medium.com/dropbox-makers/guido-van-rossum-on-finding-his-way-e018e8b5f6b1))\n",
"- **[JPMorgan Chase](https://www.jpmorganchase.com/)** requires new employees to learn Python as part of the onboarding process starting with the 2018 intake (cf., [source](https://www.ft.com/content/4c17d6ce-c8b2-11e8-ba8f-ee390057b8c9?segmentId=a7371401-027d-d8bf-8a7f-2a746e767d56))"
"As images tell more than words, here are two plots of popular languages' \"market shares\" based on the number of questions asked on [Stack Overflow](https://stackoverflow.blog/2017/09/06/incredible-growth-python/), the most relevant platform for answering programming-related questions: As of late 2017, Python surpassed [Java](https://www.java.com/en/), heavily used in big corporates, and [JavaScript](https://developer.mozilla.org/en-US/docs/Web/JavaScript), the \"language of the internet\" that does everything in web browsers, in popularity. Two blog posts from \"technical\" people explain this in more depth to the layman: [Stack Overflow](https://stackoverflow.blog/2017/09/14/python-growing-quickly/) and [DataCamp](https://www.datacamp.com/community/blog/python-scientific-computing-case)."
"As the graph below shows, neither Google's very own language **[Go](https://golang.org/)** nor **[R](https://www.r-project.org/)**, a domain-specific language in the niche of statistics, can compete with Python's year-to-year growth."
"[IEEE Sprectrum](https://spectrum.ieee.org/computing/software/the-top-programming-languages-2019) provides a more recent comparison of programming language's popularity. Even news and media outlets notice the recent popularity of Python: [Economist](https://www.economist.com/graphic-detail/2018/07/26/python-is-becoming-the-worlds-most-popular-coding-language), [Huffington Post](https://www.huffingtonpost.com/entry/why-python-is-the-best-programming-language-with-which_us_59ef8f62e4b04809c05011b9), [TechRepublic](https://www.techrepublic.com/article/why-python-is-so-popular-with-developers-3-reasons-the-language-has-exploded/), and [QZ](https://qz.com/1408660/the-rise-of-python-as-seen-through-a-decade-of-stack-overflow/)."
"Programming is more than just writing code into a text file. It means reading through parts of the [documentation](https://docs.python.org/), blogs with best practices, and tutorials, or researching problems on Stack Overflow while trying to implement features in the application at hand. Also, it means using command-line tools to automate some part of the work or manage different versions of a program, for example, with **[git](https://git-scm.com/)**. In short, programming involves a lot of \"muscle memory,\" which can only be built and kept up through near-daily usage.\n",
"Further, many aspects of software architecture and best practices can only be understood after having implemented some requirements for the very first time. Coding also means \"breaking\" things to find out what makes them work in the first place.\n",
"Coding is learned best by just doing it for some time on a daily or at least a regular basis and not right before some task is due, just like learning a \"real\" language."
"[Y Combinator](https://www.ycombinator.com/)'s co-founder [Paul Graham](https://en.wikipedia.org/wiki/Paul_Graham_%28programmer%29) wrote a very popular and often cited [article](http://www.paulgraham.com/makersschedule.html) where he divides every person into belonging to one of two groups:\n",
"- **Managers**: People that need to organize things and command others (like a \"boss\"). Their schedule is usually organized by the hour or even 30-minute intervals.\n",
"Have you ever wondered why so many tech people work during nights and sleep at \"weird\" times? The reason is that many programming-related tasks require a \"flow\" state in one's mind that is hard to achieve when one can get interrupted, even if it is only for one short question. Graham describes that only knowing that one has an appointment in three hours can cause a programmer to not get into a flow state.\n",
"As a result, do not set aside a certain amount of time for learning something but rather plan in an **entire evening** or a **rainy Sunday** where you can work on a problem in an **open end** setting. And do not be surprised anymore to hear \"I looked at it over the weekend\" from a programmer."
"Building a prototype always reveals issues no book or tutorial can think of before. Data is never as clean as it should be. An algorithm from a textbook must be adapted to a peculiar aspect of a case study. It is essential to learn to \"ship a product\" because only then will one have looked at all the aspects.\n",
"The major downside of this approach is that one likely learns bad \"patterns\" overfitted to the case at hand, and one does not get the big picture or mental concepts behind a solution. This gap can be filled in by well-written books: For example, check the Python/programming books offered by [Packt](https://www.packtpub.com/packt/offers/free-learning/) or [O’Reilly](https://www.oreilly.com/)."