Notebooks and code for the book "Introduction to Machine Learning with Python"
The books requires the current stable version of scikit-learn, that is 0.20.0. Most of the book can also be used with previous versions of scikit-learn, though you need to adjust the import for everything from the
This repository provides the notebooks from which the book is created, together with the
mglearnlibrary of helper functions to create figures and datasets.
For the curious ones, the cover depicts a hellbender.
All datasets are included in the repository, with the exception of the aclImdb dataset, which you can download from the page of Andrew Maas. See the book for details.
If you get
ImportError: No module named mglearnyou can try to install mglearn into your python environment using the command
pip install mglearnin your terminal or
!pip install mglearnin Jupyter Notebook.
Please note that the first print of the book is missing the following line when listing the assumed imports:
from IPython.display import display
Please add this line if you see an error involving
The first print of the book used a function called
plot_group_kfold. This has been renamed to
plot_label_kfoldbecause of a rename in scikit-learn.
To run the code, you need the packages
pillow. Some of the visualizations of decision trees and neural networks structures also require
graphviz. The chapter on text processing also requirs
The easiest way to set up an environment is by installing Anaconda.
If you already have a Python environment set up, and you are using the
condapackage manager, you can get all packages by running
conda install numpy scipy scikit-learn matplotlib pandas pillow graphviz python-graphviz
For the chapter on text processing you also need to install
conda install nltk spacy
If you already have a Python environment and are using pip to install packages, you need to run
pip install numpy scipy scikit-learn matplotlib pandas pillow graphviz
You also need to install the graphiz C-library, which is easiest using a package manager. If you are using OS X and homebrew, you can
brew install graphviz. If you are on Ubuntu or debian, you can
apt-get install graphviz. Installing graphviz on Windows can be tricky and using conda / anaconda is recommended. For the chapter on text processing you also need to install
pip install nltk spacy
For the text processing chapter, you need to download the English language model for spacy using
python -m spacy download en
If you have errata for the (e-)book, please submit them via the O'Reilly Website. You can submit fixes to the code as pull-requests here, but I'd appreciate it if you would also submit them there, as this repository doesn't hold the "master notebooks".