Need help with data-science-from-scratch?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

About the developer

joelgrus
6.4K Stars 3.7K Forks MIT License 89 Commits 70 Opened issues

Description

code for Data Science From Scratch book

Services available

!
?

Need anything else?

Contributors list

# 3,842
Python
Jupyter...
elm
pytorch
69 commits
# 3,789
C++
Shell
Jupyter...
string-...
1 commit
# 88,413
Python
Shell
C++
1 commit
# 88,646
Python
R
Data vi...
visuali...
1 commit
# 31,037
Python
CSS
jupyter
Tensorf...
1 commit
# 79
Python
tpu
algorit...
sorting...
1 commit

Data Science from Scratch

Here's all the code and examples from the second edition of my book Data Science from Scratch. They require at least Python 3.6.

(If you're looking for the code and examples from the first edition, that's in the

first-edition
folder.)

If you want to use the code, you should be able to clone the repo and just do things like

In [1]: from scratch.linear_algebra import dot

In [2]: dot([1, 2, 3], [4, 5, 6]) Out[2]: 32

and so on and so forth.

Two notes:

  1. In order to use the library like this, you need to be in the root directory (that is, the directory that contains the

    scratch
    folder). If you are in the
    scratch
    directory itself, the imports won't work.
  2. It's possible that it will just work. It's also possible that you may need to add the root directory to your

    PYTHONPATH
    , if you are on Linux or OSX this is as simple as
export PYTHONPATH=/path/to/where/you/cloned/this/repo

(substituting in the real path, of course).

If you are on Windows, it's potentially more complicated.

Table of Contents

  1. Introduction
  2. A Crash Course in Python
  3. Visualizing Data
  4. Linear Algebra
  5. Statistics
  6. Probability
  7. Hypothesis and Inference
  8. Gradient Descent
  9. Getting Data
  10. Working With Data
  11. Machine Learning
  12. k-Nearest Neighbors
  13. Naive Bayes
  14. Simple Linear Regression
  15. Multiple Regression
  16. Logistic Regression
  17. Decision Trees
  18. Neural Networks
  19. [Deep Learning]
  20. Clustering
  21. Natural Language Processing
  22. Network Analysis
  23. Recommender Systems
  24. Databases and SQL
  25. MapReduce
  26. Data Ethics
  27. Go Forth And Do Data Science

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.