Practice and tutorial-style notebooks covering wide variety of machine learning techniques
Carefully curated resource links for data science in one place
pip install numpy)
pip install pandas)
pip install scikit-learn)
pip install scipy)
pip install statsmodels)
pip install matplotlib)
pip install seaborn)
pip install sympy)
pip install flask)
pip install wtforms)
pip install tensorflow>=1.15)
pip install keras)
pip install pdpipe)
You can start with this article that I wrote in Heartbeat magazine (on Medium platform):
Jupyter notebooks covering a wide range of functions and operations on the topics of NumPy, Pandans, Seaborn, Matplotlib etc.
Simple linear regression with t-statistic generation
Multiple ways to perform linear regression in Python and their speed comparison (check the article I wrote on freeCodeCamp)
Polynomial regression using scikit-learn pipeline feature (check the article I wrote on Towards Data Science)
Decision trees and Random Forest regression (showing how the Random Forest works as a robust/regularized meta-estimator rejecting overfitting)
Detailed visual analytics and goodness-of-fit diagnostic tests for a linear regression problem
Robust linear regression using
HuberRegressorfrom Scikit-learn
Logistic regression/classification (Here is the Notebook)
k-nearest neighbor classification (Here is the Notebook)
Decision trees and Random Forest Classification (Here is the Notebook)
Support vector machine classification (Here is the Notebook) (check the article I wrote in Towards Data Science on SVM and sorting algorithm)
K-means clustering (Here is the Notebook)
Affinity propagation (showing its time complexity and the effect of damping factor) (Here is the Notebook)
Mean-shift technique (showing its time complexity and the effect of noise on cluster discovery) (Here is the Notebook)
DBSCAN (showing how it can generically detect areas of high density irrespective of cluster shapes, which the k-means fails to do) (Here is the Notebook)
Hierarchical clustering with Dendograms showing how to choose optimal number of clusters (Here is the Notebook)
How to use Sympy package to generate random datasets using symbolic mathematical expressions.
Here is my article on Medium on this topic: Random regression and classification problem generation with symbolic expression
Serving a linear regression model through a simple HTTP server interface. User needs to request predictions by executing a Python script. Uses
Flaskand
Gunicorn.
Serving a recurrent neural network (RNN) through a HTTP webpage, complete with a web form, where users can input parameters and click a button to generate text based on the pre-trained RNN model. Uses
Flask,
Jinja,
Keras/
TensorFlow,
WTForms.
Implementing some of the core OOP principles in a machine learning context by building your own Scikit-learn-like estimator, and making it better.
See my articles on Medium on this topic.
Check the files and detailed instructions in the Pytest directory to understand how one should write unit testing code/module for machine learning models