Awesome Software Engineering for Machine Learning
Software Engineering for Machine Learning are techniques and guidelines for building ML applications that do not concern the core ML problem -- e.g. the development of new algorithms -- but rather the surrounding activities like data ingestion, coding, testing, versioning, deployment, quality control, and team collaboration.
Good software engineering practices enhance development, deployment and maintenance of production level applications using machine learning components.
🎓 Scientific publication
Based on this literature, we compiled a survey on the adoption of software engineering practices for applications with machine learning components.
Feel free to take and share the survey and to read more!
These resources cover all aspects.
How to manage the data sets you use in machine learning.
How to organize your model training experiments.
Deployment and Operation
How to deploy and operate your models in a production environment.
How to organize teams and projects to ensure effective collaboration and accountability.
Tooling can make your life easier.
We only share open source tools, or commercial platforms that offer substantial free packages for research.
Airflow - Programmatically author, schedule and monitor workflows.
Auto-PyTorch - Automatic architecture search and hyperparameter optimization for PyTorch.
Ax - Optimize any kind of experiment, including machine learning experiments, A/B tests, and simulations.
Data Version Control (DVC) - DVC is a data and ML experiments management tool.
Facets Overview / Facets Dive - Robust visualizations to aid in understanding machine learning datasets.
Git Large File System (LFS) - Replaces large files such as datasets with text pointers inside Git.
HParams - A thoughtful approach to configuration management for machine learning projects.
Kubeflow - A platform for data scientists who want to build and experiment with ML pipelines.
Label Studio - A multi-type data labeling and annotation tool with standardized output format.
MLFlow - Manage the ML lifecycle, including experimentation, deployment, and a central model registry.
Neptune.ai - Experiment tracking tool bringing organization and collaboration to data science projects.
Neuraxle - Sklearn-like framework for hyperparameter tuning and AutoML in deep learning projects.
OpenML - An inclusive movement to build an open, organized, online ecosystem for machine learning.
Spark Machine Learning - Spark’s ML library consisting of common learning algorithms and utilities.
Sacred - A tool to help you configure, organize, log and reproduce experiments.
TensorBoard - TensorFlow's Visualization Toolkit.
Tensorflow Extended (TFX) - An end-to-end platform for deploying production ML pipelines.
Weights & Biases - Experiment tracking, model optimization, and dataset versioning.
Contributions welcomed! Read the contribution guidelines first