by SE-ML

SE-ML /awesome-seml

A curated list of articles that cover the software engineering best practices for building machine l...

477 Stars 48 Forks Last release: Not found Creative Commons Zero v1.0 Universal 100 Commits 0 Releases

Available items

No Items, yet!

The developer of this repository has not created any items for sale yet. Need a bug fixed? Help with integration? A different license? Create a request here:

Awesome Software Engineering for Machine Learning AwesomePRs Welcome

Software Engineering for Machine Learning are techniques and guidelines for building ML applications that do not concern the core ML problem -- e.g. the development of new algorithms -- but rather the surrounding activities like data ingestion, coding, testing, versioning, deployment, quality control, and team collaboration. Good software engineering practices enhance development, deployment and maintenance of production level applications using machine learning components.

⭐ Must-read

πŸŽ“ Scientific publication

Based on this literature, we compiled a survey on the adoption of software engineering practices for applications with machine learning components.

Feel free to take and share the survey and to read more!


Broad Overviews

These resources cover all aspects.

Data Management

How to manage the data sets you use in machine learning.

Model Training

How to organize your model training experiments.

Deployment and Operation

How to deploy and operate your models in a production environment.

Social Aspects

How to organize teams and projects to ensure effective collaboration and accountability.



Tooling can make your life easier.

We only share open source tools, or commercial platforms that offer substantial free packages for research.

  • Airflow - Programmatically author, schedule and monitor workflows.
  • Auto-PyTorch - Automatic architecture search and hyperparameter optimization for PyTorch.
  • Ax - Optimize any kind of experiment, including machine learning experiments, A/B tests, and simulations.
  • Data Version Control (DVC) - DVC is a data and ML experiments management tool.
  • Facets Overview / Facets Dive - Robust visualizations to aid in understanding machine learning datasets.
  • Git Large File System (LFS) - Replaces large files such as datasets with text pointers inside Git.
  • HParams - A thoughtful approach to configuration management for machine learning projects.
  • Kubeflow - A platform for data scientists who want to build and experiment with ML pipelines.
  • Label Studio - A multi-type data labeling and annotation tool with standardized output format.
  • MLFlow - Manage the ML lifecycle, including experimentation, deployment, and a central model registry.
  • - Experiment tracking tool bringing organization and collaboration to data science projects.
  • Neuraxle - Sklearn-like framework for hyperparameter tuning and AutoML in deep learning projects.
  • OpenML - An inclusive movement to build an open, organized, online ecosystem for machine learning.
  • Spark Machine Learning - Spark’s ML library consisting of common learning algorithms and utilities.
  • Sacred - A tool to help you configure, organize, log and reproduce experiments.
  • TensorBoard - TensorFlow's Visualization Toolkit.
  • Tensorflow Extended (TFX) - An end-to-end platform for deploying production ML pipelines.
  • Weights & Biases - Experiment tracking, model optimization, and dataset versioning.


Contributions welcomed! Read the contribution guidelines first

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.