Need help with pyodds?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

About the developer

134 Stars 27 Forks MIT License 115 Commits 5 Opened issues


An End-to-end Outlier Detection System

Services available


Need anything else?

Contributors list


Build Status Coverage Status Documentation Status Codacy Badge PyPI version

Official Website:

PyODDS is an end-to end Python system for outlier detection with database support. PyODDS provides outlier detection algorithms which meet the demands for users in different fields, w/wo data science or machine learning background. PyODDS gives the ability to execute machine learning algorithms in-database without moving data out of the database server or over the network. It also provides access to a wide range of outlier detection algorithms, including statistical analysis and more recent deep learning based approaches. It is developed by

at Texas A&M University.

PyODDS is featured for:

  • Full Stack Service which supports operations and maintenances from light-weight SQL based database to back-end machine learning algorithms and makes the throughput speed faster;

  • State-of-the-art Anomaly Detection Approaches including Statistical/Machine Learning/Deep Learning models with unified APIs and detailed documentation;

  • Powerful Data Analysis Mechanism which supports both static and time-series data analysis with flexible time-slice(sliding-window) segmentation.

  • Automated Machine Learning PyODDS describes the first attempt to incorporate automated machine learning with outlier detection, and belongs to one of the first attempts to extend automated machine learning concepts into real-world data mining tasks.

The Full API Reference can be found in


API Demo:

from utils.import_algorithm import algorithm_selection
from utils.utilities import output_performance,connect_server,query_data

connect to the database

conn,cursor=connect_server(host, user, password)

query data from specific time range

data = query_data(database_name,table_name,start_time,end_time)

train the anomaly detection algorithm

clf = algorithm_selection(algorithm_name)

get outlier result and scores

prediction_result = clf.predict(X_test) outlierness_score = clf.decision_function(test)

#visualize the prediction_result visualize_distribution(X_test,prediction_result,outlierness_score)

Cite this work

Yuening Li, Daochen Zha, Praveen Kumar Venugopal, Na Zou, Xia Hu. "PyODDS: An End-to-end Outlier Detection System with Automated Machine Learning" (Download)

Biblatex entry:

    author = {Li, Yuening and Zha, Daochen and Venugopal, Praveen and Zou, Na and Hu, Xia},
    title = {PyODDS: An End-to-End Outlier Detection System with Automated Machine Learning},
    year = {2020},
    isbn = {9781450370240},
    publisher = {Association for Computing Machinery},
    address = {New York, NY, USA},
    url = {},
    doi = {10.1145/3366424.3383530},
    booktitle = {Companion Proceedings of the Web Conference 2020},
    pages = {153--157},
    numpages = {5},
    keywords = {Automated Machine Learning, Outlier Detection, Open Source Package, End-to-end System},
    location = {Taipei, Taiwan},
    series = {WWW '20}

Quick Start

python --ground_truth --visualize_distribution

Results are shown as

connect to TDengine success
Load dataset and table
Loading cost: 0.151061 seconds
Load data successful
Start processing:
100%|████████████████████████████████████| 10/10 [00:00<00:00, 14.02it/s]
Results in Algorithm dagmm are:
accuracy_score: 0.98
precision_score: 0.99
recall_score: 0.99
f1_score: 0.99
roc_auc_score: 0.99
processing time: 15.330137 seconds
connection is closed


To install the package, please use the

installation as follows:

pip install pyodds
pip install [email protected]:datamllab/PyODDS.git

Note: PyODDS is only compatible with Python 3.6 and above.

Required Dependencies

- pandas>=0.25.0
- taos==1.4.15
- tensorflow==2.0.0b1
- numpy>=1.16.4
- seaborn>=0.9.0
- torch>=1.1.0
- luminol==0.4
- tqdm>=4.35.0
- matplotlib>=3.1.1
- scikit_learn>=0.21.3

To compile and package the JDBC driver source code, you should have a Java jdk-8 or higher and Apache Maven 2.7 or higher installed. To install openjdk-8 on Ubuntu:

sudo apt-get install openjdk-8-jdk

To install Apache Maven on Ubuntu:

sudo apt-get install maven

To install the TDengine as the back-end database service, please refer to this instruction.

To enable the Python client APIs for TDengine, please follow this handbook.

To insure the locale in config file is valid:

sudo locale-gen "en_US.UTF-8"
export LC_ALL="en_US.UTF-8"

To start the service after installation, in a terminal, use:


Implemented Algorithms

Statistical Based Methods


Algorithm Class API
CBLOF Clustering-Based Local Outlier Factor :class:

HBOS Histogram-based Outlier Score :class:
IFOREST Isolation Forest :class:
KNN k-Nearest Neighbors :class:
LOF Local Outlier Factor :class:
OCSVM One-Class Support Vector Machines :class:
PCA Principal Component Analysis :class:
RobustCovariance Robust Covariance :class:
SOD Subspace Outlier Detection :class:

Deep Learning Based Methods


Algorithm Class API
autoencoder Outlier detection using replicator neural networks :class:

dagmm Deep autoencoding gaussian mixture model for unsupervised anomaly detection :class:

Time Serie Methods


Algorithm Class API
lstmad Long short term memory networks for anomaly detection in time series :class:

lstmencdec LSTM-based encoder-decoder for multi-sensor anomaly detection :class:
luminol Linkedin's luminol :class:

APIs Cheatsheet

The Full API Reference can be found in


  • connect_server(hostname,username,password): Connect to Apache backend TDengine Service.

  • querydata(connection,cursor,databasename,tablename,starttime,end_time): Query data from table table_name in database database_name within a given time range.

  • algorithmselection(algorithmname,contamination): Select an algorithm as detector.

  • fit(X): Fit X to detector.

  • predict(X): Predict if instance in X is outlier or not.

  • decision_function(X): Output the anomaly score of instances in X.

  • outputperformance(algorithmname,groundtruth,predictionresult,outlierness_score): Output the prediction result as evaluation matrix in Accuracy, Precision, Recall, F1 Score, ROC-AUC Score, Cost time.

  • visualizedistribution(X,predictionresult,outlierness_score): Visualize the detection result with the the data distribution.

  • visualizeoutlierscore(outliernessscore,prediction_result,contamination) Visualize the detection result with the outlier score.


You may use this software under the MIT License.

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.