A list of software and papers related to automatic and fast Exploratory Data Analysis
A list of software and papers related to automated Exploratory Data Analysis, including
Pull requests with software, paper and conference presentations are welcome.
dataMaid (CRAN package) - automated checks of data validity.
DataExplorer (CRAN package) - automated data exploration (including univariate and bivariate plots, PCA) and treatment.
funModeling (CRAN package) - automated EDA, simple feature engineering and outlier detection.
visdat (CRAN package) - 6 exploratory/diagnostic plots for initial data analysis.
dlookr (CRAN package) - tools for data quality diagnosis, basic exploration and feature transformations.
arsenal (CRAN package) - statistical summaries (models and exploration) and quick reporting.
exploreR (CRAN package) - exploration based on univariate linear regression.
summarytools (CRAN package) - table to summarise datasets and perform simple uni- and bivariate analyses.
explore (CRAN package) - interactive Shiny app for comprehensive dataset exploration (including uni- and bivariate relationships, correlation analysis and simple modeling with decision trees) and stand-alone function for quick exploration. Examples are given in a vignette.
AEDA (GitHub package) - summary statistics, correlation analysis, cluster analysis, PCA & other projections.
automatic-data-explorer (GitHub package) - basic EDA and creating Markdown reports from multiple R scripts.
xda (GitHub package) - basic data summaries.
modeler (GitHub package) - tools for exploration and pre-processing.
IEDA (GitHub package) - EDA simplified through interactive visualization.
dfvis (GitHub package) - ggplot2 based implementation of tabplot.
ExPanDaR - package for interactive data visualization. Designed for longitudinal data, but can be also used with other types of data after setting an artificial time variable. Shiny apps with examples are provided on the github website of the package.
brolgar (GitHub package) - tools to assist in longitudinal data analysis
POMA (Bioconductor package) - structured, reproducible and easy-to-use workflow for the visualization, pre-processing, exploratory data analysis and statistical analysis of mass spectrometry data. POMA R/Shiny version available here.
featuretoolsR (CRAN package) - R port to Python library for automated feature engineering.
report - automated modeling report generation.
FactoInvestigate (CRAN package) - has an automatic reporting module which selects best plots that summarise different projection techniques.
gtsummary (GitHub package) - presentation-ready tables summarizing data sets, regression models, and more.
clean (CRAN package) - fast data cleaning.
finalfit (CRAN package) - tables and plots to quickly visualize regression results.
modelsummary (GitHub package) - summary tables for regression models.
DataPrep (pip library) - data preparation library with an EDA package.
Dora (pip library) - data cleaning, featuring engineering and simple modeling tools.
statsModels (pip library) - collection of statistical tools, including EDA.
TPOT (pip library) - autoML tool with feature engineering module.
HoloViews (pip library) - automated visualization based on short data annotations.
pandas-profiling - popular library for quick data summaries and correlation analysis.
speedML (pip library) - large library for ML with module dedicated to fast EDA.
edaviz - Python library for fast data exploration that provides functions for dataset overviews, bivariate plots and finding good predictors. (Free version only works for small datasets).
AutoViz - Python library for automated visualization.
ExploriPy - Python library for various EDA tasks.
pandas-summary - simple extension to pandas.describe.
sweetviz - visualizations for automated EDA.
featuretools - library for automated feature engineering.
pyvtreat - Python version of the R's vtreat package.
autoimpute - easier handling of missing values.
Auto_TS - automated time series modeling.
DIVE - MIT's tools for data exploration that tries to choose best (most informative) visualizations.
Automatic Statistician - tool for automated EDA and modeling.
auto-eda - automatic EDA with SQL.
elycite - tools for exploration and modelling available (locally) as an web application. Designed for NLP problems.