Data Analysis Baseline Library
The data analysis baseline library.
Find more information on the website.
pip install dabl
This library is very much still under development. Current code focuses mostly on exploratory visualization and preprocessing. There are also drop-in replacements for GridSearchCV and RandomizedSearchCV using successive halfing. There are preliminary portfolios in the style of POSH auto-sklearn to find strong models quickly. In essence that boils down to a quick search over different gradient boosting models and other tree ensembles and potentially kernel methods.
Lux is an awesome project for easy interactive visualization of pandas dataframes within notebooks.
Pandas Profiling can provide a thorough summary of the data in only a single line of code. Using the
ProfileReport()method, you are able to access a HTML report of your data that can help you find correlations and identify missing data.
dablfocuses less on statistical measures of individual columns, and more on providing a quick overview via visualizations, as well as convienient preprocessing and model search for machine learning.