Need help with Copulas?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

About the developer

sdv-dev
233 Stars 65 Forks MIT License 702 Commits 30 Opened issues

Description

A library to model multivariate data using copulas.

Services available

!
?

Need anything else?

Contributors list

DAI-Lab An Open Source Project from the Data to AI Lab, at MIT

Development Status PyPi Shield Downloads Unit Tests Coverage Status

Copulas

Overview

  • Website: https://sdv.dev
  • Documentation: https://sdv.dev/Copulas
  • Repository: https://github.com/sdv-dev/Copulas
  • License: MIT
  • Development Status: Pre-Alpha

Copulas is a Python library for modeling multivariate distributions and sampling from them using copula functions. Given a table containing numerical data, we can use Copulas to learn the distribution and later on generate new synthetic rows following the same statistical properties.

Some of the features provided by this library include:

  • A variety of distributions for modeling univariate data.
  • Multiple Archimedean copulas for modeling bivariate data.
  • Gaussian and Vine copulas for modeling multivariate data.
  • Automatic selection of univariate distributions and bivariate copulas.

Supported Distributions

Univariate

  • Beta
  • Gamma
  • Gaussian
  • Gaussian KDE
  • Log-Laplace
  • Student T
  • Truncated Gaussian
  • Uniform

Archimedean Copulas (Bivariate)

  • Clayton
  • Frank
  • Gumbel

Multivariate

  • Gaussian Copula
  • D-Vine
  • C-Vine
  • R-Vine

Install

Requirements

Copulas is part of the SDV project and is automatically installed alongside it. For details about this process please visit the SDV Installation Guide

Optionally, Copulas can also be installed as a standalone library using the following commands:

Using

pip
:

pip install copulas

Using

conda
:

conda install -c sdv-dev -c conda-forge copulas

For more installation options please visit the Copulas installation Guide

Quickstart

In this short quickstart, we show how to model a multivariate dataset and then generate synthetic data that resembles it.

import warnings
warnings.filterwarnings('ignore')

from copulas.datasets import sample_trivariate_xyz from copulas.multivariate import GaussianMultivariate from copulas.visualization import compare_3d

Load a dataset with 3 columns that are not independent

real_data = sample_trivariate_xyz()

Fit a gaussian copula to the data

copula = GaussianMultivariate() copula.fit(real_data)

Sample synthetic data

synthetic_data = copula.sample(len(real_data))

Plot the real and the synthetic data to compare

compare_3d(real_data, synthetic_data)

The output will be a figure with two plots, showing what both the real and the synthetic data that you just generated look like:

Quickstart

What's next?

For more details about Copulas and all its possibilities and features, please check the documentation site.

There you can learn more about how to contribute to Copulas in order to help us developing new features or cool ideas.

Credits

Copulas is an open source project from the Data to AI Lab at MIT which has been built and maintained over the years by the following team:

The Synthetic Data Vault

This repository is part of The Synthetic Data Vault Project

  • Website: https://sdv.dev
  • Documentation: https://sdv.dev/SDV

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.