Need help with data-science-types?

Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

180 Stars 57 Forks Apache License 2.0 446 Commits 39 Opened issues

Mypy stubs, i.e., type information, for numpy, pandas and matplotlib

Readme

⚠️ **this project has mostly stopped development** ⚠️

The pandas team and the numpy team are both in the process of integrating type stubs into their codebases, and we don't see the point of competing with them.

This is a PEP-561-compliant stub-only package which provides type information for matplotlib, numpy and pandas. The mypy type checker (or pytype or PyCharm) can recognize the types in these packages by installing this package.

Many functions are already typed, but a *lot* is still missing (NumPy and pandas are *huge* libraries).
Chances are, you will see a message from Mypy claiming that a function does not exist when it does exist.
If you encounter missing functions, we would be delighted for you to send a PR.
If you are unsure of how to type a function, we can discuss it.

You can get this package from PyPI:

pip install data-science-types

To get the most up-to-date version, install it directly from GitHub:

pip install git+https://github.com/predictive-analytics-lab/data-science-types

Or clone the repository somewhere and do

pip install -e ..

These are the kinds of things that can be checked:

import numpy as nparr1: np.ndarray[np.int64] = np.array([3, 7, 39, -3]) # OK arr2: np.ndarray[np.int32] = np.array([3, 7, 39, -3]) # Type error arr3: np.ndarray[np.int32] = np.array([3, 7, 39, -3], dtype=np.int32) # OK arr4: np.ndarray[float] = np.array([3, 7, 39, -3], dtype=float) # Type error: the type of ndarray can not be just "float" arr5: np.ndarray[np.float64] = np.array([3, 7, 39, -3], dtype=float) # OK

import numpy as nparr1: np.ndarray[np.int64] = np.array([3, 7, 39, -3]) arr2: np.ndarray[np.int64] = np.array([4, 12, 9, -1])

result1: np.ndarray[np.int64] = np.divide(arr1, arr2) # Type error result2: np.ndarray[np.float64] = np.divide(arr1, arr2) # OK

compare: np.ndarray[np.bool_] = (arr1 == arr2)

import numpy as nparr: np.ndarray[np.float64] = np.array([[1.3, 0.7], [-43.0, 5.6]])

sum1: int = np.sum(arr) # Type error sum2: np.float64 = np.sum(arr) # OK sum3: float = np.sum(arr) # Also OK: np.float64 is a subclass of float sum4: np.ndarray[np.float64] = np.sum(arr, axis=0) # OK

## the same works with np.max, np.min and np.prod

The goal is not to recreate the APIs exactly.
The main goal is to have *useful* checks on our code.
Often the actual APIs in the libraries is more permissive than the type signatures in our stubs;
but this is (usually) a feature and not a bug.

We always welcome contributions. All pull requests are subject to CI checks. We check for compliance with Mypy and that the file formatting conforms to our Black specification.

You can install these dev dependencies via

pip install -e '.[dev]'

This will also install NumPy, pandas, and Matplotlib to be able to run the tests.

We include a script for running the CI checks that are triggered when a PR is opened. To test these out locally, you need to install the type stubs in your environment. Typically, you would do this with

pip install -e .

Then use the

check_all.shscript to run all tests:

./check_all.sh

Below we describe how to run the various checks individually, but

check_all.shshould be easier to use.

The settings for Mypy are specified in the

mypy.inifile in the repository. Just running

mypy tests

from the base directory should take these settings into account. We enforce 0 Mypy errors.

We use Black to format the stub files. First, install

blackand then run

black .

from the base directory.

python -m pytest -vv tests/

flake8 *-stubs