A framework for data augmentation for 2D and 3D image classification and segmentation
batchgenerators is a python package that we developed at the Division of Medical Image Computing at the German Cancer Research Center (DKFZ) to suit all our deep learning data augmentation needs. It is not (yet) perfect, but we feel it is good enough to be shared with the community. If you encounter bug, feel free to contact us or open a github issue.
If you use it please cite the following work:
Isensee Fabian, Jäger Paul, Wasserthal Jakob, Zimmerer David, Petersen Jens, Kohl Simon, Schock Justus, Klein Andre, Roß Tobias, Wirkert Sebastian, Neher Peter, Dinkelacker Stefan, Köhler Gregor, Maier-Hein Klaus (2020). batchgenerators - a python framework for data augmentation. doi:10.5281/zenodo.3632567
We supports a variety of augmentations, all of which are compatible with 2D and 3D input data! (This is something that was missing in most other frameworks).
Note: Stack transforms by using batchgenerators.transforms.abstracttransforms.Compose. Finish it up by plugging the composed transform into our multithreader: batchgenerators.dataloading.multithreaded_augmenter.MultiThreadedAugmenter
The working principle is simple: Derive from DataLoaderBase class, reimplement generatetrainbatch member function and use it to stack your augmentations! For simple example see
We also now have an extensive example for BraTS2017/2018 with both 2D and 3D DataLoader and augmentations:
There are also CIFAR10/100 datasets and DataLoader available at
The data structure that is used internally (and with which you have to comply when implementing generatetrainbatch) is kept simple as well: It is just a regular python dictionary! We did this to allow maximum flexibility in the kind of data that is passed along through the pipeline. The dictionary must have a 'data' key:value pair. It optionally can handle a 'seg' key:vlaue pair to hold a segmentation. If a 'seg' key:value pair is present all spatial transformations will also be applied to the segmentation! A part from 'data' and 'seg' you are free to do whatever you want (your image classification/regression target for example). All key:value pairs other than 'data' and 'seg' will be passed through the pipeline unmodified.
'data' value must have shape (b, c, x, y) for 2D or shape (b, c, x, y, z) for 3D! 'seg' value must have shape (b, c, x, y) for 2D or shape (b, c, x, y, z) for 3D! Color channel may be used here to allow for several segmentation maps. If you have only one segmentation, make sure to have shape (b, 1, x, y (, z))
pip install --upgrade batchgenerators
Import as follows
from batchgenerators.transforms.color_transforms import ContrastAugmentationTransform
Batchgenerators makes heavy use of python multiprocessing and python multiprocessing on windows is different from linux. To prevent the workers from freezing in windows, you have to guard your code with
if __name__ == '__main__'and use multiprocessing's
freeze_support. The executed script may then look like this:
# some imports and functions here
def main(): # do some stuff
if name == 'main': from multiprocessing import freeze_support freeze_support() main()
This is not required on Linux.
(only highlights, not an exhaustive list)