🗂 Split folders with files (i.e. images) into training, validation and test (dataset) folders
The developer of this repository has not created any items for sale yet. Need a bug fixed? Help with integration? A different license? Create a request here:
Split folders with files (e.g. images) into train, validation and test (dataset) folders.
The input folder should have the following format:
input/ class1/ img1.jpg img2.jpg ... class2/ imgWhatever.jpg ... ...
In order to give you this:
output/ train/ class1/ img1.jpg ... class2/ imga.jpg ... val/ class1/ img2.jpg ... class2/ imgb.jpg ... test/ class1/ img3.jpg ... class2/ imgc.jpg ...
This should get you started to do some serious deep learning on your data. Read here why it's a good idea to split your data intro three different sets.
pip install split-folders
If you are working with a large amount of files, you may want to get a progress bar. Install tqdm in order to get visual updates for copying files.
pip install split-folders tqdm
You can use
split-foldersas Python module or as a Command Line Interface (CLI).
If your datasets is balanced (each class has the same number of samples), choose
fixed. NB: oversampling is turned off by default. Oversampling is only applied to the train folder since having duplicates in val or test would be considered cheating.
import splitfolders # or import split_folders
Split with a ratio.
To only split into training and validation set, set a tuple to
splitfolders.ratio("input_folder", output="output", seed=1337, ratio=(.8, .1, .1), group_prefix=None) # default values
Split val/test with a fixed number of items e.g. 100 for each set.
To only split into training and validation set, use a single number to
splitfolders.fixed("input_folder", output="output", seed=1337, fixed=(100, 100), oversample=False, group_prefix=None) # default values
Occasionally you may have things that comprise more than a single file (e.g. picture (.png) + annotation (.txt)).
splitfolderslets you split files into equally-sized groups based on their prefix. Set
group_prefixto the length of the group (e.g.
2). But now all files should be part of groups.
Usage: splitfolders [--output] [--ratio] [--fixed] [--seed] [--oversample] [--group_prefix] folder_with_images Options: --output path to the output folder. defaults to `output`. Get created if non-existent. --ratio the ratio to split. e.g. for train/val/test `.8 .1 .1` or for train/val `.8 .2`. --fixed set the absolute number of items per validation/test set. The remaining items constitute the training set. e.g. for train/val/test `100 100` or for train/val `100`. --seed set seed value for shuffling the items. defaults to 1337. --oversample enable oversampling of imbalanced datasets, works only with --fixed. --group_prefix split files into equally-sized groups based on their prefix Example: splitfolders --ratio .8 .1 .1 folder_with_images
Instead of the command
splitfoldersyou can also use
Install and use poetry.
If you have a question, found a bug or want to propose a new feature, have a look at the issues page.
Pull requests are especially welcomed when they fix bugs or improve the code quality.