Need help with split-folders?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

About the developer

jfilter
213 Stars 37 Forks MIT License 48 Commits 9 Opened issues

Description

🗂 Split folders with files (i.e. images) into training, validation and test (dataset) folders

Services available

!
?

Need anything else?

Contributors list

# 57,173
PHP
Sass
React N...
Sublime...
44 commits
# 161,560
Jupyter...
CSS
TypeScr...
Shell
1 commit
# 288,215
Angular
CSS
ionic2
Google ...
1 commit

split-folders
Build Status PyPI PyPI - Python Version PyPI - Downloads

Split folders with files (e.g. images) into train, validation and test (dataset) folders.

The input folder should have the following format:

input/
    class1/
        img1.jpg
        img2.jpg
        ...
    class2/
        imgWhatever.jpg
        ...
    ...

In order to give you this:

output/
    train/
        class1/
            img1.jpg
            ...
        class2/
            imga.jpg
            ...
    val/
        class1/
            img2.jpg
            ...
        class2/
            imgb.jpg
            ...
    test/
        class1/
            img3.jpg
            ...
        class2/
            imgc.jpg
            ...

This should get you started to do some serious deep learning on your data. Read here why it's a good idea to split your data intro three different sets.

  • Split files into a training set and a validation set (and optionally a test set).
  • Works on any file types.
  • The files get shuffled.
  • A seed makes splits reproducible.
  • Allows randomized oversampling for imbalanced datasets.
  • Optionally group files by prefix.
  • (Should) work on all operating systems.

Install

pip install split-folders

If you are working with a large amount of files, you may want to get a progress bar. Install tqdm in order to get visual updates for copying files.

pip install split-folders tqdm

Usage

You can use

split-folders
as Python module or as a Command Line Interface (CLI).

If your datasets is balanced (each class has the same number of samples), choose

ratio
otherwise
fixed
. NB: oversampling is turned off by default. Oversampling is only applied to the train folder since having duplicates in val or test would be considered cheating.

Module

import splitfolders  # or import split_folders

Split with a ratio.

To only split into training and validation set, set a tuple to ratio, i.e, (.8, .2).

splitfolders.ratio("input_folder", output="output", seed=1337, ratio=(.8, .1, .1), group_prefix=None) # default values

Split val/test with a fixed number of items e.g. 100 for each set.

To only split into training and validation set, use a single number to fixed, i.e., 10.

splitfolders.fixed("input_folder", output="output", seed=1337, fixed=(100, 100), oversample=False, group_prefix=None) # default values

Occasionally you may have things that comprise more than a single file (e.g. picture (.png) + annotation (.txt)).

splitfolders
lets you split files into equally-sized groups based on their prefix. Set
group_prefix
to the length of the group (e.g.
2
). But now all files should be part of groups.

CLI

Usage:
    splitfolders [--output] [--ratio] [--fixed] [--seed] [--oversample] [--group_prefix] folder_with_images
Options:
    --output        path to the output folder. defaults to `output`. Get created if non-existent.
    --ratio         the ratio to split. e.g. for train/val/test `.8 .1 .1 --` or for train/val `.8 .2 --`.
    --fixed         set the absolute number of items per validation/test set. The remaining items constitute
                    the training set. e.g. for train/val/test `100 100` or for train/val `100`.
    --seed          set seed value for shuffling the items. defaults to 1337.
    --oversample    enable oversampling of imbalanced datasets, works only with --fixed.
    --group_prefix  split files into equally-sized groups based on their prefix
Example:
    splitfolders --ratio .8 .1 .1 -- folder_with_images

Because of some Python quirks you have to prepend

--
afer using
--ratio
.

Instead of the command

splitfolders
you can also use
split_folders
or
split-folders
.

Development

Install and use poetry.

Contributing

If you have a question, found a bug or want to propose a new feature, have a look at the issues page.

Pull requests are especially welcomed when they fix bugs or improve the code quality.

License

MIT

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.