A new version of phraug, which is a set of simple Python scripts for pre-processing large files
A new version of phraug (pron. frog) with improved command line arguments parsing, thanks to jofusa.
This is a set of simple Python scripts for pre-processing large files, things like splitting and format conversion. The names phraug comes from a great book, Made to Stick, by Chip and Dan Heath.
See http://fastml.com/processing-large-files-line-by-line/ for the basic idea.
There's always at least one input file and usually one or more output files. An input file always stays unchanged.
For documentation: * try calling a script with
-h, most will display usage information. * see the phraug docs. * see http://fastml.com/introducing-phraug/
>python split.py usage: split.py [-h] [-p PROBABILITY] [-r RANDOM_SEED] [-s] [-c] input_file output_file1 output_file2 split.py: error: too few arguments
>python split.py -h usage: split.py [-h] [-p PROBABILITY] [-r RANDOM_SEED] [-s] [-c] input_file output_file1 output_file2
split a file into two randomly, line by line.
positional arguments: input_file path to an input file output_file1 path to the first output file output_file2 path to the second output file
optional arguments: -h, --help show this help message and exit -p PROBABILITY, --probability PROBABILITY probability of writing to the first file (default 0.9) -r RANDOM_SEED, --random_seed RANDOM_SEED random seed -s, --skip_headers skip the header line -c, --copy_headers copy the header line to both output files