Need help with phraug2?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

About the developer

zygmuntz
201 Stars 61 Forks BSD 2-Clause "Simplified" License 54 Commits 2 Opened issues

Description

A new version of phraug, which is a set of simple Python scripts for pre-processing large files

Services available

!
?

Need anything else?

Contributors list

# 35,283
Shell
Jupyter...
hyperpa...
R
41 commits
# 55,963
Scala
big-dat...
Shell
c-plus-...
3 commits
# 375,701
Python
Shell
1 commit
# 9,793
autohot...
ahk
bitwise...
Nette
1 commit
# 614,231
Python
1 commit
# 587,338
JavaScr...
Python
1 commit

phraug2

A new version of phraug (pron. frog) with improved command line arguments parsing, thanks to jofusa.

This is a set of simple Python scripts for pre-processing large files, things like splitting and format conversion. The names phraug comes from a great book, Made to Stick, by Chip and Dan Heath.

See http://fastml.com/processing-large-files-line-by-line/ for the basic idea.

There's always at least one input file and usually one or more output files. An input file always stays unchanged.

For documentation: * try calling a script with

-h
, most will display usage information. * see the phraug docs. * see http://fastml.com/introducing-phraug/

Example:

>python split.py
usage: split.py [-h] [-p PROBABILITY] [-r RANDOM_SEED] [-s] [-c]
                input_file output_file1 output_file2
split.py: error: too few arguments

>python split.py -h usage: split.py [-h] [-p PROBABILITY] [-r RANDOM_SEED] [-s] [-c] input_file output_file1 output_file2

split a file into two randomly, line by line.

positional arguments: input_file path to an input file output_file1 path to the first output file output_file2 path to the second output file

optional arguments: -h, --help show this help message and exit -p PROBABILITY, --probability PROBABILITY probability of writing to the first file (default 0.9) -r RANDOM_SEED, --random_seed RANDOM_SEED random seed -s, --skip_headers skip the header line -c, --copy_headers copy the header line to both output files

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.