plait.py - a fake data modeler
plait.py is a program for generating fake data from composable yaml templates.
The idea behind plait.py is that it should be easy to model fake data that has an interesting shape. Currently, many fake data generators model their data as a collection of IID variables; with plait.py we can stitch together those variables into a more coherent model.
some example uses for plait.py are:
# a person generator define: min_age: 10 minor_age: 13 working_age: 18fields: age: random: gauss(25, 5) # minimum age is $min_age finalize: max($min_age, value)
gender: mixture: - value: M - value: F
name: "#{name.name}" job: value: "#{job.title}" onlyif: this.age > $working_age
address: template: address/usa.yaml phone: # add a phone if the person is older than the minor age template: device/phone.yaml onlyif: this.age > ${minor_age}
we model our height as a gaussian that varies based on
age and gender
height: lambda: this._base_height * this._age_factor _base_height: switch: - onlyif: this.gender == "F" random: gauss(60, 5) - onlyif: this.gender == "M" random: gauss(70, 5)
_age_factor: switch: - onlyif: this.age < 15 lambda: 1 - (20 - (this.age + 5)) / 20 - default: value: 1
some specific examples of what plait.py can do:
# install with python pip install plaitpyor with pypy
pypy-pip install plaitpy
git clone https://github.com/plaitpy/plaitpyget the fakerb repo
git submodule init git submodule update
specify a template as a yaml file, then generate records from that yaml file.
# a simple example (if cloning plait.py repo) python main.py templates/timestamp/uniform.yamlif plait.py is installed via pip
plait.py templates/timestamp/uniform.yaml
import plaitpy t = plaitpy.Template("templates/timestamp/uniform.yaml") print t.gen_record() print t.gen_records(10)
plait.py also simplifies looking up faker fields:
# list faker namespaces plait.py --list # lookup faker namespaces plait.py --lookup namelookup faker keys
(-ll is short for --lookup)
plait.py --ll name.suffix
To simulate data that comes from many markov processes (a markov ecosystem), see the plaitpy-ipc repository.
If you have ideas on features to add, open an issue - Feedback is appreciated!