ProPPR

by TeamCohen

TeamCohen / ProPPR

Graph-algorithm inferences over local groundings of first-order logic programs

126 Stars 49 Forks Last release: over 5 years ago (v2.1.0) Apache License 2.0 801 Commits 3 Releases

Available items

No Items, yet!

The developer of this repository has not created any items for sale yet. Need a bug fixed? Help with integration? A different license? Create a request here:

For regular updates, subscribe to our google group at: https://groups.google.com/forum/#!forum/proppr

==============

2.0 QUICKSTART

  1. Write a rulefile as *.ppr:

$ cat > test.ppr predict(X,Y) :- hasWord(X,W),isLabel(Y),related(W,Y) {r}. related(W,Y) :- {w(W,Y)}. ^D

  1. Compile a rulefile:

$ python src/scripts/compile.py serialize test.ppr | tee test.wam 0 comment predict(-1,-2) :- hasWord(-1,-3), isLabel(-2), related(-3,-2) {r} #v:['X', 'Y', 'W']. 1 predict/2 allocate 3 ['W', 'Y', 'X'] 2 initfreevar -1 -2 3 initfreevar -2 -1 4 fclear 5 fpushstart r 0 6 freport 7 pushboundvar -1 8 pushfreevar -3 9 callp hasWord/2 10 pushboundvar -2 11 callp isLabel/1 12 pushboundvar -3 13 pushboundvar -2 14 callp related/2 15 returnp 16 comment related(-1,-2) :- {w(-1,-2)} #v:['W', 'Y']. 17 related/2 allocate 2 ['Y', 'W'] 18 initfreevar -1 -2 19 initfreevar -2 -1 20 fclear 21 fpushstart w 2 22 fpushboundvar -1 23 fpushboundvar -2 24 freport 25 returnp

  1. Write arity-2 facts in a database file as *.graph:

$ cat > test.graph hasWord dh a hasWord dh pricy hasWord dh doll hasWord dh house hasWord ft a hasWord ft little hasWord ft red hasWord ft fire hasWord ft truck hasWord rw a hasWord rw red hasWord rw wagon hasWord sc a hasWord sc pricy hasWord sc red hasWord sc sports hasWord sc car ... ^D

  1. Write arity-N facts in a database file as *.cfacts:

$ cat > test.cfacts isLabel neg isLabel pos ^D

  1. Write training examples:

$ cat > test_train.data predict(dh,Y) -predict(dh,neg) +predict(dh,pos) predict(ft,Y) -predict(ft,neg) +predict(ft,pos) predict(rw,Y) -predict(rw,neg) +predict(rw,pos) predict(sc,Y) -predict(sc,neg) +predict(sc,pos) ... ^D

  1. Ground training examples:

$ java -cp conf:bin:lib/* edu.cmu.ml.proppr.Grounder --programFiles test.wam:test.graph:test.cfacts --queries testtrain.data --grounded testtrain.grounded Time 461 msec Done.

  1. Train parameters:

$ java -cp conf:bin:lib/* edu.cmu.ml.proppr.Trainer --train test_train.grounded --params test.wts INFO [Trainer] edu.cmu.ml.proppr.util.ModuleConfiguration: Walker: edu.cmu.ml.proppr.learn.L2PosNegLossTrainedSRW Trainer: edu.cmu.ml.proppr.Trainer Weighting Scheme: edu.cmu.ml.proppr.learn.tools.ReLUWeightingScheme

INFO [Trainer] Training model parameters on test_train.grounded... INFO [Trainer] epoch 1 ... INFO [Trainer] epoch 2 ... INFO [Trainer] epoch 3 ... INFO [Trainer] epoch 4 ... INFO [Trainer] epoch 5 ... INFO [Trainer] Finished training in 650 ms INFO [Trainer] Saving parameters to test.wts...

  1. Write testing examples:

$ cat > test_testing.data predict(pb,Y) -predict(pb,neg) +predict(pb,pos) predict(yc,Y) -predict(yc,neg) +predict(yc,pos) predict(rb2,Y) -predict(rb2,neg) +predict(rb2,pos) ... ^D

  1. Get untrained rankings:

$ java -cp conf:bin:lib/* edu.cmu.ml.proppr.QueryAnswerer --programFiles test.wam:test.graph:test.cfacts --queries test_testing.data --solutions pre.testing.solutions.txt edu.cmu.ml.proppr.QueryAnswerer.QueryAnswererConfiguration: Prover: edu.cmu.ml.proppr.prove.DprProver Weighting Scheme: edu.cmu.ml.proppr.learn.tools.ReLUWeightingScheme

INFO [QueryAnswerer] Running queries from test_testing.data; saving results to pre.testing.solutions.txt INFO [QueryAnswerer] Querying: predict(pb,-1) #v:[?]. INFO [QueryAnswerer] Writing 2 solutions... INFO [QueryAnswerer] Querying: predict(yc,-1) #v:[?]. INFO [QueryAnswerer] Writing 2 solutions... INFO [QueryAnswerer] Querying: predict(rb2,-1) #v:[?]. INFO [QueryAnswerer] Writing 2 solutions... INFO [QueryAnswerer] Querying: predict(rp,-1) #v:[?]. INFO [QueryAnswerer] Writing 2 solutions... INFO [QueryAnswerer] Querying: predict(bp,-1) #v:[?]. INFO [QueryAnswerer] Writing 2 solutions... INFO [QueryAnswerer] Querying: predict(he,-1) #v:[?]. INFO [QueryAnswerer] Writing 2 solutions... INFO [QueryAnswerer] Querying: predict(wt,-1) #v:[?]. INFO [QueryAnswerer] Writing 2 solutions...

  1. Measure untrained performance:

$ python scripts/answermetrics.py --data test_testing.data --answers pre.testing.solutions.txt --metric mrr --metric recall

metric mrr (Mean Reciprocal Rank): averages 1/rank for all positive answers . micro: 0.5 . macro: 0.5 . details: . . predict(he,-1) #v:[?]. 0.5 . . predict(pb,-1) #v:[?]. 0.5 . . predict(yc,-1) #v:[?]. 0.5 . . predict(bp,-1) #v:[?]. 0.5 . . predict(rb2,-1) #v:[?]. 0.5 . . predict(wt,-1) #v:[?]. 0.5

. . predict(rp,-1) #v:[?]. 0.5

metric recall (Recall): fraction of positive examples that are proposed as solutions anywhere in the ranking . micro: 1.0 . macro: 1.0 . details: . . predict(he,-1) #v:[?]. 1.0 . . predict(pb,-1) #v:[?]. 1.0 . . predict(yc,-1) #v:[?]. 1.0 . . predict(bp,-1) #v:[?]. 1.0 . . predict(rb2,-1) #v:[?]. 1.0 . . predict(wt,-1) #v:[?]. 1.0 . . predict(rp,-1) #v:[?]. 1.0

  1. Get trained rankings (note --params; --solutions):

$ java -cp conf:bin:lib/* edu.cmu.ml.proppr.QueryAnswerer --programFiles test.wam:test.graph:test.cfacts --queries test_testing.data --solutions post.testing.solutions.txt --params test.wts edu.cmu.ml.proppr.QueryAnswerer.QueryAnswererConfiguration: Prover: edu.cmu.ml.proppr.prove.DprProver Weighting Scheme: edu.cmu.ml.proppr.learn.tools.ReLUWeightingScheme

INFO [QueryAnswerer] Running queries from test_testing.data; saving results to post.testing.solutions.txt INFO [QueryAnswerer] Querying: predict(pb,-1) #v:[?]. INFO [QueryAnswerer] Writing 2 solutions... INFO [QueryAnswerer] Querying: predict(yc,-1) #v:[?]. INFO [QueryAnswerer] Writing 2 solutions... INFO [QueryAnswerer] Querying: predict(rb2,-1) #v:[?]. INFO [QueryAnswerer] Writing 2 solutions... INFO [QueryAnswerer] Querying: predict(rp,-1) #v:[?]. INFO [QueryAnswerer] Writing 2 solutions... INFO [QueryAnswerer] Querying: predict(bp,-1) #v:[?]. INFO [QueryAnswerer] Writing 2 solutions... INFO [QueryAnswerer] Querying: predict(he,-1) #v:[?]. INFO [QueryAnswerer] Writing 2 solutions... INFO [QueryAnswerer] Querying: predict(wt,-1) #v:[?]. INFO [QueryAnswerer] Writing 2 solutions...

  1. Measure trained performance:

$ python scripts/answermetrics.py --data test_testing.data --answers post.testing.solutions.txt --metric mrr --metric recall

metric mrr (Mean Reciprocal Rank): averages 1/rank for all positive answers . micro: 1.0 . macro: 1.0 . details: . . predict(he,-1) #v:[?]. 1.0 . . predict(pb,-1) #v:[?]. 1.0 . . predict(yc,-1) #v:[?]. 1.0 . . predict(bp,-1) #v:[?]. 1.0 . . predict(rb2,-1) #v:[?]. 1.0 . . predict(wt,-1) #v:[?]. 1.0

. . predict(rp,-1) #v:[?]. 1.0

metric recall (Recall): fraction of positive examples that are proposed as solutions anywhere in the ranking . micro: 1.0 . macro: 1.0 . details: . . predict(he,-1) #v:[?]. 1.0 . . predict(pb,-1) #v:[?]. 1.0 . . predict(yc,-1) #v:[?]. 1.0 . . predict(bp,-1) #v:[?]. 1.0 . . predict(rb2,-1) #v:[?]. 1.0 . . predict(wt,-1) #v:[?]. 1.0 . . predict(rp,-1) #v:[?]. 1.0

==============================================

ProPPR: PROGRAMMING WITH PERSONALIZED PAGERANK

This is a Java package for using graph walk algorithms to perform inference tasks over local groundings of first-order logic programs. The package makes use of parallelization to substantially speed processing, making it practical even for large databases.

Contents: 1. Build 2. Run 2.0. Overview of Java main() classes 2.0.0. Grounder: Construct a proof graph for each query 2.0.1. Trainer: Train feature weights on the proof graphs 2.0.2. QueryAnswerer: Generate [un]trained ranked candidate solutions for queries 2.1. Utilities 2.1.0. compiler.py: Convert ProPPR rulefiles (.ppr) to WAM instructions (.wam) 2.1.1. answermetrics.py: Measure performance 2.1.2. sparseGraphTools: Construct memory- and CPU-efficient ProPPR databases 3. Use 3.0. Developing a ProPPR program and database 3.1. Typical workflow for experiments

1. BUILD

ProPPR $ ant clean build

2. RUN

For all run phases, control logging output using conf/log4j.properties.

2.0. RUN: JAVA MAIN CLASSES

edu.cmu.ml.proppr.Grounder edu.cmu.ml.proppr.Trainer edu.cmu.ml.proppr.QueryAnswerer

2.0.0. RUN: MAIN CLASSES: GROUNDER

$ java -cp conf:bin:lib/* edu.cmu.ml.proppr.Grounder --queries \ inputFile --grounded outputFile.grounded --programFiles \ file.wam:file.cfacts:file.graph [--ternaryIndex true|false] \ [--threads integer] [--prover ppr[:depth] | \ dpr[:eps[:alph[:strat]]] | tr[:depth] ]

Grounder will read the list of queries from inputFile, the WAM program from file.wam, and the various database plugin files file.cfacts and file.graph, and produce the proof graph for each query in outputFile.grounded.

Optional parameters:

  • If your database contains facts of arity 3 or more, use

    --ternaryIndex true
    to spend some memory and increase the speed of lookups.
  • If you are on a multi-core machine, set --threads up to (#cores-2) to ground queries in parallel (one thread is used as the controller, one for writing output, and the others are worker threads).

  • The default prover is dpr:1e-4:0.1, which will fail in graphs with a maximum out degree >10. Reduce alpha to 1/(max out degree) to suit your dataset.

2.0.1. RUN: MAIN CLASSES: TRAINER

$ java -cp conf:bin:lib/* edu.cmu.ml.proppr.Trainer --train inputFile.grounded --params outputFile [--threads integer] [--epochs integer] [--traceLosses] [--force] [--weightingScheme linear | sigmoid | tanh | ReLU | exp]

Trainer will read the grounded proof graphs from inputFile.grounded, then perform stochastic gradient descent to optimize the weights assigned to the edge labels, storing the resulting parameter vector in outputFile.

Optional arguments:

  • If you are on a multicore machine, specify --threads up to (#cores-2) and ProPPR will process examples in parallel (1 controller thread, 1 thread for managing output, N worker threads). Training is threadsafe, but currently (fall 2014) programs with a small number of non-db features (or a large number of db lookups) may experience reduced parallelization speedup due to resource contention.

    • Increase or decrease the number of training iterations using --epochs.
    • Turn on --traceLosses to view a readout of log loss and regularization loss every epoch.
    • Turn on --force to use different settings for training than were used for grounding (not recommended unless you know what you're doing)
    • Set --weightingScheme to your desired wrapper function, controlling how the weight of an edge is computed from its features.

2.2. RUN: UTILITIES

2.2.0. RUN: UTILITIES: QUERYANSWERER

If you want to use a program to answer a series of queries, you can use the QueryAnswerer class. If you are running this step you should already have a compiled program and a file containing a list of queries, one per line. Each query is a single goal.

ProPPR $ cat testcases/family.queries sim(william,X) sim(rachel,X)

ProPPR $ java edu.cmu.ml.proppr.QueryAnswerer \ --programFiles testcases/family.cfacts:testcases/family.crules \ --queries testcases/family.queries --output answers.txt INFO [Component] Loading from file 'testcases/family.cfacts' with alpha=0.0 ... INFO [Component] Loading from file 'testcases/family.crules' with alpha=0.0 ...

ProPPR $ cat answers.txt

proved sim(william,-1) 47 msec

1 0.8838968104504825 -1=c[william] 2 0.035512510088781264 -1=c[lottie] 3 0.035512510088781264 -1=c[rachel] 4 0.035512510088781264 -1=c[sarah] 5 0.002391414820793351 -1=c[poppy] 6 0.0017935611155950133 -1=c[lucas] 7 0.0017935611155950133 -1=c[charlotte] 8 0.0017935611155950133 -1=c[caroline] 9 0.0017935611155950133 -1=c[elizabeth]

proved sim(rachel,-1) 18 msec

1 0.9094251636624519 -1=c[rachel] 2 0.0452874181687741 -1=c[caroline] 3 0.0452874181687741 -1=c[elizabeth]

2.2.1. RUN: UTILITIES: PROMPT

An interactive prompt can be useful while debugging logic program issues, because you can examine a single query in detail. If you are running this step you should already have a compiled program.

Starting up the prompt:

""" ProPPR $ java -cp conf/:bin/:lib/* edu.cmu.ml.proppr.prove.Prompt --programFiles ${PROGRAMFILES%:} Starting up beanshell... prv set: [email protected] INFO [Component] Loading from file 'kbpprototype/doc.crules' with alpha=0.0 ... INFO [Component] Loading from file 'kbpprototype/kb.cfacts' with alpha=0.0 ... INFO [Component] Loading from file 'kbpprototype/lppredicateSFENG_001-50doc.graph' with alpha=0.0 ... lp set: [email protected]

Type 'help();' for help, 'quit();' to quit; 'list();' for a variable listing.

BeanShell 2.0b4 - by Pat Niemeyer ([email protected]) bsh % """

When it starts up, Prompt instantiates the logic program from the command line as 'lp', and a default prover which prints a depth-first-search-style proof of a query (default maximum depth is 5). You can specify a different prover on the command line if you wish. For information on built-in commands and interpreter syntax, type 'help();':

""" bsh % help(); This is a beanshell, a command-line interpreter for java. A full beanshell manual is available at http://www.beanshell.org/manual/contents.html.

Type java statements and expressions at the prompt. Don't forget semicolons.

Type 'help();' for help, 'quit();' to quit; 'list();' for a variable listing.

'show();' will toggle automatic printing of the results of expressions. Otherwise you must use 'print( expr );' to see results.

'javap( x );' will list the fields and methods available on an object. Be warned; beanshell has trouble locating methods that are only defined on the superclass.

'[sol = ]run(prover,logicprogram,"functor(arg,arg,...,arg)")' will prove the associated state.

'pretty(sol)' will list solutions first, then intermediate states in descending weight order.

bsh % """

3. FILE FORMATS

****** File format: *.rules

Example: predict(X,Y) :- hasWord(X,W),isLabel(Y),related(W,Y) #r. related(W,Y) :- # w(W,Y).

Grammar: line= rhs ':-' lhs ('#' featureList)? '.' rhs= goal lhs= |= goal (',' goal)* featureList= |= goal (',' goal)* goal= functor |= functor '(' argList ')' argList= constantArgList |= variableArgList |= constantArgList ',' variableArgList constantArgList= constantArg (',' constantArg)* variableArgList= variableArg (',' variableArg)* constantArg= [a-z][a-zA-Z0-9]* variableArg= [A-Z][a-zA-Z0-9]* functor= [a-z][a-zA-Z0-9]*

****** File format: *.facts

Example: isLabel(pos) isLabel(neg)

Grammar: line= goal

****** File format: *.graph

Example: hasWord bk punk hasWord bk queen hasWord bk barbie hasWord bk and hasWord bk ken hasWord rb a hasWord rb little hasWord rb red hasWord rb bike hasWord mv a hasWord mv big hasWord mv 7-seater hasWord mv minivan hasWord mv with hasWord mv an hasWord mv automatic hasWord mv transmission hasWord hs a hasWord hs big hasWord hs house hasWord hs in hasWord hs the hasWord hs suburbs hasWord hs with hasWord hs crushing hasWord hs mortgage

Grammar: line= edge '\t' sourcenode '\t' destnode edge= functor sourcenode,destnode= constantArg

****** File format: *.data

Example: predict(bk,Y) -predict(bk,neg) +predict(bk,pos) predict(rb,Y) -predict(rb,neg) +predict(rb,pos) predict(mv,Y) +predict(mv,neg) -predict(mv,pos) predict(hs,Y) +predict(hs,neg) -predict(hs,pos)

Grammar: line= query '\t' exampleList query= goal exampleList= example ('\t' example)* example= positiveExample |= negativeExample positiveExample= '+' goal negativeExample= '-' goal

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.