Quality information extraction at web scale.
** DEPRECATED! ** Please see https://github.com/dair-iitd/OpenIE-standalone, which has combined multiple projects into a single project and maintains the latest version of Open IE (Open IE 5). It is based on another repository https://github.com/allenai/openie-standalone, which has an older version of Open IE.
This project contains the principal Open Information Extraction (Open IE) system from the University of Washington (UW). An Open IE system runs over sentences and creates extractions that represent relations in text. For example, consider the following sentence.
The U.S. president Barack Obama gave his speech on Tuesday to thousands of people.
There are many binary relations in this sentence that can be expressed as a triple
(A, B, C)where
Bare arguments, and
Cis the relation between those arguments. Since Open IE is not aligned with an ontology, the relation is a phrase of text. Here is a possible list of the binary relations in the above sentence:
(Barack Obama, is the president of, the U.S.) (Barack Obama, gave, his speech) (Barack Obama, gave his speech, on Tuesday) (Barack Obama, gave his speech, to thousands of people)
The first extraction in the above list is a "noun-mediated extraction", because the extraction has a relation phrase is described by the noun "president". The other extractions are very similar. In fact, they can be represented more informatively as an n-ary extraction. An n-ary extraction can have 0 or more secondary arguments. Here is a possible list of the n-ary relations in the sentence:
(Barack Obama, is the president of, the U.S.) (Barack Obama, gave, [his speech, on Tuesday, to thousands of people])
Extractions can include more than just the arguments and relation as well. For example, we might be interested in whether the extraction is a negative assertion or a positive assertion, or if it is conditional in some way. Consider the following sentence:
Some people say Barack Obama was born in Kenya.
We would not want to extract that
(Barack Obama, was born, in Kenya)alone because this is not true. However, if we have the condition as well, we can have a correct extraction.
Some people say:(Barack Obama, was born in, Kenya)
To see an example of Open IE being used, please visit http://openie.cs.washington.edu/.
Open IE 4 is a combination of SRLIE and Relnoun. The closest papers for these two are:
A survey paper summarizing about ten years of progress in Open IE:
Open IE 4.x is the successor to Ollie. Whereas Ollie used bootstrapped dependency parse paths to extract relations (see Open Language Learning for Information Extraction), Open IE 4.x uses similar argument and relation expansion heuristics to create Open IE extractions from SRL frames. Open IE 4.x also extends the defintion of Open IE extractions to include n-ary extractions (extractions with 0 or more arguments 2s).
openieuses java-7-openjdk & the sbt build system, so downloading dependencies and compiling is simple. Just run:
You can run
openiewith sbt or create a stand-alone jar.
openierequires substantial memory.
sbtis configured to use these options by default:
sbt 'run-main edu.knowitall.openie.OpenIECli'
First create the stand-alone jar.
sbt clean compile assembly
You may need to add the above memory options.
sbt -J-Xmx2700M clean compile assembly
Then you can run the resulting jar file as normal.
java -jar openie-assembly.jar
You may need to add the above memory options.
java -Xmx4g -XX:+UseConcMarkSweepGC -jar openie-assembly.jar
openietakes one sentence per line unless
--splitis specified. If
--splitis specified, the input text will be split into sentences. You can either pipe input from Standard Input, specify an input file (an option first argument), or type sentences interactively. Output will be written to Standard Output unless a second option argument is specified for an output file.
openietakes a number of command line arguments. To see them all run
java -jar openie-assembly.jar --usage. Of particular interest are
--ignore-errorswhich continues running even if an exception is encountered,
--binarywhich gives the binary(triples) output and
--splitwhich splits the input document text into sentences.
There are two formats--a simple format made for ease of reading and a columnated format used for machine processing. The format can be specified with either
--format column. The simple format is chosen by default.
A simple java demo which uses openIE (https://github.com/OpenIE-HelperCodes/OpenIEDemo1)
> John ran down the road to fetch a pail of water. John ran down the road to fetch a pail of water. 0.86 (John; ran; down the road; to fetch a pail of water) 0.82 John ran:(John; ran down the road to fetch; a pail of water)
Columns are separated by tab, making it hard to read in this README.
0.8576784836790008 John ran down the road; to fetch a pail of water John ran down the road to fetch a pail of water. 0.8195727266148489 John ran John ran down the road to fetch a pail of water John ran down the road to fetch a pail of water.