(formerly om.el) A functional library for org-mode
org-ml-set-caption!have been merged with
org-ml-get/set/map-propertyand will be removed in a later revision
org-ml-get-(some-)subtreesare now depreciated. Use
Install from MELPA:
M-x package-install RET org-ml RET
Alternatively, clone this repository to somewhere in your load path:
git clone https://github.com/ndwarshuis/org-ml ~/somewhere/in/load/path
Then require in your emacs config:
Explicit versions noted above have been tested. Other versions may work but are not currently supported.
Org-mode comes with a powerful, built-in parse-tree generator specified in
org-element.el. The generated parse-tree is simply a heavily-nested list which can be easily manipulated using (mostly pure) functional code. This contrasts the majority of functions normally used to interface with org-mode files, which are imperative in nature (
outline-next-heading, etc) as they depend on the mutable state of Emacs buffers. In general, functional code is (arguably) more robust, readable, and testable, especially in use-cases such as this where a stateless abstract data structure is being transformed and queried.
org-element.elprovides a minimal API for handling this parse-tree in a functional manner, but does not provide higher-level functions necessary for intuitive, large-scale use. The
org-mlpackage is designed to provide this API. Furthermore, it is highly compatible with the
dash.elpackage, which is a generalized functional library for emacs-lisp.
Parsing a buffer with the function
org-element-parse-bufferwill yield a parse tree composed of nodes. Nodes have types and properties associated with them. See the org-element API documentation for a list of all node types and their properties (also see the terminology conventions and property omissions used in this package).
Each node is represented by a list where the first member is the type and the second member is a plist describing the node's properties:
(type (:prop1 value1 :prop2 value2 ...))
Node types may be either leaves or branches, where branches may have zero or more child nodes and leaves may not have child nodes at all. Leaves will always have lists of the form shown above. Branches, on the other hand, have their children appended to the end:
(type (:prop1 value1 :prop2 value2) child1 child2 ...)
In addition to leaves and branches, node types can belong to one of two classes: - Objects: roughly correspond to raw, possibly-formatted text - Elements: more complex structures which may be built from objects
Within the branch node types, there are restrictions of which class is allowed to be a child depending on the type. There are three of these restrictions: - Branch element with child elements (aka 'greater elements'): these are element types that are generally nestable inside one another (eg headlines, plain-lists, items) - Branch elements with child objects (aka 'object containers'): these are element types that hold textual information (eg paragraph) - Branch objects with child objects (aka 'recursive objects'): these are object types used primarily for text formating (bold, italic, underline, etc)
Note: it is never allowed for an element type to be a child of a branch object type.
This package takes several deviations from the original terminology found in
org-element.el. - 'node' is used here to describe a vertex in the parse tree, where 'element' and 'object' are two classes used to describe said vertex (
org-element.elseems to use 'element' to generally mean 'node' and uses 'object' to further specify) - 'child' and 'children' are used here instead of 'content' and 'contents' - 'branch' is used here instead of 'container'. Furthermore, 'leaf' is used to describe the converse of 'branch' (there does not seem to be an equivalent term in
org-element.eluses 'attribute(s)' and 'property(ies)' interchangeably to describe nodes; here only 'property(ies)' is used
All properties specified by
org-element.elare readable by this API (eg one can query them with functions like
post-affiliatedare not settable by this API as they are not necessary for manipulating the textual representation of the parse tree. In addition to these, some properties unique to certain types are not settable for the same reason. Each type's build function describes the properties that are settable.
Each function that operates on an element/object will take the element/object as its right-most argument. This allows convenient function chaining using
dash.el's right-threading operators (
-some->>). The examples in the API reference almost exclusively demonstrate this pattern. Additionally, the right-argument convention also allows convenient partial application using
Higher-order functions (functions that take other functions as arguments) have two forms. The first takes a (usually unary) function and applies it:
(om-map-property :value (lambda (s) (concat "foo" s)) node) (om-map-property :value (-partial concat "foo") node)
This can equivalently be written using an anaphoric form where the original function name is appended with
*. The symbol
itcarries the value of the unary argument (unless otherwise specified):
(om-map-property* :value (concat "foo" it) node)
All functions that read and write from buffers are named like
OPERATIONis some operation to be performed on
THINGin the current buffer. All these functions take
pointas one of their arguments to denote where in the buffer to perform
All of these functions have current-point convenience analogues that are named as
THINGcarry the same meaning, but
OPERATIONis done at the current point and
pointis not an argument to the function.
For the sake of brevity, only the former form of these functions are given in the API reference.
For comprehensive documentation of all available functions see the API reference.
By default, the org-element API does not parse timestamp habits. This means that if you parse an org-mode buffer with timestamp habits and try to convert it back to a string, the habits will be lost.
org-mlhas a wrapper function to add this functionality; enable it by setting
org-ml-parse-habitsto t. Since habits are an extension of timestamp repeaters, this option will also impact the behavior of
org-ml-timestamp-map-repeater(see their docstrings for details).
Benchmarking this library is still in the early stages.
Intuitively, the most costly operations are going to be those that go back-and-forth between raw buffer text (here called "buffer space") and its node representations (here called "node space") since those involve complicated string formating, regular expressions, buffer searching, etc (examples:
org-ml-update-this-THINGand friends). Once the data is in node space, execution should be very fast since nodes are just lists. Thus if you have performance-intensive code that requires many small edits to org-mode files, it might be better to use org-mode's build-in functions. On the other hand, if most of the complicated processing can be done in node space while limiting the number of conversions to/from buffer space,
org-mlwill be much faster.
To be more scientific, the current tests in the suite (see here) seem to support the following conclusions when comparing
org-mlto equivalent code written using built-in org-mode functions (in line with the intuitions above): * reading data (a one way conversion from buffer to node space) is up to an order of magnitude slower, specifically when the data to be obtained isn't very large (eg, reading the TODO state from a headline) * manipulating text (going from buffer to node space, then modifying the node, then going back to buffer space) is several times slower for single modifications (eg setting the TODO state of a headline) * larger numbers of manipulations on one node at once are faster (eg changing the TODO state, setting a property, and setting a SCHEDULED timestamp on a headline)
To run the benchmark suite:
For all pattern-matching functions (eg
PATTERNparameter is processed into a lambda function which computationally carries out the pattern matching. If there are many calls using the same or a few unique patterns, this lambda-generation overhead may be memoized by setting
org-ml-memoize-match-patterns. See this varible's documentation for details.