Simple Python library to parse and interact with unified diff data.

Installing unidiff


$ pip install unidiff

Quick start


>>> import urllib.request
>>> from unidiff import PatchSet
>>> diff = urllib.request.urlopen('')
>>> encoding = diff.headers.get_charsets()[0]
>>> patch = PatchSet(diff, encoding=encoding)
>>> patch
, , ]>
>>> patch[0]

>>> patch[0].is_added_file
>>> patch[0].added
>>> patch[1]

>>> patch[1].added, patch[1].removed
(20, 11)
>>> len(patch[1])
>>> patch[1][2]

>>> patch[2]

>>> print(patch[2])
diff --git a/unidiff/ b/unidiff/
index eae63e6..29c896a 100644
--- a/unidiff/
+++ b/unidiff/
@@ -37,4 +37,3 @@
# - deleted line
# \ No newline case (ignore)
RE_HUNK_BODY_LINE = re.compile(r'^([- \+\\])')

Load unified diff data by instantiating :code:

with a file-like object as argument, or using :code:
class method to read diff from file.

A :code:

is a list of files updated by the given patch. For each :code:
you can get stats (if it is a new, removed or modified file; the source/target lines; etc), besides having access to each hunk (also like a list) and its respective info.

At any point you can get the string representation of the current object, and that will return the unified diff data of it.

As a quick example of what can be done, check bin/unidiff file.

Also, once installed, unidiff provides a command-line program that displays information from diff data (a file, or stdin). For example:


$ git diff | unidiff
------- +6 additions, -0 deletions

1 modified file(s), 0 added file(s), 0 removed file(s) Total: 6 addition(s), 0 deletion(s)

Load a local diff file

To instantiate :code:

from a local file, you can use:


>>> from unidiff import PatchSet
>>> patch = PatchSet.from_filename('tests/samples/bzr.diff', encoding='utf-8')
>>> patch
, , ]>

Notice the (optional) :code:

parameter. If not specified, unicode input will be expected. Or alternatively:


>>> import codecs
>>> from unidiff import PatchSet
>>> with'tests/samples/bzr.diff', 'r', encoding='utf-8') as diff:
...     patch = PatchSet(diff)
>>> patch
, , ]>

Finally, you can also instantiate :code:

passing any iterable (and encoding, if needed):


>>> from unidiff import PatchSet
>>> with open('tests/samples/bzr.diff', 'r') as diff:
...     data = diff.readlines()
>>> patch = PatchSet(data)
>>> patch
, , ]>

If you don't need to be able to rebuild the original unified diff input, you can pass :code:

(defaults to :code:
), which should help making the parsing more efficient:


>>> from unidiff import PatchSet
>>> patch = PatchSet.from_filename('tests/samples/bzr.diff', encoding='utf-8', metadata_only=True)



