Need help with python-unidiff?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

About the developer

137 Stars 40 Forks MIT License 102 Commits 17 Opened issues


Unified diff python parsing/metadata extraction library

Services available


Need anything else?

Contributors list


Simple Python library to parse and interact with unified diff data.

.. image:: :target:

Installing unidiff


$ pip install unidiff

Quick start


>>> import urllib.request
>>> from unidiff import PatchSet
>>> diff = urllib.request.urlopen('')
>>> encoding = diff.headers.get_charsets()[0]
>>> patch = PatchSet(diff, encoding=encoding)
>>> patch
, , ]>
>>> patch[0]

>>> patch[0].is_added_file
>>> patch[0].added
>>> patch[1]

>>> patch[1].added, patch[1].removed
(20, 11)
>>> len(patch[1])
>>> patch[1][2]

>>> patch[2]

>>> print(patch[2])
diff --git a/unidiff/ b/unidiff/
index eae63e6..29c896a 100644
--- a/unidiff/
+++ b/unidiff/
@@ -37,4 +37,3 @@
# - deleted line
# \ No newline case (ignore)
RE_HUNK_BODY_LINE = re.compile(r'^([- \+\\])')

Load unified diff data by instantiating :code:

with a file-like object as argument, or using :code:
class method to read diff from file.

A :code:

is a list of files updated by the given patch. For each :code:
you can get stats (if it is a new, removed or modified file; the source/target lines; etc), besides having access to each hunk (also like a list) and its respective info.

At any point you can get the string representation of the current object, and that will return the unified diff data of it.

As a quick example of what can be done, check bin/unidiff file.

Also, once installed, unidiff provides a command-line program that displays information from diff data (a file, or stdin). For example:


$ git diff | unidiff
------- +6 additions, -0 deletions

1 modified file(s), 0 added file(s), 0 removed file(s) Total: 6 addition(s), 0 deletion(s)

Load a local diff file

To instantiate :code:

from a local file, you can use:


>>> from unidiff import PatchSet
>>> patch = PatchSet.from_filename('tests/samples/bzr.diff', encoding='utf-8')
>>> patch
, , ]>

Notice the (optional) :code:

parameter. If not specified, unicode input will be expected. Or alternatively:


>>> import codecs
>>> from unidiff import PatchSet
>>> with'tests/samples/bzr.diff', 'r', encoding='utf-8') as diff:
...     patch = PatchSet(diff)
>>> patch
, , ]>

Finally, you can also instantiate :code:

passing any iterable (and encoding, if needed):


>>> from unidiff import PatchSet
>>> with open('tests/samples/bzr.diff', 'r') as diff:
...     data = diff.readlines()
>>> patch = PatchSet(data)
>>> patch
, , ]>

If you don't need to be able to rebuild the original unified diff input, you can pass :code:

(defaults to :code:
), which should help making the parsing more efficient:


>>> from unidiff import PatchSet
>>> patch = PatchSet.from_filename('tests/samples/bzr.diff', encoding='utf-8', metadata_only=True)



We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.