warc

by internetarchive

internetarchive / warc

Python library for reading and writing warc files

202 Stars 107 Forks Last release: Not found GNU General Public License v2.0 94 Commits 0 Releases

Available items

No Items, yet!

The developer of this repository has not created any items for sale yet. Need a bug fixed? Help with integration? A different license? Create a request here:

warc: Python library to work with WARC files

.. image:: https://secure.travis-ci.org/anandology/warc.png?branch=master :alt: build status :target: http://travis-ci.org/anandology/warc

WARC (Web ARChive) is a file format for storing web crawls.

http://bibnum.bnf.fr/WARC/

This

warc
library makes it very easy to work with WARC files.::
import warc
f = warc.open("test.warc")
for record in f:
    print record['WARC-Target-URI'], record['Content-Length']

Documentation

The documentation of the warc library is available at http://warc.readthedocs.org/.

License

This software is licensed under GPL v2. See LICENSE_ file for details.

.. LICENSE: http://github.com/internetarchive/warc/blob/master/LICENSE

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.