by alephdata

alephdata / memorious

Distributed crawling framework for documents and structured data.

230 Stars 41 Forks Last release: Not found MIT License 739 Commits 89 Releases

Available items

No Items, yet!

The developer of this repository has not created any items for sale yet. Need a bug fixed? Help with integration? A different license? Create a request here:



The solitary and lucid spectator of a multiform, instantaneous and almost intolerably precise world.

-- Funes the Memorious <http:>_, Jorge Luis Borges

.. image::

is a distributed web scraping toolkit. It is a light-weight tool that schedules, monitors and supports scrapers that collect structured or un-structured data. This includes the following use cases:
  • Maintain an overview of a fleet of crawlers
  • Schedule crawler execution in regular intervals
  • Store execution information and error messages
  • Distribute scraping tasks across multiple machines
  • Make crawlers modular and simple tasks re-usable
  • Get out of your way as much as possible

.. image:: docs/memorious-ui.png


When writing a scraper, you often need to paginate through through an index page, then download an HTML page for each result and finally parse that page and insert or update a record in a database.

handles this by managing a set of
, each of which can be composed of multiple
. Each
is implemented using a Python function, which can be re-used across different

The basic steps of writing a Memorious crawler:

  1. Make YAML crawler configuration file
  2. Add different stages
  3. Write code for stage operations (optional)
  4. Test, rinse, repeat


The documentation for Memorious is available at 
_. Feel free to edit the source files in the
folder and send pull requests for improvements.

To build the documentation, inside the

folder run
make html

You'll find the resulting HTML files in /docs/_build/html.

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.