spookystuff

by tribbloid

tribbloid / spookystuff

Scalable query engine for web scrapping/data mashup/acceptance QA, powered by Apache Spark

129 Stars 33 Forks Last release: Not found Apache License 2.0 1.7K Commits 1 Releases

Available items

No Items, yet!

The developer of this repository has not created any items for sale yet. Need a bug fixed? Help with integration? A different license? Create a request here:

Latest doc already moved to:

http://tribbloid.github.io/spookystuff/

SpookyStuff

Codeship Status for tribbloid/spookystuff Join the chat at https://gitter.im/tribbloid/spookystuff

... is a scalable query engine for web scraping/data integration/acceptance QA. The goal is to allow the Web being queried and ETL'ed like a relational database.

SpookyStuff is the fastest big data collection engine in history, with a speed record of querying 330404 dynamic pages per hour on 300 cores.

SpookyStuff-UAV (alpha component)

Build Status Join the chat at https://gitter.im/spookystuff-UAV/Lobby

... allows the same engine to be used to control a swarm of aerial robots for photogrammatry and sensor data acquisition. It is still a work in progress, please refer to this proposal for a feature and implementation overview.

Powered by

  • Apache Spark
  • Selenium
  • JSoup
  • Apache Tika
  • Apache Maven
  • PhantomJS/GhostDriver
  • (UAV) MAVLink

Apache Spark Selenium Apache Tika Apache Maven PhantomJS MAVLink

License

Copyright © 2014 by Peng Cheng @tribbloid, Sandeep Singh @techaddict, Terry Lin @ithinkicancode, Long Yao @l2yao and contributors.

Published under ASF License, see LICENSE.

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.