robobrowser

by jmcarp

jmcarp / robobrowser
3.5K Stars 326 Forks Last release: Not found BSD 3-Clause "New" or "Revised" License 103 Commits 6 Releases

Available items

No Items, yet!

The developer of this repository has not created any items for sale yet. Need a bug fixed? Help with integration? A different license? Create a request here:

RoboBrowser: Your friendly neighborhood web scraper

.. image:: https://badge.fury.io/py/robobrowser.png :target: http://badge.fury.io/py/robobrowser

.. image:: https://travis-ci.org/jmcarp/robobrowser.png?branch=master :target: https://travis-ci.org/jmcarp/robobrowser

.. image:: https://coveralls.io/repos/jmcarp/robobrowser/badge.png?branch=master :target: https://coveralls.io/r/jmcarp/robobrowser

Homepage:

http://robobrowser.readthedocs.org/ 
_

RoboBrowser is a simple, Pythonic library for browsing the web without a standalone web browser. RoboBrowser can fetch a page, click on links and buttons, and fill out and submit forms. If you need to interact with web services that don't have APIs, RoboBrowser can help.

.. code-block:: python

import re
from robobrowser import RoboBrowser

Browse to Genius

browser = RoboBrowser(history=True) browser.open('http://genius.com/')

Search for Porcupine Tree

form = browser.get_form(action='/search') form # form['q'].value = 'porcupine tree' browser.submit_form(form)

Look up the first song

songs = browser.select('.song_link') browser.follow_link(songs[0]) lyrics = browser.select('.lyrics') lyrics[0].text # \nHear the sound of music ...

Back to results page

browser.back()

Look up my favorite song

song_link = browser.get_link('trains') browser.follow_link(song_link)

Can also search HTML using regex patterns

lyrics = browser.find(class_=re.compile(r'\blyrics\b')) lyrics.text # \nTrain set and match spied under the blind...

RoboBrowser combines the best of two excellent Python libraries:

Requests 
_ and
BeautifulSoup 
_. RoboBrowser represents browser sessions using Requests and HTML responses using BeautifulSoup, transparently exposing methods of both libraries:

.. code-block:: python

import re
from robobrowser import RoboBrowser

browser = RoboBrowser(user_agent='a python robot') browser.open('https://github.com/')

Inspect the browser session

browser.session.cookies['_gh_sess'] # BAh7Bzo... browser.session.headers['User-Agent'] # a python robot

Search the parsed HTML

browser.select('div.teaser-icon') # [

# #
, # ... browser.find(class_=re.compile(r'column', re.I)) #

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.