Github url

Goutte

by FriendsOfPHP

FriendsOfPHP /Goutte

Goutte, a simple PHP Web Scraper

8.0K Stars 936 Forks Last release: about 5 years ago (v3.1.0) MIT License 312 Commits 24 Releases

Available items

No Items, yet!

The developer of this repository has not created any items for sale yet. Need a bug fixed? Help with integration? A different license? Create a request here:

Goutte, a simple PHP Web Scraper

Goutte is a screen scraping and web crawling library for PHP.

Goutte provides a nice API to crawl websites and extract data from the HTML/XML responses.

Requirements

Goutte depends on PHP 7.1+.

Installation

Add

fabpot/goutte

as a require dependency in your

composer.json

file:

.. code-block:: bash

composer require fabpot/goutte

Usage

Create a Goutte Client instance (which extends

Symfony\Component\BrowserKit\HttpBrowser

):

.. code-block:: php

use Goutte\Client; $client = new Client();

Make requests with the

request()

method:

.. code-block:: php

// Go to the symfony.com website $crawler = $client-\>request('GET', 'https://www.symfony.com/blog/');

The method returns a

Crawler

object (

Symfony\Component\DomCrawler\Crawler

).

To use your own HTTP settings, you may create and pass an HttpClient instance to Goutte. For example, to add a 60 second request timeout:

.. code-block:: php

use Goutte\Client; use Symfony\Component\HttpClient\HttpClient; $client = new Client(HttpClient::create(['timeout' =\> 60]));

Click on links:

.. code-block:: php

// Click on the "Security Advisories" link $link = $crawler-\>selectLink('Security Advisories')-\>link(); $crawler = $client-\>click($link);

Extract data:

.. code-block:: php

// Get the latest post in this category and display the titles $crawler-\>filter('h2 \> a')-\>each(function ($node) { print $node-\>text()."\n"; });

Submit forms:

.. code-block:: php

$crawler = $client-\>request('GET', 'https://github.com/'); $crawler = $client-\>click($crawler-\>selectLink('Sign in')-\>link()); $form = $crawler-\>selectButton('Sign in')-\>form(); $crawler = $client-\>submit($form, array('login' =\> 'fabpot', 'password' =\> 'xxxxxx')); $crawler-\>filter('.flash-error')-\>each(function ($node) { print $node-\>text()."\n"; });

More Information

Read the documentation of the

BrowserKit

_,

DomCrawler
```_ , and 

HttpClient

\_ Symfony Components for more information about what you can do with Goutte.
## Pronunciation

Goutte is pronounced

goot

 i.e. it rhymes with 

boot

 and not 

out

.
## Technical Information

Goutte is a thin wrapper around the following Symfony Components:

BrowserKit

_, 

CssSelector

DomCrawler

_, and 

HttpClient

## License

Goutte is licensed under the MIT license.

.. 
_```
Composer

: https://getcomposer.org .. _

BrowserKit

: https://symfony.com/components/BrowserKit .. _

DomCrawler

: https://symfony.com/doc/current/components/dom_crawler.html .. _``` CssSelector

: https://symfony.com/doc/current/components/css_selector.html .. 
_```
HttpClient

: https://symfony.com/doc/current/components/http_client.html

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.