Need help with Goutte?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

About the developer

FriendsOfPHP
8.2K Stars 955 Forks MIT License 317 Commits 123 Opened issues

Description

Goutte, a simple PHP Web Scraper

Services available

!
?

Need anything else?

Contributors list

Goutte, a simple PHP Web Scraper

Goutte is a screen scraping and web crawling library for PHP.

Goutte provides a nice API to crawl websites and extract data from the HTML/XML responses.

Requirements

Goutte depends on PHP 7.1+.

Installation

Add

fabpot/goutte
as a require dependency in your
composer.json
file:

.. code-block:: bash

composer require fabpot/goutte

Usage

Create a Goutte Client instance (which extends

Symfony\Component\BrowserKit\HttpBrowser
):

.. code-block:: php

use Goutte\Client;

$client = new Client();

Make requests with the

request()
method:

.. code-block:: php

// Go to the symfony.com website
$crawler = $client->request('GET', 'https://www.symfony.com/blog/');

The method returns a

Crawler
object (
Symfony\Component\DomCrawler\Crawler
).

To use your own HTTP settings, you may create and pass an HttpClient instance to Goutte. For example, to add a 60 second request timeout:

.. code-block:: php

use Goutte\Client;
use Symfony\Component\HttpClient\HttpClient;

$client = new Client(HttpClient::create(['timeout' => 60]));

Click on links:

.. code-block:: php

// Click on the "Security Advisories" link
$link = $crawler->selectLink('Security Advisories')->link();
$crawler = $client->click($link);

Extract data:

.. code-block:: php

// Get the latest post in this category and display the titles
$crawler->filter('h2 > a')->each(function ($node) {
    print $node->text()."\n";
});

Submit forms:

.. code-block:: php

$crawler = $client->request('GET', 'https://github.com/');
$crawler = $client->click($crawler->selectLink('Sign in')->link());
$form = $crawler->selectButton('Sign in')->form();
$crawler = $client->submit($form, ['login' => 'fabpot', 'password' => 'xxxxxx']);
$crawler->filter('.flash-error')->each(function ($node) {
    print $node->text()."\n";
});

More Information

Read the documentation of the

BrowserKit
,
DomCrawler
, and
HttpClient
_ Symfony Components for more information about what you can do with Goutte.

Pronunciation

Goutte is pronounced

goot
i.e. it rhymes with
boot
and not
out
.

Technical Information

Goutte is a thin wrapper around the following Symfony Components:

BrowserKit
,
CssSelector
,
DomCrawler
, and
HttpClient
.

License

Goutte is licensed under the MIT license.

..

Composer
: https://getcomposer.org .. _
BrowserKit
: https://symfony.com/components/BrowserKit .. _
DomCrawler
: https://symfony.com/doc/current/components/dom
crawler.html ..
CssSelector
: https://symfony.com/doc/current/components/css
selector.html ..
HttpClient
: https://symfony.com/doc/current/components/http
client.html

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.