Need help with scraper?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

About the developer

causal-agent
641 Stars 45 Forks ISC License 196 Commits 10 Opened issues

Description

HTML parsing and querying with CSS selectors

Services available

!
?

Need anything else?

Contributors list

# 143,576
vhdl
verilog
expanda...
migrate
153 commits
# 97,034
cms-fra...
iot-fra...
emacs-l...
Emacs
6 commits
# 27,886
Rust
Django
Electro...
rust-la...
5 commits
# 585,573
Rust
3 commits
# 39,326
Symfony
iphone
Django
rhel
3 commits
# 111,028
jsonweb...
golang
rust-cr...
Koa
3 commits
# 152,467
gtk
Perl
x11-wm
i3-gaps
2 commits
# 655,888
TypeScr...
CSS
Shell
1 commit
# 563,220
Rust
seleniu...
1 commit
# 310,210
C
Shell
irc
ircv3
1 commit
# 336,125
winapi
ffi
Windows
Shell
1 commit
# 131,214
PureScr...
superco...
Emacs
embedde...
1 commit
# 103,402
Django
Shell
cpluspl...
elm
1 commit

This project is looking for maintainer(s): #36


scraper

HTML parsing and querying with CSS selectors.

scraper
is on Crates.io and GitHub.

Scraper provides an interface to Servo's

html5ever
and
selectors
crates, for browser-grade parsing and querying.

Examples

Parsing a document

use scraper::Html;

let html = r#"

<meta charset="utf-8">
<title>Hello, world!</title>
<h1 class="foo">Hello, <i>world!</i>
"#;

let document = Html::parse_document(html);

Parsing a fragment

use scraper::Html;
let fragment = Html::parse_fragment("

Hello, world!

");

Parsing a selector

use scraper::Selector;
let selector = Selector::parse("h1.foo").unwrap();

Selecting elements

use scraper::{Html, Selector};

let html = r#"

  • Foo
  • Bar
  • Baz
"#;

let fragment = Html::parse_fragment(html); let selector = Selector::parse("li").unwrap();

for element in fragment.select(&selector) { assert_eq!("li", element.value().name()); }

Selecting descendent elements

use scraper::{Html, Selector};

let html = r#"

  • Foo
  • Bar
  • Baz
"#;

let fragment = Html::parse_fragment(html); let ul_selector = Selector::parse("ul").unwrap(); let li_selector = Selector::parse("li").unwrap();

let ul = fragment.select(&ul_selector).next().unwrap(); for element in ul.select(&li_selector) { assert_eq!("li", element.value().name()); }

Accessing element attributes

use scraper::{Html, Selector};

let fragment = Html::parse_fragment(r#""#); let selector = Selector::parse(r#"input[name="foo"]"#).unwrap();

let input = fragment.select(&selector).next().unwrap(); assert_eq!(Some("bar"), input.value().attr("value"));

Serializing HTML and inner HTML

use scraper::{Html, Selector};

let fragment = Html::parse_fragment("

Hello, world!

"); let selector = Selector::parse("h1").unwrap();

let h1 = fragment.select(&selector).next().unwrap();

assert_eq!("

Hello, world!

", h1.html()); assert_eq!("Hello, world!", h1.inner_html());

Accessing descendent text

use scraper::{Html, Selector};

let fragment = Html::parse_fragment("

Hello, world!

"); let selector = Selector::parse("h1").unwrap();

let h1 = fragment.select(&selector).next().unwrap(); let text = h1.text().collect::>();

assert_eq!(vec!["Hello, ", "world!"], text);

License: ISC

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.