Need help with HandsomeSoup?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

About the developer

egonSchiele
125 Stars 19 Forks Other 85 Commits 14 Opened issues

Description

Easy HTML parsing for Haskell

Services available

!
?

Need anything else?

Contributors list

# 29,160
Elixir
PureScr...
Shell
content...
61 commits
# 394,709
Haskell
Shell
PureScr...
environ...
7 commits
# 531,502
Haskell
Shell
Clojure
3 commits
# 736,086
Haskell
HTML
1 commit
# 36,137
Haskell
cpluspl...
webasse...
plainte...
1 commit
# 40,511
nix
nixos
Nextclo...
faceboo...
1 commit
# 12,459
Django
testing...
Twitch
dailymo...
1 commit
# 101,527
Arduino
proof-o...
casper
SQLite
1 commit

HandsomeSoup

Current Status: Usable and stable. Needs GHC 7.6. Please file bugs!

HandsomeSoup is the library I wish I had when I started parsing HTML in Haskell.

It is built on top of HXT and adds a few functions that make it easier to work with HTML.

Most importantly, it adds CSS selectors to HXT. The goal of HandsomeSoup is to be a complete CSS2 selector parser for HXT.

Install

cabal install HandsomeSoup

Example

Nokogiri, the HTML parser for Ruby, has an example showing how to scrape Google search results. This is easy in HandsomeSoup:

import Text.XML.HXT.Core
import Text.HandsomeSoup

main = do let doc = fromUrl "http://www.google.com/search?q=egon+schiele" links >> css "h3.r a" ! "href" mapM_ putStrLn links

What can HandsomeSoup do for you?

Easily parse an online page using
fromUrl

let doc = fromUrl "http://example.com"

Or a local page using
parseHtml

contents 

Easily extract elements using
css

Here are some valid selectors:

doc <<< css "a"
doc <<< css "*"
doc <<< css "a#link1"
doc <<< css "a.foo"
doc <<< css "p > a"
doc <<< css "p strong"
doc <<< css "#container h1"
doc <<< css "img[width]"
doc <<< css "img[width=400]"
doc <<< css "a[class~=bar]"
doc <<< css "a:first-child"

Easily get attributes using
(!)

doc <<< css "img" ! "src"
doc <<< css "a" ! "href"

Docs

Find Haddock docs on Hackage.

I also wrote The Complete Guide To Parsing HXT With Haskell.

Credits

Made by Adit.

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.