API¶
Extractor public API¶
- class Extractor¶
- keep(xpath)¶
Adds an Xpath expression to keep
- Parameters:
xpath (str) – The Xpath expression to add
- Returns:
The self instance
- Return type:
Extractor
- discard(xpath)¶
Adds an Xpath expression to discard
- Parameters:
xpath (str) – The Xpath expression to add
- Returns:
The self instance
- Return type:
Extractor
- extract(html_contents, css_contents=None, base_url=None)¶
Extracts the cleaned html tree as a string and only css rules matching the cleaned html tree
- Parameters:
html_contents (str) – The HTML contents to parse
css_contents (str) – The CSS contents to parse
base_url (str) – The base page URL to use for relative to absolute links
- Returns:
cleaned HTML contents or (cleaned HTML contents, cleaned CSS contents)
- Return type:
str or tuple