API¶
Extractor public API¶
- class Extractor¶
- keep(xpath)¶
Adds an Xpath expression to keep
Parameters: xpath (str) – The Xpath expression to add Returns: The self instance Return type: Extractor
- discard(xpath)¶
Adds an Xpath expression to discard
Parameters: xpath (str) – The Xpath expression to add Returns: The self instance Return type: Extractor
- extract(html_contents, css_contents=None, base_url=None)¶
Extracts the cleaned html tree as a string and only css rules matching the cleaned html tree
Parameters: - html_contents (str) – The HTML contents to parse
- css_contents (str) – The CSS contents to parse
- base_url (str) – The base page URL to use for relative to absolute links
Returns: cleaned HTML contents or (cleaned HTML contents, cleaned CSS contents)
Return type: str or tuple