class Crawler implements Countable, IteratorAggregate
Crawler eases navigation of a list of \DOMNode objects.
Methods
__construct(mixed $node = null, string $currentUri = null, string $baseHref = null) | ||
string | getUri() Returns the current URI. | |
string | getBaseHref() Returns base href. | |
clear() Removes all the nodes. | ||
add(DOMNodeList|DOMNode|array|string|null $node) Adds a node to the current list of nodes. | ||
addContent(string $content, null|string $type = null) Adds HTML/XML content. | ||
addHtmlContent(string $content, string $charset = 'UTF-8') Adds an HTML content to the list of nodes. | ||
addXmlContent(string $content, string $charset = 'UTF-8', int $options = LIBXML_NONET) Adds an XML content to the list of nodes. | ||
addDocument(DOMDocument $dom) Adds a \DOMDocument to the list of nodes. | ||
addNodeList(DOMNodeList $nodes) Adds a \DOMNodeList to the list of nodes. | ||
addNodes(array $nodes) Adds an array of \DOMNode instances to the list of nodes. | ||
addNode(DOMNode $node) Adds a \DOMNode instance to the list of nodes. | ||
Crawler | eq(int $position) Returns a node given its position in the node list. | |
array | each(Closure $closure) Calls an anonymous function on each node of the list. | |
Crawler | slice(int $offset, int $length = null) Slices the list of nodes by $offset and $length. | |
Crawler | reduce(Closure $closure) Reduces the list of nodes by calling an anonymous function. | |
Crawler | first() Returns the first node of the current selection. | |
Crawler | last() Returns the last node of the current selection. | |
Crawler | siblings() Returns the siblings nodes of the current selection. | |
Crawler | nextAll() Returns the next siblings nodes of the current selection. | |
Crawler | previousAll() Returns the previous sibling nodes of the current selection. | |
Crawler | parents() Returns the parents nodes of the current selection. | |
Crawler | children() Returns the children nodes of the current selection. | |
string|null | attr(string $attribute) Returns the attribute value of the first node of the list. | |
string | nodeName() Returns the node name of the first node of the list. | |
string | text() Returns the node value of the first node of the list. | |
string | html() Returns the first node of the list as HTML. | |
array | extract(array $attributes) Extracts information from the list of nodes. | |
Crawler | filterXPath(string $xpath) Filters the list of nodes with an XPath expression. | |
Crawler | filter(string $selector) Filters the list of nodes with a CSS selector. | |
Crawler | selectLink(string $value) Selects links by name or alt value for clickable images. | |
Crawler | selectImage(string $value) Selects images by alt value. | |
Crawler | selectButton(string $value) Selects a button by name or alt value for images. | |
Link | link(string $method = 'get') Returns a Link object for the first node in the list. | |
Link[] | links() Returns an array of Link objects for the nodes in the list. | |
Image | image() Returns an Image object for the first node in the list. | |
Image[] | images() Returns an array of Image objects for the nodes in the list. | |
Form | form(array $values = null, string $method = null) Returns a Form object for the first node in the list. | |
setDefaultNamespacePrefix(string $prefix) Overloads a default namespace prefix to be used with XPath and CSS expressions. | ||
registerNamespace(string $prefix, string $namespace) | ||
static string | xpathLiteral(string $s) Converts string for XPath expressions. | |
DOMElement|null | getNode(int $position) | |
int | count() | |
ArrayIterator | getIterator() |
Details
__construct(mixed $node = null, string $currentUri = null, string $baseHref = null)
string getUri()
Returns the current URI.
string getBaseHref()
Returns base href.
clear()
Removes all the nodes.
add(DOMNodeList|DOMNode|array|string|null $node)
Adds a node to the current list of nodes.
This method uses the appropriate specialized add*() method based on the type of the argument.
addContent(string $content, null|string $type = null)
Adds HTML/XML content.
If the charset is not set via the content type, it is assumed to be ISO-8859-1, which is the default charset defined by the HTTP 1.1 specification.
addHtmlContent(string $content, string $charset = 'UTF-8')
Adds an HTML content to the list of nodes.
The libxml errors are disabled when the content is parsed.
If you want to get parsing errors, be sure to enable internal errors via libxmluseinternalerrors(true) and then, get the errors via libxmlgeterrors(). Be sure to clear errors with libxmlclear_errors() afterward.
addXmlContent(string $content, string $charset = 'UTF-8', int $options = LIBXML_NONET)
Adds an XML content to the list of nodes.
The libxml errors are disabled when the content is parsed.
If you want to get parsing errors, be sure to enable internal errors via libxmluseinternalerrors(true) and then, get the errors via libxmlgeterrors(). Be sure to clear errors with libxmlclear_errors() afterward.
addDocument(DOMDocument $dom)
Adds a \DOMDocument to the list of nodes.
addNodeList(DOMNodeList $nodes)
Adds a \DOMNodeList to the list of nodes.
addNodes(array $nodes)
Adds an array of \DOMNode instances to the list of nodes.
addNode(DOMNode $node)
Adds a \DOMNode instance to the list of nodes.
Crawler eq(int $position)
Returns a node given its position in the node list.
array each(Closure $closure)
Calls an anonymous function on each node of the list.
The anonymous function receives the position and the node wrapped in a Crawler instance as arguments.
Example:
$crawler->filter('h1')->each(function ($node, $i) {
return $node->text();
});
Crawler slice(int $offset, int $length = null)
Slices the list of nodes by $offset and $length.
Crawler reduce(Closure $closure)
Reduces the list of nodes by calling an anonymous function.
To remove a node from the list, the anonymous function must return false.
Crawler first()
Returns the first node of the current selection.
Crawler last()
Returns the last node of the current selection.
Crawler siblings()
Returns the siblings nodes of the current selection.
Crawler nextAll()
Returns the next siblings nodes of the current selection.
Crawler previousAll()
Returns the previous sibling nodes of the current selection.
Crawler parents()
Returns the parents nodes of the current selection.
Crawler children()
Returns the children nodes of the current selection.
string|null attr(string $attribute)
Returns the attribute value of the first node of the list.
string nodeName()
Returns the node name of the first node of the list.
string text()
Returns the node value of the first node of the list.
string html()
Returns the first node of the list as HTML.
array extract(array $attributes)
Extracts information from the list of nodes.
You can extract attributes or/and the node value (_text).
Example:
$crawler->filter('h1 a')->extract(array('_text', 'href'));
Crawler filterXPath(string $xpath)
Filters the list of nodes with an XPath expression.
The XPath expression is evaluated in the context of the crawler, which is considered as a fake parent of the elements inside it. This means that a child selector "div" or "./div" will match only the div elements of the current crawler, not their children.
Crawler filter(string $selector)
Filters the list of nodes with a CSS selector.
This method only works if you have installed the CssSelector Symfony Component.
Crawler selectLink(string $value)
Selects links by name or alt value for clickable images.
Crawler selectImage(string $value)
Selects images by alt value.
Crawler selectButton(string $value)
Selects a button by name or alt value for images.
Link link(string $method = 'get')
Returns a Link object for the first node in the list.
Link[] links()
Returns an array of Link objects for the nodes in the list.
Image image()
Returns an Image object for the first node in the list.
Image[] images()
Returns an array of Image objects for the nodes in the list.
Form form(array $values = null, string $method = null)
Returns a Form object for the first node in the list.
setDefaultNamespacePrefix(string $prefix)
Overloads a default namespace prefix to be used with XPath and CSS expressions.
registerNamespace(string $prefix, string $namespace)
static string xpathLiteral(string $s)
Converts string for XPath expressions.
Escaped characters are: quotes (") and apostrophe (').
Examples:
echo Crawler::xpathLiteral('foo " bar');
//prints 'foo " bar'
echo Crawler::xpathLiteral("foo ' bar");
//prints "foo ' bar"
echo Crawler::xpathLiteral('a\'b"c');
//prints concat('a', "'", 'b"c')
Please login to continue.