media.domain.com
since they will be ignored when crawling domain.com
page
object in the on_every_page
block above has a .doc
method which returns the Nokogiri document for the HTML body of the page. This means you can use Nokogiri selectors inside the on_every_page
block such as page.doc.css('div#id')
pages = []
crawler = Anemone::Core.new(url, options)
crawler.on_every_page do |page|
results << page.url
end
crawler.run
Parameter | Details |
---|---|
url | URL (including protocol to be crawled) |
options | optional hash, see all options here |