Will the limits specified in robots.txt be taken into account when crawling?
This is optional, but enabled by default. If this option is ticked, our bot will follow the Allow and Disallow rules in general section of User-agent.
"Personal" sections of User-agent (for example, Google or Yandex) are considered when choosing the appropriate crawler mode as search bot.
In addition, you may create separate section specifically for Mysitemapgenerator:
User-agent: Mysitemapgenerator
Below is an example of a robots.txt file:
#No robots should visit any URL starting with /noindex-directory/ User-agent: * Disallow: /noindex-directory/ #Google does not need to visit a specific URL User-agent: Googlebot Disallow: /noindex-directory/disallow-google.html #Yandex does not need to visit URL starting with /noindex-directory/ #But allows to index a specific page User-agent: Yandex Disallow: /noindex-directory/ Allow: /noindex-directory/allow-yandex.html #Mysitemapgenerator does not need to visit URL starting with /noindex-directory/ #But allows to index pages with a specific extension User-agent: Mysitemapgenerator Disallow: /noindex-directory/ Allow: /noindex-directory/*.html