Mysitemapgenerator / Help center / Website Crawling

Will the limits specified in robots.txt be taken into account when crawling?

This is optional, but enabled by default. If this option is ticked, our bot will follow the Allow and Disallow rules in general section of User-agent.
"Personal" sections of User-agent (for example, Google or Yandex) are considered when choosing the appropriate crawler mode as search bot.
In addition, you may create separate section specifically for Mysitemapgenerator:

    User-agent: Mysitemapgenerator

Below is an example of a robots.txt file:

    #No robots should visit any URL starting with /noindex-directory/
    User-agent: *
    Disallow: /noindex-directory/
    
    #Google does not need to visit a specific URL
    User-agent: Googlebot
    Disallow: /noindex-directory/disallow-google.html
    
    #Yandex does not need to visit URL starting with /noindex-directory/
    #But allows to index a specific page
    User-agent: Yandex
    Disallow: /noindex-directory/
    Allow: /noindex-directory/allow-yandex.html
    
    #Mysitemapgenerator does not need to visit URL starting with /noindex-directory/
    #But allows to index pages with a specific extension
    User-agent: Mysitemapgenerator
    Disallow: /noindex-directory/
    Allow: /noindex-directory/*.html