What is robots.txt
A robots.txt file is a text file that tells web crawlers (like Googlebot) which parts of your website they are allowed to access. It is a simple way to control how your website is indexed by search engines.
π Why is it important for a streaming service
Optimizing crawl budget: Search engines have a limited amount of resources that they can use to crawl websites. By using robots.txt, a website owner can tell search engines which pages are most important and should be crawled first. This helps to ensure that the most important pages are indexed quickly and accurately.
Blocking duplicate and non-public pages: Robots.txt can be used to block search engines from crawling duplicate or non-public pages. This can help to improve the crawl efficiency of search engines and prevent them from wasting time crawling pages that are not relevant to users.
Hiding resources: Robots.txt can be used to hide resources from search engines. This can be useful for hiding things like login pages, administrative pages, or other resources that should not be accessible to the public.
How to add/edit your robots.txt file?
From the Responsive website section of the left menu, expand Website and then select Basic info.
Select Asset Details.
Scroll down to the Robots file and format Add your robots.txt file here in .txt format.
Press Save.
What should your robots.txt contain?
The syntax for a robots.txt file is as follows:
User-agent: <user-agent>
Disallow: <path>
Allow: <path>
Sitemap: <url>
The User-agent line specifies the search engine bot that the directive applies to.
The Disallow line specifies a path that the search engine bot should not crawl.
The Allow line specifies a path that the search engine bot is allowed to crawl.
The Sitemap line specifies the URL of a sitemap file that the search engine bot can use to crawl your website
EXAMPLE
User-agent: Googlebot
Disallow: /private-page
Allow: /
Sitemap: https://www.yourwebsite.com/sitemap.xml
β