robots.txt - How to disallow bots from a single page or file -
how disallow bots single page , allow allow other content crawled.
its important not wrong asking here, cant find definitive answer elsewhere.
is correct?
user-agent:* disallow: /dir/mypage.html allow: /
the disallow
line that's needed. block access starts "/dir/mypage.html".
the allow
line superfluous. default robots.txt allow: /
. in general, allow
not required. it's there can override access disallowed. example, want disallow access "/images" directory, except images in "public" subdirectory. write:
allow: /images/public disallow: /images
note order important here. crawlers supposed use "first match" algorithm. if wrote 'disallow` first, crawler assume access "/images/public" blocked.
Comments
Post a Comment