robots.txt - How to disallow bots from a single page or file -


how disallow bots single page , allow allow other content crawled.

its important not wrong asking here, cant find definitive answer elsewhere.

is correct?

    user-agent:*     disallow: /dir/mypage.html     allow: / 

the disallow line that's needed. block access starts "/dir/mypage.html".

the allow line superfluous. default robots.txt allow: /. in general, allow not required. it's there can override access disallowed. example, want disallow access "/images" directory, except images in "public" subdirectory. write:

allow: /images/public disallow: /images 

note order important here. crawlers supposed use "first match" algorithm. if wrote 'disallow` first, crawler assume access "/images/public" blocked.


Comments

Popular posts from this blog

linux - Using a Cron Job to check if my mod_wsgi / apache server is running and restart -

actionscript 3 - TweenLite does not work with object -

jQuery Ajax Render Fragments OR Whole Page -