Interview question: Honeypots and web crawlers -


i reading book prep interview , came across following question:

what when crawler runs honey pot generates infinite subgraph wander about?

i wanted solutions qn. personally, form of depth limited search prevent traversing continuously. or perhaps use form of machine learning detect patterns. thoughts?

most commonly infinite subgraphs prevented link depth. gain inital set of urls , traverse each finite depth. while limiting traversing depth may use heuristics dynamically adjust according webpage characteristics. more information can found e.g. here.

another option try sort of pattern matching. depending on algorithm produces subgraph pretty (very very)hard task. @ least pretty expensive operation.

for interview question(about detecting infinite loops):

if ask questiom want hear reference halting problem

alan turing proved in 1936 general algorithm solve halting problem possible program-input pairs cannot exist.


Comments

Popular posts from this blog

linux - Using a Cron Job to check if my mod_wsgi / apache server is running and restart -

actionscript 3 - TweenLite does not work with object -

jQuery Ajax Render Fragments OR Whole Page -