Interview question: Honeypots and web crawlers -

- March 15, 2014

i reading book prep interview , came across following question:

what when crawler runs honey pot generates infinite subgraph wander about?

i wanted solutions qn. personally, form of depth limited search prevent traversing continuously. or perhaps use form of machine learning detect patterns. thoughts?

most commonly infinite subgraphs prevented link depth. gain inital set of urls , traverse each finite depth. while limiting traversing depth may use heuristics dynamically adjust according webpage characteristics. more information can found e.g. here.

another option try sort of pattern matching. depending on algorithm produces subgraph pretty (very very)hard task. @ least pretty expensive operation.

for interview question(about detecting infinite loops):

if ask questiom want hear reference halting problem

alan turing proved in 1936 general algorithm solve halting problem possible program-input pairs cannot exist.

Search This Blog

C A N B

Interview question: Honeypots and web crawlers -

Comments

Post a Comment

Popular posts from this blog

actionscript 3 - TweenLite does not work with object -

php - How can I edit my code to echo the data of child's element where my search term was found in, in XMLReader? -

c# - Global Variables vs. ASP.NET Session State -