Interview question: Honeypots and web crawlers -
i reading book prep interview , came across following question:
what when crawler runs honey pot generates infinite subgraph wander about?
i wanted solutions qn. personally, form of depth limited search prevent traversing continuously. or perhaps use form of machine learning detect patterns. thoughts?
most commonly infinite subgraphs prevented link depth. gain inital set of urls , traverse each finite depth. while limiting traversing depth may use heuristics dynamically adjust according webpage characteristics. more information can found e.g. here.
another option try sort of pattern matching. depending on algorithm produces subgraph pretty (very very)hard task. @ least pretty expensive operation.
for interview question(about detecting infinite loops):
if ask questiom want hear reference halting problem
alan turing proved in 1936 general algorithm solve halting problem possible program-input pairs cannot exist.
Comments
Post a Comment