In today's iCamp Live Session, we discussed System Design for a web crawler.
The first thing we did is discuss the difference between web crawlers and search engines. People often get confused between the two (during interviews). It's important to draw the line between web crawler functionality and search engine functionality. You don't want to get into Search Engine specific stuff if asked to design a web crawler.
Then, we discussed the concept and design of a crawler, including topics like Recrawl speed, architecture, handling duplicate URLs, Fault tolerance and scaling.
We have attached notes below. As always, to access these sessions live, please join our trial at Interview Camp!