Posted 09/02/08 at 09:56:21 PM by Mark Edward Soper

The search engine startup Cuil (pronouced "Cool") we first told you about in July isn't very "cool" in the way its indexing robot works with websites. TechCrunch reports that Cuil's Twiceler website crawler is bringing many websites to their knees.
What is Twiceler doing? Last year, posters on The Admin Zone forum on Twiceler pointed out that the crawler was creating many connections in a short amount of time, resulting in an de facto denial of service "attack" on sites being crawled. While Twiceler doesn't work the same way now, it's still behaving badly.
For example, the JazzyChad blog reported recently that Twiceler was indexing invalid addresses that would become 404 (file not found) errors when Cuil users tried to follow them. Joe Kirp's Popular Science and Technology blog reports that:
The Twiceler bot is probably the most stupid crawler I've ever seen, it just downloads everything it can find and it seems that it just won't ever stop. If there's a page using dynamic input in a URL (a calendar for example) it will download the same page 100,000 and more times, simply by following all kinds of dynamic links it can find without using any kind of intelligent limitation.
By downloading thousands of pages per hour on each website it can cause an incredible traffic on a server, and dynamic scripts (written in Perl, Python or PHP for example) start causing an immense CPU load that may even take your entire server down (as reported by several webmasters). Twiceler is really harmful and can cost both money and downtime. A well written crawler such as Googlebot or Slurp (Yahoo) would never affect a website in such a malicious way.
How can you stop Twiceler from bringing your website to a crashing halt? To find out how, and to sound off on your Twiceler problems, follow the jump.
1 NEW COMMENT(S) | 54 TOTAL COMMENTS





