<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Oyvinds World &#187; The Internet(s)</title>
	<atom:link href="http://oyvinds.livelyblog.com/category/technology/the-internets/feed/" rel="self" type="application/rss+xml" />
	<link>http://oyvinds.livelyblog.com</link>
	<description>Don't read between the lines, a train could run over you</description>
	<lastBuildDate>Sun, 19 Apr 2009 19:33:23 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.1</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Yanga WorldSearch Bot v1.1/beta &#8211; legit and misbehaved</title>
		<link>http://oyvinds.livelyblog.com/2009/04/19/yanga-worldsearch-bot-v11beta-legit-and-misbehaved/</link>
		<comments>http://oyvinds.livelyblog.com/2009/04/19/yanga-worldsearch-bot-v11beta-legit-and-misbehaved/#comments</comments>
		<pubDate>Sun, 19 Apr 2009 19:33:23 +0000</pubDate>
		<dc:creator>oyvinds</dc:creator>
				<category><![CDATA[Web spiders]]></category>

		<guid isPermaLink="false">http://oyvinds.livelyblog.com/?p=74</guid>
		<description><![CDATA[I was looking in my hitlogs and I noticed that Yanga WorldSearch Bot v1.1/beta was fetching a whole lot of pages and (ab)using a truckload of bandwidth.
I searched around and found blogs with debates about it being &#8220;legit&#8221; or not.
It&#8217;s legit
The Yanga gang do actually have a search engine interface where you can search among [...]]]></description>
			<content:encoded><![CDATA[<!-- sphereit start --><p>I was looking in my hitlogs and I noticed that Yanga WorldSearch Bot v1.1/beta was fetching <em>a whole lot of pages</em> and (ab)using <em>a truckload of bandwidth.</em></p>
<p>I searched around and found <a href="http://www.joewein.net/blog/2009/01/12/yanga-worldsearch-bot-v11beta-www-yanga-co-uk/">blogs with debates</a> about it being &#8220;legit&#8221; or not.</p>
<h2>It&#8217;s legit</h2>
<p>The Yanga gang <a href="http://www.yanga.co.uk/">do actually have a search engine interface</a> where you can search among the pages they crawl. A few searches on things like my own name did produce results. The evidence suggests that this is in fact a useful crawler which is used to provide a public service.</p>
<h2>..and misbehaved</h2>
<p>It must be mentioned that this crawler ate more than <em>four thousand pages</em> off <em>one website</em> during the last 24 hours. That really is a whole lot of pages. The logs further indicate that their crawler is <em>very stupid </em>and <em>unpolite</em>.</p>
<p><strong>I&#8217;ll allow Yanga for now, and I recommend allowing it since it does appear to be useful &#8211; if your server can handle the immense load it puts on it when eating pages.</strong> Supporting alternatives to the heavily-censored Google search-engine is a good thing. I do, however, recommend that those who host sites with heavy PHP/MySQL usage on weak servers just -j DROP their IPs as it does strain servers  to the point where users may notice a slowdown.</p>
<!-- sphereit end --><br/><span style="margin-bottom:40px; border-bottom:none;"><a class="iconsphere" title="Sphere: Related Content" onclick="return Sphere.Widget.search('http://oyvinds.livelyblog.com/2009/04/19/yanga-worldsearch-bot-v11beta-legit-and-misbehaved/')" href="http://www.sphere.com/search?q=sphereit:http://oyvinds.livelyblog.com/2009/04/19/yanga-worldsearch-bot-v11beta-legit-and-misbehaved/">Sphere: Related Content</a></span><br/>]]></content:encoded>
			<wfw:commentRss>http://oyvinds.livelyblog.com/2009/04/19/yanga-worldsearch-bot-v11beta-legit-and-misbehaved/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
