<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>

<channel>
	<title>SEO and Web Marketing Research</title>
	<atom:link href="http://www.seoresearcher.com/feed" rel="self" type="application/rss+xml" />
	<link>http://www.seoresearcher.com</link>
	<description>A comprehensive SEO and Web Marketing study</description>
	<pubDate>Tue, 24 Jun 2008 07:49:13 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.5.1</generator>
	<language>en</language>
			<item>
		<title>SEO for WordPress Part II</title>
		<link>http://www.seoresearcher.com/seo-for-wordpress-part-ii.htm</link>
		<comments>http://www.seoresearcher.com/seo-for-wordpress-part-ii.htm#comments</comments>
		<pubDate>Sun, 27 May 2007 15:22:34 +0000</pubDate>
		<dc:creator>oleg.ishenko</dc:creator>
		
		<category><![CDATA[Search Engine Optimization]]></category>

		<category><![CDATA[WordPress Blogging]]></category>

		<guid isPermaLink="false">http://www.seoresearcher.com/seo-for-wordpress-part-ii.htm</guid>
		<description><![CDATA[This  is the second part of the essential SEO tips for WordPress blogs covering the  topics of Google Sitemaps plugins, pings and ping servers, valid (X)HTML, importance  of a layout that puts post content ahead of sidebars and navigation, and displaying  post excerpts and teaser text on the home page.
You should [...]]]></description>
			<content:encoded><![CDATA[<p><img width="250" height="169" align="left" src="http://www.seoresearcher.com/images/articles/seo-for-wordpress.jpg" />This  is the second part of the <strong>essential SEO tips for WordPress blogs</strong> covering the  topics of Google Sitemaps plugins, pings and ping servers, valid (X)HTML, importance  of a layout that puts post content ahead of sidebars and navigation, and displaying  post excerpts and teaser text on the home page.</p>
<p>You should also check out other  articles relevant to the SEO for blogs: <a href="http://www.seoresearcher.com/how-to-make-your-wordpress-blog-duplicate-content-safe.htm">How  to Make a WordPress Blog Duplicate Content Safe</a> and <a href="http://www.seoresearcher.com/seo-for-wordpress.htm">SEO  for WordPress Part 1</a></p>
<p><span id="more-53"></span></p>
<h2>Google Sitemaps</h2>
<p>To keep the quality of the web search high Google spiders constantly crawl    the Internet searching for new or updated content. The main way Google discovers    a new page is following links that point to it. Some pages don’t have    enough incoming links to be quickly discovered by Google, and it may take weeks    for them to appear in the index.</p>
<div id="advertical"><script type="text/javascript"><!--
google_ad_client = "pub-4068762382585748";
google_ad_width = 160;
google_ad_height = 600;
google_ad_format = "160x600_as";
google_ad_type = "text";
//2007-01-01: SR_vertical_post
google_ad_channel = "9650256337";
google_color_border = "FFFFFF";
google_color_bg = "FFFFFF";
google_color_link = "000066";
google_color_text = "4B090A";
google_color_url = "4B090A";
//--></script>
<script type="text/javascript"
  src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
</script></div>
<p>To speed up the indexing process Google allows    webmasters to upload a specially formatted XML file called ‘<strong>sitemap</strong>’    containing links to all the pages in a given website and the frequency of their    updates. This not just increases the chances of a new or updated page to be    picked up quickly, but also optimizes the indexing job, as instead of random    crawl spiders now can be sent directly to the new content.Although I can say that from my experience WordPress blogs are usually indexed    without much problem, it is still can be useful to create a Google account and    upload a <a target="_blank" href="http://www.google.com/webmasters/sitemaps/">sitemap    file</a> for your blog. There is a handy plugin for WordPress that allows you    to create sitemaps with little or now knowledge of PHP and XML. Check it out:</p>
<ul>
<li><a target="_blank" href="http://www.arnebrachhold.de/2005/06/05/google-sitemaps-generator-v2-final">Wordpress      Sitemaps plugin</a> from <em>Arne Brachhold</em>. It builds a new XML sitemap      every time a post is written or updated. It can set priority of a page based      on the number of comments to it. It also has a friendly user interface to      customize all the parameters. Plus there is an informative <a target="_blank" href="http://www.andrechaperon.com/2005/07/google-sitemaps-tutorial/">video      tutorial</a> explaining how to install the plugin and work with sitemaps by      <em>Andre Chaperon</em>.</li>
<li>To display your XML sitemap in your blog just as a regular sitemap (which      would help visitor to browse your blog) use <a target="_blank" href="http://bueltge.de/wp-sitemapview-plugin/63/">SiteView      plugin</a> . The page is in German, so here is the link to the automated <a target="_blank" href="http://babelfish.altavista.com/babelfish/trurl_pagecontent?lp=de_en&#038;trurl=http%3A//bueltge.de/wp-sitemapview-plugin/63/">English      translation</a> .</li>
</ul>
<h2>Ping Servers</h2>
<p>Each time you publish or update a post your WordPress engine attempts to notify    ping servers about the new content on your site.<strong> Ping servers </strong>provide    lists of recently updated blogs to <strong>blog search engines</strong> and    <strong>aggregators</strong> helping them to show the most recent content to    their users. You can manage the list of servers to ping in <em>Options ->    Writing</em> section. The more servers you ping the better, but be aware that    as your blog notifies a long list of ping servers this an extra load on your    webserver making you wait every time you publish updates. The best solution    is to choose a few popular ping servers that can guarantee that all the major    blog search engines and aggregators will be notified about your new post. Here    is the list of recommended ping servers:</p>
<ul>
<li>http://api.feedster.com/ping.php</li>
<li>http://api.my.yahoo.com/RPC2</li>
<li>http://api.my.yahoo.com/rss/ping</li>
<li>http://blogsearch.google.com/ping/RPC2</li>
<li>http://bulkfeeds.net/rpc</li>
<li>http://ping.feedburner.com</li>
<li>http://rpc.icerocket.com:10080</li>
<li>http://rpc.newsgator.com</li>
<li>http://rpc.pingomatic.com</li>
<li>http://rpc.technorati.com/rpc/ping</li>
<li>http://rpc.weblogs.com/RPC2</li>
</ul>
<p>A comprehensive <a target="_blank" href="http://en.wikipedia.org/wiki/Ping-server#Available_ping_servers">list    of active ping servers</a> you can find on Wikipedia</p>
<h2>Valid (X)HTML</h2>
<p>Only a small percentage of pages in the Web fully confirm the standards of    <a target="_blank" href="http://www.w3.org/">W3C</a>, and even some big websites    allow having their web documents not validating against the W3C rules. The modern    major browsers are capable to display such pages regardless the errors in HTML    and search engine crawlers are mostly able to index them. But sometimes structure    (X)HTML errors may prevent your pages to be indexed correctly. To make sure    that your pages are valid use the <a href="http://validator.w3.org/">W3C validation    service</a> or one of the plugins for your browser, such as for example, <a target="_blank" href="http://users.skynet.be/mgueury/mozilla/">this    one</a> based on <a href="http://www.w3.org/People/Raggett/tidy/">Tidy</a>.</p>
<h2>Post Content above Navigation</h2>
<p>Your blog navigation and the content of your sidebar are repeated across the    blog while the content of your posts is mostly unique. It would be a wise decision    to put your posts above the navigation so that to get advantage of the content    prominence (one of the factors used to judge the relevance of a page).</p>
<p>To see how the content and sidebar navigation are arranged in your pages use    a text-only browser like <a target="_blank" href="http://lynx.browser.org/">Lynx</a>,    or temporarily disable CSS in your browser options. Or better yet, install <a target="_blank" href="http://chrispederick.com/work/webdeveloper/">Web    Developer plugin for Firefox</a> that allows you to enable and disable CSS in    one click. Once you disabled CSS you can see you blog just as search engines    crawlers see it.</p>
<p>The designers of WordPress themes place post content above navigation and sidebar    by editing CSS file associated with the theme. The most popular blog layout    – posts to the left, sidebar to the right – doesn’t require    any special adjustments as sidebar appears after the post content. But when    you want to use a three column layout or a layout with a left sidebar –    you have to make sure that the theme you are going to use puts posts above the    navigation and sidebars in the CSS disabled view.</p>
<h2>Showing Teaser Text or Text Excerpts on the Home Page</h2>
<p>If you prefer to write long posts you should think about showing only a part    of them on your page. The reasons for that are:</p>
<ul>
<li>decreased loading time for your home page,</li>
<li>improved visibility of you previous posts,</li>
<li>precaution against duplicate content penalties.</li>
</ul>
<p>Simply put <em> tag after the first or second paragraph of    your post and make sure that the first lines displayed on the home page are    capable to capture the attention of your readers motivating them to read the    entire post. Copyblogger gives excellent tips on <a target="_blank" href="http://www.copyblogger.com/5-simple-ways-to-open-your-blog-post-with-a-bang/">writing    captivating teaser text</a>.</em></p>
<p><em> </em></p>
<h2><em>Some more resources on SEO for Blogs:</em></h2>
<p><em> </em><em><a target="_blank" href="http://problogger.net/archives/2005/05/21/the-importance-of-title-tags-in-search-engine-optimization/">The    importance of Title Tags in Search Engine Optimization</a></em></p>
<p><em> </em><em><a target="_blank" href="http://www.problogger.net/archives/2005/08/15/search-engine-optimization-for-blogs/">Search    Engine Optimization for Blogs - SEO</a></em></p>
<p><em> </em><em><a target="_blank" href="http://searchenginewatch.com/showPage.html?page=3625832">SEO    for Blogs and RSS</a></em></p>
<p><em> </em><em>This article is largely based on <a target="_blank" href="http://sw-guide.de/2006-07/seo-fuer-wordpress-die-besten-tipps-teil-2/">SEO    für WordPress – die besten Tipps – Teil 2</a> by <em>Michael    Wöhrer</em> with some new input by me.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.seoresearcher.com/seo-for-wordpress-part-ii.htm/feed</wfw:commentRss>
		</item>
		<item>
		<title>SEO for WordPress</title>
		<link>http://www.seoresearcher.com/seo-for-wordpress.htm</link>
		<comments>http://www.seoresearcher.com/seo-for-wordpress.htm#comments</comments>
		<pubDate>Sun, 13 May 2007 12:24:54 +0000</pubDate>
		<dc:creator>oleg.ishenko</dc:creator>
		
		<category><![CDATA[Search Engine Optimization]]></category>

		<category><![CDATA[WordPress Blogging]]></category>

		<guid isPermaLink="false">http://www.seoresearcher.com/seo-for-wordpress.htm</guid>
		<description><![CDATA[WordPress is without question the most popular stand-alone blog platform.   It is flexible and customizable; there are lots of useful plugins providing any   functionality a blogger can think of. However, a fresh installation of a WordPress   blogs leaves a lot for improvement. For instance, search   engine optimization [...]]]></description>
			<content:encoded><![CDATA[<p><strong><img align="left" alt="SEO for Wordpress" src="http://www.seoresearcher.com/images/articles/seo-for-wordpress.jpg" />WordPress</strong> is without question the most popular stand-alone blog platform.   It is flexible and customizable; there are lots of useful plugins providing any   functionality a blogger can think of. However, a fresh installation of a WordPress   blogs leaves <strong>a lot for improvement</strong>. For instance, <strong>search   engine optimization</strong>  and <strong><a href="http://www.seoresearcher.com/how-to-make-your-wordpress-blog-duplicate-content-safe.htm">duplicate content proofing</a></strong>.</p>
<p>Below is a rundown of useful tips that can   help improving your blog’s position in search engines as well as providing   some additional benefits to your readers.</p>
<p><span id="more-52"></span></p>
<h2>Permalinks</h2>
<p>By default URLs to WordPress posts look like this: <a target="_blank" rel="noindex, nofollow" href="http://yourblog.com/?p=321">http://yourblog.com/?p=321</a>.   This URL calls the PHP engine to show a post or a page identified by its number,   in this case 321.</p>
<div id="advertical"><script type="text/javascript"><!--
google_ad_client = "pub-4068762382585748";
google_ad_width = 160;
google_ad_height = 600;
google_ad_format = "160x600_as";
google_ad_type = "text";
//2007-01-01: SR_vertical_post
google_ad_channel = "9650256337";
google_color_border = "FFFFFF";
google_color_bg = "FFFFFF";
google_color_link = "000066";
google_color_text = "4B090A";
google_color_url = "4B090A";
//--></script>
<script type="text/javascript"
  src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
</script></div>
<p>This is a totally valid URL and search engines (at least   the major ones: <a href="http://www.google.com">Google</a>, <a href="http://www.yahoo.com">Yahoo</a> and <a href="http://www.msn.com">MSN</a>) no longer have problems with indexing   dynamic content. However, a wise webmaster is aware that having keywords in   URL is an advantage over meaningless parameter values. Keywords in URL are   in fact one of the biggest factors determining the relevancy of a page to a   specific search query.The <strong>permalinks</strong> feature of WordPress allows creating meaningful URLs easily.   Just go to <em>Options</em> page of your blog’s control panel and click the menu   item  ‘<em>permalinks</em>’. Here you can choose, for example, date based   permalinks, which include the title of your post, as well as the month and   the day of posting. While definitely an improvement over the post-id based   URL, it is still not perfect. What the use of these month-and-day? Let’s   get rid of it and click the ‘Custom’  option and type /%postname%.html   in. Now your URL will look like <a rel="noindex, nofollow" target="_blank" href="http://yourblog.com/post-title.html">http://yourblog.com/post-title.html</a>.   You can further customize the post URL by providing a different ‘<em>post   slug</em>’  when writing your posts. The post-slug option you can find in   the right sidebar of your post editing page.</p>
<p><a target="_blank" href="http://codex.wordpress.org/Using_Permalinks">More     info on customizing the permalinks structure</a>.</p>
<h2>Page Title</h2>
<p>Page title is another important factor influencing the relevancy score of   a page in search engine index. Besides, title is what will be shown in a search   engine results page as a link to your post. Again, the default WordPress setting   for this feature is far from ideal. The fresh install of a WP blog shows page   titles as <em>The Name of Your Blog » Post Title</em>. Considering that this   structure is propagated to every page in your blog you might suffer from <strong>duplicate   content penalty</strong> (<a href="http://www.seoresearcher.com/duplicate-content-what-everybody-ought-to-know-about.htm">see   a more detailed description of duplicate problem here</a>).   This can be sorted out by editing the header file of your current WordPress   theme. In fact, many theme authors are aware of this problem and publish their   themes with this problem already fixed.</p>
<p>In your dashboard go to <em>presentation</em> page and click <em>theme     editor </em>menu item.   Then locate and click <em>header</em> link in the right sidebar. This will open the   text editor with the upper part of the source code shared by all the posts   and pages in your blog.</p>
<p>Take a look at this piece of code:</p>
<p><img width="624" height="177" alt="PHP code excerpt" src="http://www.seoresearcher.com/images/articles/swp-code01.gif" /></p>
<p>So let’s delete all but the last one:</p>
<p><img width="214" height="39" alt="PHP code excerpt" src="http://www.seoresearcher.com/images/articles/swp-code02.gif" /></p>
<p>No wait! What about the home page? This will leave it without the title! Change   the code as follows:</p>
<p><img width="430" height="33" alt="PHP code excerpt" src="http://www.seoresearcher.com/images/articles/swp-code03.gif" /></p>
<p>Now this will check, if it is a home page and assign your blog name as its   title, using the post title otherwise.</p>
<h2>Headings Structure</h2>
<p>A clear headings structure is beneficial both for users, as it improves readability,   and for search engines, as it describes the content of the page. Generally   it is advised to have one <em>h1</em> tag per page – at best containing   your post title, a few <em>h2</em> for subtopics of your post and a few <em>h3</em> whenever   necessary to emphasize or give a title to a paragraph in your subtopics. This   is just guidelines; you are not required to create <em>h2</em> and <em>h3</em> headings   in every post you write, for example in a one consisting of two paragraphs.   But keep in mind that longer posts should be logically divided into subtopics   to make users stop at headings while skimming the page (a common reading pattern   in the Web).</p>
<p>Do not overuse headings! Once webmasters had realized the weight the keywords   in headings had in relevancy scores, headings became often abused. Numerous   headings sometimes disguised with CSS as text of normal height and weight were   filled with target keywords to manipulate the relevancy algorithms of search   engines. This practice, however, now is detectable by SEs, and you might get   punished for using it.</p>
<p>Changing headings structure requires a little bit more advanced skills and   some knowledge of PHP and CSS. Do always backup your current theme before editing   it!</p>
<h2>Alt Tags for Pictures</h2>
<p>Whenever you insert an image into your post take your time and add a meaningful   description of in as an alt tag. There are two basic advantages of doing so.   First, there are a lot of your potential readers browsing the Net with images   turned off. In this case, instead of an empty box, they will see the description.   Or a visually impaired user can benefit from the description when his text-to-voice   software recites it for him. Another advantage is that your page can be discovered   by users doing image search by keywords you provided in the alt description.</p>
<h2>Tagging</h2>
<p>Tags are a relatively new and powerful feature in website promotion. Not just   page title and the content of headings determine the relevancy of content.   To even greater degree it’s the job of links pointing to the page. Keywords   in link anchor and URL are the most important factor that determines which   pages will be shown to a given query.</p>
<p>Linked tags you place in your post have far less power than those ones linking   to your page. But they still help search engine to determine to which topic   your post belongs, thus increasing your topical score.</p>
<p>Here are some popular tagging plugins for WordPress:</p>
<ul>
<li><a target="_blank" href="http://www.neato.co.nz/ultimate-tag-warrior/">Ultimate Tag Warrior</a></li>
<li><a target="_blank" href="http://vapourtrails.ca/wp-keywords">Jerome Keywords</a></li>
<li><a href="http://sw-guide.de/wordpress/plugins/category-tagging/">Category     Tagging</a></li>
</ul>
<h2>Links to Similar Posts</h2>
<p>This is one of the most powerful features for blog promotion. It helps users   discover similar posts they’ve just read. This is much more convenient   than browsing through archives or searching for a keyword. In fact, this is   one of the factors that made <a rel="nofollow" target="_blank" href="http://www.youtube.com">YouTube</a> so successful: links to similar videos   made user stay at <a rel="nofollow" target="_blank" href="http://www.youtube.com">YouTube</a> and spend in average 30 minutes a day there.</p>
<p>In SEO terms such links help building tight topical linking structures, again   to the benefit of your blog.</p>
<p>This functionality can be added to your blog by installing this <a target="_blank" href="http://sw-guide.de/wordpress/plugins/jeromes-keywords-related-posts/">extension   plugin</a> to Jerome Keywords</p>
<p><a href="http://www.seoresearcher.com/seo-for-wordpress-part-ii.htm">Continued: SEO for  WordPress Part II</a></p>
<p>Largely based on article <a target="_blank" href="http://sw-guide.de/weblog/2006-07-01/seo-fuer-wordpress-die-besten-tipps-teil-1/">SEO für WordPress – die besten Tipps – Teil 1</a> with some new input by me.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.seoresearcher.com/seo-for-wordpress.htm/feed</wfw:commentRss>
		</item>
		<item>
		<title>How Much Blog Spam? A Study of a Ping Dataset</title>
		<link>http://www.seoresearcher.com/how-much-blog-spam-a-study-of-a-ping-dataset.htm</link>
		<comments>http://www.seoresearcher.com/how-much-blog-spam-a-study-of-a-ping-dataset.htm#comments</comments>
		<pubDate>Mon, 12 Feb 2007 12:44:54 +0000</pubDate>
		<dc:creator>oleg.ishenko</dc:creator>
		
		<category><![CDATA[Search Engine Optimization]]></category>

		<category><![CDATA[Search Engines Technology]]></category>

		<category><![CDATA[WordPress Blogging]]></category>

		<guid isPermaLink="false">http://www.seoresearcher.com/how-much-blog-spam-a-study-of-a-ping-dataset.htm</guid>
		<description><![CDATA[How    much blog spam is produced in 5 minutes in a quiet Sunday evening?    What is the ratio of spam blogs in the most popular blog services?    To answer this question I present you the results of an experiment analyzing    ping data and [...]]]></description>
			<content:encoded><![CDATA[<p><img width="180" height="165" align="left" src="http://www.seoresearcher.com/images/articles/sunday-spam.jpg" />How    much <strong>blog spam</strong> is produced in 5 minutes in a quiet Sunday evening?    What is the <strong>ratio of spam blogs </strong>in the most popular blog services?    To answer this question I present you the results of an experiment analyzing    ping data and manually reviewing blogs.</p>
<p>The relative ease of creating and maintaining blogs makes them ideal tools    for spamming search engines. Spam blogs or <strong>splogs</strong> serve two    basic purposes: <strong>making money from advertising and affiliate programs</strong>,    and participating in <strong>link farms</strong>. But making money from AdSense    and providing nepotistic links are not what it takes to call a blog splog. Otherwise    we would have to classify all blogs showing ads or promoting a business as spam;    and there are thousands popular, quality blogs that would fall into this category.    The distinctive feature of a splog, however, is that it has no use for its visitors.    Should Google ban a splog from AdSense and prevent its links from passing on    authority – such a splog would have no more value or purpose of existence.    So my definition of a splog would be “<em>a blog with the <strong>only</strong>    purpose of showing contextual or affiliate ads, or boosting link popularity    of certain target sites</em>”.</p>
<p><span id="more-51"></span></p>
<div id="advertical"><!--adsense#vertical_post--></div>
<p>How active are these splogs? This question calls for a little experiment; similar    to one described by P. Kolari, A. Java and T. Finn in their paper “<a target="_blank" href="http://www.blogpulse.com/www2006-workshop/papers/splogosphere.pdf">Characterizing    the Splogosphere</a>”. They did their experiment in early 2006, and I    am going to repeat it at a smaller scale now, in the early 2007.</p>
<p>Every time a blog is updated it sends a <a target="_blank" href="http://en.wikipedia.org/wiki/Ping_blog"><strong>ping</strong></a>    to one of many ping servers in order to invite search engine crawlers to index    the new post. I am going to use ping data provided by one of the most popular    ping servers – <a target="_blank" href="http://www.weblogs.com/">Weblogs.com</a>.    Due to the limited scale of the experiment I will be using the smaller dataset covering    the last 5 minutes of pings. It’s pretty big though: 8117 pings. I’ve    written a simple Java application to parse the XML file and extract URLs and    names of the blogs in the dataset. Also some of the blogs were classified by    blog platform: <a target="_blank" href="http://www.blogger.com/">Blogspot (Blogger)</a>,    <a target="_blank" href="http://www.myspace.com/">MySpace</a>, <a target="_blank" href="http://spaces.live.com/">Spaces.Live.com</a>    etc. I have discovered a number of popular blog services, that I haven’t    come across yet, such as a popular Taiwanese site <a target="_blank" href="http://www.wretch.cc/">Wretch.cc</a>,    or Italian <a target="_blank" href="http://libero.it/">Libero.it</a> and <a target="_blank" href="http://www.splinder.com/">Splinder.com</a>.    I was surprised to see how few pings came from some other popular blog services;<a href="http://www.livejournal.com/">    Livejournal</a> for instance had only 6 pings! Obviously LJ doesn’t rely    much on Weblogs.com, but LJ has little to do with my experiment, as it is known    to have very small percentage of splogs.</p>
<p>So below is a break down of blogs by platform, according to a ping dataset    retrieved on a Sunday evening, Feb. 11. Do not mix blogs under <a target="_blank" href="http://www.wordpress.com/">Wordpress.com</a>    category with blogs using WP as a <strong>blog engine</strong>. Only those blogs    hosted by Wordpress.com are included into this category.</p>
<p><img width="400" height="312" src="http://www.seoresearcher.com/images/articles/spam02.jpg" /></p>
<p><em>Fig. 1 Popular Blog Services in the Sunday Weblogs Dataset</em></p>
<p>The huge ‘<strong>Rest</strong>’ category consists of <strong>standalone</strong>    blogs and blogs hosted by <strong>minor blog services</strong>.<br />
A few words on the blogs in the dataset: a lot of blogs were not in English,    I think as much as 70% of them. For instance, all Wretch.cc blogs and many Spaces.Live.com    ones are in Chinese, there are also a lot of blogs in Italian, Spanish, Russian,    Japanese and German.</p>
<p>Once dataset was downloaded and processed I started manually reviewing the    blogs and discovering spam. Of course I couldn’t visit all the 8117 blogs,    so I randomly selected 20 blogs from each category.</p>
<p>How did I classify spam blogs? While blogs with automatically generated content    or dictionary dumps are easily classified as spam, those with plagiarized content    or in foreign languages required a bit more of effort. Nepotistic links with    keyword stuffed anchors were a good indicator of spam. <a href="http://www.copyscape.com/">Copyscape.com</a>    helped much discovering plagiarized posts. And finally, affiliate and contextual    ads were the final complement in the spam classification problem. It has to    be noted that very few blogs in languages other than English were classified    as spam. I can be sure about my judgment of German and Russian blogs, since    I know these languages, but when dealing with others I relied only on excessive    advertising and nepotistic links as spam indicators. I skipped Wretch.cc and    Explog.jp samples as I was totally unable to judge Chinese and Japanese blogs.    In total of 177 reviewed blogs 36 were classified as spam.</p>
<p>Below you can see two charts, one indicating a ratio of spam within a sample,    and another showing how much each blog platform contributes to the total amount    of spam.</p>
<p><img width="414" height="309" src="http://www.seoresearcher.com/images/articles/spam03.jpg" /></p>
<p><em>Fig 2. Percentage of Spam Blogs in 20-blogs Samples</em></p>
<p><em><img width="306" height="254" src="http://www.seoresearcher.com/images/articles/spam01.jpg" /></em></p>
<p><em>Fig 3. Contribution of Each Category to the Total Blog Spam</em></p>
<p>With the notable exception of Blogspot, the majority of blogs hosted by popular    blog services are spam free. Of course one can question their quality, as many    of them are of little value to others. But let’s not forget that most    of those blogs are private diaries or personal playgrounds never intended to    have big audiences; and as long as they have value to the author and his/her    close circle of friends we can’t call them spam.</p>
<p>Thus, according to my reviews blogs hosted by beon.ru, Libero.it, Spaces.Live.com,    Livejournal.com, splinder.com, and typepad.com showed no instances of blog spam    in 20 blogs samples. Among 20 MySpace blogs I have discovered 1 splog, and Wordpress.com    sample contained 2. The popular Google’s service Blogspot has confirmed    its unofficial name of <span style="font-weight: bold">Splogspot </span>with 50% spam ratio. ‘The Rest’    category comprised by standalone blogs and blogs attached to commercial sites    showed even bigger proportion of blog spam: 23 blogs of 27 reviewed were classified    as spam. The relatively low number of splogs hosted by public services can be    explained by anti-spam actions taken by the administration of such services.    The standalone splogs, however, are not subject to such moderation, which allows    them to thrive producing tons of junk content for SE crawlers and overloading    ping servers with spam pings.</p>
<p>As you might have noticed I used the same style of charts introduced by the    famous blog <a target="_blank" href="http://www.modernlifeisrubbish.co.uk/">ModernLifeIsRubbish.co.uk</a>,    which has an excellent tutorial on <a target="_blank" href="http://www.modernlifeisrubbish.co.uk/article/howto-make-pretty-pie-charts">how    to create pretty pie charts in Adobe Illustrator</a>. Highly recommended!</p>
<p>If anybody is interested, here is the dataset I used: <a href="http://www.seoresearcher.com/files/WeblogsDataset.xls">Dataset</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.seoresearcher.com/how-much-blog-spam-a-study-of-a-ping-dataset.htm/feed</wfw:commentRss>
		</item>
		<item>
		<title>Articles Directories List: Ordered by Alexa Rating</title>
		<link>http://www.seoresearcher.com/articles-directories-list-alexa-rating-ordered.htm</link>
		<comments>http://www.seoresearcher.com/articles-directories-list-alexa-rating-ordered.htm#comments</comments>
		<pubDate>Wed, 31 Jan 2007 16:19:36 +0000</pubDate>
		<dc:creator>oleg.ishenko</dc:creator>
		
		<category><![CDATA[Search Engine Optimization]]></category>

		<category><![CDATA[Website is a Marketing Being]]></category>

		<guid isPermaLink="false">http://www.seoresearcher.com/articles-directories-list-alexa-rating-ordered.htm</guid>
		<description><![CDATA[Are  article submissions worth the time? When doing manual submissions  it takes me 10 to 15 minutes in average to login, format and submit an article  to an article directory. I have to make at least 10 submissions to get a feasible  exposure for my articles. So this process can take [...]]]></description>
			<content:encoded><![CDATA[<p><img width="225" height="185" align="left" src="http://www.seoresearcher.com/images/articles/article-dirs.jpg" />Are  <strong>article submissions</strong> worth the time? When doing manual submissions  it takes me 10 to 15 minutes in average to login, format and submit an article  to an article directory. I have to make at least 10 submissions to get a feasible  exposure for my articles. So this process can take more than two hours a day!  To make the most of my time I have to make sure that the article directories I  am submitting to, are able to bring me as many visitors and backlinks as possible.  I can name top five directories: <a href="http://ezinearticles.com/">EzineArticles.com</a>,  <a href="http://www.buzzle.com/">Buzzle.com</a>, <a href="http://www.goarticles.com/">GoArticles.com</a>,  <a href="http://www.articlesfactory.com/">ArticlesFactory.com</a> and <a href="http://www.webpronews.com/">WebProNews.com</a>.  Submitting to these is a must! EzineArticles.com and Buzzle.com can bring you  a lot of traffic, GoArticles.com brings you backlinks, and ArticlesFactory.com  – PageRank (my profile page there now is PR5 – with only 12 submitted  articles).</p>
<p><span id="more-50"></span>But five top directories are not enough. As I said above, I usually do at least    10 manual submissions, so I have to choose my directories wisely. The only two    ratings easily available to web users to judge the quality of websites and webpages    are <a href="/www.alexa.com">Alexa Rating</a> and Google <strong>PageRank</strong>.    Both are flawed and often do not reflect the real authority or traffic of a    page. But since there is nothing better I have to stick to these two when deciding    if a directory is worth submitting to. Of course, one must not forget the power    of themed sites. Whenever I have a choice between a general directory and a    <strong>directory focusing on my topic</strong> – I choose the latter.</p>
<p>To make the choosing process easier I have obtained a list of article directories,    their names and URLs. Yesterday, I downloaded Eclipse and wrote a simple Java    application that queries <a href="http://www.alexa.com/site/devcorner/web_info_services">Alexa    Web Service</a> for rating data, and makes <a href="http://www.trynt.com/trynt-google-pagerank-api/">PageRank    lookups</a>. I&#8217;ve run this application on my list this morning and here are the    top results with Alexa rating < 100,000 sorted ascending:</p>
<table width="650" cellspacing="1" cellpadding="4" bgcolor="#333300">
<tr bgcolor="#ccff99">
<td>Directory Name</td>
<td>URL</td>
<td align="right">Alexa Rating</td>
<td>PageRank</td>
</tr>
<tr>
<td bgcolor="#ffffff">Ezine Articles.com</td>
<td bgcolor="#ffffff">http://www.ezinearticles.com</td>
<td bgcolor="#ffffff" align="right">450</td>
<td bgcolor="#ffffff" align="right">6</td>
</tr>
<tr>
<td bgcolor="#ffffff">GoArticles</td>
<td bgcolor="#ffffff">http://www.goarticles.com</td>
<td bgcolor="#ffffff" align="right">2465</td>
<td bgcolor="#ffffff" align="right">6</td>
</tr>
<tr>
<td bgcolor="#ffffff">Web Pro News</td>
<td bgcolor="#ffffff">http://www.webpronews.com</td>
<td bgcolor="#ffffff" align="right">2903</td>
<td bgcolor="#ffffff" align="right">7</td>
</tr>
<tr>
<td bgcolor="#ffffff">Site Reference</td>
<td bgcolor="#ffffff">http://site-reference.com</td>
<td bgcolor="#ffffff" align="right">3652</td>
<td bgcolor="#ffffff" align="right">0</td>
</tr>
<tr>
<td bgcolor="#ffffff">AD</td>
<td bgcolor="#ffffff">http://www.articledashboard.com</td>
<td bgcolor="#ffffff" align="right">4360</td>
<td bgcolor="#ffffff" align="right">6</td>
</tr>
<tr>
<td bgcolor="#ffffff">Free Articles</td>
<td bgcolor="#ffffff">http://www.topica.com/lists/free_articles</td>
<td bgcolor="#ffffff" align="right">5140</td>
<td bgcolor="#ffffff" align="right">0</td>
</tr>
<tr>
<td bgcolor="#ffffff">Articles Base Directory</td>
<td bgcolor="#ffffff">http://www.articlesbase.com</td>
<td bgcolor="#ffffff" align="right">6319</td>
<td bgcolor="#ffffff" align="right">5</td>
</tr>
<tr>
<td bgcolor="#ffffff">Articles.Web.Com</td>
<td bgcolor="#ffffff">http://www.articles.web.com</td>
<td bgcolor="#ffffff" align="right">8506</td>
<td bgcolor="#ffffff" align="right">4</td>
</tr>
<tr>
<td bgcolor="#ffffff">Gobala Krishnan</td>
<td bgcolor="#ffffff">http://articles.easywordpress.com</td>
<td bgcolor="#ffffff" align="right">14794</td>
<td bgcolor="#ffffff" align="right">2</td>
</tr>
<tr>
<td bgcolor="#ffffff">DirectoryGold Article Directory</td>
<td bgcolor="#ffffff">http://articles.directorygold.com</td>
<td bgcolor="#ffffff" align="right">25357</td>
<td bgcolor="#ffffff" align="right">4</td>
</tr>
<tr>
<td bgcolor="#ffffff">Afro Articles - Article        Marketing Directory</td>
<td bgcolor="#ffffff">http://www.afroarticles.com/article-dashboard/</td>
<td bgcolor="#ffffff" align="right">34900</td>
<td bgcolor="#ffffff" align="right">4</td>
</tr>
<tr>
<td bgcolor="#ffffff">InfoWizards Free Content Articles</td>
<td bgcolor="#ffffff">http://content.infowizards.com</td>
<td bgcolor="#ffffff" align="right">39158</td>
<td bgcolor="#ffffff" align="right">3</td>
</tr>
<tr>
<td bgcolor="#ffffff">Article Friendly</td>
<td bgcolor="#ffffff">http://www.articlefriendly.com</td>
<td bgcolor="#ffffff" align="right">41660</td>
<td bgcolor="#ffffff" align="right">3</td>
</tr>
<tr>
<td bgcolor="#ffffff">Article Submission</td>
<td bgcolor="#ffffff">http://www.articlewheel.com/</td>
<td bgcolor="#ffffff" align="right">43045</td>
<td bgcolor="#ffffff" align="right">5</td>
</tr>
<tr>
<td bgcolor="#ffffff">Top-Affiliate.com</td>
<td bgcolor="#ffffff">http://www.top-affiliate.com/articles</td>
<td bgcolor="#ffffff" align="right">46932</td>
<td bgcolor="#ffffff" align="right">4</td>
</tr>
<tr>
<td bgcolor="#ffffff">Free Articles for Reprint</td>
<td bgcolor="#ffffff">http://www.articles-hub.com</td>
<td bgcolor="#ffffff" align="right">49035</td>
<td bgcolor="#ffffff" align="right">6</td>
</tr>
<tr>
<td bgcolor="#ffffff">Article Ardvaark</td>
<td bgcolor="#ffffff">http://nero.byethost15.com</td>
<td bgcolor="#ffffff" align="right">49967</td>
<td bgcolor="#ffffff" align="right">0</td>
</tr>
<tr>
<td bgcolor="#ffffff">Submit Your New Article</td>
<td bgcolor="#ffffff">http://www.submityournewarticle.com</td>
<td bgcolor="#ffffff" align="right">52215</td>
<td bgcolor="#ffffff" align="right">4</td>
</tr>
<tr>
<td bgcolor="#ffffff">ArticleRich.com</td>
<td bgcolor="#ffffff">http://www.articlerich.com</td>
<td bgcolor="#ffffff" align="right">53604</td>
<td bgcolor="#ffffff" align="right">4</td>
</tr>
<tr>
<td bgcolor="#ffffff">ArticleCafe.net</td>
<td bgcolor="#ffffff">http://www.articlecafe.net</td>
<td bgcolor="#ffffff" align="right">58106</td>
<td bgcolor="#ffffff" align="right">3</td>
</tr>
<tr>
<td bgcolor="#ffffff">Your Free SAtellite</td>
<td bgcolor="#ffffff">http://www.your-free-satellite.com/index-2.html</td>
<td bgcolor="#ffffff" align="right">63374</td>
<td bgcolor="#ffffff" align="right">0</td>
</tr>
<tr>
<td bgcolor="#ffffff">The Add Articles Directory</td>
<td bgcolor="#ffffff">http://www.add-articles.com</td>
<td bgcolor="#ffffff" align="right">67840</td>
<td bgcolor="#ffffff" align="right">4</td>
</tr>
<tr>
<td bgcolor="#ffffff">Article Blotter</td>
<td bgcolor="#ffffff">http://www.articleblotter.com</td>
<td bgcolor="#ffffff" align="right">68054</td>
<td bgcolor="#ffffff" align="right">3</td>
</tr>
<tr>
<td bgcolor="#ffffff">1Article World</td>
<td bgcolor="#ffffff">http://www.1articleworld.com</td>
<td bgcolor="#ffffff" align="right">68395</td>
<td bgcolor="#ffffff" align="right">3</td>
</tr>
<tr>
<td bgcolor="#ffffff">ABC Article Directory</td>
<td bgcolor="#ffffff">http://www.abcarticledirectory.com/</td>
<td bgcolor="#ffffff" align="right">74153</td>
<td bgcolor="#ffffff" align="right">0</td>
</tr>
<tr>
<td bgcolor="#ffffff">eArticlesOnline.com</td>
<td bgcolor="#ffffff">http://www.earticlesonline.com</td>
<td bgcolor="#ffffff" align="right">84051</td>
<td bgcolor="#ffffff" align="right">4</td>
</tr>
<tr>
<td bgcolor="#ffffff">eArticlesOnline.com</td>
<td bgcolor="#ffffff">http://www.earticlesonline.com</td>
<td bgcolor="#ffffff" align="right">84051</td>
<td bgcolor="#ffffff" align="right">4</td>
</tr>
<tr>
<td bgcolor="#ffffff">dk-article</td>
<td bgcolor="#ffffff">http://www.article.com</td>
<td bgcolor="#ffffff" align="right">84513</td>
<td bgcolor="#ffffff" align="right">1</td>
</tr>
<tr>
<td bgcolor="#ffffff">ArticleSnatch - The        Best Place to Grab Art<span style="display: none">icles</span></td>
<td bgcolor="#ffffff">http://www.articlesnatch.com</td>
<td bgcolor="#ffffff" align="right">88123</td>
<td bgcolor="#ffffff" align="right">4</td>
</tr>
<tr>
<td bgcolor="#ffffff">Talkin Mince Article Directory</td>
<td bgcolor="#ffffff">http://www.talkinmince.com</td>
<td bgcolor="#ffffff" align="right">89689</td>
<td bgcolor="#ffffff" align="right">4</td>
</tr>
<tr>
<td bgcolor="#ffffff">Just Articles</td>
<td bgcolor="#ffffff">http://www.JustArticles.com/</td>
<td bgcolor="#ffffff" align="right">93655</td>
<td bgcolor="#ffffff" align="right">5</td>
</tr>
<tr>
<td bgcolor="#ffffff">Article-Buzz - Free Article Directory</td>
<td bgcolor="#ffffff">http://www.article-buzz.com/</td>
<td bgcolor="#ffffff" align="right">94347</td>
<td bgcolor="#ffffff" align="right">4</td>
</tr>
<tr>
<td bgcolor="#ffffff">Free Ezine Articles Site</td>
<td bgcolor="#ffffff">http://freezinesite.com</td>
<td bgcolor="#ffffff" align="right">94529</td>
<td bgcolor="#ffffff" align="right">4</td>
</tr>
<tr>
<td bgcolor="#ffffff">Tips Tricks Resource Portal</td>
<td bgcolor="#ffffff">http://www.tips.com.my</td>
<td bgcolor="#ffffff" align="right">95787</td>
<td bgcolor="#ffffff" align="right">5</td>
</tr>
</table>
<p>The complete table has over 550 directories and can be downloaded as a <a href="http://www.seoresearcher.com/files/article-dirs.txt">    tab delimited text file</a> or as an <a href="http://www.seoresearcher.com/files/article-dirs.xls">Excel    sheet</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.seoresearcher.com/articles-directories-list-alexa-rating-ordered.htm/feed</wfw:commentRss>
		</item>
		<item>
		<title>Google&#8217;s New Algorithm to Rank Pages and Detect Spam: &#8220;PhraseRank&#8221;?</title>
		<link>http://www.seoresearcher.com/googles-new-algorithm-to-rank-pages-and-detect-spam-phrase-rank.htm</link>
		<comments>http://www.seoresearcher.com/googles-new-algorithm-to-rank-pages-and-detect-spam-phrase-rank.htm#comments</comments>
		<pubDate>Sun, 07 Jan 2007 01:32:44 +0000</pubDate>
		<dc:creator>oleg.ishenko</dc:creator>
		
		<category><![CDATA[Search Engine Optimization]]></category>

		<category><![CDATA[Search Engines Technology]]></category>

		<guid isPermaLink="false">http://www.seoresearcher.com/googles-new-algorithm-to-rank-pages-and-detect-spam-phrase-rank.htm</guid>
		<description><![CDATA[Will    the system described in the recent    Google’s patent become a new ranking algorithm    to augment the existing PageRank?
PhraseRank
From the very beginning, Google’s distinctive feature was the hyperlink    induced popularity ranking. Algorithms using text content    to evaluate relevancy of web [...]]]></description>
			<content:encoded><![CDATA[<p><img width="270" height="210" align="left" alt="Phrase Rank" src="http://www.seoresearcher.com/images/articles/phrase-rank.jpg" />Will    the system described in the <a target="_blank" href="http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&#038;Sect2=HITOFF&#038;u=%2Fnetahtml%2FPTO%2Fsearch-adv.html&#038;r=1&#038;p=1&#038;f=G&#038;l=50&#038;d=PG01&#038;S1=20060294155.PGNR.&#038;OS=dn/20060294155&#038;RS=DN/20060294155">recent    Google’s patent</a> become <strong><strong>a </strong>new ranking algorithm</strong>    to augment the existing <strong>PageRank</strong>?</p>
<h2>PhraseRank</h2>
<p>From the very beginning, Google’s distinctive feature was the <strong>hyperlink    induced popularity ranking</strong>. Algorithms using <strong>text content</strong>    to evaluate relevancy of web documents played much lesser role. The reasons    to this disparity are purely pragmatical: authors of web documents have total    control over their content and are at liberty to modify it to<strong> deceive    ranking algorithms </strong>and get higher positions in search results. Hyperlinks    however are much less influenced by webmasters and provide a more reliable measure    of authority (link weight) and relevance (link anchor).</p>
<p>Now Google introduces a new way to evaluate relevancy of a web document based    on its content which might prove itself to be <strong>immune to manipulation    attempts</strong><span id="more-49"></span> such as adjusting the keyword density or the automated generation    of keyword-rich web pages. Actually the new system can become a remedy against    <strong>MFA</strong> (Made For AdSense) sites that display meaningless scrapped    keyword-rich content with paid contextual advertisements.</p>
<div id="advertical"><!--adsense#vertical_post--></div>
<p>The new indexing and ranking system is based on the use of <strong>phrases</strong>.    From a user’s point of view search queries in most cases are phrases or    ‘concepts’, rather than sets of keywords. Despite this, conventional    indexing systems still rely on <strong>individual terms</strong>. Indexing of    phrases is avoided because the identification of all possible combinations of    words would require immense computational and memory resources. For example    a lexicon of 200,000 unique words could have approx. 3.2&#215;10<sup>26</sup> phrases    – with no system capable to store such a great amount of data in memory    or efficiently manipulate it.</p>
<p>This problem is solved in the new system, which identifies phrases that are    sufficiently frequent and distinguished in the crawled documents. By detecting    phrases and indicating that they are ‘valid’ the system can identify    multiple word phrases. This eliminates the need to index all the possible combinations    of words in phrases that vary in length.</p>
<p>Another important feature is the ability of phrases to predict the presence    of other phrases in a webpage. For example a phrase ‘<em>President of    the United States</em>’ indicates that the document most likely contains    the phrase ‘<em>White House</em>’. For every phrase the system creates    a corresponding list of related phrases ordered according to their significance.    This enables the system to detect spam pages based on the excessive appearance    of related phrases.</p>
<p>So how does the system work?</p>
<h2>Indexing</h2>
<p>The process of indexing includes identification of phrases and related phrases.    The system analyses the sequences of words and marks them as ‘good’    or ‘bad’ phrases. ‘Good’ phrases are those that occur    quite frequently across the indexed documents or have a distinguished appearance,    e.g. are delimited by markup tags, punctuation or other markers. Another distinguishing    feature is the ability of a ‘good’ phrase to <strong>predict a related    phrase</strong> – such as in above example ‘<em>President of the    United States</em>’ predicts ‘<em>White House</em>’. Some    phrases, for example, idioms (‘<em>out of the blue</em>’, <em>‘sitting    ducks</em>’ etc) tend to appear with different and unrelated phrases,    and are not able to predict anything. Therefore idioms and colloquisms don’t    count as ‘good’ phrases.</p>
<p>At the end of the indexing process the system produces a list of valid phrases    along with a co-occurrence matrix as a predictive measure. An estimated size    of the list is 650,000 phrases.</p>
<p>List of good phrases, or <strong>posting list</strong> has the following structure:</p>
<pre>Phrase i: list:(document d, [list: related phrase count][related phrase information])</pre>
<p>For each phrase <em>i</em> there is a list of documents d containing <em>i</em>.    For each document there is the number of occurrences of the phrases related    to <em>i</em>, and a bit vector containing the information about related phrases.</p>
<p><strong>Bit vector</strong> consists of pair of bits. In each pair the value    1 in the first position indicates that a related phrase <em>k</em> is present    in the document <em>d</em>; otherwise the value is 0. The second position indicates    if a phrase <em>l</em> related to phrase<em> k</em> is present. The related    phrases<em> l</em> of related phrases <em>k</em> are called ‘<em>secondary    related phrases of i</em>&#8216;. Bit vector is very important as it is used to determine    relevancy of a document when the search results are ranked.</p>
<h3>Example of a bit vector</h3>
<pre>Phrase <em>i</em>: document <em>d</em>: [related phrase counts:{3,4,3,0,0,2,1,1,0}]</pre>
<pre>related phrase bit vector:={11 11 10 00 00 10 10 10 01}</pre>
<p>For phrase <em>i</em> there are 9 related phrases <em>k</em>. Now take a look    at the bit vector. First pair indicates that both related phrase <em>k<sub>1</sub></em>    and one of its related phrases <em>l</em> are present in the document. Fourth    and fifth pairs show that neither <em>k<sub>4</sub></em> and <em>k<sub>5</sub></em>    nor their related phrases <em>l</em> are found, The last pair shows that although    there is no occurrence of phrase <em>k<sub>9</sub></em> one of its related phrases    l is present.</p>
<p>For each phrase <em>i</em> the documents <em>d</em> are sorted in declining    order according to the information retrieval-type score assigned to them with    respect to the given phrase. This pre-ranking significantly improves performance    of the system. To calculate ranking score the system can employ a link-popularity    algorithm such as PageRank.</p>
<p align="left"><img width="405" height="310" alt="Phrase Identification Process" src="http://www.seoresearcher.com/images/articles/phrase-identification.jpg" /></p>
<p align="left"><em>Phrase Identification. For a detailed description of the process    please refer to <a target="_blank" href="http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&#038;Sect2=HITOFF&#038;u=/netahtml/PTO/search-adv.html&#038;r=1&#038;p=1&#038;f=G&#038;l=50&#038;d=PG01&#038;S1=20060294155.PGNR.&#038;OS=dn/20060294155&#038;RS=DN/20060294155">[1]</a>    (paragraphs 0026 - 0102)</em></p>
<h2>Searching</h2>
<p>The search system receives a query and identifies phrases in it. Once the set    <em>Q</em> of query phrases in created; the system retrieves posting lists for    the query phrases in <em>Q</em>. Posting lists are intersected to determine,    which documents appear on more than one list.</p>
<h2>Phrase Based Document Ranking</h2>
<p>Documents can be ranked according to their<strong> bit vector values</strong>.    A document containing the most relevant phrases has the highest bit vector value    and gets the highest ranking. Note that this approach uses the <strong>information    about related phrases</strong> to rank search results, so even documents with    low frequency of the query phrase <em>q</em> can get high rankings provided    they have sufficiently high frequency of related phrases.</p>
<p>To produce the final ranking score the ‘<strong>body hit</strong>’    scores calculated above are combined with ‘<strong>anchor hit</strong>’    scores in a form of a linear function with adjustable weights, e.g.</p>
<pre>Rank = (body hit score)*weight1 + (anchor hit score)*weight2.</pre>
<p>For each phrase the indexing system also creates lists of documents in which    the given phrase is an <strong>anchor</strong> in incoming and outgoing links.    So the <strong>anchor hit</strong> score for document <em>d</em> can be calculated    as a function of the related phrase bit vectors of the query phrases <em>Q</em>,    where <em>Q</em> is an anchor term in a document that references document <em>d</em>.</p>
<h2>Detecting Spam Documents</h2>
<p>The new phrase based approach enables the future indexing system to detect    and penalize spam documents. A statistical analysis of the document collection    shows that normally a web page contains 8 to 20 related phrases. A spam document    that deceives a search ranking system with an inflated keyword density is expected    to contain an excessive number of related phrases, like 100 and more. Therefore    by identifying deviations from the expected number of related phrases can be    used to detect and battle spam in search results.</p>
<p>This system can also be applied to identify automatically generated content    intended to be displayed along with paid contextual advertisements. Such sort    of content is often used in MFA (Made for AdSense) sites and is nothing more    than a meaningless sequence of keyword-rich text blocks scrapped from other    websites, RSS feeds or search engine results pages. Although the conventional    indexing systems are already quite effective in preventing these sites from    showing in search results for popular terms, they still can occasionally appear    in results for long-tail terms.</p>
<h2>To Sum Up</h2>
<p>The new indexing and ranking system proposed by Google uses page content (<strong>phrases</strong>)    to rank search results in a way that is highly immune to manipulation attempts.    The properties of a web document used to rank documents, i.e. phrases and relations    between them, are influenced by the properties of all the other documents in    the index, and therefore are out of control of webmasters.</p>
<p>The phrase based approach also enhances the ability of search engines to detect    unnatural patterns in text content, such as inflated keyword density or scrapped    content. It also enables search engine to provide more topically focused results    by culling documents covering multiple topics.</p>
<p>The new approach can be used as an augmentation to the existing link-popularity    based ranking systems as an additional parameter in the final score formula.    Link popularity values are also used to pre-rank documents in posting lists    to improve the performance of the search system.</p>
<h2>Reference:</h2>
<p>1. Patterson, A.L. &#8220;<a target="_blank" href="http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&#038;Sect2=HITOFF&#038;u=/netahtml/PTO/search-adv.html&#038;r=1&#038;p=1&#038;f=G&#038;l=50&#038;d=PG01&#038;S1=20060294155.PGNR.&#038;OS=dn/20060294155&#038;RS=DN/20060294155">Detecting    spam documents in a phrase based information retrieval system</a>&#8220;, United    States Patent Application, 12.28.2006</p>
]]></content:encoded>
			<wfw:commentRss>http://www.seoresearcher.com/googles-new-algorithm-to-rank-pages-and-detect-spam-phrase-rank.htm/feed</wfw:commentRss>
		</item>
		<item>
		<title>Guidelines to a Perfect Link Exchange Scam</title>
		<link>http://www.seoresearcher.com/guidelines-to-a-perfect-link-exchange-scam.htm</link>
		<comments>http://www.seoresearcher.com/guidelines-to-a-perfect-link-exchange-scam.htm#comments</comments>
		<pubDate>Sun, 24 Dec 2006 23:14:46 +0000</pubDate>
		<dc:creator>oleg.ishenko</dc:creator>
		
		<category><![CDATA[Link Popularity Algorithms]]></category>

		<category><![CDATA[Search Engine Optimization]]></category>

		<guid isPermaLink="false">http://www.seoresearcher.com/48.htm</guid>
		<description><![CDATA[Reciprocal  link exchange still is an important strategy of link popularity building  despite all the measures taken by the search engines to diminish its effect. Back  in 1999-2001 obtaining a quality link exchange was not difficult, and webmasters  used to respond more willingly to an e-mail request. But as more people [...]]]></description>
			<content:encoded><![CDATA[<p><img width="190" height="160" align="left" alt="Reciprocal link exchange" src="http://www.seoresearcher.com/images/articles/link-exchange.jpg" /><strong>Reciprocal  link exchange</strong> still is an important strategy of link popularity building  despite all the measures taken by the search engines to diminish its effect. Back  in 1999-2001 obtaining a quality link exchange was not difficult, and webmasters  used to respond more willingly to an e-mail request. But as more people became  aware of this strategy so the <strong>reciprocal linking scam</strong> started  to be a common practice.</p>
<p>Sometimes I check my old ‘link exchange’ e-mail account I used to build    link popularity for my very first website. There are lots of people contacting    me daily with exchange proposals. Well, not actually people – they are    mostly <strong>bots</strong>.</p>
<p>Probably one of the reasons I still maintain that e-mail is that those requests    are a source of a <strong>persistent amusement</strong> for me. One example:    a request in pink letters with images of dancing puppies and bouncing hearts    written by a ‘blond chick’ (picture attached) asking me to link    to her pharmacy site! Or maybe I just enjoy reading the admiring comments on    the outlook and content of my site that precede every exchange proposal?</p>
<p>Link exchange scam is an interesting theme for a study <em>per se</em> and    still awaits its researchers. But in the meanwhile the SEO community is being successful    in summarizing the <strong>guidelines for the most perfect link exchange scam</strong>.</p>
<p><span id="more-48"></span></p>
<h2>Filing an Exchange Request</h2>
<div id="advertical"><!--adsense#vertical_post--></div>
<ul>
<li>Send an automated e-mail request or use a bot to submit it via an online      contact form. Combining the both methods is preferred whenever possible.</li>
<li>Use a free e-mail account such as Gmail, or better yet, some foreign free      e-mail service to send your message. Sending a duplicate request from your      company account is also beneficial.</li>
<li>Do send follow up e-mails. Sooner or later your victim will give up and      read one of them.</li>
<li>Send minimum 100-300 automated requests every day. Push your mail server’s      spam detection to the limits.</li>
<li>Make sure that the website you are trying to contact is absolutely unrelated      to your field.</li>
<li>Send your request to every e-mail address you can find on the target site.      Let the sales or customer support guys forward them to the webmaster.</li>
</ul>
<h2>Writing Your Request</h2>
<ul>
<li><em>Address properly</em>. No names required. Best thing is to use the website’s      title or at least the URL: “<em>Dear Blue Cheap Online Widgets</em>”,      or “<em>Hello www.bluewidgets.com</em>”</li>
<li><em>Kiss ass</em>. Tell your victims how much you adore their websites.      Do use superlatives.</li>
<li><em>Inform</em>. Let your recipients know how important PageRank and incoming      links are. Go in depth with the mysteries and magnificence of the PageRank      and how the high PageRank will ensure them the first positions in Google.</li>
<li><em>Scare</em>. Notify them that their link popularity is low, and their      positions in search engines are threatened.</li>
<li><em>Share a secret</em>. Tell them that the three-way linking is more effective,      since search engines detect and ignore two-way links.</li>
<li><em>Threaten</em>. Notify them that their link will be removed from your      high quality directory if they do not provide a link back in the specified      number of days.</li>
<li><em> Show your scale</em>. Make your message easily detectable as a bulk      sending by setting a different font size and color for the recipient’s      address and site name.</li>
<li><em>Be unofficial</em>. Use the Internet argot in your e-mail. Like ‘u      r’ instead of ‘you are’. This is the Internet – formalism      is unacceptable.</li>
<li><em>Threaten them again</em>. With hundreds of reminder e-mails.</li>
<li><em>Use a girl’s name</em>. Most webmasters are male and should not      resist a lady asking for a favor.</li>
</ul>
<h2>Prepare a Sound Links Page</h2>
<ul>
<li>Your link page must have at least 100 outgoing links, preferably uncategorized.      Make sure that minimum 50% point to pharmacy and gambling websites.</li>
<li>Your proposed links page has to be deeply buried in a keyword-rich URL      like: <em>http://www.yoursite.com/widgets/cheap-widgets/amazingly-cheap-widgets/widgets-links/</em></li>
<li>Make sure the links page URL contains at least one poison keyword like      ‘<em>links</em>’, ‘<em>partners</em>’, ‘<em>directory</em>’,      or ‘<em>exchanges</em>’.</li>
<li>Alternatively provide a dynamic URL with a minimum of 100 characters of      meaningless parameter values.</li>
<li>Choose links pages that are in Google’s supplementary index.</li>
<li>The PageRank for your page has to be between 0 and 3 with 0 being the best.</li>
<li>Make your page look more credible by putting AdSense ads on it. “<em>Well,      if Google approves this page, then it is worth having a link from it</em>”.</li>
<li>Disguise your low PR links pages by opening them in a high PR frame.</li>
<li>Orphan pages are the best.</li>
</ul>
<h2>Use SEO Tricks</h2>
<ul>
<li>Link to your partners using one of the following options:
<ul>
<li>‘nofollow’ attribute</li>
<li>javascript links</li>
<li>302 ‘Found’ redirects</li>
</ul>
</li>
<li>Edit <em>robots.txt</em> to restrict spiders from indexing your links pages.</li>
<li>Double protect your links pages from indexing by adding meta ‘<em>noindex,nofollow</em>’      tags.</li>
</ul>
<p>The above guidelines are compiled from my own experience and the hilarious    thread ‘<a target="_blank" href="http://www.webmasterworld.com/forum12/3154.htm">SEO    Link Exchange</a>’ from the <strong>WebMasterWorld</strong> forum.</p>
<p>The list can be continued. Any suggestions?</p>
<p><em>Related Link:</em>  <a href="http://newrich.wordpress.com/2008/02/01/are-the-investor-concierge-deals-any-good/">Nouveau Riche Scam</a> Top ten RE scams</p>
]]></content:encoded>
			<wfw:commentRss>http://www.seoresearcher.com/guidelines-to-a-perfect-link-exchange-scam.htm/feed</wfw:commentRss>
		</item>
		<item>
		<title>Emerging SEM Markets: Portugal</title>
		<link>http://www.seoresearcher.com/emerging-sem-markets-portugal.htm</link>
		<comments>http://www.seoresearcher.com/emerging-sem-markets-portugal.htm#comments</comments>
		<pubDate>Wed, 20 Dec 2006 01:31:09 +0000</pubDate>
		<dc:creator>oleg.ishenko</dc:creator>
		
		<category><![CDATA[Website is a Marketing Being]]></category>

		<guid isPermaLink="false">http://www.seoresearcher.com/emerging-sem-markets-portugal.htm</guid>
		<description><![CDATA[Recently I was approached by a colleauge from Portugal who offered me the following    article on SEO and online advertising market in his country. I am gladly publishing    this report by Nuno Hipólito    here.
Why invest in SEM in small to medium European markets
According    [...]]]></description>
			<content:encoded><![CDATA[<p>Recently I was approached by a colleauge from Portugal who offered me the following    article on SEO and online advertising market in his country. I am gladly publishing    this report by <a target="_blank" href="http://www.searchmarketing.pt">Nuno Hipólito</a>    here.<span id="more-47"></span></p>
<h3>Why invest in SEM in small to medium European markets</h3>
<p><img width="353" height="333" align="left" alt="Portuguese online advertising in 2005" src="http://www.seoresearcher.com/images/articles/portuguese-online-advertising.GIF" />According    to <strong>SEMPO</strong>, the Search Engine Marketing Professionals Organization,    the investment in SEM in the US will reach $11 billion in 2011.</p>
<p>Let’s face it. When it comes to new technologies and adjusting to new    technology, no one can bet the US. It’s a very big market and there is    an infrastructure built to progress new ideas and specially investment in new    industries.</p>
<p>The “start up” never works that well outside the US, and most of    the time it’s not even because of financial matters. Americans think “future”,    always, even if when they vote conservatively.</p>
<p>So SEM seemed a logical thing when search engines gained importance in the    sales process. People research things online – people will shop more and    more online or at least will make decisions based on web research.</p>
<p>But that’s easy.</p>
<p>Let’s talk SEM in <strong>emerging markets</strong>.</p>
<div id="advertical"><!--adsense#vertical_post--></div>
<p>When should one begin to think about SEO and PPC? Does it really pay to advertise    online, when most of the country you are targeting does not use the Internet    for shopping? Or when parents there are hesitant to discover the online world,    because they still look at the tv remote with suspicion?</p>
<p>When you talk about the UK, France or Germany the problem isn’t the same.    But let’s consider much smaller and underdeveloped markets: Portugal,    Greece, etc…</p>
<p>I can speak about the Portuguese market because our company is Portuguese.    We are considering other markets, but for now we have to stay within the confines    of Portugal.</p>
<p>When we have a meeting with a potential client, imagine a hotel manager, we    try to make him realize the benefits of investing online. He can attract costumers    at a very low cost, his website should work 24/7 in that task and he should    be proactive in influencing people’s minds when they search for hotels    near his.</p>
<p>He immediately looks at us and asks the killer question: “do people search    for hotels online?”.</p>
<p>We nod yes with a nervous smile. Sure they do.</p>
<p>Maybe not a lot of them, but some do. We can even tell you roughly how many.    And you can think about attracting foreign costumers at a low cost. And we can    provide estimates; promise certain results, concrete objectives. Do other marketing    campaigns give you that? As a cool side effect, your brand image will get a    makeover.</p>
<p>Hum… he looks interested, but unconvinced.</p>
<p>To give you an idea, the online advertising investment in Portugal was a mere    30 million euros in 2005 (23 million dollars). 5 million euros (3.8 million    dollars) in PPC ads.</p>
<p>No wonder the hotel manager is reluctant. No one invests in online marketing!</p>
<p>So why should he?</p>
<p>Two words: “<strong>Low, Low cost</strong>”. Ok, three words. Sure,    our market is small, but that means you can have a dominant position with a    smaller investment. And if you look forward, the market will grow, and your    company will be prepared. If you play your online cards right, you will be a    leader.</p>
<p>The risk is very small too. PPC can be done with very low budgets, as low as    1 euro a day (3.8 dollars). Yes, that’s for the entire daily budget, not    just a keyword. That’s emergent markets for you…</p>
<p><strong>Low cost = big results</strong>. That’s our pitch.</p>
<p>And even if the costs are low, we, as an SEM company will make sure they will    get even lower. And that the right keywords are researched, contents created,    new costumers attracted.</p>
<p>In 2006 the growth of online adverting in Portugal will be <strong>26%</strong>.    That’s massive. Above European average.</p>
<p>Did he know that people that research for the keyword “<em>holidays in    Lisbon</em>” could be interested in hotels? He didn’t. But they    are. We recommend a landing page with info about the city, interesting tourist    routes, where to eat, what shows to see…</p>
<p>He likes the idea.</p>
<p>We have experience doing SEO and PPC in the Portuguese market. We even do Spanish    PPC. So rest assure – we tell him – we’ll deliver you results,    measurable results in a short time and we have a long term plan for your online    future.</p>
<p>He finally looks convinced and smiles.</p>
<p>The nervous smile comes off our faces and we shake hands. It’s difficult    to get clients for SEM in Portugal, but it will only get easier in the future.</p>
<p>As for all emergent markets, the difficulty local SEM companies go through    are not that different. First educate your potential costumer and he will understand    your pitch. He should, because you give added value to his business, that’s    your role.</p>
<p>At the end of the day, he will have a smile on his face.</p>
<p>Slowly he will gain more costumers online. And when the market is mature, he    will think in disbelief how he didn’t see how important online adverting    would become.<br />
Nuno Hipólito<br />
SEO consultant.<br />
<a target="_blank" href="http://www.searchmarketing.pt">www.searchmarketing.pt</a></p>
<p>If you speak Portuguese, check out this site about SEO: <a target="_blank" href="http://esquilloseocontest.home.sapo.pt/">http://esquilloseocontest.home.sapo.pt/</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.seoresearcher.com/emerging-sem-markets-portugal.htm/feed</wfw:commentRss>
		</item>
		<item>
		<title>Making Money From Your Blog&#8217;s RSS Feed</title>
		<link>http://www.seoresearcher.com/making-money-from-your-blogs-rss-feed.htm</link>
		<comments>http://www.seoresearcher.com/making-money-from-your-blogs-rss-feed.htm#comments</comments>
		<pubDate>Tue, 19 Dec 2006 16:18:39 +0000</pubDate>
		<dc:creator>oleg.ishenko</dc:creator>
		
		<category><![CDATA[Website is a Marketing Being]]></category>

		<category><![CDATA[WordPress Blogging]]></category>

		<guid isPermaLink="false">http://www.seoresearcher.com/making-money-from-your-blogs-rss-feed.htm</guid>
		<description><![CDATA[Some  blog for fun, some blog for money, some blog for both. There are numerous options  to monetize a blog. AdSense ads, affiliate links, paid reviews, links to your  products – you name it. If your blog receives enough visitors you can start  making living online. To make the most of [...]]]></description>
			<content:encoded><![CDATA[<p><img width="200" height="125" align="left" alt="RSS Feedvertising" src="http://www.seoresearcher.com/images/articles/rss.jpg" />Some  blog for fun, some blog for money, some blog for both. There are numerous options  to monetize a blog. AdSense ads, affiliate links, paid reviews, links to your  products – you name it. If your blog receives enough visitors you can start  making living online. To make the most of your visitors you must keep in mind  where do they come from. Those who arrive to your blog from search engine results  or directed to you by links from other websites can see your pages fully. But  your revenue-generating ads and links are hidden for those who read your <strong>RSS  feeds</strong>. This means that your online money-machine loses click from a substantial  portion of your most loyal visitors. Is there a way to make money in RSS feeds?  Yes, try ‘<strong>feedvertising</strong>’</p>
<p><span id="more-46"></span></p>
<p><strong>Feedvertising</strong> is a technology that enables bloggers to run    text ads in their RSS feeds. One service I discovered lately that provides such    technology is <strong><a target="_blank" rel="nofollow" href="http://www.seoresearcher.com/jump.php?m=tla">Text    Links Ads</a></strong>. If you have a <strong>Wordpress blog</strong> you can    join the network, which already features such popular blogs as <a target="_blank" href="http://www.techcrunch.com/">TechCrunch</a>    or <a target="_blank" href="http://www.problogger.net/">Problogger</a>. See    an example of feedvertising: this is how an affiliate link looks like in Problogger’s    RSS:</p>
<p><img width="600" height="355"  alt="feedverising screenshot" src="http://www.seoresearcher.com/images/articles/feedvertising.gif" /></p>
<p><strong>Feedvertising</strong> is very flexible. You can choose your advertisers    (your affiliate links, your own products or advertisers suggested by <a target="_blank" rel="nofollow" href="http://www.seoresearcher.com/jump.php?m=tla">Text    Link Ads</a>), provide your own custom prefix to the ad, such as ‘<em>sponsored    by</em>’, ‘<em>thanks to our sponsor</em>’ or whatever you    like, you can write your own text after the link to express your opinion about    the advertised product or service. You can also let Text Links Ads to run paid    links not only in your RSS but also across your entire blog.<a target="_blank" rel="nofollow" href="http://www.seoresearcher.com/jump.php?m=tla">Text    Links Ads</a> provides you with a plugin customized to your blog which is easily    installed and managed just as any other WordPress plugin. Unfortunately this    also means that if you have a <strong>Blogger</strong> account you are not able    to use this service.</p>
<p>Feedvertising is<strong> not a contextual ads provider</strong> so you can    keep running your AdSense ads <strong>without violating the TOS</strong>. Your    payouts depend on the popularity of your blog, which is measured as a combination    of <em>Technorati</em> and <em>Alexa</em> rankings, and can be up to $250 per    month per link for the top publishers or $40-70 for moderately popular blogs.</p>
<p>For more information on creating an account in <a target="_blank" rel="nofollow" href="http://www.seoresearcher.com/jump.php?m=tla">Feedvertising</a>    as well as the instruction on setting up the plugin please refer to the excellent    video by <a target="_blank" href="http://www.tubetorial.com/feedvertising/">TubeTutorial</a>.
</p>
<p><em>Related Link:</em>  <a href="http://www.budgetplanners.net/">Debt consolidation</a> loans may not be your best option, Are you tempted to take out one big loan to pay off your various debts? Learn the Secret to debt free living Today!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.seoresearcher.com/making-money-from-your-blogs-rss-feed.htm/feed</wfw:commentRss>
		</item>
		<item>
		<title>Avoiding Keyword Stuffing Ban</title>
		<link>http://www.seoresearcher.com/average-keyword-saturation-google-msn-yahoo.htm</link>
		<comments>http://www.seoresearcher.com/average-keyword-saturation-google-msn-yahoo.htm#comments</comments>
		<pubDate>Mon, 18 Dec 2006 20:24:55 +0000</pubDate>
		<dc:creator>oleg.ishenko</dc:creator>
		
		<category><![CDATA[Search Engine Optimization]]></category>

		<guid isPermaLink="false">http://www.seoresearcher.com/average-keyword-saturation-google-msn-yahoo.htm</guid>
		<description><![CDATA[Average Keyword Saturation for Google, MSN and Yahoo
When deciding upon keyword placement we all try to get the most out of our target keywords saturation. In the same time no one wants to get penalized by accidentally inserting too many keywords in the page copy, or by including too many words between H1 tags. Since [...]]]></description>
			<content:encoded><![CDATA[<h2>Average Keyword Saturation for Google, MSN and Yahoo</h2>
<p>When deciding upon keyword placement we all try to get the most out of our target keywords saturation. In the same time no one wants to get penalized by accidentally inserting too many keywords in the page copy, or by including too many words between H1 tags. Since search engines would never publish the exact numbers for maximally alowed keyword frequency or keyword prominence, all we can do is just study top pages in SEPRs and make more or less informed guesses. Or we can conduct an experiment, and calculate the average numbers for top pages in the results of the major search engines: Google, Yahoo! and MSN. For the tables below I used data provided by WebPosition software, which calculates the average scores of the top 5 positions for dozens of keyword searches conducted by WebTrends Inc.<span id="more-22"></span></p>
<p>Of course aligning your parameters to the top averages will not guarantee you the high rankings, but it can ensure that your keyword saturation stays within the allowed boundaries.</p>
<h3>Google Averages</h3>
<p>Partial matching enabled,  Non-Exact Search,  Non-Case Sensitive</p>
<table width="90%" cellspacing="2" cellpadding="3" border="0" class="basic">
<tr>
<td style="width: 40%"><strong>Areas</strong></td>
<td style="width: 15%">Frequency</td>
<td style="width: 15%">Words</td>
<td style="width: 15%">Weight</td>
<td style="width: 15%">Average Prominence</td>
</tr>
<tr>
<td colspan="5"><strong>Head</strong></td>
</tr>
<tr>
<td>TITLE tag</td>
<td>1.0</td>
<td>6.7</td>
<td>72.0%</td>
<td>62%</td>
</tr>
<tr>
<td>META Desctiption tag</td>
<td>0.7</td>
<td>10.8</td>
<td>32%</td>
<td>65.2%</td>
</tr>
<tr>
<td colspan="5"><strong>Body</strong></td>
</tr>
<tr>
<td>Headings</td>
<td>0.5</td>
<td>7.9</td>
<td>32.1%</td>
<td>64.9%</td>
</tr>
<tr>
<td>Link Text</td>
<td>4.7</td>
<td>166.9</td>
<td>14.2%</td>
<td>55.8%</td>
</tr>
<tr>
<td>Hyperlink URL</td>
<td>9.2</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
</tr>
<tr>
<td>Body Text</td>
<td>9.7</td>
<td>618</td>
<td>7.8%</td>
<td>56.2%</td>
</tr>
</table>
<p>.</p>
<h3>MSN Averages</h3>
<p>MSN Averages: Partial matching  disabled,  Non-Exact Search,   Non-Case  Sensitive.</p>
<table width="90%" cellspacing="2" cellpadding="3" border="0" class="basic">
<tr>
<td style="width: 40%"><strong>Areas</strong></td>
<td style="width: 15%">Frequency</td>
<td style="width: 15%">Words</td>
<td style="width: 15%">Weight</td>
<td style="width: 15%">Average Prominence</td>
</tr>
<tr>
<td colspan="5"><strong>Head</strong></td>
</tr>
<tr>
<td>TITLE tag</td>
<td>1.0</td>
<td>6.2</td>
<td>81.0%</td>
<td>68.7%</td>
</tr>
<tr>
<td colspan="5"><strong>Body</strong></td>
</tr>
<tr>
<td>Headings</td>
<td>0.4</td>
<td>6.6</td>
<td>26.7%</td>
<td>64.6%</td>
</tr>
<tr>
<td>Link Text</td>
<td>2.2</td>
<td>135.1</td>
<td>8.1%</td>
<td>56.9%</td>
</tr>
<tr>
<td>Hyperlink URL</td>
<td>7.4</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
</tr>
<tr>
<td>Body Text</td>
<td>10.5</td>
<td>608.1</td>
<td>8.6%</td>
<td>58.0%</td>
</tr>
</table>
<p>.</p>
<h3>Yahoo! Averages</h3>
<p>Partial matching disabled,  Non-Exact Search,  Non-Case Sensitive.</p>
<table width="90%" cellspacing="2" cellpadding="3" border="0" class="basic">
<tr>
<td style="width: 40%"><strong>Areas</strong></td>
<td style="width: 15%">Frequency</td>
<td style="width: 15%">Words</td>
<td style="width: 15%">Weight</td>
<td style="width: 15%">Average Prominence</td>
</tr>
<tr>
<td colspan="5"><strong>Head</strong></td>
</tr>
<tr>
<td>TITLE tag</td>
<td>0.9</td>
<td>6.1</td>
<td>71.8%</td>
<td>65.7%</td>
</tr>
<tr>
<td>META Keywords tag</td>
<td>1.1</td>
<td>13.7</td>
<td>38.4%</td>
<td>70.5%</td>
</tr>
<tr>
<td>META Desctiption tag</td>
<td>0.5</td>
<td>9.3</td>
<td>25.9%</td>
<td>65.4%</td>
</tr>
<tr>
<td colspan="5"><strong>Body</strong></td>
</tr>
<tr>
<td>Headings</td>
<td>0.4</td>
<td>11.3</td>
<td>16.4%</td>
<td>72.6%</td>
</tr>
<tr>
<td>Link Text</td>
<td>2.4</td>
<td>142.4</td>
<td>3.3%</td>
<td>59%</td>
</tr>
<tr>
<td>Body Text</td>
<td>7.8</td>
<td>760.6</td>
<td>5.1%</td>
<td>57.2%</td>
</tr>
</table>
<p>Head - words between HEAD tags, this includes TITLE</p>
<p>Body - words between BODY tags including:</p>
<ul>
<li>Headings - words in H1, H2 and H3  tags</li>
<li>Link Text - anchor text of outgoing links</li>
<li>Hyperlink URL - words in URL of the outgoing links</li>
<li>Body Text - words in your page copy , excluding the content of ALT and COMMENT tags</li>
</ul>
<h2>Parameters&#8217; Definitions and Calculations</h2>
<h3>Frequency</h3>
<p>When defining keyword/key-phrase <strong>frequency</strong> we distinguish between <strong>exact</strong>, <strong>non-exact</strong> and <strong>partial matching</strong>. <strong>Exact matching</strong> means looking for the exact matches of a key-phrase. Exact matching is possible when user performs a search with quotation marks around the search terms. For example if the content of an H1 tag is <em>“Bahamian Paradise. Bahamas Islands: All inclusive Atlantis Bahamas Deals”</em> then the frequency of <em>“Atlantis Bahamas”</em> by exact match is 1 (one occurence). By <strong>non-exact matching</strong> the frequency for the same phrase is 1.5: 1 for one occurrence of <em>‘Atlantis’</em>, plus 2 for two occurrences of <em>‘Bahamas’</em> divided by 2 – the number of words in the search phrase. <strong>Partial matching</strong> or <strong>keyword stemming</strong> also considers keyword modifications as matches. In this case the frequency for <em>“Atlantis Bahamas”</em> will be 2 – word <em>‘Bahamian’</em> is considered as a match to <em>‘Bahamas’</em>.</p>
<h3>Words</h3>
<p>This is simply the total number of words in the analyzed area. Be careful not to put too many words between H1 or H2 tags, or in link text, since it might be considered as spam.</p>
<h3>Keyword weight</h3>
<p>This parameter determines the degree to which a specific keyword or phrase dominates in any given area. This parameter is calculated by multiplying number of words in the key-phrase by its frequency and dividing it by the total number of words in the area.</p>
<h3>Average Prominence</h3>
<p>This parameter shows how close are your keyword or phrase to the start of the area. Most of the search algorithms assign more weight to more prominent keywords, and therefore it is beneficial to have your targeted keywords in the top of the page or in the beginning of the page copy. However in order to avoid spam penalties the keyword distribution must be as natural as possible, and you might find it necessary to put a keyword in the middle or at the end of your page. Prominence calculation is:</p>
<ul>
<li>If a keyword appears at the beginning of an area, its prominence will be 100%.</li>
<li>If a keyword appears in the middle of an area, its prominence will be around 50%.</li>
<li>If the keyword appears at the beginning of the area, then another repetition appears at the end of the area, the prominence would be 50%.</li>
<li>If the keyword appears at the end of the area, prominence would be 0%.</li>
<li>If the area consists of multiple parts (like having 3 heading tags on the page) then all three areas are treated as a single contiguous area when prominence is calculated.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.seoresearcher.com/average-keyword-saturation-google-msn-yahoo.htm/feed</wfw:commentRss>
		</item>
		<item>
		<title>How to Make a WordPress Blog Duplicate Content Safe</title>
		<link>http://www.seoresearcher.com/how-to-make-your-wordpress-blog-duplicate-content-safe.htm</link>
		<comments>http://www.seoresearcher.com/how-to-make-your-wordpress-blog-duplicate-content-safe.htm#comments</comments>
		<pubDate>Thu, 30 Nov 2006 23:05:03 +0000</pubDate>
		<dc:creator>oleg.ishenko</dc:creator>
		
		<category><![CDATA[Search Engine Optimization]]></category>

		<category><![CDATA[Tutorials]]></category>

		<category><![CDATA[WordPress Blogging]]></category>

		<guid isPermaLink="false">http://www.seoresearcher.com/how-to-make-your-wordpress-blog-duplicate-content-safe.htm</guid>
		<description><![CDATA[In  one of my recent posts I wrote about the duplicate content issue.  This topic is especially important to me since my blog uses the WordPress  content management system which, when used with the default configuration,  is not duplicate content proof. In fact this CMS is capable to  render almost [...]]]></description>
			<content:encoded><![CDATA[<p><img width="230" height="167" align="left" alt="Supplementary index" src="http://www.seoresearcher.com/images/articles/duplicated-2.jpg" />In  one of my recent posts I wrote about the <strong>duplicate content issue</strong>.  This topic is especially important to me since my blog uses the <strong>WordPress  content management system</strong> which, when used with the default configuration,  is <strong>not</strong> duplicate content proof. In fact this CMS is capable to  render almost <strong>100%</strong> of your content duplicate. As usual the fault of the system has roots in its advantages. WordPress has    many features facilitating blogging and linking, such as RSS feeds to posts    and comments, trackback URLs, monthly archives and so on. In the same time this    variety of URLs returning similar or identical pages represents a clear case    of duplicate content.<span id="more-45"></span></p>
<h2>WordPress And Duplicate Content</h2>
<div id="advertical"><!--adsense#vertical_post--></div>
<p>The first evidences of duplicate content produced by your WordPress CMS can    be found in your sidebar. They are<strong> category pages</strong> and <strong>monthly/daily    archives</strong>. Category pages store your articles posted under the same    topic – a category. Such pages have no unique content; they are just a    collection of your previous posts. Monthly and daily archives also simply group    your previous articles by the date of posting. Sometimes when you have only    one post in a given day, the archive page for the date and your post are totally    identical.</p>
<p>The next case of duplicate content is even more prominent. It can be your <strong>home    page</strong> itself. If it contains not excerpts but the full text of your    posts, then it duplicates your post pages. This also applies to the <em>‘next/previous    entries</em>’ pages – those accessible via <em>/page/2, /3, /4</em>    etc.</p>
<p><strong>Feeds</strong>. Search engine spiders crawl all the content they can    reach and of course this includes RSS feeds too. The additional problem with    them is that Google may choose to display your RSS URL in the search results    over the link to the original post. In this case the user who clicks this result    will see an XML formatted page which is not ‘human-friendly’.</p>
<p><strong>Trackback URLs</strong>. Many WordPress templates add trackback links    after posts. This links enable authors to track who links to their posts. Usually,    if your post URL looks like ‘<em>www.yoursite.com/2006-11-30/yourpost/</em>’    its trackback URL will be ‘<em>www.yoursite.com/2006-11-30/yourpost/trackback/</em>’.</p>
<p><strong>Identical meta-description</strong>. By default WordPress doesn’t    provide a tool to add unique meta description tags to your posts, and they either    have none or share a single site-wide description. Having no meta description    at all is a disadvantage, as a properly written one can make your snippet stand    out in a SERP. Having an identical description for all your pages is a <strong>threat</strong>,    as Google might get them filtered out as too similar. (see a thread <a target="_blank" href="http://www.webmasterworld.com/google/3131048.htm">here</a>)</p>
<p>Because of the duplicate content Google search can return less desired URLs    (such as feeds or archives instead of original posts); your pages can be moved    out of their index, or placed into the supplemental results, which are rarely    displayed to users.</p>
<h2>Solving the Duplicate Content Issue in WordPress</h2>
<h3>Adding ‘<em>noindex, follow</em>’ tags</h3>
<p>What can you do to avoid this problem? You can tell the search engines what    URL to index by using ‘<em>noindex, follow</em>’ meta tag, <em>robots.txt</em>    exclusions or <em>301 redirects</em>. Let’s say you want Google to index    your front page, posts, single pages and category pages and forbid the spiders    from crawling the content of archives, feeds and ‘<em>next entries</em>’    pages - <em>page/2, /3, </em>… To do this you have to add to your header.php    the following code:</p>
<div class='code_parent'>
<div class='code_title'>Code:</div>
<div class='code_child'><code>
<div class='pre_container'>
<pre>     if((is_home() &#038;&#038; ($paged &#60; 2 )) || is_single() || is_page() || is_category()){
echo '&#60;meta name="robots" content="index,follow" />';
} else {
echo '&#60;meta name="robots" content="noindex,follow" />';}</pre>
</div>
<p></code></div>
</div>
<p>For those not familiar with editing templates in WordPress: <em>in your dashboard    click <strong>Presentation</strong> menu item and after the new page is opened    – click <strong>Theme Editor</strong>. In the Theme Editor choose ‘<strong>header.php</strong>’    and then paste the above code into the editor form. This code has to be inserted    anywhere between  <strong>head</strong> tags </em>.</p>
<p>Here the <em><meta name="”robots”" content="”index," /></em>    tag is added to the home page but not the ‘<em>next entries</em>’    page <em>(is_home() and ($paged<2))</em>, to your posts <em>(is_single())</em>;    to solo pages, like ‘About me’, if you created any <em>(is_page())</em>;    and to category pages <em>(is_category())</em>. If you don’t want your    categories to be indexed just delete<em> || is_category()</em>. All the other    pages will get <em><meta name="”robots”" content="”noindex," /></em>. They will not be indexed, but this will not prevent    crawlers from following their outgoing links.</p>
<h3>Adding unique meta description</h3>
<p>For this purpose I use <a href="http://guff.szub.net/head-meta-description/">Head    Meta Description</a> plugin. This plugin can be configured to use an <strong>excerpt</strong>    of your post as a meta description – this is especially useful if you    have to add this tag to hundreds of existing pages. Or you can add your own    manually as a custom field, which is my personal preference.</p>
<h3>Using <em>more</em> tag</h3>
<p>By using this tag you tell WordPress to display only the first few lines of    your post. This greatly reduces the similarity of home page and your articles.    If you have too many existing posts to edit, you can use an ‘excerpt’    plugin, such as this one from <a href="http://www.semiologic.com/software/fancy-excerpt/%20">Semiologic</a></p>
<h3>Redirect to a canonical URL</h3>
<p>You should edit your<em> .htaccess</em> file to perform <em>301 redirects</em>.    Non-www addresses like <em>yoursite.com</em> should be redirected to <em>www.yoursite.com</em>.    URL without trailing slashes like <em>www.yoursite.com/category</em> should    be rewritten to include it: <em>www.yoursite.com/category/</em> This can be    done by inserting the following code into your <em>.htaccess</em> file:</p>
<p><em><br />
RewriteEngine On<br />
RewriteCond %{HTTP_HOST} !^www\.yoursite\.com$ [NC]<br />
RewriteRule ^(.*)$ http://www.yoursite.com/$1 [R,L]<br />
RewriteBase /<br />
RewriteCond %{REQUEST_FILENAME} !-f<br />
RewriteCond %{REQUEST_FILENAME} !-d<br />
RewriteRule . /index.php [L]<br />
</em></p>
<p>For more details I advise you to read this: <a target="_blank" href="http://httpd.apache.org/docs/2.0/misc/rewriteguide.html#url%20">the    process or rewriting the URL layout.</a></p>
<h3>Preventing spiders from crawling feeds and auxiliary pages</h3>
<p>For this purpose you should edit your <em>robots.txt</em> file by inserting    the following code</p>
<p><em>User-agent: *<br />
Disallow: /wp-<br />
Disallow: /search<br />
Disallow: /feed<br />
Disallow: /comments/feed<br />
Disallow: /feed/$<br />
Disallow: /*/feed/$<br />
Disallow: /*/feed/rss/$<br />
Disallow: /*/trackback/$<br />
Disallow: /*/*/feed/$<br />
Disallow: /*/*/feed/rss/$<br />
Disallow: /*/*/trackback/$<br />
Disallow: /*/*/*/feed/$<br />
Disallow: /*/*/*/feed/rss/$<br />
Disallow: /*/*/*/trackback/$</em></p>
<h3>Another two practical tips</h3>
<p>Some people find it useful to restrict the number of posts displayed in your    home page to 4-5, as less posts are duplicated.</p>
<p>A great <a target="_blank" href="http://codex.wordpress.org/Customizing_the_Read_More">article</a>    on customizing the <strong>more</strong> tag in Wordpress.</p>
<h2>To Sum Up:</h2>
<ul>
<li>To avoid the duplicate content issue in WordPress include you should do:</li>
<li>Add<em> ‘noindex, follow’</em> meta tag to your monthly/weekly/daily      archives, ‘<em>next entries</em>’, and if necessary, category      pages</li>
<li>Ensure that all your pages have unique meta-description tags</li>
<li>Set up <em>301 redirects</em> for your non-www URL and URLs without trailing      slashes</li>
<li>Restrict search engine crawlers from indexing your feeds and trackbacks</li>
<li>Use <strong>more</strong> tag to show excerpts in your home page instead      of full posts</li>
<li>Restrict the number of posts displayed in your home page</li>
</ul>
<h2>References:</h2>
<ul>
<li><a target="_blank" href="http://www.webmasterworld.com/google/3097706.htm">WordPress      And Google: Avoiding Duplicate Content Issues</a> thread in the WebmasterWorld      forum</li>
<li><a href="http://www.webmasterworld.com/google/3084893.htm">Google indexing      /feed URLs Issues</a> thread in the WebmasterWorld forum</li>
<li><a target="_blank" href="http://www.beyondink.com/howtos/301-redirect.html">301      Redirect – a How-to</a> by Beyondinc.com</li>
<li><a target="_blank" href="http://httpd.apache.org/docs/2.0/misc/rewriteguide.html">URL      Rewriting Guide</a></li>
</ul>
<p><!--reddit_2--><br />
<!--reddit_2--></p>
]]></content:encoded>
			<wfw:commentRss>http://www.seoresearcher.com/how-to-make-your-wordpress-blog-duplicate-content-safe.htm/feed</wfw:commentRss>
		</item>
	</channel>
</rss>
