SEO Articles

Duplicate ContentTake a look at your website. How much of your content might be considered as duplicate by a search engine algorithm? Even though you never copy anyone you can’t answer ‘none’ because someone can be copying you. Duplicate content is one of the biggest issues both for search engines trying to keep their results’ relevancy high, and webmasters trying to avoid search engine penalties.

Penalties for having duplicate content can be really harmful. This is not just a downgrade in rankings but a move to supplementary results which are hardly visible to the most of the web users. Normally it is expected that Google would select one URL over another to display in SERPs, while duplicates could be found in supplemental results. Unfortunately this is not always so. In this thread [1] of the WebmasterWorld forum you can read about a case when an original high quality and authoritative page was removed from Google’s index together with its duplicates. Considering that this can happen even to the most honest webmaster, one can imagine the amount of attention this issue gets on any SEO forum.

Types of Duplicate Content

Duplicate content has a wider definition than the ‘copy-paste’ plagiarism; it is not just content scrapped from a competitor’s site, a SERP or a RSS feed. Apart from this there are few more aspects that are generally referred to as duplicate content.

Circular Navigation

Jake Baille from TrueLocal vaguely defines circular navigation as having multiple paths across website [2]. This can be understood as the same content being accessible via different URLs. An example of the circular navigation could be an article that is retrieved by links like
- www.example.com/articles/1/ ,
- www.mysite.com/article1/
- www.mysite.com/articles.php?id=1

Another legitimate use of multiple URLs is forum threads. Each thread can be accessible by a link like www.myforum.com/index.php/topic.1201.html , and each message within the tread has a URL like www.myforum.com/index.php/topic.1201.msg.01.html . In the eyes of a search engine all the links lead to different pages with identical content. Solution? Think of a consistent way of linking, or apply robots.txt exclusion rules.

This can also be the case when other people link to you using differently looking URLs. Since these external links are out of your control, you should create a 301 redirect to the canonical URL you choose to be displayed. A tutorial on 301 redirects can be found here [3].

.

Printer-Friendly Versions

Making a printer friendly version is a common practice and it adds value to the visitors. But printer-friendly version is also a prominent example of duplicate content! Fortunately a simple solution like adding a ‘noindex’ meta tag to your print pages solves the issue.

Product-Only Pages

Product pages looking similar are common among online stores. Typically they are created using a single template. Often two different product pages share a description that varies in just few words or numbers, which causes them to be filtered out as duplicate content. This issue has no easy solution. Either you rewrite robot.txt to allow only one product description to be crawled and lose SE traffic to the rest of them, or you roll up your sleeves and add something different to each product page, like testimonials, which is time consuming or nearly impossible depending on the number of product types in your stock.

How Do Duplicate Content Filters Work?

There are several algorithms in data mining aiming to detect similar text passages. The one claimed to be used by search engines [2] is w-shingling [4]. Each document has a unique fingerprint or shinglings – the contiguous subsequences of tokens (blocks of text). The ratio of magnitude of union and intersection of two documents’ shinglings can be used to determine their resemblance. Other algorithms that can be used for duplicates detection are Levenshtein’s distance [5] and Soundex [6].

It is naturally to expect from a duplicate content filter to be able to discover the origin and rank it higher. The simplest way to detect the origin would be comparing the date of indexing implying that the original source is uploaded and crawled earlier than its copies. But with the advent of the RSS feeds the new content can be distributed instantaneously and this approach is no longer valid.

Concerning the origin’s right to be ranked higher – this is not always implemented. In this article [9] you can read about an experiment of an article distribution. An article was syndicated twice scoring as many as 19000 copies. After some time Google, Yahoo and MSN have purged their indices leaving just few of the duplicates. MSN’s filter managed not only to discover the origin but also put it to the top of the search results. Yahoo has also discovered the origin, but in the results page to the title of the article, the origin’s position fluctuated obviously responding to the way Yahoo counts relevancy and authority.

To the tester’s amusement Google’s refined index did not include the original at all! Evidently Google featured only those pages with copies of the same article which it considered relevant and authoritative with no regard to the original source of the content! I’ve already mentioned a thread [1] where a similar problem is discussed. The both stories took place in 2005 and early 2006 and so far I found no evidence that this issue is resolved.

References and links to read about Duplicate Content

  1. Duplicate Content Observation‘. WebmasterWorld.com
  2. Duplicate Content Issues‘. SERoundtable.com.2006.02.28
  3. 301 Redirect — a How-To‘ BeyondInk.com
  4. W-Shingling‘. Wikipedia
  5. Levenshtein Distance‘. Wikipedia
  6. Soundex‘. Wikipedia
  7. Duplicate Content Filter: What it is and how it works‘. WebConfs.com
  8. CopyScape.com — discovers copied and similar pages.
  9. Duplicate Content Penalties Problems with Googles Filter‘ by J.S.Cassidy, published at SEOChat.com

Digg!

Did you like it? Was it useful? Bookmark or share this post:

Check this out: .

16 Responses to “Duplicate Content – What You Ought to Know About”

  1. Andy Beard Says:

    Some forms of duplicate content might actually get a bonus in SERPs

  2. oleg.ishenko Says:

    How?

  3. Webmaster Libre » Archivo del weblog » Algunas cosas que debes saber sobre contenido duplicado Says:

    [...] de 2006 a las 15:11 y está archivada en: SEO y SEM. Puedes dejar un comentario, o enviar un trackback desde tusitio. [...]

  4. Up2j.com » Blog Archive » Duplicate Content: What You Ought to Know About Says:

    [...] Originally published at “Duplicate Content: What You Ought to Know About”Original post by default@goarticles.com (Oleg Ishenko, SEOResearcher.com) and software by Elliott Back [...]

  5. Duplicate Content: What You Ought to Know About « elettrosmo Says:

    [...] Originally published at “Duplicate Content: What You Ought to Know About”. [...]

  6. Search Marketing Facts » Duplicate Content - What You Ought to Know About Says:

    [...] Take a look at your website. How much of your content might be considered as duplicate by a search engine algorithm? Even though you never copy anyone you can’t answer ‘none’ because someone can be copying you. Duplicate content is one of the biggest issues both for search engines trying to […]Read full entry [...]

  7. Duplicate Content: What You Ought to Know About « camionoc Says:

    [...] Originally published at “Duplicate Content: What You Ought to Know About”. [...]

  8. Test Says:

    Hello

    Bye

  9. Auto Merchant Says:

    well its really such a wonderful post about duplicate content. Google is finding many ways to avoid link spamming. Well this duplicate content is one of the factor because of duplicate content some websites are facing this issue in search engine ranking algorithm.
    And your post beautifully sketches out issues of duplicate content and filtering of duplicate content.

  10. Bret heart bobby Says:

    Hi,
    Its a great post.I found it is very interesting and more informative regarding Duplicate content.you have bring all good points.i hav created a website which page rank was good before 2 months…suddenly it had a drastic changes in page rank ..i was confused…after reading this article i gotta answers for all my questions….from this article what i have learnt was our site content must be unique…

  11. naisioxerloro Says:

    Hi.
    Good design, who make it?

  12. Las Vegas Seo Champion Says:

    Rand of seomoz.org has also illustrated how process takes place on Google upon seeing duplicate content. Here it is : http://www.seomoz.org/blog/the-illustrated-guide-to-duplicate-content-in-the-search-engines

  13. Introspective Says:

    Should I stop publish my articles on article directories? I used to publish my articles, but now I wander should I stop doing this, because the risk of duplicate content penalty.

  14. James George Says:

    http://copyscape.com/about.php

    This video explains how copy scape helps you protect your content. I was reading about copy space today and thats why this article caught my eye.

    Enjoy

  15. Remy Goddess Says:

    Its true it does take time not to have your blog look like another product webpage. So its time consuming to add extra info. I usually share anecdotes, share media regarding the products, the fashion industry. That way I vary it up and it looks natural. Hopefully ;) Good post.

  16. Illegible Kerr Says:

    Hey there! I just wanted to ask if you ever have any problems with hackers? My last blog (wordpress) was hacked and I ended up losing several weeks of hard work due to no backup. Do you have any solutions to protect against hackers?

Leave a Reply