Limited amount of duplicate content may not thwart your search engine rankings.
However, this problem can go out of control and start to affect your rankings even without your knowledge.
This article describes the duplicate content penalty and some precautions you need to take about it.
Recognize Duplicate Content
If you have a blog updated regularly, make sure no two of your posts have the same content. Make sure no two URLs point to the same page.
In self-hosted blogs, features like print preview, monthly/weekly archive URLs, topic (category) pages, etc., will cause duplicate content. When search engines encounter duplicate content, normally one of the pages is ranked lower. But, if you have duplicate content pages with intent to manipulate search engine rankings, these pages will be ranked lower or completely removed from the index.
So, duplicate content penalty can be very serious.
How Duplicate Content Happens
Just as we said early, make sure no two different URLs point to the same content. An example of this is the Recent Comments widget used in Blogger blogs. If you have this widget in the sidebar, the new comments shown will be in a special URL in the form: http://blogname.blogspot.com/2008/07/thepost.html?showComment=5533434#54545456645
The post’s original URL ends at “thepost.html”. The rest is an addendum, which will be seen by Google as a separate URL, though it points to the same page. So, the search engine will see it as a duplicate page on the same site. If you have posts with several comments, the issue will be very serious.
Another example is the monthly archives in Blogger. These pages are not disallowed with a robots.txt directive. So, they can cause duplicate content as these pages contain all the posts published in a particular month.
Fortunately, the categories (label) pages in Blogger are disallowed automatically in Blogger robots.txt file (accessible from: . So, they are not indexed by search engines and do not cause duplicate content issues.
How to Fight Duplicate Content
You can find if search engines have indexed any duplicate URL from your blog with this search directive :-
1. Google this query: site:your URL (no space after the colon).
2. Google will show you the number of search results. This is the number of pages indexed from your site. If this is more than the total number of posts you made in your blog, then there definitely is duplicate content.
3. Now, you have to request removal of all duplicate URLs at the Google Webmaster Tools.
Removal Request at Google Webmaster Tools
You can request at Google Webmaster Tools, removal of all your duplicate URLs. However, before sending a request, make sure that you have removed the duplicate content from your blog. You can do any of these :-
1. Remove the duplicate content entirely from your blog (which means the URL should return a 404 (not found) status).
2. Disallow the duplicate content with a robots.txt disallow directive.
3. Use Robots meta NoIndex tag to stop indexing of the page
After doing one of these, request a removal at Webmaster Tools.
1. Log in to Google Webmaster Tools
2. Go to your site’s profile and choose Tools
3. Under Remove URLs, click New Removal Request, and follow the site directions.
Once the removal request has been submitted, it will be in pending status and will be approved or rejected within a day’s time.
In self-hosted blogs, you have control over the robots.txt file. You should disallow all pages that may cause duplicate content. This includes monthly archive pages, categories, and print preview pages.
Though these precautions are taken, you should ensure that nobody links to these pages from their sites. If you find that any other blogger links to your disallowed page, contact him and explain the situation.
As we said at the beginning, duplicate content can be a great problem.
All bloggers, who want to rank their pages high in search results, should be aware of this.
Content duplication not only is bad for search ranking, but also causes bad user experience.
If you scrape content from any popular blog and one of your users finds out, then he will most probably unsubscribe from yours.