How To Find Duplicate Content & Catch Content Stealers

submit   May 21, 2012   Comments Off on How To Find Duplicate Content & Catch Content Stealers

Article provide by iClimber Content Writing Service.

After posting a lot of guest posts on this blog recently, there were two main issues that I really had to pay attention to.

The first issue was setting standards that would ensure that only quality content was posted.

I had to knock some people back, simply because I felt that if I had posted those articles it would have degraded the quality of the website.

No one wants to visit a site that has poorly written content, full of spelling and grammar mistakes, or substandard rehashed information in general.

The second (and major) issue was publishing duplicate content on the site. Publishing duplicate content might not have mattered that much in the early days of the internet, but it certainly does now. With Google’s strict rules and algorithm’s now in place, it is detrimental to your site to publish duplicate content because Google will penalise your entire site for it.

A small percentage of the content writers that I dealt with submitted duplicate content, in the hope that I did not realise it had already been published. I have always been a person who doesn’t trust anyone that I don’t know very well, and therefore I made sure that I checked every article for duplicate content, before I even considered if the quality was good enough to publish.

So, now that we have established that you should never publish duplicate content, the question is how we do detect that the content is not unique, and has been copied?

The two simple methods of discovering how to find duplicate content:

1. Copyscape

This is probably the most well known way to check for duplicate content. You simply copy and paste the URL of your webpage into the search box and click on “Go”.

Copyscape is not bad for a basic duplicate content checker, but to get decent results from it you need to upgrade to the paid version. The free version only checks for exact matches of the entire page, and does not check paragraphs of text. You need to check paragraphs separately to get decent results. 

Obviously they want you to pay for the good version, and only provide a very basic version of the real thing for free.

2. Google Search

Yes I know this sounds too obvious, but Google search is a perfect way of checking for duplicate content. The trick is that you only copy and paste small paragraphs into the search box, and check random blocks of your content. Google only accepts the first 32 words (at the time of writing) of the phrase, so there is no point in pasting in a whole article.

So by studying the image above you can see this is the result of searching for the previous paragraph of text above this image. The words that Google has matched to the search query are shown in bold text. As you can see there are not many bolded words and there are ellipses (…) separating various sentences. This is a sure sign that the paragraph is not duplicate content.

But, if you were to see the entire search result in bold text (without … ), then the paragraph is an exact match and definitely duplicate content. You will more than likely find other paragraphs of the content are copied as well.

You only need to check 3 or 4 random paragraphs of a piece of content to discover if it is duplicate content or not. This is without doubt my preferred and recommended method of checking for unique content, and it is free.

What if other people copy my content and use it on their site?

Once you have published content you automatically run the risk of other people stealing it, and publishing it on their site. And if people subscribe to your RSS feed, they will get instant notification that you have just published new content.

The big problem with this is that whichever site gets crawled first by the Google bots will be credited with “owning the content”. So if a RSS subscriber quickly copies and publishes your fresh content, and their site gets crawled before yours does, then Google assumes that they created the content.

Google will scan your website and will do it’s best to determine the frequency between your blog posts and will crawl your website on their determined frequency. So the more regular you add content, the more regular your site is going to be crawled.

It is for this reason that I schedule my blog posts to be published every 12 hours precisely to the minute, and this way Google picks up on that and crawls my site every 12 hours or so as well.

So, this means that there is an extremely small window of opportunity for content stealers to publish my content before my website gets crawled, and my content is picked up by Google.

If the content is published elsewhere after my site has been crawled then the content stealer gets penalised for duplicate content. So to sum up, whichever site Google crawls first will be credited as the creator of the content.

A plugin to stop people highlighting text or saving images

To make it even harder for content stealers is recommended to install a plugin called WP-Protect.

This is an awesome plugin that makes it impossible to select text or save images by disabling all options to highlight and copy text, and also to copy or save images. Content thieves will often just highlight an entire article and copy and paste onto their own blog. This free plugin will put that to a stop immediately, and is highly recommended.

If you are not taking duplicate content seriously, then it is time to change your opinion and ensure that all content that is published on your blog or website is credited to yourself by the major search engines. Because if it is not then your rankings will suffer a lot from duplicate content penalties.


Article provided by iClimber. iClimber offers social media marketing and Content writing services. Visit their site today to learn more and increase your website traffic.