Alternative Views: A list of webpages that have been effected by Panda and should not

A list of webpages that have been effected by Panda and should not

The purpose of this article is to give a list of "credible" webpages that have been effected by Google Panda that should not. Note that I am usingthe word "webpages" and not "wesbsites" - the distinction should has implications on "sitewide" demotion that Panda has attempted to levy on sites of "poor quality".

A long and lengthy discussions at google webmaster forum has led to revealition of a large number of wesites that have been affected by Google Panda, most of which, are probably in the true spirit of the "demoting low value" websites. But we see a large number of webpages that seems to have been unduly effected by this update. Please note that this list is at the time of writing and the things may change with time.

Most of this list has been taken from the google webmaster forum thread started by Wysz, a google employee who says - "If you know of a high quality site that has been negatively affected by this change, please bring it to our attention in this thread."

We have selected only those webites that are close to what Wysz wanted with one change - while Wysz wanted to list "high quality site", we are instead listed "high quality webpaged". We sincerely believe that a search engine should work on a page basis instead of a site basis, since otherwise a low quality page on a high quality site can rank higher ( and vice versa) - which not the result a search engine or the searcher should be attempting to achive.

One of the most serious issues we have seen is the reports by website owners where scraping sites have been ranking higher than the original websites. And this is not about one or two cases but a plethora of cases. While this issue was there before Panda update, it affected only the websites that were manually degraded. Post panda, all scrapers need to do is find websites affected by Panda, scrape them and post it, and they can be assured of a high SERP position.

Many of the Pro googlers have attempted to take a stand that, it is not google's responsibility to check duplicacy of content. It is the website owners responsibility to file DMCA and get the duplicating content down. There are several drawback to this scheme suggested by the Pro googlers. First DMCA is a time consuming process. You need to find the website owners details, emails. You then need to draft an email / response detailing the copied content. And if the copied content runs over several webpages or if the scraper copies several webpages you will have to show which pages have been copied. The time taken in filing the DMCA can exceed the time it takes to create a fresh content. Filing DMCA at webmasters end chasing invividual scrapers is not a practical solution.

The Author has experienced cases where scrapers put down the content of the website on filing the DMCA. But the webpage was back next day with the same url and the content changed. The scraper realises that this url must have a value, as it received the DMCA. He therefore, does a quick respin of the article and starts getting traffic. The loser is the original content creater.

There is another practical issue when filing a DMCA complaint. If you run AdSense ads the site that you demand to remove your content can click on the ads in such a way that you get your account cancelled, or, if it is a really big wicked scraper site, its staff can run denial of service (DOS) attacks on your site.

- by hotwinduk at http://www.google.com/support/forum/p/Webmasters/thread?tid=76830633df82fd8e&hl=en&start=3880

So coming back to the point, here goes the list, the results and the ranks

1. The user clickonf5 has reported scrapers ranking high for his webpage

http://www.clickonf5.org/wp-content/uploads/2011/06/google-panda-effect-1.png

The search term in question is

"things to do before selling old pc"

The screenshot showing this is here

http://www.clickonf5.org/wp-content/uploads/2011/06/google-panda-effect-1.png

Except the top yahoo answers pages, all the 6 results are scraped content from the original webpage at www.clickonf5.org. Reportedly, the original page does not figure

2. Another good example whereby so called "Authority site" outrank stolen content from small publishers orginal content has been provided by Marcus S at

http://www.google.com/support/forum/p/Webmasters/thread?tid=76830633df82fd8e&hl=en&start=3880

example: search 'average speed of a cyclist'
result 1 is wiki.answers
result 5 is road-bike.co.uk (my site)
the answers . com page is taken directly without permission from my site and reposted without permission

3. At

http://www.google.com/support/forum/p/Webmasters/thread?tid=76830633df82fd8e&hl=en&start=3760

Pete Carpenter reports about his website rc-airplane-world.com. He adds - I have already expressed my dissatisfaction with the new-look search results for anything radio control related, but more recently my site, which ranked #1 for the term "rc airplanes" for several years, is now preceded by this result -

"Dr. Thomas C. Smith
Dr. Thomas Smith, has been providing chiropractic and acupuncture services, in the Camrose area, since 1998. Supporting a broad range of patients, ...
www.rcairplanes.com/ - Cached"

4. webbartie from http://www.google.com/support/forum/p/Webmasters/thread?tid=76830633df82fd8e&hl=en&start=3760

adds

Having been hit hard by Panda, I've started to chase those sites which have duplicated my copyright material, I find these by putting phrases from my diydata.com pages into google search. From the first three pages I've checked: -

Putting in "If you are replacing an existing dado rail, it is probably best to keep to the same height" - brings up as number one http://cawehirur68.multiply.com/journal/item/9/Dado_Rail_Height which is just gibberish, no meaningful content and just links to other sites.

Putting in "We explain the most common types of catches used around the house below." - although I'm number one, the other result is www.growinglifestyle.co.uk/uk/j22275056 , but selecting that takes you to http://growinglifestyle.co.uk/j/catches-latches-and-locks/ which is just a marketplace - looking at the google cached version (10 May) gives a completely different page.

Both these sites seem to have been constructed purely to get good google rankings which should, I consider, be probably banned by google to 'improves overall search quality.'

5. EricLegge Complains at

http://www.google.com/support/forum/p/Webmasters/thread?tid=76830633df82fd8e&hl=en&start=3760

The Google PageRank of this site - http://www.scribd.com/ - is still a huge 8 and it has copies whole pages from my site pcbuyerbeware.co.uk, such as these:

http://www.scribd.com/doc/32164038/CPU-Motherboard-Properties-and-Installation

The links to my site are left in in that scrape.

Here is a copy that I found in Google's cache that has the internal links to my site removed - http://tinyurl.com/3qjcgzo - but they were probably left in in the copied page.

It copies a great deal of its information from the web, including from Wikipedia, such as this page - http://www.scribd.com/doc/38304580/Motherboard

If you search for the term "motherboard", almost all of the content is scraped from other sites.

If I search for a sentence taken from my site of any of those copied pages in Google, that site has a higher PageRank than mine so the search returns the link from that site higher than the one from my site.

This is why I think that my site took such a bad hit from the Panda updates. Pages have been copied by other scraper sites that have a higher PageRank than mine so my site is deemed as the copying site and demoted. What can you do about that? Google clearly can't tell the difference between a scraper site and the original even when the links to your site are left in the scraped material.

The site runs AdSense, surprise, surprise. How did it get permission to run AdSense when anyone who examines the site can see that it scrapes from other sites?

6. http://www.google.com/support/forum/p/Webmasters/thread?tid=60052b1100203d42&hl=en

Here are two urls that are 100% duplicate content. The orginial url http://bit.ly/lrlz4b has been around since 2001. http://bit.ly/kfxY5u has been registered since 2008. http://bit.ly/lrlz4b beginning last week was ranking #6 in google for teacup yorkie. Just recently http://bit.ly/kfxY5u showed up in google ranking #4 for teacup yorkie and http://bit.ly/lrlz4b is now nowhere to be found for teacup yorkie.

Does this sound familiar to anyone? It’s no wonder I see so many complaints about stolen content ranking higher than the original. And in this case one can’t say http://bit.ly/kfxY5u is ranking because it is on a more authoritive site or it is better optimized.

I believe that by google penalizing the original site and rewarding the second site without performing due diligence on who has the orginial content they have just opened themselves up to a class action lawsuit.

A list of the sites that should have been demoted but are not

1. zimbio.com
2. www.ebay.com
3. www.ehow.com

Alternative Views

Friday, June 10, 2011

A list of webpages that have been effected by Panda and should not

No comments:

Post a Comment