I have a suspicion that Google makes a shitload of money off of spam sites. I have a friend who makes upwards of 1000 dollars a month off of scraped content.
If I were 'economically rational', I ought to be doing the same thing instead of screwing around with things like langpop and squeezed books.
I've been reading a certain blog that deals with 'adsensing' and making money on the web through joint ventures, arbitrage etc. There are people out there who make $100K a month (and some that make over $1M a month) using all kinds of schemes. Yes, Google makes a ton of money too off these guys as well. I've never tried any of these things myself...
Why is TechCrunch not suing(or threatening) Google here? Personally I am very skeptical and critical of our copyright system, but these sites which are copying content AND getting paid to do it gain no sympathy from me. How does Google get away with keeping these people in business?
How many largish bloggers are there? 10,000 maybe? Certainly, when the pagerank hits a certain point, keeping a human-verified whitelist of sites is doable.
That is, if something is getting a high enough position on google, a google rep could look for a phone number and verify that a human, not a bot, is behind the content.
Of course, this will hurt short-term ad revenue...
You forget something: people that download said killsplog feature are not going to be running into splogs often (i don't). Other than that, I think its just fine...good for the economy. Eventually (as Eric S. ) said, advertisers will wise up, and start paying less. If they don't, good for google.
Spammers are also using Amazon Mechanical Turk to reword posts and articles in dozens of differnt ways. This helps them get around services such as Attributor.
I've seen software (I think they charge over $4K per install) that uses hidden markov chains etc to create content that looks like it was written by humans. You give it few keywords that you'd like to optimize your site/pages for and this thing does the rest. Google does ban few of them but they're getting more sophisticated with every new version so Google has really hard time keeping up.
Mm, so what I understand is that Google has duplicate content detection to punish people who do this kind of thing.
However! If you have a high-ranking site and you scrape from a low-ranking site, Google might index you first and think the originating site was stealing "your" content!
And then there's the fact that it wouldn't be hard to replace words with synonyms, chopping up multiple articles, etc. so that Google probably wouldn't detect it. This is what spammers do.
If I were 'economically rational', I ought to be doing the same thing instead of screwing around with things like langpop and squeezed books.