Attack of the Splogs--One Of Our Posts Copied 152 Times Without Attribution

davidw · on Nov 9, 2007

I have a suspicion that Google makes a shitload of money off of spam sites. I have a friend who makes upwards of 1000 dollars a month off of scraped content.

If I were 'economically rational', I ought to be doing the same thing instead of screwing around with things like langpop and squeezed books.

nickb · on Nov 9, 2007

I've been reading a certain blog that deals with 'adsensing' and making money on the web through joint ventures, arbitrage etc. There are people out there who make $100K a month (and some that make over $1M a month) using all kinds of schemes. Yes, Google makes a ton of money too off these guys as well. I've never tried any of these things myself...

breck · on Nov 10, 2007

Why is TechCrunch not suing(or threatening) Google here? Personally I am very skeptical and critical of our copyright system, but these sites which are copying content AND getting paid to do it gain no sympathy from me. How does Google get away with keeping these people in business?

mynameishere · on Nov 9, 2007

How many largish bloggers are there? 10,000 maybe? Certainly, when the pagerank hits a certain point, keeping a human-verified whitelist of sites is doable.

That is, if something is getting a high enough position on google, a google rep could look for a phone number and verify that a human, not a bot, is behind the content.

Of course, this will hurt short-term ad revenue...

jakewolf · on Nov 9, 2007

All it would take is a google toolbar button "kill the splog"

cellis · on Nov 10, 2007

You forget something: people that download said killsplog feature are not going to be running into splogs often (i don't). Other than that, I think its just fine...good for the economy. Eventually (as Eric S. ) said, advertisers will wise up, and start paying less. If they don't, good for google.

jakewolf · on Nov 10, 2007

Or angry blog owners would hunt down the sploggers for revenge.

jakewolf · on Nov 9, 2007

Spammers are also using Amazon Mechanical Turk to reword posts and articles in dozens of differnt ways. This helps them get around services such as Attributor.

SLEAZY!

nickb · on Nov 9, 2007

I've seen software (I think they charge over $4K per install) that uses hidden markov chains etc to create content that looks like it was written by humans. You give it few keywords that you'd like to optimize your site/pages for and this thing does the rest. Google does ban few of them but they're getting more sophisticated with every new version so Google has really hard time keeping up.

dpapathanasiou · on Nov 9, 2007

I've read about that software, too, although from a research perspective: http://www.eblong.com/zarf/markov/

I would have never thought to use it for spam generation, though... I have to admit (grudgingly) that some of those spammers are clever.

henning · on Nov 10, 2007

Mm, so what I understand is that Google has duplicate content detection to punish people who do this kind of thing.

However! If you have a high-ranking site and you scrape from a low-ranking site, Google might index you first and think the originating site was stealing "your" content!

And then there's the fact that it wouldn't be hard to replace words with synonyms, chopping up multiple articles, etc. so that Google probably wouldn't detect it. This is what spammers do.