There are teams that actively (and solely) work on spam and abuse detection. They're larger than you seem to believe, though I won't give exact numbers. There's also obviously sre teams that maintain uptime. (Note I said teams, and there's a public approximate minimum size for an are team at Google of 8-12 people)
The problem is that spam can't be "solved". Reputation is easy to solve: only accept email from a known list of good senders. Gmail, Yahoo, MailChimp (or not), etc. But that makes people on HN complain. So your have to try and infer reputation of mailservers on shared hosts. And spammers are always trying to beat you, and there are thousands, maybe tens of thousands of spam outfits. Af they're sneaky. They try to use awa or gcp to send email, or even send spam from Gmail, prevent trickier things. So you're left to defend from a spam campaign from Yahoo while also trying to not block everyone at Yahoo, and detect the spammers who are using Gmail to spam Yahoo too.
And the spammers are always innovating, so you have to as well.
My personal belief is that Google likely considers spam detection to be an area of competitive advantage so investments are warranted.
> The problem is that spam can't be "solved". Reputation is easy to solve: only accept email from a known list of good senders. Gmail, Yahoo, MailChimp (or not), etc. But that makes people on HN complain. So your have to try and infer reputation of mailservers on shared hosts. And spammers are always trying to beat you, and there are thousands, maybe tens of thousands of spam outfits. Af they're sneaky. They try to use awa or gcp to send email, or even send spam from Gmail, prevent trickier things. So you're left to defend from a spam campaign from Yahoo while also trying to not block everyone at Yahoo, and detect the spammers who are using Gmail to spam Yahoo too.
Spam absolutely can be solved.
1) enforce identity. If the sender isn't authentic, then the sender is spam.
2) enforce reportability. If the user reports the sender as spam, then don't permit the sender to send more messages to the person who complained. if a lot of people report the problem, then block the sender.
3) enforce liability. if an ISP hosts spammers, then block the ISP.
If someone complains then walk them through the process. Just like people shouldn't drive vehicles without understanding that vehicles are dangerous, the same should be done with computers.
What is identity? A name? An SSN? How do you verify that for people in all the countries of the world?
How do you do that at scale? With hundreds of millions of users, you can't exactly call them up.
How many users are you going to have after you start adding measures to verify their identities at signup? How will the board of directors feel about that? And feel free to run your own company into the ground doing the right thing, but there are other email providers in the world who will happily accept the users you drive away.
What happens when people have their accounts taken over and start spamming? Were the accounts ever "real"? How can you even know?
What happens when the reports themselves are spam? Spammers will report other spammers to remove the competition. Or they'll just overwhelm it with useless fake reports to DOS your human reviewers.
You have to realize that every input to your system is a potential avenue for abuse. There are people sitting there all day thinking about how to prevent you from achieving your goals. Humanity went to the moon, we're problem solvers. If there's a way to manipulate and undermine your spam defenses it will be found.
> If the sender isn't authentic, then the sender is spam.
What's your definition of authentic? Is a self-hosted email server authentic? How do you decide?
> If the user reports the sender as spam, then don't permit the sender to send more messages to the person who complained.
If 10000 yahoo accounts are sending spam emails to other websites, what do you do? Block all yahoo senders? Try to block the yahoo accounts as they appear?
> 3) enforce liability. if an ISP hosts spammers, then block the ISP.
All major ISPs host spammers. Often they don't know that they do. Is it worth cutting off all comcast users nationwide from being able to use email? If anything, this would further centralize on one or two trustworthy email hosts, because those providers are essentially their own ISPs.
> What's your definition of authentic? Is a self-hosted email server authentic? How do you decide?
Authentic in terms of DNS. That means using and enforcing DKIM at the minimum.
Also in terms of from: and reply-to: addresses matching each other.
> If 10000 yahoo accounts are sending spam emails to other websites, what do you do? Block all yahoo senders? Try to block the yahoo accounts as they appear?
If 10000 yahoo accounts are sending spam emails, then that's a Yahoo problem. Yes, I would refuse to accept incoming mail from @yahoo.com until they've fixed their complicity.
> All major ISPs host spammers. Often they don't know that they do.
I disagree about not knowing that they do. ISPs must respond to fraud and abuse reports or they would lose the ability to do business. ISPs not responding to spam reports are offloading the cost of policing their users onto you.
> Authentic in terms of DNS. That means using and enforcing DKIM at the minimum.
Sure, these are basic things that are generally used as strong signals, but all this does is filter out the incompetent spammers. If you're sending from yahoo or from gmail, you've already solved the reputation problem. And there are other ways of doing the same.
> If 10000 yahoo accounts are sending spam emails, then that's a Yahoo problem. Yes, I would refuse to accept incoming mail from @yahoo.com until they've fixed their complicity.
I'd expect that this is approximately the baseline number of yahoo accounts sending spam when they aren't being actively targeted. Its less than 1% of 1% of the active monthly accounts on yahoo. So you'd like to just block yahoo constantly?
> I disagree about not knowing that they do.
Sure they know, in the sense that I also know that there are always people spamming from every major ISP. That doesn't mean that they can immediately address things. And while you're busy blocking all comcast users from sending your users email, your users are busy moving to a different email provider that identifies individual spam senders so that they can still receive legitimate email.
In closing, a simple question: if solving spam is this straightforward, why hasn't an upstart competitor (yahoo, protonmail, etc.) taken advantage of this strategy to fix the spam problem? It appears you're presuming a centralized system, which defeats the point of email and significantly simplifies the problem.
Internet systems are one part technology and one part social.
If my mail server is banning mail from Yahoo, I can't communicate with my grandparents and I stop using that mail server. Enough people do that and the mail server has no users.
inetknght, I get the sense that you run a mail server of your own. Have you taken your own advice here and blocked @yahoo.com incoming? Is it inconvenient? Is it more inconvenient than the two-step process of setting up a Gmail account?
> If my mail server is banning mail from Yahoo, I can't communicate with my grandparents and I stop using that mail server. Enough people do that and the mail server has no users.
Why are your grandparents using Yahoo instead of your mail server?
> Have you taken your own advice here and blocked @yahoo.com incoming? Is it inconvenient? Is it more inconvenient than the two-step process of setting up a Gmail account?
I haven't had any correspondence from anyone who uses @yahoo.com. Or, if I have, they haven't complained about me not receiving their email. Or, if they have, their complaint was also not received in which case it doesn't exactly matter. If it did matter then I would address it then. And, importantly, it also means there's another (less noisy) communication medium available already.
> Why are your grandparents using Yahoo instead of your mail server?
Because internet systems are one part technology and one part social. My grandparents already have Yahoo accounts and are unwilling to change that.
And if your solution to interoperating with Yahoo servers is "I don't have anyone to talk to using Yahoo servers," then I'm afraid it sounds like you're trying to solve a problem other than the one email is designed to solve.
There are teams that actively (and solely) work on spam and abuse detection. They're larger than you seem to believe, though I won't give exact numbers. There's also obviously sre teams that maintain uptime. (Note I said teams, and there's a public approximate minimum size for an are team at Google of 8-12 people)
The problem is that spam can't be "solved". Reputation is easy to solve: only accept email from a known list of good senders. Gmail, Yahoo, MailChimp (or not), etc. But that makes people on HN complain. So your have to try and infer reputation of mailservers on shared hosts. And spammers are always trying to beat you, and there are thousands, maybe tens of thousands of spam outfits. Af they're sneaky. They try to use awa or gcp to send email, or even send spam from Gmail, prevent trickier things. So you're left to defend from a spam campaign from Yahoo while also trying to not block everyone at Yahoo, and detect the spammers who are using Gmail to spam Yahoo too.
And the spammers are always innovating, so you have to as well.
My personal belief is that Google likely considers spam detection to be an area of competitive advantage so investments are warranted.