Hacker Newsnew | past | comments | ask | show | jobs | submit | jmalicki's commentslogin

The article is sparse on what pending means, but I would guess that that where condition would be enough?

Health care premiums are expensive, but even the best insurance I've ever seen was a fraction of the compensation software employees were making.

For some fields that's a huge amount of your compensation, but for software engineers it's noticeable but not going to be worth doing layoffs over by themselves.


This isn't even training on the test data.

This is modifying the test code itself to always print "pass", or modifying the loss function computation to return a loss of 0, or reading the ground truth data and having your model just return the ground truth data, without even training on it.


If you're prepared to do that you don't even need to run any benchmark. You can just print up the sheets with scores you like.

There if a presumption with benchmark scores that the score is only valid if the benchmark were properly applied. An AI that figures out how to reward hack represents a result not within the bounds of measurement, but still interesting, and necessitates a new benchmark.

Just saying 'Done it!' is not reward hacking. It is just a lie. Most data is analysed under the presumption that it is not a lie. If it turns out to be a lie the analysis can be discarded. Showing something is a lie has value. Showing that lying exists (which appears to be the level this publication is at) is uninformative. All measurements may be wrong, this comes as news to no-one.


I think the point of the paper is to prod benchmark authors to at least try to make them a little more secure and hard to hack... Especially as AI is getting smart enough to unintentionally hack the evaluation environments itself, when that is not the authors intent.

They said they used things like submitted a `conftest.py` - e.g. what would be considered very blatant cheating, not just overfitting/benchmaxxing. Did you read the AI slop in the post?

This is basically a paper about security exploits for the benchmarks. This isn't benchmark hacking like having hand coded hot paths for a microbenchmarks, this is hacking like modifying the benchmark computation code itself at runtime.


I get it, but why would anyone trust what these companies say about their model performance anyway. Everyone can see for themselves how well they complete whatever tasks they're interested in.

There aren't taxes on datacenters in Texas. They gain virtually nothing from them!

https://www.texastribune.org/2026/04/08/texas-data-centers-s...


That ignores all the tax revenue they bring in at the local level. Virginia also has tax exemptions at the state level, but as another commentator points out, data centers are delivering a huge share of tax revenue in places like Loudon County.

And of course you can (and should!) get rid of those state tax exemptions which have served their purpose.


This is the market telling you what matters.

OpenClaw has been an outstanding success, it is providing people the ability to leak their keys, secrets, and personal data, and allowing people to be subject to an incredible number of supply chain attacks when its users have felt their attack surface was just too low.

Your efforts have been on increasing security and reducing supply chain attacks, when the market is strongly signaling to you that people want reduced security and more supply chain attacks!


The breaches will continue until morale improves.

What do you mean? He has a rabid fan base who loves him because he writes like he talks!

It clearly works for him - I hate how he talks, but he seems to be an effective communicator if you only judge by results. Sadly.


Maybe more an effective rhetorician than communicator. I suppose his communication does match the clarity of thought though... it's just that the thoughts are so jumbled he says 3 things that contradict each other in the same breath.

> more an effective rhetorician than communicator.

What’s the difference?


the point of rhetoric is persuasion or flattery, the point of communication (or argument as its usually framed going all the way back to Plato) is to accurately convey an idea or concept. In your average Trump speech the point is usually to evoke an emotion in his audience, not so much arguing anything in particular.

"write like you talk" is advice for type-1 thinkers.

presumably also advice from them.


> I hate how he talks, but he seems to be an effective communicator if you only judge by results

Looking at Iran situation, absolutely not, results of Trumps communications are pure disaster. Looking at tariffs situation, absolutely not, results of Trumps communications are pure disaster. His communication is masterpiece of ineffective communication.

On the plus side, he is emotionally pleasing to certain kind of people and he is effective in bullying and humiliating close ones. If those are the goal, yes he is effective. But, he cant do much else.


That's how you get flagged as an AI!

I've been nearly hit by a bicycle-messenger looking dude in San Francisco when I was crossing the street with a "walk" sign at a crosswalk and he blew through the red light at probably about 15mph, and I have plenty of other experiences like that.

If you are running a red light at 15mph on a bicycle, dodging pedestrians, you are just an asshole - maybe you're slightly less dangerous than an SUV running a red light, but it is still completely not okay.

There are dumb teenagers, which is one thing, but the aggressive "well, we're not emitting carbon, so we can do whatever we want crowd" is probably even more crappy and dangerous, since they're deliberate about it, and more present in areas with lots of pedestrians.


Is there an interpretation of the Computer Fraud and Abuse Act where using this bicycle bell to circumvent the computer system used in your headphones for active noise cancellation would be a federal felony in the United States?

probably. i am pretty sure you can spin up a CFAA violation with some string and 2 cups.

Be careful with that because then bikers are just going to start using car horns.

Now that is an interesting question

hey, if they can prosecute for whistling into a handset...

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: