Hacker Newsnew | past | comments | ask | show | jobs | submit | aoeusnth1's commentslogin

> Wobbly assumption that increasing the size of these models yields better performance.

I'm assuming you disagree that larger models are better? Can you expand on what indicates that AI will hit a wall in scaling given the evidence of the last 9 years of scaling transformers (or other models)? Where on the plot does the line go from exponential to flat?


Leaks from within OpenAI have made it pretty clear that they've been struggling to achieve significant improvements lately by simply scaling up parameter size. Experts like LeCunn have also been vocal that blindly scaling up is a dead end.

(Incidentally, the line of skill improvement isn't "exponential". It's been incremental in improvements per generation, but generations have been coming thick and fast of late, and have grown in parameter count exponentially since 2017.)

Speaking more broadly, LLMs don't have to "hit a wall" in scaling to become uneconomical. If incremental improvement continues to come at exponential cost, however, then the fundamental value argument falls apart.

Setting all that aside, even presuming that model performance scales linearly with dimensionality, there are just fundamental limits to the size of the training corpuses. Quality training data is not unbounded and infinite. Given the same size corpus of training data, there's a hard theoretical limit to how much meaning and inference a model can wring out of it.

And then there are other issues with the whole business model. For one thing, it's predicated on continual full scale retraining to achieve even modest gains in skill and relevancy. Topical and targeted learning requires a full retraining. Etc cetera.

I think that the next generation of AI will lean more heavily on RL to be useful beyond a few months. I also think that the energy requirements of a particular technology are a good proxy to whether it's got a realistic future.


Why do you believe progress is currently exponential? There’s one dubious chart showing “exponential growth” in a single narrow domain, and otherwise zero evidence to suggest exponential improvement.

The evidence is the last 9 years of scaling.

The curve flattened out years ago. The exponential was going from GPT-2 to GPT-4 (or thereabouts). After that, it was painfully obvious to anyone observing without a vested interest in believing otherwise that the progress had slowed.

Now, it's not just that progress has slowed: it's that the exponential has reversed. In order to get marginal gains, they have to throw exponentially more hardware at the training.


even if traning is hitting a wall I think they are shifting more to reasoning phase to get better results... and that is inference compute scaling

In my experience the models havent gotten any better, just the hype.

And companies know this hence the heavy astroturfing, if their new product has minimal improvements they'll just gaslight you into thinking otherwise

How would you know if it wasn't an extrapolation of current knowledge? Can you point me to somethings humans have done which isn't an extrapolation?

That was my point: "I am not necessarily saying humans do something different".

You could reimplement it as public domain on your machine, and then edit it by hand and copyleft your own edits.


Citation needed? Tall tasks are standard practice to improve utilization and reduce hotspots by reducing load variance across tasks.


I don't think he's misleading, I think he is valuing Claude's contributions as essentially having cracked the problem open while the humans cleaned it up into something presentable.


I think your theory might be missing an extremely relevant and timely counterexample?


Unclear how much damage the designation will do to their dealmaking ability in the meantime. How long will it take for the court to reverse order?


The longer it takes, the better the impact on their reputation.


I imagine they're also benchgooning on SVG generation


It absolutely has quasi-identity, in the sense that projecting identity on it gives better predictions about its behavior than not. Whether it has true identity is a philosophy exercise unrelated to the predictive powers of quasi-identity.


I'm an environmentalist and I agree with this framing. The solution is going to be painful and must increase prices on products and services that fossil fuels are currently the cheapest solution for. If you're not willing to personally sacrifice anything to reduce fossil fuel consumption you can see why carbon taxes are not popular, right? France's protests against them, for example, are a good example of a populist reaction against attempts to regulate the economy to have less emissions.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: