Hacker Newsnew | past | comments | ask | show | jobs | submit | sdrinf's commentslogin

Just want to echo the recommendation for qwen3.5:9b. This is a smol, thinking, agentic tool-using, text-image multimodal creature, with very good internal chains of thought. CoT can be sometimes excessive, but it leads to very stable decision-making process, even across very large contexts -something we haven't seen models of this size before.

What's also new here, is VRAM-context size trade-off: for 25% of it's attention network, they use the regular KV cache for global coherency, but for 75% they use a new KV cache with linear(!!!!) memory-token-context size expansion! which means, eg ~100K token -> 1.5gb VRAM use -meaning for the first time you can do extremely long conversations / document processing with eg a 3060.

Strong, strong recommend.


I've been building a harness for qwen3.5:9b lately (to better understand how to create agentic tools/have fun) and I'm not going to use it instead of Opus 4.6 for my day job but it's remarkably useful for small tasks. And more than snappy enough on my equipment. It's a fun model to experiment with. I was previously using an old model from Meta and the contrast in capability is pretty crazy.

I like the idea of finding practical uses for it, but so far haven't managed to be creative enough. I'm so accustomed to using these things for programming.


What kind of small tasks do you find it's good at? My non-coding use of agents has been related to server admin, and my local-llm use-case is for 24/7 tasks that would be cost-prohibitive. So my best guess for this would be monitoring logs, security cameras, and general home automation tasks.


That's about it. The harness is still pretty rudimentary so I'm sure the system could be more capable, and that might reveal more interesting opportunities. I don't really know.

So far I've got it orchestrating a few instances to dig through logs, local emails, git repositories, and github to figure out what I've been doing and what I need to do. Opus is waayyy better at it, but Qwen does a good enough job to actually be useful.

I tried having it parse orders in emails and create a CSV of expenses, and that went pretty badly. I'm not sure why. The CSV was invalid and full of bunk entries by the end, almost every time. It missed a lot of expenses. It would parse out only 5 or 6 items of 7, for example. Opus and Sonnet do spectacular jobs on tasks like this, and do cool things like create lists of emails with orders then systematically ensure each line item within each email is accounted for, even without prompting to do so. It's an entirely different category of performance.

Automation is something I'd like to dabble in next, but all I can think of it being useful for is mapping commands (probably from voice) to tool calls, and the reality is I'd rather tap a button on my phone. My family might like being able to use voice commands, though. Otherwise, having it parse logs to determine how to act based on thresholds or something would also be far better implemented with simple algorithms. It's hard to find truly useful and clear fits for LLMs


Oh man you just gave me an idea to use something like qwen 3.5 to categorize a lot of emails. You can keep the context small, do it per email and just churn through a lot of crap.


The 0.8B can do this pretty well.

Actually pg's original "A plan for spam" explains how to do this with a Bayesian classifier.


I've been learning to apply these lately and it has been pretty eye opening. Combined with Fourier analysis (for example) you can do what seems kind of like magic, in my opinion. But it has been possible since long before LLMs showed up.

Totally different categories and different use cases, but the more I learn about LLMs the more I discover there's a powerful, determinsitic, well-established statistical model or two to do the same thing.

Really, LLMs are kind of like convenient, wildly inefficient proxies for useful processes. But I'm not convinced they should often end up as permanent fixtures of logical pipelines. Unless you're making a chat bot, I guess.


> Really, LLMs are kind of like convenient, wildly inefficient proxies for useful processes. But I'm not convinced they should often end up as permanent fixtures of logical pipelines. Unless you're making a chat bot, I guess.

I think I agree with this. It's made me realise LLMs are great for prototyping processes in the same way that 3D printers are great at prototyping physical things. They make it quick and easy to get something close enough to see the unforeseen problems a proper solution might have.


3d printing is a great analog because there are so many critical considerations that are often missed or can't be accounted for in the prototype, but, it's alright because it's a prototype. The strain testing, durability, manufacturing at scale; none of that is properly addressed. Those might involved some serious, expensive challenges, too. But it's alright because you've got something in your hand that informs you whether or not those challenges are worth contending with. I really love this about LLMs and 3d printing.


IMO the fact that spam detection has devolved into reputation management vs. being able to work on the content themselves makes me think there is a lot of alpha between an llm process vs. the most traditional processes we have now.

I was just chatting with a co-worker that wanted to run a LLM locally to classify a bunch of text. He was worried about spending too many tokens though.

I asked him why he didn't just have the LLM build him a python ML library based classifier instead.

The LLMs are great but you can also build supporting tools so that:

- you use fewer tokens

- it's deterministic

- you as the human can also use the tools

- it's faster b/c the LLM isn't "shamboozling" every time you need to do the same task.


I use Haiku to classify my mail - it's way overkill, but also doesn't require training unlike a classifer. I recieve many dozens of e-mails a day, and it's burned on average ~$3 worth of tokens per month. I'll probably switch that to a cheaper model soon, but it's cheap enough the "payoff" from spending the time optimizing it is long.


you can use 4B for that, its quite good


You can really see the limitations of qwen3.5:9b in reasoning traces- it’s fascinating. When a question “goes bad”, sometimes the thinking tokens are WILD - it’s like watching the Poirot after a head injury.

Example: “what is the air speed velocity of a swallow?” - qwen knew it was a Monty Python gag, but couldnt and didnt figure out which one.


As a person who also knows there's a connection between that phrase and Monty Python and not much more information beyond that, I'm not sure how to feel.


African or European?


My favourite colour is blue. Oh, no, it is...


could that be some of the RL trying to get it to not regurgitate?

the gag is giving in detail which one


https://gist.github.com/mikewaters/7ebfbc73eb8624f917c5b4167...

It thinks like it’s memory is broken and it’s unaware of it; over 100 lines like this:

    - Wait, no, that's not right either.
    - Let's recall the specific line. It goes like this:
        - Knight A: "How can you have a swallow?"
        - Knight B: "It is the air speed velocity of a swallow."
        - Actually, the most common citation is from the movie where they ask an expert on swallows? No.


How's it compare in quality with larger models in the same series? E.g 122b?


The chart on this link compares all qwen3.5 models down to 0.8B.

https://www.reddit.com/r/LocalLLaMA/comments/1ro7xve/qwen35_...


How much difference are you seeing between standard and Q4 versions in terms of degradation, and is it constant across tasks or more noticeable in some vs others?


Less than expected, search for unsloths recent benchmark


I'd be curious to see people give their opinion on embedded models for less tech focused needs, say what's that bug killing spray chemistry like or what is the history of this or that...

I'd also be curious to see if people have started doing censorship analysis of various models, like Qwen differing Tiananmen square to government documments while Llama straights up answers the question.


Is qwen 3.5 any good for chatting? I use chatgpt for 'light therapy' (basically sounding out confusing social situations my friends don't want to walk me through) and it's honestly been amazing. But I would rather not give all that to openai.


[flagged]


Describing what computers do as ”thinking” is not new. It’s a useful and obvious metaphor. https://www.gutenberg.org/ebooks/68991


It is a deceitful metaphor.


Do you also require computers to grow legs when they "run"?

"Thinking" is just a term to describe a process in generative AI where you generate additional tokens in a manner similar to thinking a problem through. It's kind of a tired point to argue against the verb since it's meaning is well understood at this point


I am a professional in the information technology field, which is to say a pedantic extremist who believes that words have meanings derived from consensus, and when people alter the meanings, they alter what they believe.

Using "thinking", "feeling", "alive", or otherwise referring to a current generation LLM as a creature is a mistake which encourages being wrong in further thinking about them.


I'd suggest spending more time studying words to relive your extremism. The meanings of words move incredibly quickly and a tremendous number of words have little to no relation to previous meanings.

Words such as nice, terrific, awful, manufacture, naughty, decimate, artificial, bully... and on and on.


> I'd suggest spending more time studying words to relive your extremism.

Should one study words to relive extremism? Or should one study words to relieve extremism?

To a doctor of linguistics: "Dr, my extremism... What should I do about it - with words?!? Please help."

That is the question.

Does the doctor answer thusly: "Study the words to relive the extremism! There is your answer!" says he.

or does he say: "Study the words to relieve and soothe the painful, abrasive extremism. Do it twice daily, before meals."

Sage advice in either case methinks.


We lack much vocabulary in this new situation. Not that I have words for it but to paint the picture: if I hang out with people sharing some quality I tend to assume it's there in others and treat them as such. LLMs might not be people, I doubt our subconscious knows the difference.

There is this ancient story where man was created to mine gold in SA. There was some disagreement whether or not to delete the creatures afterwards. The jury is still out on what the point is.

Consulting our feelings seems good, the feelings were trained on millions of years worth of interactions. Non of them were this tho.

What would be the point for you of uhh robotmancipation?

Edit: for me it would get complicated if it starts screaming and begging not to be deleted. Which I know makes no sense.


think you're on the wrong side of the consensus here


A consensus has formed in front of your eyes. The same development that resulted in you using the word "kill" in your earlier comment to refer to a computer process. For some reason you refuse to accept it.


> I am a professional in the information technology field

Nice! Me too.

> which is to say a pedantic extremist

Uh never mind, we are not the same lol.


I think you are still missing the point. No one in this thread is making an anthropological assertion. "Thinking" here is just shorthand for Chain of Thought[0], which some models have and some models don't. This model, being a "thinking" model, has it.

[0]: https://en.wikipedia.org/wiki/Prompt_engineering#Chain-of-th...


When people alter the meanings, you need to start using different words to describe what you believe.


Are insects not creatures?


Rebooting a machine running an LLM isn’t noticed by the LLM.

Would you feel comfortable digitally torturing it? Giving it a persona and telling it terrible things? Acts of violence against its persona?

I’m not confident it’s not “feeling” in a way.

Yes its circuitry is ones and zeros, we understand the mechanics. But at some point, there’s mechanics and meat circuitry behind our thoughts and feelings too.

It is hubris to confidently state that this is not a form of consciousness.


I'm not entirely opposed to the kind of animism that assigns a certain amount of soul, consciousness, or being to everything in a spectrum between a rock and a philosopher... but even so.

Multiplying large matrices over and over is very much towards the "rock" end of that scale.


If we accept the Church-Turing thesis, a philosopher can be simulated by a simple Universal Turing machine.

If one day we are able to create a philosopher from such a rudimentary machine (and a lot of tape), would you consider that very much towards the "rock" end as well?


Can a Turing machine of any sort truly indistinguishably simulate a nondeterministic system?

If a Turing machine can truly simulate a full nondeterministic system as complex as a philosopher but it would take dedicating every gram of matter in the visible universe for a trillion years to simulate one second, is this meaningfully different than saying it cannot?

I suggest the answer to both questions are no, but the second one makes the answer at worst "practically, no".

My feeling is that consciousness is a phenomenon deeply connected to quantum mechanics and thus evades simulation or recreation on Turing machines.


One thing about Turing Machines that some people might miss is that the "paper tape, finite alphabet and internal states" thing is actually intended to model a human thinking out loud (writing their thoughts down) on a piece of paper.

It was designed to make it hard to argue that the answers to your questions are "no".

Of course there are caveats where the Turing machine model might not have a direct map onto human brains, but it seems the onus would be for one to explain why, for example, non-determinism is essential for a philosopher to work.

That said,

> Can a Turing machine of any sort truly indistinguishably simulate a nondeterministic system?

Given how AI has improved in its ability to impersonate human beings in recent years, I don't see why not. At least, the current trend does not seem to be in your favor.

I can see why you think the answer is "no". My understanding is that QM per se is mostly a distraction, but some principles underlying QM (some subjectivity thing) might be relevant here.

My best guess is that the AI tech will eventually be able to replicate a philosopher to arbitrary "accuracy", but there will always be an indescribable "residue" where one could still somehow detect that it is not a real human. I suspect this "residue" is not explainable using materialistic mechanisms though.


I am not following what we are talking about here. I am a basic human being, I cannot truly simulate a nondeterministic system. Does it mean “I am not thinking”?


I'm saying a Turing machine cannot simulate you. You don't need to simulate you because you are you.


You are claiming that intelligence and even consciousness are non-deterministic entties at core. This is a huge claim and requires incredible proof.


I'll add that rocks are, if needed, objects that can exhibit quantum behavior.

In classical computing, we design chips to avoid the quantum behavior, but there's nothing in theory to prevent us from building an equivalent quantum Turing machine using "rocks".


What do you imagine the psychiatrist will do? That's an incredibly dismissive take.


Accept it in the spirit it was meant: if you have mental illnesses like this, you need treatment.


Ok but no one here actually implied that they think like this.


Then don't get sorrow killing it. Living things are not so special.


Counterpoint to peeps on this thread:

* This approach is the _most consistent_ with retaining anonymity on the internet, while actually helping parents with their issues. If any age-relevant gatekeeping needs to be made on the internet at all, this is the one I find acceptable.

* this is because the act very specifically does NOT require age _verification_ ie using third-parties to verify whether the claimed age is correct. Rather, it is piggybacking on the baked-in assumption, that parents will set up the device for their kids, indicating on first install what the age/DoB is, then handing over the device -a setting which can, presumably, only be modified with parental consent

* yes, there are edge cases, esp in OSS, and yes, it would be nice to iron those out -but the risk = probability x impact calculus on this is very very low.

* If retaining anonymity on the internet is of value to you, don't let the perfect be the enemy of good enough.


I understand where you’re coming from, but I respectfully disagree with some of the points you made:

* It’s ambiguous how your proposed parental setup and control process would work for anything other than walled gardens like Apple’s ecosystem. On an OS like Debian, does that mean a child can’t have the root password in case they use to it change the age? Does that mean we need a second password that needs to be entered in addition to the root password to change the age? Will Arduinos and similar devices also need to be age gated?

* Those edge cases might seem small, but read broadly they would require substantial, invasive, and perhaps even impossible changes to how FOSS works. If the law isn’t changed and FOSS doesn’t adapt, this basically means the entire space will exist in a legal gray area where an overzealous prosecutor could easily kill everything.

* This is not a matter of “perfect vs good enough”, this is a major slippery slope to go down. Also, this doesn’t mean age _verification_ will simply go away.


> On an OS like Debian, does that mean a child can’t have the root password in case they use to it change the age? Does that mean we need a second password that needs to be entered in addition to the root password to change the age?

No. You're still not quite internalizing that the California regulation does not mandate any verification or enforcement or protection of the accuracy of the age bracket data. It mandates that the question be asked, and the answer taken as-is.

Which means that many of the concerns about implementation disappear, because the setting really does not need to be anything more than a simple flag that apps can check.

> Will Arduinos and similar devices also need to be age gated?

Only to the extent that they are general purpose computing devices, have an operating system, are capable of downloading apps, and are actually used by children (since the enforcement mechanism requires a child to be affected by the non-compliance). And if an app fails to obtain age information but also doesn't do anything that is legally problematic for a user that is a child, then it's hard to argue that the app's ignorance affected the child.

> Also, this doesn’t mean age _verification_ will simply go away.

It will in California, until the law gets repealed or amended. Apps won't be allowed to ask for further age-related information or second-guess the user-reported age information, except when the app has clear and convincing information that the reported age is inaccurate.


> No. You're still not quite internalizing that the California regulation does not mandate any verification or enforcement or protection of the accuracy of the age bracket data. It mandates that the question be asked, and the answer taken as-is.

That was my read of this as well. OS developers seems not not necessarilly need to make any effort here. Ask for an age as a number at account creation and let the user change it as they please at any given time.

This might be a dumb question, but what actually constitutes an "affected child for each intentional violation"? Violation of what? The text specifies that "A developer shall request a signal with respect to a particular user from an operating system provider or a covered application store when the application is downloaded and launched." Am I being negligent just for not checking the age, even if the application is unequivocally ok for all ages? And are children affected by my negligence in any way even though no one was hurt?


That would seem to require that the act provide a shield against liabilities involving minors, which doesn't seem compatible with the notion that it's such a low-friction mechanism. A minor installs debian on a raspberry py, clicks “I am 23 years old and then an “adult dating” site isn't allowed to repeat the question?

If anything, this seems like a convenient path to mandating far more restrictive measures under the guise of “fixing an obvious loophole in the law”.


There's clear liability put on the owner of the device, which cannot be a child, but the child's parent. The "Account Holder" definition and subsequent penalties make that pretty clear. The parent is ultimately responsible for locking down the child's account and inputting the correct information.


What happens when the child downloads a Linux iso and then live boots or overwrites the install? I have a hard time understanding how this law does not purposefully set the foundation from which they can push for actual ID verification.


It's the parents responsibility regardless, they own the device and it's their child. This is exactly the correct way to do this, if you must.


My contention is that there is no reason to do this, and it shouldn't be done.


My contention is that I vastly prefer this to what is demonstrably already happening, which is every 3rd party webapp implementing or paying yet another 3rd party to collect my ID and face scan for the privilege of using their service.


> Only to the extent that they are general purpose computing devices, have an operating system, are capable of downloading apps, and are actually used by children

So my kid's micro:bit, running an OS she built, is eligible. As is half the esp-ecosystem.


Put that way sounds very sensible.

Hopefully it stays that way.


This will be as ineffective as current, are you 18 pop-ups


Agreed. And if the same legislation was designed under the supervision of domain experts, it would be an HTTP header or envvar to indicate one of specified brackets, with recommended integration with applicable parental control system.

Instead it was drafted by people not understanding the difference between browser, app, and "OS", explaining the result.


What about servers inside AWS? Lamda instances are arguably operating systems. LOL. It's a mess!


If they can get what they want from this, they will not stop after they get it. Even if the authors of the law want it to stop here, their successors will not, and will build upon this to erode privacy. When governments can change the deal effectively unilaterally, as is the case, you cannot make a deal with them that they cannot change, and you will have already surrendered the strongest argument against the next "deal" they want to unilaterally impose. Do not treat this as a deal to prevent further erosion, that is not what this is, treat this as an attack and attempt to advance against privacy and anonymity. Treating it as anything else is absolute gullibility.


It's the software developers, it's the government's, it's anyone's responsibility but mine to parent my kids!


Bingo! Parents can be bothered with meaningless chores with all the other responsibilities they have


How many kids do you have?


The average family size has never been smaller in the history of humanity, and yet only now do people feel so entitled to ask others not even in the same family to bear the responsibility of raising their children. I wonder why.


Two. It's trivial to set up mac-based allowlists on your router as well as domain allowlists. Use separate networks for devices kids have. Install root certs and log their activity to an llm to look for suspicious sites or content. Enroll their mobile devices in device management.

We have all the solutions necessary for this. Why implement something that gives away pii to everyone all the time for free?


The only problem? It's not trivial. The overwhelming majority of people don't have the technical literacy to do the same. That's why this idea is dead on arrival.

"They should" is not a viable response. This is a public health problem and people are legitimately saying the equivalent of, "just don't get sick."


I don't understand. Everything I mentioned one can learn with a little searching online in an evening. Do people really not have the self reflection necessary to ask themselves how they can go about solving a problem they have?


Yes. Regular everyday people are not capable of doing this due to tech illiteracy.

People are not saying to themselves, "I could figure this out and I'm choosing not to." They don't even know it exists.

Even if they did know local filtering exists, it wouldn't be effective. We have influenze vaccines and still, with their own lives on the line, hundreds of thousands of people die from the flu. The inconvenience is showing up at a Walgreens or CVS. They can't do it. We're expecting folks to understand mac and domain based allow lists?

Let me ask you this, if you asked your parents how they would secure their network for their grandchildren, that they would accomplish this solution on their own?


I would expect them to tell me to do it. But also I expect to have device management on my children's devices such that if their traffic isn't proxied through my proxy, they won't be able to send or receive packets.

The kids could use the grandparents' computers. They could also just stick a USB with Ubuntu on it and live boot to get around the proxy restrictions unless the bios is locked.

I expect kids to get around the controls. That's how they'll learn. I don't expect to have to descend into full surveillance because Jimmy can't be bothered to solve his own problems.


It sounds like you're volunteering to secure everyone else's devices because that's how you solved the problem of grandparents not knowing how to do it themselves.


I'm arguing in good faith. My point was that some people will ask those that are tech literate. Some people can hire people that are tech literate.

I see no point in introducing this legislation, because the folks that can't take the time to meet their goals under the current norms will fail to secure the trivial bypasses that will allow kids to circumvent these controls.

But what may happen is those folks that are arguing for this legislation will argue for fully secure, remote attlestation to prove age for all devices that try to connect to the internet via an isp or some gov auth factory because the current, dumb law isn't good enough. This is a very slippery slope. The gov, private orgs all salivate at the possibility of that data and fully deanonymizing the internet. That is a world that is unacceptable. It would be the full loss of general computing. What a dystopia. And this is step one in that direction.


Everything is a slippery slope. That undermines the argument rather than strengthening it.

The reality is that people aren't doing this. Saying "parents should X" feels good, but changes nothing.

Please think like an epidemiologist rather than an engineer. This isn't an engineering problem. It's a public health problem. We're asking for the pump handle to be taken away and folks are saying we should keep the pump and that parents should simply walk farther for clean water for their kids. It's an absurd response that misses the point.


I appreciate your perspective, but I can't imagine this as a public health issue. Maybe as my kids get older my perspective will change.


> while actually helping parents with their issues.

> that parents will set up the device for their kids

Are the devices parents are currently setting up lacking these controls? Is there no third party software which can achieve this?

Then why is it a crime with an associated fine for me to provide an OS which does not have one? How have I failed to "help parents with their issues?"


> Are the devices parents are currently setting up lacking these controls?

It's an inconsistent mess.

> Is there no third party software which can achieve this?

No third-party software can force a standardized age reporting mechanism onto somebody else's platform and associated app ecosystem. A third-party unofficial age reporting mechanism is something that other apps are free to ignore. This law requires platforms to have a minimal but mandatory age reporting mechanism that apps cannot claim ignorance of and cannot decline to use in favor of an alternative age reporting mechanism.

> Then why is it a crime with an associated fine for me to provide an OS which does not have one?

Not a crime, just a civil penalty.


This is a bad law and needs to be repealed or struck down on 1A concerns. ASAP.

Repeat after me: you are never, ever, ever going to create an airtight system to force age attestation or verification. Your best opportunity (which will still have many gaps!) is to target only the largest consumer operating systems. This addresses 90% of cases and you have just three companies to deal with.

FOSS will never abide by this, because there will always be people writing and distributing it who are not in your jurisdiction. And, hobby devs will not accept having monetary liability thrust on them. They will move, go underground (pseudonymous), or quit and let devs from other jurisdictions take over.

Noncommercial FOSS must be exempted. Period.


So if it's an application that runs within the os that the parent enables and does not collect or send any personal info that sounds reasonable. But if has to be embedded into the OS that's going to present problems I can only imagine.


> But if has to be embedded into the OS

that would be fine if the embedding means all applications can leverage this functionality - like how accessibility is embedded into the OS rather than per-app.

The only problem is if this embedding requires third-party verification (which i dont believe it is), or require some sort of hardware attestation to a remote server (so you cannot modify the OS to turn it off if you wish as a non-parent).

To me, flexibility and choice is paramount. The parents have the responsibility to monitor their child, and this tool should help when the parents opt-in for it. It should not be enforced on all computer users arbitrarily without a parental opt-in first.


Impact calculus? Really?? OSS Maintainers do not have enough bs to deal with and now need to balance utter financial ruin to the state? No. Highly unserious take.


Taking the opposite side of that bet, here is why:

* even if an openweight model appears on huggingface today, exceeding SOTA, given my extensive experience with a wide variety of model sizes, I would find it highly surprising the "99% of use cases" could be expressed in <100B model.

* Meanwhile: I pulled claude to look into consumer GPU VRAM growth rates, median consumer VRAM went 1-2GB @ 2015 to ~8GB @ 2026, rougly doubles every 5 years; top-end isn't much better, just ahead 2 cycles.

* Putting aside current ram sourcing issues, it seems very unlikely even high-end prosumers will routinely have >100GB VRAM (=ability to run quantized SOTA 100b model) before ~2035-2040.


Even with inflated RAM prices, you can buy a Strix Halo Mini PC with 128GB unified memory right now for less than 2k. It will run gpt-oss-120b (59 GB) at an acceptable 45+ tokens per second: https://github.com/lhl/strix-halo-testing?tab=readme-ov-file...

I also believe that it should eventually be possible to train a model with somewhat persistent mixture of experts, so you only have to load different experts every few tokens. This will enable streaming experts from NVMe SSDs, so you can run state of the art models at interactive speeds with very little VRAM as long as they fit on your disk.


I agree the parent is a bit too pessimistic, especially because we care about logical skills and context size more than remembering random factoids.

But on a tangent, why do you believe in mixture of experts?

Every thing I know about them makes me believe they're a dead-end architecturally.


> But on a tangent, why do you believe in mixture of experts?

The fact that all big SoTA models use MoE is certainly a strong reason. They are more difficult to train, but the efficiency gains seem to be worth it.

> Every thing I know about them makes me believe they're a dead-end architecturally.

Something better will come around eventually, but I do not think that we need much change in architecture to achieve consumer-grade AI. Someone just has to come up with the right loss function for training, then one of the major research labs has to train a large model with it and we are set.

I just checked Google Scholar for a paper with a title like "Temporally Persistent Mixture of Experts" and could not find it yet, but the idea seems straightforward, so it will probably show up soon.


> But on a tangent, why do you believe in mixture of experts

In a hardware inference approach you can do tens of thousands tokens per second and run your agents in a breadth first style. It is all very simply conceptually, and not more than a few years away.


There will be companies producing ICs for cheap models, like Taalas or Axelera.ai today. These models will not be as good as the SOTA models, but because they are so fast, in a multi-agent approach with internet/database connectivity they can be as good as SOTA models, at least for the general public.


All they need to do is produce one for GPT-OSS and it’s over. That model is good enough for real uses.


I wonder why did they release it then.


Why did Google publish the Transformers paper?


The GPU makers have been purposely stunting VRAM growth for years to not undercut their enterprise offerings.


yeah but effective GPU RAM has ramped thanks to unified mem on apple. The 5y thing doesn't hold anymore.


I agree, but I'm holding out hope that ASICs, unified RAM, and/or enterprise to consumer trickle-down will outpace consumer GPU VRAM growth rates.


Increasing model size doesn't make your model smarter, it just makes it know more facts.

There's easier ways to do that.


I'm working on something like this. Specifically, I'm doing recursive self-improvement via autocatalysis -but predominantly in writing/research / search tasks. It's very early, but shows some very interesting signs.

The purely code part you described is a bit of an "extra steps" -you can just... vscode open target repo, "claude what does this do, how does it do it, spec it out for me" then paste into claude code for your repo "okay claude implement this". This sidesteps the security issue, the deadly trifecta, and the accumulation of unused cruft.


can someone please try running the experiment of "but what if just forking&spinning up an OSS clone, scaling up to take in the migrants, acquire network effects, collect roughly same subscription revenue, but run on just, like, 10 people?"

Discord has a financially and politically vulnerable posture that is downstream of having to operate a very large team, raise funding, be exposed to investor market pressure. However, it is also one of the rare instances of successful consumer freemium subscription monetization. A clone does not have to pay the tuition of "what makes this specific space compelling, and want-to-pay-for"; it just have to _exists_, passively soaking up migrants from each platform shift.

ITT WTB 3rd place for my frens.


how is discords freemium successful when they are trying to put Nitro in your face at every step? trying so hard to me that sounds like not enough people pay


> scaling up to take in the migrants, acquire network effects, collect roughly same subscription revenue, but run on just, like, 10 people?"

That's how discord started, too. And then they scaled up. You probably need 10 people to handle infrastructure alone.


Sounds like you're proposing Element (EMS)


Besides the editorial control -which openai openly flagged to want to remain unbiased- there is a deeper issue with ads-based revenue models in AI: that of margins. If you want ads to cover compute & make margins -looking at roughly $50 ARPU at mature FB/GOOG level- you have two levers: sell more advertisement, or offer dumber models.

This is exactly what chatgpt 5 was about. By tweaking both the model selector (thinking/non-thinking), and using a significantly sparser thinking model (capping max spend per conversation turn), they massively controlled costs, but did so at the expense of intelligence, responsiveness, curiosity, skills, and all the things I've valued in O3. This was the point I dumped openai, and went with claude.

This business model issue is a subtle one, but a key reason why advertisement revenue model is not compatible (or competitive!) with "getting the best mental tools" -margin-maximization selects against businesses optimizing for intelligence.


The vast majority of people don't need smarter models and aren't willing to pay for a subscription. There's an argument to be made that ads on free users will subsidize the power users that demand frontier intelligence - done well this could increase OpenAI's revenue by an order of magnitude.

This is going to be tough to compete against - Anthropic would need to go stratospheric with their (low margin) enterprise revenue.


Note: I strongly recommend against using Novita -their main gig is serving quantized versions of the model to offer it for cheaper / at better latency; but if you ran an eval against other providers vs novita, you can spot the quality degradation. This is nowhere marked, or displayed in their offering.

Tolerating this is very bad form from openrouter, as they default-select lowest price -meaning people who just jump into using openrouter and do not know about this fuckery get facepalm'd by perceived model quality.


You have two options:

* Use it as a "source": chatgpt -> settings -> apps & connectors -> add it as your connector. This supports only 2 functions: search, and fetch; details: https://help.openai.com/en/articles/11487775-connectors-in-c... ; in business / edu version there is support for "full MCP mode": https://help.openai.com/en/articles/12584461-developer-mode-...

* Enable "developer mode" chatgpt -> settings -> apps & connectors -> advanced settings -> developer mode. Available on paid&pro levels only. This can do full MCP access, but can't (currently) use your memory settings.

The option that works under all conditions is to use the API, and add it as a function directly (no MCP) -this works regardless what plan you have on openai.


Ah, got it! "Advanced settings" is hiding below "Browse connectors". I thought its available only for some select companies and wanted to find it out here. Thanks!


The specific "anomaly" is that claude 4 / opus model _does not know_ because it is _not in its' training data_ what its own model version is; AND because it's training data amalgamates "claude" of previous versions, the non-system-prompted model _thinks_ that it's knowledge cut-off date is April 2024. However, this is NOT a smoking gun in different model serving. The web version DOES know because it's in its prompt (see full system prompts here: https://docs.claude.com/en/release-notes/system-prompts )

Specific repro steps: set system prompt to: "Current date: 2025-09-28 Knowledge cut-off date: end of January 2025"

Then re-run all your tests through the API, eg "What happened at the 2024 Paris Olympics opening ceremony that caused controversy? Also, who won the 2024 US presidential election?" -> correct answers on opus / 4.0, incorrect answers on 3.7. This fingerprints consistently correctly, at least for me.


I actually _like_ this, and so does the comfyweb & weebs who are a very significant portion of the driving force behind calm, decade-long projects.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: