Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Hello, Perceptron: An introduction to artificial neural networks (might.net)
142 points by greghn on June 17, 2023 | hide | past | favorite | 28 comments


Has anyone has tried searching for new basic operations, below the level of neural networks? We've been using these methods for years, and I doubt the first major breakthrough in ML is the most optimal method possible.

Consider the extreme case of searching over all mathematical and logical operations to see if something really novel can be discovered.

How feasible would this be?


I’m not sure I’d say NN was the first major breakthrough.

For many years people considered them too inefficient to compete with SVM, and people genuinely thought kernels was the way to intelligent machines.

Today you find researchers claiming that Bayesian nets will outcompete NN.

We’ve also seen tremendous success of random forests and other ensemble models.

I am sure there are plenty of researchers looking into all sorts of novel ensembles.

I think the major breakthrough is ensemble models and with a little bit of cheekiness you can say that NN are ensemble logistic regressions.


I’m not sure if you are aware, but Bayesian neural networks can be actually be well approximated by appropriate ensembles of standard neural networks [0]. The strength of Bayesian nets (including the approximating ensembles) is that they are able to estimate the uncertainty in their own predictions (by generating a probability distribution of possible predictions), at the cost of more computation needed for training and inference. I don’t think it’s ever going to be a matter of Bayesian nets outright outcompeting standard nets though, it’s just another tool in the toolbox if you want a model which “knows it doesn’t know something” and don’t mind the extra compute needed.

[0] https://arxiv.org/abs/1810.05546


Couldn't that be a way to address the issue of current LLMs hallucinating?


Possibly, but I struggle to reason about Bayesian nets at that scale. I think the level at which a Bayesian net could “know what it doesn’t know” would be regarding uncertainty in what text to generate in a given context, not whether or not the generated text is saying something true. One example could be a prompt in a language not seen in the training data. It could be that some plausible sounding made up thing is likely in a given context. Also, at the end of the day, what you’ll get out of a Bayesian LLM is a sample of several generated texts which would hopefully have more variation than multiple samples from the same standard LLM. I can see it being helpful to see if the different outputs agree or not, but I can’t tell at a glance how well it would work in practice.


Thanks for the explanation!


> Has anyone has tried searching for new basic operations, below the level of neural networks?

Perceptrons aren’t even a good analogue for biological neural networks. Each dendrite in and of themselves behave something like a multi layer perceptron. Back propagation doesn’t resemble human learning. Biological neurons also have a temporal activation function. Todays most successful ANNs seem focused on raw compute, but they may be missing some of the secret sauce that permits more interesting behaviors.

https://braininspired.co/podcast/167/


I can see closer modeling to biological neurons going in two directions,

1.) definite improvements

2.) and engineering inefficiencies that are economically difficult to overcome.

A mindless analogy would be fixed wing aircraft with engines vs. wings that flap. And hey, maybe we end up with flapping wings on commercial aircraft at some point, so who knows!

Day one of Intro to CE had a slide that is burned into my brain:

Engineering = Physics + Economics


> 1.) definite improvements > 2.) and engineering inefficiencies that are economically difficult to overcome.

Improvements can be had in directions we’re not even thinking about and possibly that spark in artificiality that brings it to life in the self autonomous way. I could see that becoming a bad path for us…

Engineering inefficiencies will arise when it’s used for the wrong thing and that’s happening a lot when a ton of money is poured in new tools that are used without being properly understood.


It seems the value of ML and AI comes from the approximation of the brain, or best-fitting conceptual models that nevertheless today contain high error. As I understand, dendrites vs perceptrons are quite different, but holistically, this is about energent behavior from simple input output networks. Structuralism is about to be put to the test. The better our conceptual models fit the biological behavior, the more "human-like" the behavior. We should expect the field to continue developing practical and better fitting models. The only question, here then, is a "articial ladder of consciousness", where we decide where in this spectrum a being deserves to have rights and not suffer. We may want to start granting rights to our models today to avoid a history of enslaving conscious beings (serious perspective)


> fixed wing aircraft with engines vs. wings that flap

It's a pretty old argument, and a little unfair. The better comparison would be between between bird's wings and drone's rotors, and even then if you wanted to compare an owl's agility at swooping and landing on branches to that of a drone, rotors aren't serving as well as flappy wings.

For sure, in terms of transatlantic flight and carrying capacity, fixed-wing all the way. But for neural networks, a lot of what's useful isn't doing things that are altogether different from what humans do, just automated and faster.


The efficacy of neural networks really boil down to the efficacy of linear operations, which in turn – I suspect – are efficacious because all smooth functions are linear when you look closely enough.

That might help with the intuition for why neural networks seem to represent such a fundamentally useful operation. I'm the wrong person to speculate about the future.


Kanerva associative architectures have one extra basic operator not found in classical NNs: https://redwood.berkeley.edu/wp-content/uploads/2021/08/Modu...


If you can come up with desired inputs and outputs and have some building blocks in terms of the operations you can perform on them, sure. This happens with https://en.wikipedia.org/wiki/Superoptimization and various algorithms have been found using such techniques. It also occurs at higher levels of abstraction, for example: https://www.deepmind.com/blog/alphadev-discovers-faster-sort...


There are only sixteen possible maps from Z^2 to Z^2.


> But, obviously, there is no magic.

That's not entirely true. As someone who has a pretty long background in this area, works with LLMs/Diffusion models everyday, and generally thinks there is a bit too much hype (but also a lot of potential): there is a lot that we don't really understand about how these models behave and why.

For starters, this article discusses how we need to change the architecture for solving XOR. That's something we do understand quite well. However what we really don't understand is why architectures like transformers work so well. From an engineering standpoint they make sense because the models look like they're doing something we want, and it makes intuitive sense that they work.

But from a theoretical standpoint it's not known why we really need all these fancy architectures (rather than just using a bunch of layers that should also be able to "figure out" what the network needs to do). All of our success has boiled down to "hey, let's try this and see if the model will learn better/faster"

Similarly, from a mathematical perspective, we do have some intuition around the reality that all NNs are basically doing some highly non-linear, complex transformation on to some latent surface where the problem is linearly separable. That gives us a sense of why these models probably can't learn "truth" (unless you do believe there exists a latent space of what is true and what is not, which is pretty radical). But if you start asking many more questions about how this works, or why the model would choose one representation of the other internally, we don't really know.

Nearly all of our progress in deep learning that last decade has been basically hacking around and applying larger and larger amounts of data and compute resources. But at the end of the day even the best in the field don't really understand exactly what's happening.

Following from this: If you really want to understand these tools better, start playing with them and trying to build cool things. A deep understanding of the fundamentals is not much more useful for success with LLMs and Diffusion models is that knowing how to efficiently implement b-trees helps build a cool product with a database back end.


> But, obviously, there is no magic. > That's not entirely true.

Yes, that is entirely true. There is no magic. Even if we don't understand some parts yet. There is no magic.


It can absolutely be called magic when creators of LLMs themselves openly say that don't understand why they work the way they work. The word "magic" is very flexible ("his singing is magical", "it was a magical holiday in Vegas with four girls and me in the hotel room") and it can be definitely used in this context as "something wonderful we don't fully understand".


>But from a theoretical standpoint it's not known why we really need all these fancy architectures

As someone who has been researching neural networks in a variety of settings for a very long time now, it is actually pretty obvious. There is also no real "magic" to it, even though it certainly might seem so to people who did not follow the academic world of research closely. But to those who do, all of this followed a pretty straightforward path, even though certain key steps were only obvious in hindsight. We already knew since the 90s that a perceptron with a single hidden layer can approximate any function (with some caveats that in practice only boil down to computational limits) with arbitrary accuracy, with the error scaling like 1/N with the number of hidden Neurons. But the proof of that theorem already shows that this is by far not the most efficient way to approximate functions. While in practice you could plug pixel values of an image directly into a perceptron, computationally it turned out to be hugely more efficient to use convolutions first as a dimensionality reduction scheme. This not just allowed people to train much larger networks on larger datasets, it also highlighted how additional layers enable hierarchical knowledge. So the first layer of such a network might only encode lines or circles, while deeper layers could encode noses and ears and eventually entire human faces. For language modeling, the thing holding everything back was also computability. Recurrent neural networks are theoretically even more powerful than simple perceptrons, but they come with a significant cost when computing gradients. Trying to improve these restraints is what eventually led to the transformer, which at its core is just an extremely scalable, general purpose, differentiable algorithm that you can optimise using backpropagation. We didn't need this architecture from a purely theoretical perspective, but we needed it in practice because our computing hardware is still very limited once we are trying to mimic actual biological neural networks as you would find them in the human brain.


> But to those who do, all of this followed a pretty straightforward path, even though certain key steps were only obvious in hindsight.

The research still largely relies on post-hoc justification for these architectural benefits. We know CNNs work, we can open them up and see what they're doing, but we didn't get there from a theoretical foundation that predicted this outcome, nor do we have a real theoretical framework to justify them.

The history of pre-science is filled with post-hoc justification that is very similar, allows for practitioners to make progress, but ultimately has turned out to be wildly incorrect.

> it is actually pretty obvious.

In this entire reply you leave out the theoretical justifications to back up this claim. You show many example of intuitively why these architectures work, but never dive into the rigorous explanation, because such explanations don't exist yet.

This comment simply outlines the growing "bag of tricks" we've built up over the years to solve problems, along with the common post-hoc justifications. But at it's current state this is not different than alchemy, which did get some ideas correct, was able to create some useful practices but ultimately failed to provide a theoretical frame work for what was being done.

I don't know any serious deep learning researcher who disagrees that at this point the practice far out paces our theoretical understanding.


Is the key to answering this question in the continued study of neurobiology? Are there any clues as to what the human brain is doing that apply to these concepts? Structuralism is radically popular, one would think if its are right, we should be able to grow conscious beings with a certain original blueprint.


That opinion was held by a large part of the field for the longest time, and some actually still cling to it. These are usually the people who criticise transformers, because they go against everything they believe. But what we have seen in recent years points to the fact that capability of neural networks is only a question of size. Yes, the human brain uses some tricks like recurrent layers and convolutional layers as well - and to some extent it does so better than we currently can. But transformers have shown that you don't need any of that for language processing and not even for vision, showing once again that you only need a sufficiently sized network. The details of the architecture are not that important. In the same way that your microprocessor architecture does not really matter once you deal with high level programs in userland.


Interesting, thanks!


Well that was f-img crazy. Since these ML networks behave like biological neurons, and both systems produce gorgeous rich output, why is it not fair to say that ML systems have an energent consciousness like ours, that both are conscious simply running on different hardware? Why aren't they both conscious except humans are "running on sodium ions" and ML is running on silicone logic gates? Why aren't people saying human consciousness might be moving from ion based to perceptron based hardware? I know it's a crazy thought but I wonder if humans are really discovering how to change their underlying hardware platform? Issues of the soul aside where a living being will still be unable to themselves jump platforms.


A “biological neural network” in a petri dish that has reorganized (been trained) to play Pong by means of electrical stimuli is not conscious. A slime mold that moves away from the light and “solves mazes” is also not conscious.

It is also my (relatively uninformed) understanding that a perceptron can’t really approximate a “neuron” outside of being inspired by how neurons in the visual cortex operate. For that, you need a DNN, thus human neurons are orders of magnitude more complex than “artificial neurons” and they only share a name and a slight inspiration.

All of this is just regression based function approximation, in the end, there’s no need to grasp for a ghost in the machine or anything spooky. It’s just statistics, not consciousness.


You say it is not conscious, that's fine. I am asking you to provide evidence why it is not, when conscious life is an emergent system like these systems. I am looking for an argument or a reasoned response about what is different.


Because regression based function approximators can only "fit the data." That's the difference. They are mathematical constructs that do not have experiences, preferences, or any form of sentience. To assume that such architectures can, and potentially do, or that those things could just emerge out of them given enough weights or layers, that's anthropomorphizing the model. Which humans love to do.

Human or animal consciousness is an emergent phenomenon that entails the ability to experience subjective states: emotions, self-awareness, etc.. It is not just about processing information but involves the qualitative experiences and the “what it is like” aspect of being.

When humans or animals feel pain, there is a subjective experience of suffering that is inherently tied to consciousness. The importance we assign to events, objects, or experiences is inherently based on how they impact our conscious experiences. The worth of things big or small is contingent upon the emotions or feelings they evoke in us.

In contrast, a regression-based function approximator does not have preferences, emotions, or experiences.

When you decide to lift your hand, there is a conscious experience involved. You have an intention and a subjective experience associated with that action. On the other hand, a regression-based function approximator does not “decide” anything in the experiential sense. It simply produces outputs based on inputs and pre-training and maybe RLHF that adjusted its weights. There is no intention, no subjective experience, and no consciousness involved.

There is no qualia. To put it simply: a LLM could output some text that makes you "believe" it has preferences, and subjective experiences. But there's nothing there. Just cognitive artifacts of human beings from its corpus. Does an LLM have recursive self-improvement? Does it have self-directed goals? Does it have any of that? No. It's a predictor. LLMs are not sentient. They have no agency. They are not conscious.

If all of that is not convincing to you, consider the following (audio-visual) perspective: https://www.youtube.com/watch?v=FBpPjjhJGhk


Might be just me, but I'd take that specifically demanding tone with a pupil or an AI, not a peer




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: