Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ask HN: Can you crowdfund the compute for GPT?
153 points by asim on Jan 12, 2023 | hide | past | favorite | 146 comments
I'm just curious to know whether it's possible to crowdfund the compute costs for GPT models? It seems like this stuff is going to start to get beyond what any one individual can run, meaning it's in the hands of corporations or people with deep pockets. Can groups of people pool together money to run shared models? Because the alternative is that the companies just run away with the technology and leave the rest of us to wait for APIs or use whatever they give us.


I've often wondered why a service doesn't exist that allows you to rent out your graphics card for the large data processing needed for training models. Like mining bitcoin except you are doing something actually useful and getting paid actual money for it. Example:

- Company Alpha needs $40,000,000 worth of cloud computing for their training model - Company Beta provides them said cloud computing for $30,000,000 from their pool of connected graphics cards - Individuals can connect their computers to the Company Beta network and receive compensation for doing so. In total $20,000,000 is distributed.

Company Alpha gets their cloud computing done for cheap, Company Beta pockets the $10,000,000 difference for running a network, the individuals make money with their graphics cards, except this time it's actual United States Dollars. What am I missing here that would make this type of business unfeasible?


Surprised no one has commented this but the latency requires the model to be trained in tiny fragments on each device which is currently a field of research that is being explored. As it stands now basically all of a model needs to be loaded into memory.

There’s a whole field here and people exploring this problem, colloquially solving this would enable Federated Learning and whoever figures this out will far eclipse OpenAI (if it’s ever solved).


To piggy back this comment, Federated Learning actually has a lot of other uses beyond just crowdsourcing. It has benefits to data privacy also. The medical industry has struggled sharing their data with other institutions without compromising their information. They're using it to decentralize the model making process.

Intel is doing some work with Penn on the subject now, if people want to read further: https://www.intel.com/content/www/us/en/newsroom/news/intel-...


AFAIK Federated Learning is not a magic solution. This post explains how you can exploit FL if there's some untrusted server (http://www.cleverhans.io/2022/04/17/fl-privacy.html)


Well, that explains the lack of a BOINC client for ML tasks at the moment.


Where can I read more about this field of research?


A broad covering term is MLOps (like DevOps).

https://en.wikipedia.org/wiki/MLOps

Armed with that term, we get (haven’t read):

Machine Learning Operations (MLOps): Overview, Definition, and Architecture

https://arxiv.org/abs/2205.02302

[+ps]

Better resource: https://ml-ops.org/

Based on my (limited) exposure to date, there is tremendous opportunity for software engineers and architects to make impact in ML systems. There is a pronounced lack of seasoned engineering talent (outside of big players like DataBricks, et al) and this knowledge gap sits behind an experience curve that mere IQ can’t jump over. Our experience as software architects and engineers is very valuable.

Know this and recognize the value you will bring to the table.


This initiative is getting some pretty good practical results https://arxiv.org/abs/2209.01188, taking a look at their citations should lead you to other examples in the field


https://arxiv.org/ has a ton of papers on it.


Do you have some good search terms to get started down the rabbit hole?


Probably the biggest recent result: https://arxiv.org/abs/2209.04836 (author thread: https://twitter.com/SamuelAinsworth/status/15697194946455265...)

See also: https://github.com/learning-at-home/hivemind

and more to OP's incentive structure: https://docs.bittensor.com/

Latter two intend to beat latency with Mixture-of-Expert models (MoEs). If the results of the former hold, it shows that with a simple algorithmic transformation you can merge two independently trained models in weight-space and have performance functionally equivalent to a model trained monolithically.


I too would like to go down this rabbit hole. I am going to poke around using the terms “distributed learning” and “federated learning” (They’re different areas, but somewhat related as far as I understand).


This one is a few years old, but seems interesting: https://arxiv.org/abs/1802.05799v2


bitorrent for GPUs?


isn't this already implemented in iPhones for their autocomplete suggestions?


Lots of these services:

- Fluidstack: https://fluidstack.io

- Vast: https://vast.ai

- QBlocks: https://qblocks.cloud

- RunPod: https://runpod.io

- Sonm: https://sonm.com (blockchain)

- Golem: https://golem.network (blockchain)

- Rentaflop: https://rentaflop.com (rendering specific, blockchain)

- RNDR: https://rendertoken.com (rendering specific, blockchain)

If you want HPC specific cloud providers:

- Crusoe Cloud: https://crusoecloud.com

- Coreweave: https://coreweave.com

- Lambda Labs: https://lambdalabs.com

- Paperspace: https://paperspace.com

As others have pointed out, the decentralized clouds can't offer high performance interconnects (e.g. InfiniBand) that a lot of folks are using for LLM training. There are definitely initiatives underway to reduce dependence on these interconnects and build performant distributed training (again, some threads below mention this), but I think it's mostly academic at this point.

Disclosure: I run product at Crusoe Cloud, which aims to provide ML training at half the cost of a hyperscaler, while also being carbon reducing (https://crusoecloud.com/climate-impact/).


There's a lot of problems. 1. How can I confirm that you've done the computation? 2. Privacy and security issues. Can I trust you too process my sensitive information? 3. Availability: is there a guarranty you won't just do half of it and then be on a hiatus for 2 months. But for everywhere these problems are solved we have decentralized cloud computing. For others you need to solve these problems.


>1. How can I confirm that you've done the computation?

The same problem applies to things like Mechanical Turk and other croudsourcing. The way I've dealt with the issue in the past is to start with zero trust and to have them do computations that I already know the answer to. After that, they do computations that are matched with a random other participant (the two results should match, if they don't, compare against a third random participant).

Later, when trust has been developed, you can start assuming their work work is trustworthy, but still check it randomly with computations you know the answer to, or a second person doing the same computation.

Yes, this adds overhead (roughly 10% in aggregate) but it works fairly well. It works even better if there are penalties that you can impose for ffraudulent results (like cancelling ALL owed payouts).


> have them do computations that I already know the answer to.

By definition, this is not having them do any computation. The proper solution at this time would be some trapdoor function that is easily verifiable (proof of work), at least while P != NP


>By definition, this is not having them do any computation.

If I ask you to compute the first million digits of pi to the power of 1.23456 and I already know the answer to validate it, how is this "by definition" not computation?


Good questions. Hmm do you think maybe it would be viable if the payouts were based on [model] performance rather than ostensible training time?

There's a useful asymmetry we can exploit: finding weights that perform well is computationally intensive and takes time, but scoring a set of weights is fast and easy.

A number cruncher could spend 2 weeks training a model, and then when they submit the results it takes me 10 seconds to score the model - to verify the quality of the results, and calculate the performance-based payout. In the #1 or #3 scenario where they didn't do or didn't complete the computations, they wouldn't have a well-trained model to submit for payout. (The lost time in #3 is inconvenient in time-sensitive situations, but mechanisms exist to address that - SLAs, up-front collateral, etc)

Regarding privacy, that's an EXTREMELY good and important question. There's some really neat prior art for privacy-preserving machine learning that could be useful here, e.g. https://arxiv.org/abs/2106.07229 "Privacy-Preserving Machine Learning with Fully Homomorphic Encryption for Deep Neural Network"

(note I'm approaching this as an interesting DistML thought experiment, not proposing it as an immediately viable or sensible initiative)


SETI@home and Folding@home had to deal with these problems decades ago - even with a closed-source client people would mod it in questionable ways to cheat the leaderboards.

Any computation can be verified as having been done by, at a minimum, checking for reproducibility. This requires having each work unit be done twice and only issuing credit if both units match. For deep-learning applications "match" is relative: different compute accelerators are going to give different results. So, instead we can insist that all the floating-point outputs on the model have to match up to the first n bits of mantissa. Neural networks are actually really insensitive to small perturbations in their weights, and it's common to train on 16-bit floats to save time.

We can also exploit the loss function itself as a verification mechanism. Generally speaking training is more compute-heavy than inference[0], so we can just run the updated model on the training set and confirm that your trained model is better than the original you were provided with to start from. This will need upper bounds, too - if only to catch people trying to overfit the model to guarantee they get credit.

As for privacy and security... the answer is to not train on private data or things that people do not want to be trained. Period. This isn't even a problem solely with distributed computing. All AI training should be limited to either data provided with consent, or data that's so old that training on it would not cause harm.

Availability is a problem, but not necessarily one that most distributed computing projects actually have to deal with. There is a minor incentive to participate with the credit system; there's a leaderboard for the fastest/highest credit users and teams. And people do compete for those leaderboard slots, because that's effectively ad space.

[0] Model execution.


Some blockchain projects attempted to tackle this. Check out Akash.


Akash is more about k8s, paid for by tokens.


There are a number of entities working on this problem. Plenty of papers in Arvix on it as well.


The Lightning Network may be used for invoices, identity and payment of use of model. It can also likely be used to handle prompts. Regarding privacy, not all things need to be private and if they do they should be run on a single tenant infrastructure. OpenAI is a multi-tenant cloud model, so guarantee of privacy there may be protected only by license and use agreement.


https://petals.ml/ seems to be headed in that direction.


ML typically needs decently high bandwidth and low latency links between computers too.

Sometimes the algorithm or the data is commercially sensitive.

Those are the two main reasons a 'rent out my GPU while I sleep' scheme wouldn't work.

Neither are insurmountable though.


Yeah someone posted something like that on HN maybe a year ago.

It's not especially useful though because most companies (who are actually going to pay for this service) aren't going to want to send their training data to random people, and ML training needs high performance links between the cards. Plus you'd have to deal with the fact that you're running on 100 different GPU models.


In addition to privacy, performance, and portability, also:

* Servers in a datacenter are much more reliable than a network of PCs (power goes off, someone decides to play Crysis, etc)

* People will find ways to scam you (pretend like they’re doing the calculation while not actually doing it)

* Economies of scale means a datacenter will probably be cheaper than what you’d have to pay the PC owners (power consumption, network, wear, etc)

* The PC owners will have to trust this arbitrary code that they’re running (can you assure that there’s never a jailbreak resulting in a huge botnet?)

I’ve pitched this idea more than a decade ago and quickly realized it has many issues.


What if you remove the financial incentive?

I'd contribute my GPU time to a Folding@Home style project if it meant that we had powerful, open LLMs that were free to use. I'm positive many others would as well.

As far as worrying about scammers, could you send the same compute task and training data to multiple clients and validate the results against each other? If they differed, you could throw the results and try again, or send it to a 3rd to break the tie.


You could remove the financial incentive, but then you’re really limiting the pool of resources (you have to rely on people’s kindness, which only works, partially, in noble cases like finding a cure for cancer or searching for extraterrestrial life).

By sending the work to 3 people each time, you’re effectively cutting your (already limited) resource pool by 66%.



Also https://www.runpod.io/ (which I’ve been a very happy user of)



Nice! Knew once I thought about it that someone had to have done it already.


Company Alpha will probably just call up AWS or Azure and sleep easy. Company Beta can't really compete because they have to pay both themselves and their contributors; AWS could sell ML compute at-cost and still turn a profit.

What made this type of business feasible/attractive for cryptocurrency is that the miners were Company Beta. There wasn't two mouths to feed or two pockets to line, just a direct and transparent reward scheme for people donating compute. Projects like Folding@Home have leveraged world-scale networks before, but I'm not aware of anyone who's managed to monetize it.


Large enough miners don't have the right hardware. Mining requires a tiny cpu, very little ram and gpus with a max of 8gb, oh and 100mbit. There was never an incentive to buy hardware that could be reused for other purposes since mining itself was such a cash cow.

Company Beta likely started off with hardware for AI/ML and mined on the side when they didn't have enough customers.


It would require a problem that reasonably fits the solution.

You for example get a keyboards and a set of hands with each node, a webcam, a microphone.

I have no real idea but it seems the hand of cards isn't hopeless.


>AWS could sell ML compute at-cost and still turn a profit

How? Definitionally, they wouldn't be turning a profit.


Traditionally, people don't just get one thing from AWS. If their ML compute doesn't make money, S3 Object Storage will. My bigger point is that AWS and Azure have much greater control over their margins than a business like the one OP was describing.


I tried building a startup around this back in ~2012 called "Netkine". The basic premise was that compute is fungible and that all compute has a price (regardless of whether it is from a datacenter, or from someone's desktop PC). We seeded different compute on different networks around the SF Bay Area (either in low cost datacenters, or different networks like Comcast/AT&T) and had a customized version of Linux which could securely boot on a Windows gaming machine. You could then carve out different sized virtual machines on any machine in the network, and it would seamlessly create a VPN tying each of the instances together.

Our biggest problem was finding customers on the demand side. What is the killer application which can take use of that kind of compute with those kinds of networking properties? What are the use cases? No one has written software that takes advantage of that kind of compute, because it's essentially a solution looking for a problem.


You can rent out your GPU to vast.ai

But the issue with large language models is that you need ms-latency access to TBs of data, or else you won't be able to saturate your GPUs with useful work.



There a few crypto projects that aim to do this, this one is focused on artists

https://rendertoken.com

I'm not sure whether or not they will be successful.


It does exist! Those most notable are vast.ai and runpod.io I've been able to run ML experiments that would have cost 10s of thousands of dollars for a tiny fraction of that using vast.


The compute workloads for AI/ML are vastly different than mining, which means that the hardware requirements are totally different (and actually a huge capex investment).

Individuals renting out their home compute for things like this suffer from the problem of supporting so many different types of hardware profiles. BOINC is a prime example of this, but it doesn't have a funding mechanism.


Considering the power of these AI models, it is quite clear to me that companies like Microsoft and Google should ABSOLUTELY not have the monopoly on them because they simply cannot be trusted as is, let alone with powerful AI.

If there was a feasible crowdfunded solution to this and putting it in the hands of the people - I would certainly be prepared to lay down up to £5K.


The ACT token is a decentralized, token-based solution for powering AI inference that can be used to purchase compute resources on GCP, AWS, Azure, and the decentralized network. Each token is equivalent to a certain number of inferences by the AI Compute Unit (ACU), which is a stable average of GPU compute instances available in the pool.


Sheepit[0] does this with peoples Blender files, it’s awesome. I use it often and donate gpu time when I’m not using it.

[0] https://www.sheepit-renderfarm.com/home


Here's a whitepaper for a project called AI Compute Token:

https://docs.google.com/document/d/1NTsnUVRzBfK6y7WNX2JURogD...


When Playstation 3 came out there was an app that shipped by default where you could leave it running and it would use your CPU for protein folding calculations to help researchers figure out the structure of proteins.


There is a service like that on Ethereum. It's called Golem [https://www.golem.network/].


> Like mining bitcoin except you are doing something actually useful and getting paid actual money for it

But then how would crypto-scammers run ponzi after ponzi after ponzi if they did that?


oh? That seems to be the answer. They wont actually have to care what the compute costs if they can print money back stage.


Sounds like work, as long as they prove they can prove they did it somehow - perhaps all the other computers that are sharing compute could also verify they did it somehow?


>Like mining Bitcoin except you are doing something actually useful and getting paid actual money for it

A business paying USD is never going to be competitive with a decentralized crypto compute market. Better to build something just like mining BTC, except you do useful work, and are paid in a cryptocurrency you can exchange for USD. Then businesses can build on top of that to make it more user friendly.

The Golem network already lets you do this with CPU. It's way cheaper than centralized cloud computing could ever be, less than $0.003 per core per hour.

There are plans to add GPU support to Golem, but they have been on the backlog for a long time because there is a shortage of devs.


I don't think businesses would be able to ever trust decentralized compute market. Where is the guarantee that the data/code that is sent to the network would not be intercepted and analyzed?

It's much safer to sign a contract with a real organization, especially the one which has a reputation to uphold. If someone like DigitalOcean steals data from servers, they can be sued, and there will be penalties. If some decentralized miner somewhere does that? Nothing you can do.

This only leaves decentralized compute to the applications where neither the data nor code does not matter. While I am sure there are applications like those (probably related to open-source or cryptocurrencies) I doubt they'll bring much money.

(And don't say "homomorphic encryption" -- the overhead there is so high it is much cheaper to just get a centralized server)


Data locality is an issue with large models even when using supercomputers with very high bandwidth fabric.



Why not just use AWS?


> Why not just use AWS?

Physical goods vendors that sell through Amazon have found that once they sell something highly in demand, and profitable, Amazon uses all of their internal knowledge of the sales to create their own Amazon Basics version of the same good and then promotes that as a cheaper alternative, capturing most of the profit.

There is absolutely no reason not to believe that AWS will do the same thing and use their knowledge of the workloads and exact hardware requested to compete with you. The fact that they haven't done so yet isn't much evidence against them doing this in the future, since other parts of the same company already follow this approach.


Price and dependency on a single entity.


Vast.ai does this


This exists already, but I can't find a link.

Edit: I was probably thinking of vast.ai


It's not that easy. Access to enough compute is one thing. However, you also need a proper dataset (beyond Common Crawl and Wikipedia), excellent research expertise and engineering capabilities. So even if you throw money or free credits for cloud compute out there it will not be enough. We've seen this happen with EleutherAI who were not capable of reaching their initial target of "replicating" GPT-3 and could only deliver the GPT-NeoX 20B model despite all the free compute etc.


We solved the proper dataset part at least. https://arxiv.org/abs/2101.00027

My contribution was around 19,000 books.


> My contribution was around 19,000 books

what does this mean? not meaning to cross examine you, just curious how people contribute to The Pile since it seemingly appeared out of nowhere


Not at all, I love talking about it.

I was convinced that a model needed to be able to read like we do. And what do we do when we read? Pick up a book.

That turns out to be surprisingly hard, at least for training data. Step one is to acquire the books. Step two is to turn them into a readable format for computers.

Both steps were very hard. I lucked out on step one because The Eye happened to host all of bibliotok, which came to around 30k books or so.

Trouble is, lots of those are PDFs. And although humans are great at reading those, they fucking suck for blind people. And a gpt is a blind person in a sense, because it needs to follow a linear sequence of words — something that PDFs are horrible at giving.

But one day I realized that epubs were merely html files, and aaronsw happened to write an amazing html to text converter. I had to hack it to fix a few corner cases. But after a few days, I ran it across all 19,000 epubs I spidered, then zipped the whole thing up and called it books3: https://twitter.com/theshawwn/status/1320282149329784833?s=4...

It’s one of the larger components of the pile, I think around 35%. Which is quite the hefty sum when it’s purely text. I still have a hard time wrapping my head around just how mindbogglingly big 800GB of text is.


oh wow, that was a surprisingly awesome story. thanks for sharing!

this maaaay be covered in the Pile's writeup (which i have not yet read) but i wonder who was curating the overall "mix" of the content. seems easily biased to, say, public domain books, since the corpus is easily available.

when people say things like "GPT3 has been trained on all of the internet" i suspect this is a gross exaggeration. In reality it's just C4/commoncrawl, so that's like 800GB of text.


Stella! She’s awesome. https://twitter.com/blancheminerva?s=21&t=Gt6YrATJHnmY046Mdz...

Also bmk. https://twitter.com/nabla_theta?s=21&t=Gt6YrATJHnmY046MdzhYD...

They did the legwork of writing the paper and getting everything into a presentable format. A bunch of other people helped too; I wasn’t as involved as I could’ve been.

It was all discord-based. As far as I know it was the first serious research collaboration to happen solely via chatroom.

bmk also got the 50GB of code from GitHub, I think. So that’s where GPT-J’s coding ability likely came from.


thank you for those names, ive added to my watchlist.

could be a good story to write up, i think the people who do the hard work behind all the datasets dont get any of the glory of the ML models that get built on top of it


isn't Common Crawl much, much larger than this? ~6 pebibytes from what I remember


Yeah, but the hard part is filtering. It’s pretty easy to scrape a massive amount. Turning it into quality training data is the trick.

bmk was the magician there. https://twitter.com/nabla_theta?s=21&t=Gt6YrATJHnmY046MdzhYD...


This is my impression as well. Here is an example of the engineering capabilities required to train the modern models: https://github.com/facebookresearch/metaseq/blob/main/projec... [PDF]

It's a 114 page document detailing the months of full-time work for multiple engineers that went into training a 175B language model at meta AI.


Agreed, I was thinking the same thing. Raw compute is probably the cheapest/easiest part of this problem.


There is Open Street Map (or Wikipedia for that matter). A large enough army of volunteers could produce or tag a dataset that rivals Google's data, but it would be a lot of work.


> tag a dataset that rivals Google's data

Yeah, try rival Google's Street View data.


Open Street Map is used by Amazon, Apple, Baidu Maps, Facebook and Microsoft. It might not be as good as Google, but it is decent.

And Google have no equivalent to Wikipedia.


> Common Crawl

is there a crowdsourced list of text corpuses somewhere? i bet thats the starting point for all this. i'm only aware of C4 and The Pile.


Data isn't the hard part here, plenty is available, even with all the necessary preprocessing.


A large amount of data is not an issue, but obtaining a high amount of high-quality data is challenging. This is why open-source models do not perform as well as GPT-3 models in real-world usage.


Sorry, since the Pile and C4 https://huggingface.co/datasets/c4 (and, more generally, common crawl) and BigCode https://www.bigcode-project.org/ became available, this argument ceased being the real moat.

The real moat is more about the lack of concentrated compute, ML engineering, and, more generally, prosaic lack of political with outside a few orgs.



Chat demo here: http://chat.petals.ml/

Seems to work well, albeit very slowly…



Unfortunately, it can be difficult to crowd-source funding when funding services, like Kickstarter cave to opposition to controversial compute projects. Unstable Diffusion, despite quickly reaching their funding goals, was suspended on Kickstarter. https://www.kickstarter.com/projects/unstablediffusion/unsta...


I’d donate if the model and code became open source afterwards like stable diffusion.


AFAIK while the code is open source, the Stable Diffusion model license isn't open source though as it has usage restrictions that you wouldn't find in real open source license.


Seconded.


count me in too.


Funny, I was asking myself the same question this morning, I was wondering if there was the equivalent of SETI@Home but to pool resources for model training.



I created a subReddit for this in case enough people want to actually try to make this happen.

https://www.reddit.com/r/AiCrowdFund/


Star Citizen raised half a billion dollars for a game that is never going to be released and all they did was promise a game in space with spaceships, sooo ...

Yeah, I think we could crowdfund a billion dollars for this, but we'd need some really competent people making sure it gets used optimally.


I think the second part of what you said is key. Seeing this thread hit top of HN I can see there's a desire or need for something like this, but how you'd even go implementing it, I have no idea. I have mildly speculative ideas but nothing that would result in a successful outcome. So it's like OK, lets say you crowdsource even $20m, now what. What's the structure of this thing that has $20m, how do you use it, what do you build, how do you give access to other people? My gut says, co-op style entity, no shares, its membership based, people put in the money for that and infra costs. You don't try do anything fancy, buy compute for cheap, Hetzner, bare metal elsewhere, whatever, find the best open source project currently for GPT and start to run it, provide an API to it and then start to gauge feedback, do everything in the open, hold weekly meetings, be totally transparent about the costs. Ultimately sustainability is still hard because people are crowdfunding you, so is that forever? Is there a self sustainable model there? So many questions.


You'd need a good volunteer team of experts willing to shepherd it first and foremost. For it to be a volunteer team you'd need everyone to ideologically believe in the idea. They should be able to build an outline, line item wise, how each dollar is spent.

I think that will help get it started. Then swap to a non-profit monthly sub model where people are simply paying for use (pay out the cost for labor and hardware at that point only). A utility bill essentially.


I like the analogy of a utility bill, that makes sense.


It's no different from any other type of server I think? Crowdfunding compute comes with a lot of security and privacy concerns and often if you're building a product putting that part of it in the hands of someone reliable is preferable.

It depends on what part of what you're building is your core intellectual property and such.

I'll ask back: what products would you build on top of a crowdfunded-compute GPT like model?


Personally? One I'm interested in is a personal assistant that has access to 100+ APIs that I can command solely with text e.g ability to schedule future tasks, ability to get answers to questions around real world events and data available through APIs, ability to combine a lot of these things. What Alexa could/should have been.

Separately, examination of religious and historic tests in alternate languages providing superior transliteration and translation into english than what humans could achieve.


From my understanding that is exactly what Microsoft's XiaoIce is doing in the Chinese market. Just with a text interface instead of a voice interface.

I guess there's not much interest to develop something equivalent for the English-speaking market because it's difficult to monetize.

https://en.wikipedia.org/wiki/Xiaoice

https://arxiv.org/pdf/1812.08989.pdf (actually a better introduction than the wikipedia)


The examples in the pdf are really advanced for something that was launched in 2014.


It’s not clear to me why your first example needs AI. If you have a defined text query setup (maybe with prefixes depending on what you want) the rest is basic case-based API selection, no?


Initially yes, so simple commands to APIs works, I've done this. What's harder and actually requires programming is when you want to combine the use of multiple APIs into some sort of workflow. You could start to write some sort of DSL for this but actually seeing how ChatGPT works, its going to be far more effective to have something interpret human language and write something that will trigger the APIs as and when it needs to.


> Separately, examination of religious and historic tests in alternate languages providing superior transliteration and translation into english than what humans could achieve.

I'd be interested in something similar to this as well - but I think part of the problem is that depending on the language, the text might only be available as digital scans without OCR. In that case, given that it would be in difficult-to-OCR languages, how could it be fed to an AI model?


Write a model to do image to text? In the case of things like the Quran pretty much all of it is digitized already so consumable by a model. Other things could be figured out over time. I think for a lot of things this could lead to huge breakthroughs if done right.


I was thinking of ancient Arabic and Chinese texts that are obscure enough that no one's bothered to translate them yet.



Ah interesting. So I guess if they validate the thesis we could see a number of these right? Maybe co-op style, because you'd want some level of governance, not just, oh here's the cash and let's go run it, or oh hey let's bittorrent style do it but without vetting what stuff it's being used for e.g I don't want models being built for nefarious stuff on my computer.


You might be interested in the LEAM.AI initiative, which is basically the EU planning to fund the creation of an Open Source GPT-3 competitor.

Their planning documents contain pages upon pages on all the related challenges, such as generating and storing the dataset, keeping your cluster running, tolerating node failures. In short: The compute time alone won't help you succeed.


But what about something for individuals? You contribute n GPU minutes to the community adhoc pool, you get to use n minutes in parallel from said pool. Some proof of work underlying blockchain.

Just for individuals, no promises of deep data / model privacy.

Has anyone implemented something like this?


Check out Akash Network

https://akash.network/


Support for GPUs are still in the works.

The larger issue here though is the amount of bandwidth and memory required. Downloading billions of images to train a model just isn't going to work.

People are working on splitting the training into smaller chunks (together.xyz is one example), but they aren't quite there yet.


Maybe someone can make a cryptocurrency where mining and "proof of work" is accomplished by submitting an adjustment to weights and biases such that it improves the test score on a public dataset.


What is your desired objective and what are you going to train on? There are plenty of publicly available model checkpoints so you don’t have to start from scratch with large GPU clusters. There’s no point in repeating the pretraining that has already been done.

I’m not sure that compute is still the bottleneck right now, it’s fairly cheap to train LLMs. Many optimizations like DeepSpeed dramatically reduce computation requirements / increase throughout (I.e. 13b model on a single GPU).

If we’ve learned anything from the massive LLMs like PaLM it’s that scaling autoregressive models to infinity has diminishing returns and is not feasible to implement in inference infra. Google themselves acknowledge this resource limitation in the Med-PaLM paper when they discuss fine tuning a 450b parameter model.

We’re really in more of a dataset and training task era of AI/NLP gains. Scaling masked the issues of poor quality training data and conventional language modelling up to a certain point but we’re starting to see the problems with that (hallucination) in all of the big (Galactica is an example of the limitations in this).

OpenAI’s main advantage is that they paid humans to build a large labelled dataset for their RL objective. They’re offering ChatGPT for free to get more training data.


Because GPT models will be made smaller eventually.

Like how people mirror the initial Stable Diffusion model, only to find better, faster and smaller new versions later.


Side hot take:

1. “AI” requires compute time for training (GPT, etc.) 2. If you use any “AI” service in the future, you could have to share idle computing power to improve that “AI.” 3. This may be tokenized via crypto. More contributions = more “AI” usage available for you.

Could the use of tokens or cryptocurrency incentivize participants to contribute their idle computing power to facilitate the process beyond those with deep pockets?


ACT includes a proof-of-compute mechanism to ensure the reliability of the resources on the decentralized network. Users can register their instances with the system and be rewarded with ACT tokens for doing so. The system will also continuously monitor the instances, and remove instances that fail to meet quality standards or provide inaccurate results. ACT also includes a reputation system that allows holders to track the performance of the resources on the decentralized network. Users that provide accurate and stable resources will have a higher reputation score and be more likely to be selected for providing compute resources. The decentralized network is managed by a process for adding and removing resources. This ensures that the network is always providing high-quality resources and maintain the stability of the ACU value.

https://docs.google.com/document/d/1NTsnUVRzBfK6y7WNX2JURogD...


Would this work? I can't see meaningful work being down on the dataset piecewise.

Don't you need the load the entire parameter set to backpropagate and potentially at quite high precision. Then probably load less than 8gb as typical max vram. And then finally send that all back.

I think you really need dedicated data centres if only to move the parameters around and load the entire thing at once.


Or we could just...pay for stuff. My dear mother isn't going to spin up a GPU so she can talk to her ai assistant. I don't want to be told that I can't do stuff on my phone because I need to purchase a second stack of GPUs for my home because my kids talked to Elmo too much or whatever. We can make data centers that will gladly take money in exchange for their services, and I'm happy to provide money for them to supply those services.


Distributed training could be done but how would data validation work? There would need to be some kind of voting per item as you need to assume adversarial input ( honestly a crypto token could work here for governance ). EleutherAI could be used to bootstrap but the hardware reqs for Neo-X are quite steep ( around 20 GB cuda memory per card , so you could only allow those who had a 3090+ and are skilled enough to install cuda 11 and do devops ). Im happy to contribute though!


I’d be more interested in closed co-ops where individuals provide computer power in exchange for access to the model. I think freeloaders will eat up compute time.


I guess I may be behind the times on some of these AI efforts, but I'm guessing that the model involves a big black box blob of data.

I'd be interested to know how models evolve with additional training, and wonder if there are additive identities that would allow two separately trained models to be combined.


It's no good for training, only inference, but there's salad.com that has 10k's of DAU with GPUs and they've built a managed container service, with a inference API coming soon https://salad.com/salad-inference-endpoints


The easy way to find out is to try...

You start a kickstarter, and when we get to our target of $10M, we can make a start on the computation!


Here's a project called AI Compute Token thats gaining some traction

https://docs.google.com/document/d/1NTsnUVRzBfK6y7WNX2JURogD...


I believe distributed compute *could* be one of the few applications where crypto might make sense? You could earn tokens by supplying compute and pay to get it, and the token price is a natural market price for compute.


Ever heard of banano? It's pretty much a joke, but it is basically this.


https://github.com/THUDM/GLM-130B might be a useful place to look


Yes, you can. It's not rocket science - lambdalabs and preemptible instances on various public clouds are possible choices.

And you probably should.


If only crypto had started after gpt, then we would have had a clear goal of what to compute and how to incentivize the economics.


So is blockchain just an economic and architectural model to run crowdsourced things of value like GPT models? I mean it seems like a good first use case.


They usually aren’t calculating anything interesting, but in principal there could be something like that made.


Last time I saw a project aiming to be like a fast decentralized REPL but with blockchain.

https://github.com/Kindelia/Kindelia-Chain


> Can groups of people pool together money to run shared models?

Then share the profit among the group of people? It’s called a company.


Cerebras will be the cheapest way, I think. Let's do the math.


I didn't even think about this. They do have a cloud product in the works https://www.cerebras.net/product-cloud/

Wonder if crowdfunding could be used to gain shared access to this



What type of hardware is needed as the bare bones for this?


Depends on which version you're running. According to https://github.com/amirgholami/ai_and_memory_wall And a guy with a youtube channel called "Asianometry" who did a video on this https://www.youtube.com/watch?v=5tmGKTNW8DQ the full GPT-3 model needs thousands of high-end GPUs (meaning: not your home 1080 GPU)


Salad.com is years ahead and ready for just this. :)


there were crypto projects like that, but I don't think any survived until today


Maybe a model like seti@home?


i will fund. worst case if it close-crowdfunded i will invest there too.


Elutherai?


You can, Microsoft is a publicly traded company




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: