IRC channel logs

2023-04-14.log

back to list of logs

<fossy>doras: I am quite ok with that!
<river>hello
<stikonas>hi
<river>i want to discuss bootstrapping relating to LLMs. but I want to be respectful of people who aren't interesting and don't care aboutLLMs. in that case they can just ignore the next messages from me i guess
<stikonas>what is LLM, large language model?
<river>yeah
<river>I read a paper about bootstrapping a 'pretrained mode' which just generates text into a 'helpful' model which has been optimized towards responding to questions, then this was used with a "constitution" to perform analysis of its responses in order to further optimize it to produce 'helpful+harmless' responses
<river>pretrained model*
<river>I am also thinking how these LLMs are capable of doing pytorch neural network programs, which is a library that could be used to implement an LLM. they are able to produce valid pieces of code and explain the functionality and so on.
<river>so there is a potential that it could process and produce modified/improved versions of its own code. that said, it seems to be 0.0001% code and 99.9999% data (many GB of data)
<doras>fossy: great! Should I just refer to you as "fossy" in the presentation? :)
<oriansj>river: well the largest problem in LLM bootstrapping is the dataset. The code is usually a relatively simple state machine in a standard language. And from what I can tell most of the code used to train it comes from software with free software licenses. Which mostly is an open question for the courts which have historically considered anything slightly similiar to this as a copyright violation.
<oriansj>The quality of the code it currently produces, is about equal to a half-drunk programmer who is so jaded, they only deliver what was requested to the letter of the specification.
<oriansj>and lets be honest, _A tables in the database with the corresponding update/delete triggers isn't something people tend to ask for but they should.
<oriansj>so from my perspective, I see it driving the cost of CRUD apps into the ground.
<stikonas>and training big LLMs is not something you can do at home. You need a big data centre full of specialized GPUs and a small power plant next to it
<stikonas>fossy, rickmasters: README of live-bootstrap is also a bit out of date regarding kernel bootstrap
<rickmasters>stikonas: Do you mean the Get me started! section which advises using a kernel?
<stikonas>and also in Comparison between GNU Guix and live-bootstrap
<rickmasters>stikonas: The description of the size of seeds?
<rickmasters>stikonas: Yeah, there is the "Use of kernel" line
<pabs3>river: a big problem is the training data licenses, usually it isn't even redistributable. Debian's ML policy has some stuff about that https://salsa.debian.org/deeplearning-team/ml-policy
<oriansj>stikonas: yeah, if it costs $10M to do a single "build" even basic sanity checks like reproducible builds becomes a non-starter; but perhaps as a community we could find a way to spread work out and obtain the collective benefit.
<pabs3>the other big problem is the amount of money it costs in hardware and electricity to do the training
<pabs3>then there is the CUDA problem, lots of these things rely on nvidia GPUs
<oriansj>??a folding@home solution potentially??
<rickmasters>stikonas: I think I've been a bit more hesitant than fossy to change some of the claims and defaults ... because kernel bootstrapping is not done.
<oriansj>ironically the big LLM being produced are only 64KB
<stikonas>rickmasters: yeah, we can wait a bit to update taht
<stikonas>still, you have made a huge progress on kernel bootstrapping
<stikonas>I was already doing some testing (to test PRs before merging) with builder-hex0 and fiwix
<rickmasters>stikonas: I appreciate your comments and those from others. It helps keep me motivated.
<rickmasters>I want to highlight Mikaku's contribution of Fiwix and his recent work to fix the hard drive problems.
<rickmasters>The hard drive issue was a lot of work for both of use and was just closed: https://github.com/mikaku/Fiwix/issues/27
<oriansj>of course, Mikaku clearly has spent years doing excellent work to get Fiwix to its current state.
<river>oriansj: yes indeed! there is an interesting person called shawwwn who created one of the datasets that may be involved in a court case about whether the 'weights' for these model should be libre
<river>that link about the debian ML policy is very interesting, thank you
<river>that's a great point about cuda too. i don't know anything about the libre GPU computation stuff
<river>but that is a big issue
<oriansj>but it is hard to overstate the importance of your work rickmasters; you have saved me more than a decade of work
<stikonas>well, ROCm is libre, but AMD has a smaller market share than NVIDIA for GPU training
<oriansj>I like Debian's Policy (The toxicCandy bit was a nice touch)
<rickmasters>oriansj: thank you. I appreciate that but I think you're probably underestimating what can be accomplished with sustained focus over a long period.
<rickmasters> https://quoteinvestigator.com/2019/01/03/estimate/
<pabs3>oriansj: I don't think the folding@home approach would work, I've read the extra network latency would blow out the training time a lot
<avih>re folding@home, for those who care, home electricity is not free either, and recent GPUs consume 200-400W if you let them
<avih>so this can add up to non negligible electricity bills...
<stikonas[m]>Yes, you need very low network latency and lots of RAM
<muurkha>river: I think it's a really important problem, possibly the most important in the history of the universe
<river>libr GPU?
<muurkha>oriansj: I've been impressed with the quality of the code GPT-3.5 spits out
<fossy>stikonas[m], rickmasters: README is a bit outdated, yes, but i'm ok with waiting to change that until kernel bootstrapping is more done
<fossy>the primary reason i wanted it default is to avoid regressions
<fossy>(after knowing that it wouldn't be fully complete on first PR)
<muurkha>river: bootstrapping AI
<river>ahh
<fossy>river: first we need open LLMs lol...
<fossy>i think it's going to take at *least* 5 years until today's LLMs are reproducible in <$50k (even if they are open)
<fossy>the training is just so prohibitive
<fossy>last year I heard that that revision of Google's LAMDA was already trained on >300TB (iirc) of data
<fossy>i imagine that today's are in the PBs of data
<muurkha>oriansj: like, yesterday, http://sprunge.us/xFnyrV — but maybe it's just repeating some code from its training set nearly verbatim
<muurkha>there's a bug in that the triangles it draws are always invisible because they are inside the rectangle of the same color
<muurkha>it took under a minute to produce that
<muurkha>fossy: I think the future is a lot less predictable than you think it is
<fossy>how do you mean
<muurkha>it's easy to imagine more breakthroughs that improve trainability by orders of magnitude, like ReLU did
<muurkha>but there are also second-order and third-order effects
<muurkha>also, algorithmic improvements tend to have bigger effects on larger data: an algorithmic improvement that only saves you 50% at N=1000 might save you 99% at N = 1e9
<fossy>oh, for sure, i did explicitly mean today's LLMs in terms of the way they are trained and their dataset. but even befoer you even get to training - you need to be able to have the raw compute + bandwith to obtain the training corpus, let alone process it. algorithmic improvements will do very little to help that, its just good old storage+network engineering that improves that
<muurkha>oh, it's very likely that today's LLMs aren't reproducible in that sense at all
<muurkha>not just because they were tweaking the code during the training process but also because it's probably actually nondeterministic
<muurkha>it's evident that current training methods are very inefficient in their use of trainign data
<muurkha>*training
<fossy>right, which is why i think AI bootstrapping, if it ever does occur, will not occur through any historical path
<muurkha>I agree
<fossy>it wont be feasible to do it in the same way we have done other software (effectively a bit of time travel)
<muurkha>second- and third-order effects: what happens when we can use AI to explore the design space of chip fabrication?
<fossy>yeah, that shall be very interesting
<muurkha>or asteroid mining?
<Mikaku>I appreciate your kind words, rickmasters has done and continues doing a really good job and I'm amazed at how quick he hacked the Fiwix kernel
<river>unrelated, i just saw this on HN https://intuitiveexplanations.com/tech/kalyn
<stikonas>river: at least for now this is not bootsrappable
<stikonas>it needs haskell...
<gforce_d11977>rickmasters: oh my god, what have you done. back from vacation and the builder.hex0 works. As delivered in the prophecy. Where is the Party/barbeque starting?
<rickmasters>gforce_d11977: lol. I still need to get Fiwix to launch Linux so we have a full bootstrap but yeah a lot of it is working and integrated into live-bootstrap.
<rickmasters>gforce_d11977: I'm in NYC right now. Sadly, good bbq is hard to find.
<gforce_d11977>rickmasters: i'am from the central-BBQ state inside germany where roasting a sausage is a religion (so i'am used to it). A good BBQ does not need much: It consists of a small set of people having spare time, a campfire and some stories to tell. It's essential at least once a week 8-)
<gforce_d11977>Is'nt there the "central park", where you can just start a small BBQ? e.g. https://prod-metro-markets.imgix.net/item_image/79bf9851-4e59-42b8-8eae-8c91ba17263c?auto=format,compress
<rickmasters>gforce_d11977: That sounds fun.
<rickmasters>gforce_d11977: I think you can BBQ in Central Park but I haven't. I'll have to look into that.
<muurkha>NYC?
<muurkha>interesting!
<j-k-web>trying to build mes using kaem.run & my mescc-tools (with M2-planet). It goes pretty well, I get to the M1 part on line 126 but it has a tantrum `/build/mes.M1:19 :Received invalid other; lea_eax,[ebp+DWORD]`.  INSTALL doc says it's known to work with mescc-tools 1.4.0 so maybe I should just be using that instead of
<j-k-web>`e8ffea3b2ab1cad652d37c0beafe93c8e75b6ddf` (I'm using release 0.24.2 of mes)
<stikonas>j-k-web: mes 0.24.2 needs a patch to work with latest stage0-posix
<stikonas> https://git.savannah.gnu.org/cgit/mes.git/commit/?h=wip&id=8b18e71c73e9367e586c378a1056837de7c23729
<stikonas>or use this workaround https://github.com/fosslinux/live-bootstrap/blob/master/sysa/mes-0.24.2/mes-0.24.2.kaem#L50
<j-k-web>perfect. I'll give that a go now. tyvm
<rickmasters>muurkha: My home base is Seattle
<oriansj>rickmasters: as the father of a 3 year old who disassembles doors, sustained focus over anything is a rare thing for me these days.
<oriansj>river: making your own self-hosting language is pretty trivial (especially if you allow inline assembly)