IRC channel logs

2023-04-25.log

back to list of logs

<oriansj>river: the real question perhaps is how much effort would be required to make LLM code generation reproducible.
<oriansj>The we could just treat LLM code generation as just an abstracted compiler
<oriansj>^The^Then^
<oriansj>and the source code just becomes the human written description which provided the basis of the generated code.
<oriansj>doras: finally saw your talk, good job. If you want, you can add your slides and notes to : https://github.com/oriansj/talk-notes (as well as any additional bits you think other people doing presentations might find useful.
<pabs3>oriansj: you could make it repeatable, but I think you probably wouldn't get deterministic builds across GPU vendors
<pabs3>river: where is the blog post?
<roconnor> https://github.com/oriansj/mescc-tools/blob/master/Kaem/kaem.c#L1285
<roconnor>how unsafe is this considered?
<roconnor>M2libc does have a strncpy
<pabs3>definitely better not to use strcpy
<muurkha>not if the alternative is strncpy
<muurkha>in this case we're talking about an environment variable, which really is nul-terminated
<muurkha>strncpy doesn't always nul-terminate its strings
<muurkha>it's for filling in fixed-size fields in structs
<river>pabs3: https://gist.github.com/rain-1/9c948a5931d9b0a15a985d5b11921e9e
<river>oriansj: that's a really good point. I never thought about this, I almost think that these systems cannot do things in a reproducible way
<river>maybe only possible if you really pin down every detail of your spec
<doras><oriansj> "Dor Askayo: finally saw your..." <- Thanks! I can share my slides, sure. Though I'm traveling so I can't create a PR. I can try to share the slides through Matrix to see what happens.
<doras>ACTION posted a file: (286KiB) < https://libera.ems.host/_matrix/media/v3/download/matrix.org/hSDcnaJaUeVonQdaBvOKyFvq/Bootstrappable%20Freedesktop%20SDK%20-%20LAS%202023.pdf >
<doras>Sent. I never tried this before, so please let me know if it worked.
<river>mmaybe you can add strlcpy https://www.openbsd.org/papers/strlcpy-paper.pdf
<fossy>we still want mescc-tools to be compilable by a standard toolchain..
<fossy>(fwiw, the use of strcpy there is because m2libc did not exist when this was written)
<Mikaku>doras: I see your link here and it works, so I guess oriansj will be able to push it
<muurkha>strlcpy is reasonable
<muurkha>dynamically allocated strings would probably be better
<muurkha>like golang byte slices or qmail strallocs
<muurkha>but that's carpet-bombing the codebase, while strlcpy is a surgical strike
<river>hehe
<minima>hi, fwiw, there's a micro typo at this page https://www.gnu.org/software/mes/ in case anyone has write access; the LISP-1.5 link at the 5th paragraph is broken, it's spelled http::// instead of https://
<janneke>minima: the fixes should make it to gnu.org within an hour
<janneke>thanks!
<minima>yay :) thanks
<doras>Mikaku: thanks for verifying
<roconnor> https://github.com/oriansj/mescc-tools/pull/40/files
<roconnor>my proposal is to use strncpy with a requirement that the last byte of the array is null.
<roconnor>Heh, there is a distrubing lack of calls to free. I guess that is understandable, but I at the envp_line allocation should be moved out of the main loop.
<stikonas[m]>roconnor: free is not strictly necessary, until recently free was noop in M2libc anyway
<stikonas[m]>It now has some simple linked list based implementation
<roconnor>oh. fancy.
<stikonas[m]>As we had to implement it for UEFI
<stikonas[m]>UEFI doesn't free memory on exit unlike linux
<roconnor>oh interesting.
<stikonas[m]>Well uefi is lower level
<roconnor>how many programs actually free their memory?
<stikonas[m]>So you have to close all file descriptors there
<river>the OS free's memory once the process dies :P
<roconnor>obviously kaem does not.
<roconnor>currently.
<stikonas[m]>Once kaem exits, all memory is freed
<roconnor>because the runtime frees it?
<stikonas[m]>OS frees it
<stikonas[m]>Our runtime (M2libc) frees it in uefi
<stikonas[m]>But on Linux it really is the kernel
<roconnor>How does M2libc know if it is in uefi or not?
<stikonas[m]>ifdef
<stikonas[m]>You need to build it with different files
<stikonas[m]>And uefi.c defines some stuff, probably __uefi__
<stikonas[m]>Those are different karm scripts
<stikonas[m]>So we just feed different build command
<stikonas[m]>In fact different repos
<stikonas[m]>One is stage0-posix, the other is stage0-uefi
<roconnor>oh I see. main M2libc has a uefi directory.
<stikonas[m]>And Linux too
<roconnor>I didn't see it before because I was on an early revision.
<stikonas[m]>You build either one or the other
<stikonas[m]>And order is also important
<stikonas[m]>M2libc C files can't be in arbitrary order
<stikonas[m]>E.g. here https://github.com/oriansj/stage0-posix-x86/blob/master/mescc-tools-full-kaem.kaem
<stikonas[m]>M2-Planet has no #include support
<stikonas[m]>So it's a bit trickier to use than normal compiler
<roconnor>Since I have you here, maybe I can you what M2-Mesoplanet is?
<stikonas[m]>It's C preprocessor
<stikonas[m]>It supports a bit more of the C macros
<stikonas[m]>Including some support for #include
<stikonas[m]>But also more powerful #defines
<stikonas[m]>And it can also call M2-planet, M1, hex2 to spit out binary in one go
<stikonas[m]>A bit like gcc
<stikonas[m]>gcc can call cc1, as, ld
<roconnor>Thanks.
<muurkha>roconnor: I don't think anyone should ever use strncpy because it's so bug-prone; it's much worse than strcpy. instead define a function that does what people naively expect strncpy to do
<muurkha>except of course in the case where the destination really is fixed-size rather than nul-terminated
<river>I think that strncpy can be used correctly
<river>and in this way it improves upon strcpy
<river>but it requires you to do careful bounds checking stuff. strlcpy lets you avoid that
<muurkha>strcpy can also be used correctly, and it's easier than strncpy
<river>oh you are right, i think i was thinking about gets
<muurkha>right
<muurkha>strlcpy or something similar seems clearly safer to me
<roconnor>muurkha: What is your suggestion for using strcpy correctly here
<roconnor>maybe use strlen and compare to MAX_STRING before strcpy?
<muurkha>require(strlen(envp[1]) < MAX_STRING - 1); I think? I'm not deeply familiar with the code
<muurkha>yeah
<muurkha>but really I think using strlcpy is a better solution
<roconnor>less efficently, but I suppose we don't want to microoptimise. :)
<river>does this same bounds check not work with strncpy?
<muurkha>it does, but it avoids the need to use strncpy, which is a red flag
<roconnor>actually strncpy does a bunch of padding with 0s so maybe strlen is more efficent.
<river>so strncpy basically gives no benefit over strcpy. except I guess it helps avoid writing beyond array bounds if you forget the if
<muurkha>probably in the usual case, yeah
<roconnor>okay I'll update my PR.
<muurkha>river: the idea of strncpy is to use in cases like struct employee { char firstname[8]; char lastname[10]; ... };
<muurkha>where it's valid for firstname or lastname to not be nul-terlinated
<river>oh i see!
<muurkha>I don't think I've ever seen a correct use of strncpy in the wild until now (roconnor's is correct, of course, and the first time I've seen one)
<roconnor>Ha. I'm flattered.
<muurkha>well, of course you have a very different relationship with correctness than C programmers do
<muurkha>the issue is that people naturally think strncpy is analogous to strncat, which is of course what you would expect from its name
<river>ahh
<river>what a minefield lol
<river>so strncat is the ok one, strncpy is bad/pointless
<muurkha>strncat actually does what you want in cases like this except that it is harder than it could be to tell if it has truncated the result
<roconnor>I claim that require(strlen(envp[1]) < MAX_STRING) is correct.
<roconnor>let me try to reason it out to make sure
<roconnor>strlen(envp[1]) is the length of the string without the null terminator
<muurkha>Yes, I agree.
<roconnor>strlen(envp[1]) + 1 is the number of bytes to be copied.
<roconnor>the destination holds MAX_STRING characters.
<muurkha>river: yeah, I mean, you can tell that the folks that hacked this stuff together in the 01970s had a very empirical sort of notion of "correctness"
<roconnor>so wee need strlen(envp[1]) + 1 <= MAX_STRING
<roconnor>which is the same as strlen(envp[1]) < MAX_STRING
<roconnor>Is strncpy really that old?
<roconnor>I figured it was circa 01990.
<muurkha>I think it was used for struct dirent in 6th Edition UNIX
<muurkha>here are uses in 7th Edition: https://lwn.net/Articles/723722/
<muurkha>one of the uses in there is evidently to explicitly truncate a string by deliberately passing a too-short length; nowadays we'd use memcpy, but memcpy (and bcopy) didn't exist yet
<muurkha>longer discussion in http://web.archive.org/web/20220315035032/https://minnie.tuhs.org/pipermail/tuhs/2013-January/thread.html#5947
<muurkha>Steve Johnson says strncpy in Unix probably predates C itself: http://web.archive.org/web/20220316142741/https://minnie.tuhs.org/pipermail/tuhs/2013-January/005954.html
<rickmasters>Mikaku: Using Limine boot loader I've been able to boot linux 4.9.10 using sysb initramfs to launch sysc.
<Mikaku>rickmasters: wow, that was quick! :-)
<rickmasters>Mikaku: So its viable for our purposes.
<muurkha>but Ron Natalie says that by that name it dates only to 7th Edition: http://web.archive.org/web/20220316142749/https://minnie.tuhs.org/pipermail/tuhs/2013-January/005956.html
<muurkha>rickmasters: wow, fantastic!
<Mikaku>rickmasters: so you kexec Limine from Fiwix and then Limine loads the Linux kernel?
<rickmasters>Mikaku: No, no. Don't get too excited folks. :)
<Mikaku>:-)
<rickmasters>I just created a fresh disk image, installed Limine with the kernel and initramfs taken from live-bootstrap and launched it and it worked.
<Mikaku>ah ok
<rickmasters>I just want to verify it actually boots our kernel and ramfs before porting it into Fiwix.
<Mikaku>I see
<Mikaku>but the basic idea don't change? I mean, kexec GRUB/Limine (whatever bootloader) from Fiwix to load the Linux kernel?
<rickmasters>Mikaku: Sort of/not exactly. I'm thinking of taking the Limine code and integrating it into Fiwix to directly support Linux boot protocol
<Mikaku>ah yes, that seems a better approach to me
<rickmasters>I'm thinking very similar to your kexec for multiboot but using two ram drives - one for kernel, one for initramfs file.
<Mikaku>hmm, well, in that case the current implementation already covers this
<Mikaku>you can provide kexec_* parameters and also the initrd= and ramdisksize= parameter in the same cmdline
<Mikaku>I've not tested it but the idea was that they could coexist
<Mikaku>(I'll be away for a while, I'll read your posts later)
<rickmasters>Right, I don't see any need for new kernel parameters at this point. It would use kexec_proto=linux
<rickmasters>Talk to you later...
<roconnor> https://github.com/oriansj/mescc-tools/blob/master/Kaem/kaem.c#L1295 https://github.com/oriansj/mescc-tools/blob/master/Kaem/kaem.c#L1312
<roconnor>some strangeness here. These values can never be NULL as they are required to be not NULL at the top of the loop.
<roconnor>I'm starting to regret looking into this file. :P
<stikonas[m]>roconnor: what if envp is NULL?
<stikonas[m]>Or something similar
<stikonas[m]>There is an assignment after top of the loop check
<Mikaku>rickmasters: yes, 'kexec_proto=linux' this is exactly as I thought
<stikonas[m]>Hmm, what happens if fiwix is booted with kexec but e.g. build fails and we exit 1
<roconnor> https://github.com/oriansj/mescc-tools/blob/master/Kaem/kaem.c#L1323-L1326
<roconnor>I don't even ...
<Mikaku>stikonas: I've not tested it but kexec, at least in Fiwix, is a point of no return
<stikonas[m]>That's fine anyway, I was just curious what happens
<stikonas>roconnor: yeah, this should just be n = env...
<roconnor> https://github.com/oriansj/mescc-tools/blob/master/Kaem/kaem.c#L1277-L1282
<roconnor>I was inclined to suspend disbelief about this comment.
<stikonas>roconnor: that is not a nonsense comment, there was a good reason
<roconnor>Really?
<roconnor>because strcpy litterally just goes through character by character.
<stikonas>yes, but keep in mind that this had to run on M2-Planet
<stikonas>and M2-Planet is not as good as gcc
<stikonas>I think this particular problem was fixed in M2-Planet
<stikonas>let me try to find it
<roconnor>oooh
<stikonas>roconnor: probably this https://github.com/oriansj/M2-Planet/commit/546cb1ac957cacfcc34a0e7b58f8e43d3392e417
<roconnor>now I see that envp_line predates https://github.com/oriansj/mescc-tools/commit/87fdb3fa955229c33bde2905f501d1275702d01a
<stikonas>if you look at M2-Planet 3 years ago, it was far less capable
<roconnor>Understandable.
<stikonas>still, I guess we don't need this workaround anymore
<stikonas>so you could try to rewrite it in a simpler way
<stikonas>(as long as it builds with the latest git M2-Planet)
<roconnor>So I can locally test my changes, but is that good enough to make a PR? kaem runs on all sorts of strange platforms and I don't want to break things needlessly.
<stikonas>one platform for kaem should be good enough
<rickmasters>stikonas: If build fails, then init exits and fiwix will halt with an error message. I added a patch to also sync disks in that situation.
<stikonas>as long as you can run make test-x86 or make test-amd64 on stage0-posix, I don't expect other arches to break
<stikonas>M2-Planet should have the same features/bugs on different arches
<roconnor>ha, running make.
<roconnor>But yeah, okay I'll look into that.
<stikonas>make is not really needed
<stikonas>you can just run ./bootstrap-seeds/POSIX/AMD64/kaem-optional-seed
<stikonas>that make target is just a simple wrapper around 2 commands
<roconnor>does that run the tests?
<stikonas>it runs ./bootstrap-seeds/POSIX/AMD64/kaem-optional-seed and then sha256sums...
<stikonas>we don't have particular tests for Kaem
<rickmasters>stikonas: Of course Fiwix is currently launched by a user space program on builder-hex0.
<stikonas>there are only tests for M2-Planet
<roconnor>oh okay. Well I can certainly run kaem through its usual hoops.
<stikonas>stage0-posix kaem scripts are usually good enough for testing
<roconnor>on x86 at least.
<roconnor>I have everything upto bash, which is I guess everything that kaem is used for.
<stikonas>yeah, but stage0-posix should already exercise all features
<stikonas>including env variables and aliases
<rickmasters>stikonas: Maybe you meant to ask if Linux is booted by Fiwix's kexec and build fails? In that case, it's no different than today - drops to bash.
<stikonas>rickmasters: no, I was thinking what would happen if something in Fiwix system errors out
<stikonas>so we exit but prepared kernel and initramfs are not yet built
<stikonas>so those ramdisks are empty
<stikonas[m]>Well any behaviour is fine I guess
<stikonas[m]>Linux just hangs if you exit pid1
<rickmasters>stikonas: currently Fiwix checks the ramdisk for a valid ELF header and only attempts kexec if its valid.
<stikonas[m]>OK, makes sense
<rickmasters>stikonas: The Limine code checks for appropriate magic numbers for linux boot protocol so it should work the same.
<stikonas>roconnor: as for multiple arches, M2-Planet emits basically identical assembly on all arches...
<stikonas>so e.g. on amd64 and x86 we are using the same number of registers (despite amd64 having double the number of registers)
<stikonas>(or on risc-v that has 32 registers we still use only very few)
<mihi>roconnor, you will find more dead code once you have a look at M1-Macro (in particular around blob type handling). I planned to collect all of the dead code I know and submit a pull request to eliminate it, but so far did not find time to.
<mihi>especially since M1 is also used by Mes, so I'd have to run all the Mes tests and the bootstrap to the point where tcc is built by mes, to make sure I did not break anything.
<roconnor>oh, does mes have tests?
<stikonas>yes, I think mes does have tests
<stikonas>though usually running bootstrap would catch most if not all issues
<stikonas>or at least those issues that people care about
<roconnor>Heh. Mes doesn't have a very long life. It builds tcc, and maybe that is it.
<roconnor>I guess it builds itself too.
<stikonas>roconnor: right now in live-bootstrap we do rebuild mes with mes
<stikonas>depending on the bugs, mes-m2 sometimes might also build tcc directly
<stikonas>but mes built with mescc is a bit faster anyway
<stikonas>so it does not take too much extra time to rebuilt it with itself
<stikonas>and is less prone to breakages