IRC channel logs
2023-04-25.log
back to list of logs
<oriansj>river: the real question perhaps is how much effort would be required to make LLM code generation reproducible. <oriansj>The we could just treat LLM code generation as just an abstracted compiler <oriansj>and the source code just becomes the human written description which provided the basis of the generated code. <oriansj>doras: finally saw your talk, good job. If you want, you can add your slides and notes to : https://github.com/oriansj/talk-notes (as well as any additional bits you think other people doing presentations might find useful. <pabs3>oriansj: you could make it repeatable, but I think you probably wouldn't get deterministic builds across GPU vendors <pabs3>river: where is the blog post? <pabs3>definitely better not to use strcpy <muurkha>in this case we're talking about an environment variable, which really is nul-terminated <muurkha>strncpy doesn't always nul-terminate its strings <muurkha>it's for filling in fixed-size fields in structs <river>oriansj: that's a really good point. I never thought about this, I almost think that these systems cannot do things in a reproducible way <river>maybe only possible if you really pin down every detail of your spec <doras><oriansj> "Dor Askayo: finally saw your..." <- Thanks! I can share my slides, sure. Though I'm traveling so I can't create a PR. I can try to share the slides through Matrix to see what happens. <doras>Sent. I never tried this before, so please let me know if it worked. <fossy>we still want mescc-tools to be compilable by a standard toolchain.. <fossy>(fwiw, the use of strcpy there is because m2libc did not exist when this was written) <Mikaku>doras: I see your link here and it works, so I guess oriansj will be able to push it <muurkha>dynamically allocated strings would probably be better <muurkha>like golang byte slices or qmail strallocs <muurkha>but that's carpet-bombing the codebase, while strlcpy is a surgical strike <minima>hi, fwiw, there's a micro typo at this page https://www.gnu.org/software/mes/ in case anyone has write access; the LISP-1.5 link at the 5th paragraph is broken, it's spelled http::// instead of https:// <janneke>minima: the fixes should make it to gnu.org within an hour <doras>Mikaku: thanks for verifying <roconnor>my proposal is to use strncpy with a requirement that the last byte of the array is null. <roconnor>Heh, there is a distrubing lack of calls to free. I guess that is understandable, but I at the envp_line allocation should be moved out of the main loop. <stikonas[m]>roconnor: free is not strictly necessary, until recently free was noop in M2libc anyway <stikonas[m]>It now has some simple linked list based implementation <roconnor>how many programs actually free their memory? <river>the OS free's memory once the process dies :P <roconnor>How does M2libc know if it is in uefi or not? <roconnor>oh I see. main M2libc has a uefi directory. <roconnor>I didn't see it before because I was on an early revision. <roconnor>Since I have you here, maybe I can you what M2-Mesoplanet is? <stikonas[m]>And it can also call M2-planet, M1, hex2 to spit out binary in one go <muurkha>roconnor: I don't think anyone should ever use strncpy because it's so bug-prone; it's much worse than strcpy. instead define a function that does what people naively expect strncpy to do <muurkha>except of course in the case where the destination really is fixed-size rather than nul-terminated <river>I think that strncpy can be used correctly <river>and in this way it improves upon strcpy <river>but it requires you to do careful bounds checking stuff. strlcpy lets you avoid that <muurkha>strcpy can also be used correctly, and it's easier than strncpy <river>oh you are right, i think i was thinking about gets <muurkha>strlcpy or something similar seems clearly safer to me <roconnor>muurkha: What is your suggestion for using strcpy correctly here <roconnor>maybe use strlen and compare to MAX_STRING before strcpy? <muurkha>require(strlen(envp[1]) < MAX_STRING - 1); I think? I'm not deeply familiar with the code <muurkha>but really I think using strlcpy is a better solution <roconnor>less efficently, but I suppose we don't want to microoptimise. :) <river>does this same bounds check not work with strncpy? <muurkha>it does, but it avoids the need to use strncpy, which is a red flag <roconnor>actually strncpy does a bunch of padding with 0s so maybe strlen is more efficent. <river>so strncpy basically gives no benefit over strcpy. except I guess it helps avoid writing beyond array bounds if you forget the if <muurkha>river: the idea of strncpy is to use in cases like struct employee { char firstname[8]; char lastname[10]; ... }; <muurkha>where it's valid for firstname or lastname to not be nul-terlinated <muurkha>I don't think I've ever seen a correct use of strncpy in the wild until now (roconnor's is correct, of course, and the first time I've seen one) <muurkha>well, of course you have a very different relationship with correctness than C programmers do <muurkha>the issue is that people naturally think strncpy is analogous to strncat, which is of course what you would expect from its name <river>so strncat is the ok one, strncpy is bad/pointless <muurkha>strncat actually does what you want in cases like this except that it is harder than it could be to tell if it has truncated the result <roconnor>I claim that require(strlen(envp[1]) < MAX_STRING) is correct. <roconnor>let me try to reason it out to make sure <roconnor>strlen(envp[1]) is the length of the string without the null terminator <roconnor>strlen(envp[1]) + 1 is the number of bytes to be copied. <roconnor>the destination holds MAX_STRING characters. <muurkha>river: yeah, I mean, you can tell that the folks that hacked this stuff together in the 01970s had a very empirical sort of notion of "correctness" <roconnor>so wee need strlen(envp[1]) + 1 <= MAX_STRING <roconnor>which is the same as strlen(envp[1]) < MAX_STRING <muurkha>I think it was used for struct dirent in 6th Edition UNIX <muurkha>one of the uses in there is evidently to explicitly truncate a string by deliberately passing a too-short length; nowadays we'd use memcpy, but memcpy (and bcopy) didn't exist yet <rickmasters>Mikaku: Using Limine boot loader I've been able to boot linux 4.9.10 using sysb initramfs to launch sysc. <Mikaku>rickmasters: wow, that was quick! :-) <Mikaku>rickmasters: so you kexec Limine from Fiwix and then Limine loads the Linux kernel? <rickmasters>I just created a fresh disk image, installed Limine with the kernel and initramfs taken from live-bootstrap and launched it and it worked. <rickmasters>I just want to verify it actually boots our kernel and ramfs before porting it into Fiwix. <Mikaku>but the basic idea don't change? I mean, kexec GRUB/Limine (whatever bootloader) from Fiwix to load the Linux kernel? <rickmasters>Mikaku: Sort of/not exactly. I'm thinking of taking the Limine code and integrating it into Fiwix to directly support Linux boot protocol <Mikaku>ah yes, that seems a better approach to me <rickmasters>I'm thinking very similar to your kexec for multiboot but using two ram drives - one for kernel, one for initramfs file. <Mikaku>hmm, well, in that case the current implementation already covers this <Mikaku>you can provide kexec_* parameters and also the initrd= and ramdisksize= parameter in the same cmdline <Mikaku>I've not tested it but the idea was that they could coexist <Mikaku>(I'll be away for a while, I'll read your posts later) <rickmasters>Right, I don't see any need for new kernel parameters at this point. It would use kexec_proto=linux <roconnor>some strangeness here. These values can never be NULL as they are required to be not NULL at the top of the loop. <roconnor>I'm starting to regret looking into this file. :P <Mikaku>rickmasters: yes, 'kexec_proto=linux' this is exactly as I thought <stikonas[m]>Hmm, what happens if fiwix is booted with kexec but e.g. build fails and we exit 1 <Mikaku>stikonas: I've not tested it but kexec, at least in Fiwix, is a point of no return <stikonas>roconnor: yeah, this should just be n = env... <roconnor>I was inclined to suspend disbelief about this comment. <stikonas>roconnor: that is not a nonsense comment, there was a good reason <roconnor>because strcpy litterally just goes through character by character. <stikonas>yes, but keep in mind that this had to run on M2-Planet <stikonas>I think this particular problem was fixed in M2-Planet <stikonas>if you look at M2-Planet 3 years ago, it was far less capable <stikonas>still, I guess we don't need this workaround anymore <stikonas>so you could try to rewrite it in a simpler way <stikonas>(as long as it builds with the latest git M2-Planet) <roconnor>So I can locally test my changes, but is that good enough to make a PR? kaem runs on all sorts of strange platforms and I don't want to break things needlessly. <stikonas>one platform for kaem should be good enough <rickmasters>stikonas: If build fails, then init exits and fiwix will halt with an error message. I added a patch to also sync disks in that situation. <stikonas>as long as you can run make test-x86 or make test-amd64 on stage0-posix, I don't expect other arches to break <stikonas>M2-Planet should have the same features/bugs on different arches <stikonas>you can just run ./bootstrap-seeds/POSIX/AMD64/kaem-optional-seed <stikonas>that make target is just a simple wrapper around 2 commands <stikonas>it runs ./bootstrap-seeds/POSIX/AMD64/kaem-optional-seed and then sha256sums... <rickmasters>stikonas: Of course Fiwix is currently launched by a user space program on builder-hex0. <roconnor>oh okay. Well I can certainly run kaem through its usual hoops. <stikonas>stage0-posix kaem scripts are usually good enough for testing <roconnor>I have everything upto bash, which is I guess everything that kaem is used for. <stikonas>yeah, but stage0-posix should already exercise all features <rickmasters>stikonas: Maybe you meant to ask if Linux is booted by Fiwix's kexec and build fails? In that case, it's no different than today - drops to bash. <stikonas>rickmasters: no, I was thinking what would happen if something in Fiwix system errors out <stikonas>so we exit but prepared kernel and initramfs are not yet built <rickmasters>stikonas: currently Fiwix checks the ramdisk for a valid ELF header and only attempts kexec if its valid. <rickmasters>stikonas: The Limine code checks for appropriate magic numbers for linux boot protocol so it should work the same. <stikonas>roconnor: as for multiple arches, M2-Planet emits basically identical assembly on all arches... <stikonas>so e.g. on amd64 and x86 we are using the same number of registers (despite amd64 having double the number of registers) <stikonas>(or on risc-v that has 32 registers we still use only very few) <mihi>roconnor, you will find more dead code once you have a look at M1-Macro (in particular around blob type handling). I planned to collect all of the dead code I know and submit a pull request to eliminate it, but so far did not find time to. <mihi>especially since M1 is also used by Mes, so I'd have to run all the Mes tests and the bootstrap to the point where tcc is built by mes, to make sure I did not break anything. <stikonas>though usually running bootstrap would catch most if not all issues <stikonas>or at least those issues that people care about <roconnor>Heh. Mes doesn't have a very long life. It builds tcc, and maybe that is it. <stikonas>roconnor: right now in live-bootstrap we do rebuild mes with mes <stikonas>depending on the bugs, mes-m2 sometimes might also build tcc directly <stikonas>but mes built with mescc is a bit faster anyway <stikonas>so it does not take too much extra time to rebuilt it with itself