IRC channel logs

2021-12-03.log

back to list of logs

<oriansj>which riscv32 doesn't actually have nor wait4
<stikonas>well, yes, so it has to be implemented using waitid
<stikonas>and it's (minimal) implementation in kaem-minimal might also be wrong
<stikonas>somehow now riscv32 bootstrap fails for me with
<stikonas>./riscv32/bin/M1 --architecture riscv32 --little-endian -f ./M2libc/riscv32/riscv32_defs.M1 -f ./M2libc/riscv32/libc-full.M1 -f ./riscv32/artifact/hex2_linker-1.M1 -f ./riscv32/artifact/hex2_linker-1-footer.M1 -o ./riscv32/artifact/hex2_linker-1.hex2
<stikonas>but if I run it manually, process successfully completes and exits with 0
<stikonas>oh, something is actually wrong with ./bin/M1...
<stikonas>so probably M2libc problem
<oriansj>no I think it might be a waitid problem in the kaem-optional-seed
<oriansj>because if manual running works fine, it isn't the binaries but the bit running the binaries
<stikonas>well, manual exits fine with exit code 0 but does not output any file
<stikonas>and without -o its output is empty
<stikonas>so perhaps two problems
<stikonas> https://github.com/stikonas/M2libc/commit/08be85b83bc7f261b0ddd7b8a2fea54641d746be
<stikonas>(this is what I'm testing)
<stikonas>this should fail once full kaem is build (due to missing waitpid) but fails earlier
<stikonas>oh, it's actually riscv32/bin/M1 is empty
<stikonas>so the error is earlier
<stikonas>but at least it makes more sense...
<stikonas>empty file is just executed in bash
<stikonas>ok, that's because lseek is not there on riscv32...
<stikonas>how come we didn't see any breakage in hex2_riscv32.hex1
<stikonas>that one uses lseek
<oriansj>but it doesn't use M2libc either
<stikonas>oriansj: no, it probably works by accident...
<stikonas>I'm still investigating
<stikonas>but it looks like I have to use llseek
<stikonas>and maybe lseek(0) just happened to work with wrong call
<oriansj>quite possibly
<oriansj>I probably should get a proper RISC-V 32bit syscall table
<stikonas>yes, llseek helped...
<stikonas>I've now managed to build up to kaem
<stikonas>which fails due to missing waitpid implementation...
<stikonas>but hex1 and hex2 work only by accident...
<stikonas>we are setting whence to 0 which sets required register for offset to be 0
<stikonas>so lseek(0) worked...
<oriansj>so a functional hack; guess we need to add some comments to explain what actually is happening
<stikonas>fixed wip commit https://github.com/stikonas/M2libc/commit/c51b5b1e07480482f4c8a7a208709c7db56fb858
<stikonas>oriansj: or maybe set the registers to what we need and reencode
<stikonas>I'm not sure if on baremetal we'll have them at 0
<stikonas>s/baremetal/real hw
<stikonas>anyway, bed time
<oriansj>you are right, risc-v does not zero the registers on exec
<muurkha>what does it put in them?
<muurkha>hopefully not just whatever the previous program left there, that's a potentially major security hole
<stikonas[m]>Prob whatever kernel left
<stikonas[m]>Not previous program
<oriansj>muurkha: kernel space register values
<oriansj>which is actually a major security hole
<oriansj>but one that has already been reported by us
<oriansj>if they don't address it, we will probably have to get a CVE for it
<stikonas[m]>Well, maybe most of them are unimportant temp stuff while preparing for process launch
<muurkha>a problem is that changing that unimportant temp stuff will break userland programs
<muurkha>well. can.
<muurkha>and the compiler might do that without asking you when you recompile the kernel
<muurkha>so even if it's unimportant now, it may not be unimportant next week
<stikonas[m]>Well, compiler should initialize variables...
<muurkha>probably
<muurkha>but ld.so might not be written with a compiler
<muurkha>similarly crt0
<oriansj>muurkha: what breakage could occur for a well written program in a high level language?
<oriansj>assembly programs that uses it for random data source might break (they shouldn't be using that as a source anyway)
<muurkha>I don't think it matters how well the program is written; it matters how the program's startup code is written
<oriansj>or they are already zeroing the register or setting it to a known value before use.
<muurkha>I was thinking that if one of those random registers happens to be set to 0 at present, the startup assembly code might assume that will always be the case
<muurkha>or that its value would always be, for example, positive, or have the high 24 bits set to 0
<muurkha>and if they aren't clearing the registers they probably aren't clearing the flags either, so DF might happen to be set
<muurkha>which would affect the behavior of lods, stos, cmps, movs
<oriansj>yes, however zeroing of those registers is what all other ports do already
<muurkha>all other ports of the crt0 and ld.so in glibc, dietlibc, tinycc, and musl?
<muurkha>anyway if your high-level-language implementation's startup code accidentally depends on DF or one of those register values, recompiling the kernel might break it. but maybe not for a few years
<oriansj>muurkha: depending on random register values is just stupid
<muurkha>presumably you wouldn't do it on purpose!
<oriansj>set to zero isn't random and something reasonable to depend upon
<muurkha>agreed
<oriansj>so random (not true random as it is leaked kernel state)
<muurkha>right
<muurkha>I'm saying you'd include an assumption like the ones I mentioned above in your startup code by accident, and then only find out when you upgraded the kernel and none of your programs worked
<muurkha>the ones compiled with Free Pascal or statically linked with musl or whatever
<oriansj>isn't something anyone should depend upon; we only discovered this issue because the zero we were expecting turned out to be non-zero
<muurkha>oh, maybe you meant all other ports of Linux, not all other ports of crt0 and ld.so. that makes more sense
<oriansj>muurkha: well we were talking about what the kernel does on exec
<muurkha>right
<muurkha>I agree, and as we've found in the past, Linux has been willing to break such dubious programs in the past, even if technically that "breaks userland"
<oriansj>so what one's libc isn't our problem as we don't use that until after stage0-posix
<muurkha>no, I'm saying it's potentially Linux's problem
<oriansj>muurkha: yes we know and reported it to them as such
<muurkha>like, what are the considerations the kernel developers might think about in deciding whether or not to apply your kernel patch?
<oriansj>as it is leaking kernel state to processes
<oriansj>well it will slow down the creation of new processes (the time it takes to zero the registers) and that would be it
<muurkha>right, and it changes the interface in a non-backward-compatible way
<muurkha>but it's a non-backward-compatible way that it's very unlikely anyone is depending on
<muurkha>and not fixing the problem will probably make similar non-backward-compatible changes happen from time to time when compilers upgrade, or when compiler options change
<oriansj>no libc or any runtime library honestly would start with hey lets trust random crap to be zero and they will just zero the registers again for all sane architectures as well as risc-v
<muurkha>sure, that's what they *should* do
<muurkha>but if they don't do it, you might not notice for a while
<oriansj>muurkha: ummm if random garbage is what you are getting and you are a libc writer, you'll have bugs that randomly appear if you don't properly set registers before use. So I can't imagine any case where depending on random values (which change from exec to exec) even makes sense for any libc or runtime
<gbrlwck>muurkha: WDYM with "DF"?
<gbrlwck>oriansj: +1 for the FOSDEM proposal! having heard your 2' rap i figure you're more than ready to say the same but maybe a little more verbose and maybe a tiny bit slower ;)
<gbrlwck>oriansj: did we report the non-zeroing registers issue on lkml? do you have a link handy (or a date or something i can maybe find the report)?
<stikonas>gbrlwck: I asked a bit about it on IRC
<stikonas>but I was told that there shouldn't be any information leak form old program
<stikonas>at some point there was but it was fixed
<stikonas>in any case, we should zero any registers ourselves too and not rely on them being zeroed
<gbrlwck>stikonas: yes, that's also what i took from that issue :) if there was like an official report i'd have that included in my thesis....
<gbrlwck>janneke: what does arch:test-r do in MEScc? does it test for equality? but on twice the same register (which would always be true)?
<janneke>gbrlwck: ah, i guess here mescc shows its x86 roots
<janneke>test %eax,%eax === cmp %eax,0
<janneke>(iow, it's only true if register r is zero)
<gbrlwck>so MEScc only uses it to test a value for 0?
<janneke>gbrlwck: yes, and then possibly jump accordingly
<stikonas>something similar to beqz in risc-v
<gbrlwck>stikonas: true, but test-r also happens in not, gt, ne, etc (where the jump does not immediately follow the comparison)
<stikonas>gbrlwck: there is also set if equal instructions in risc-v
<stikonas>slt/sltu/sltiu
<stikonas>M2-Planet also uses them to implement "<"
<gbrlwck>i see!
<janneke>gbrlwck: it could be that a cleaup is in order; i.e. rename arch:test-r to arch:cpm-r-0, and just implement that using test on x86
<janneke>*cmp-r-0
<muurkha>gbrlwck: uh, an i386 thing, totally inapplicable on RISC-V
<gbrlwck>that might make sense! though that dash ("-") notation is ambiguous, it's also used to mean subtract...
<muurkha>actually the high-24-bits thing is also I think totally irrelevant on RISC-V
<gbrlwck>and i've been stumbling on a couple of other things that might need some cleanup.. e.g. the function-locals allocating (* 4 1025) bytes while the comment says it's space for 4*1024 variables, some code duplication, .. i guess i'll make some PR as soon as my changes work :)
<janneke>right, like r-cmp-value => r-cmp-0 would be better
<gbrlwck>stikonas: not 100% sure this would actually work here... slt writes 1 to the destination register; so we'd have to either write into the register we want to test (which would overwrite the value we're testing) or we'd use another register which either is uniquely used for (only) that purpose or we'd have to return the register we used ...
<stikonas>well, you can use some temp register like t1
<stikonas>but I haven't looked at the code
<stikonas>so not sure what's the best
<gbrlwck>i'll have to think about it... mescc seems to heavily depend on that zero-flag; maybe it's best to dedicate one register (x31/t6) to be used for that purpose?
<stikonas>maybe. riscv has a lot of registers and it's not easy to use them all in simple compilers as M2-Planet and mescc
<stikonas>since they are designed with x86 in mind which has very few registers
***pgreco_ is now known as pgreco
<stikonas>oriansj: I think I've got waitpid working in M2libc/riscv32. There are other issues that prevent full run from succeeding but I guess we can merge what we have...
<stikonas>one of the issues might be in kaem-minimal as it fails to start full kaem
<stikonas>and also env variables seem to be broken...
<stikonas>oriansj: https://github.com/oriansj/M2libc/pull/10
<stikonas>hopefully this is also good enough to enable riscv32 tests in M2-Planet
<stikonas>(if I run kaem.riscv32 with full kaem it proceeds a bit further than with kaem-optional-seed, so there must be a problem in the seed)
<oriansj>stikonas: your M2libc work has been merged
<stikonas[m]>thanks
<oriansj>gbrlwck: DF in this context I believe means Dirty flag; the way the kernel knows what register values have been changed since last allocating CPU runtime to speed up the saving of register values when context switching between processes.
<stikonas>ok, I've enabled riscv32 tests for M2-Planet now https://github.com/oriansj/M2-Planet/pull/38
<stikonas>all passing
<stikonas>oh and I think I found env variable bug...
<stikonas>that means another PR to M2libc...
<stikonas>oh, and actually it fixed all other problems and everything runs to completion...
<stikonas>at least on qemu
<stikonas>which is here : https://github.com/oriansj/M2libc/pull/11
<stikonas>hmm, ok it's not yet running to completion... kaem probably fails to detect non-zero error status
<oriansj>and M2-Planet work merged
<oriansj>greetings wnklmnn
<stikonas>oriansj: thanks. Once you merge M2libc PR/11 I think I can get riscv32 run to completion myself (I have push access to mescc-tools-extra and stage0-posix)
<stikonas>there will still be some small bugs but those can be fixed one everything runs
<oriansj>stikonas: it has been merged
<stikonas>thanks
<stikonas>so I'll try to get make test-riscv32 working and will sort kaem issues later
<stikonas>in any case I think we are getting quite close
<stikonas>and stage0-posix update pushed