IRC channel logs

2022-12-04.log

back to list of logs

<Christoph[m]>Ah. I didn't think about bying new hardware, I assumed you'd make Linux work on your machine. Seems a bit naïve now. In what sense does the hardware have to support linux?
<Christoph[m]>Actually, it's bootstrapping, why shouldn't one start with nothing?
<stikonas[m]>I also have aarch64 system (rockpro64).
<stikonas[m]>Christoph: e.g. my riscv32 system also doesn't support linux
<stikonas[m]>(Pinecil)
<stikonas[m]>It just not powerful enough for Linux
<stikonas[m]>And yes, we do work on some lower level bootstrapping without linux
<stikonas[m]>rickmasters work uses BIOS calls to do I/O
<stikonas[m]>We also have stage0-uefi
<stikonas[m]>So these rely on system firmware
<stikonas[m]>Anything without firmware would probably have to be hardware specific rather than arch specific
<oriansj>Christoph[m]: There are different levels of bootstrapping
<oriansj>for some building everything upon a trusted kernel is good enough. For others building upon trusted firmware is good enough. For others building upon trusted hardware is good enough and for others they need to make their own hardware
<oriansj>we support all levels of bootstrapping here
<oriansj>so to enable that when we wish to test upon a trusted kernel to do stage0-posix's steps we need hardware that is able to run a posix kernel
<oriansj>if we were doing firmware or baremetal bootstrapping then the requirements would potentially be different
<oriansj>because as we build paths on an architecture, it makes other paths on that architecture easier to do as well
<oriansj>as we need to figure out a great many details about the architecture in question. Things like instruction encoding rules, memory map and odd bits which we wouldn't have known about until we started bootstrapping.
<oriansj>It is usally easiest to start upon a kernel (as good debugging tools are available) and then move down the levels until you are making your own hardware.
<Christoph[m]>I see, thank you!
<stikonas>indeed, you can use gdb on linux. When I was writing stage0-uefi I had almost no debugging
<stikonas>most of the stuff was figured out by exiting early and looking at return code
<oriansj>which is a very hard way to do development
<oriansj>and the lower one goes the less help they can get
<stikonas>yes, once you get rid of firmware, you lose easy to read exit codes too
<oriansj>So it is easiest to first port an architecture upon a kernel and then the next steps become much easier to do
<stikonas>unless you have specialized hardware for that
<stikonas>hmm, some thing is still quite broken with uint*_t types...
<stikonas>despite test0030 passing, when I try to use them to create uefi device_path struct, I get back some garbage
<stikonas>i.e. store 1 into struct member, get some large negative number back...
<stikonas>maybe I should actually use that struct for test0030...
<stikonas>oh, it might actually be caused by something more subtle
<stikonas>if I have uint8_t type; and type is set to 1
<stikonas>then uint8_t rval = device_path->type works but unsigned rval = device_path->type is garbage
<stikonas>and uint8_t rval = device_path->type; unsigned rval = r; also works
<stikonas>dirty stack?
<oriansj>sign vs zero extension
<stikonas>oh, possibly...
<stikonas>well, I should first create a better test, then I'll fix M2-Planet to pass it
<stikonas>I might know where to fix it but we'll see
<stikonas>I might need to add signed vs unsigned loading in load_value
<oriansj>yeah stores don't care about signed or unsigned but loads always should
<muurkha>once you get rid of firmware your exit codes are beep sequences from the PC speaker or LED blinking sequences
<muurkha>depending on the hardware
<oriansj>or nothing at all if you are unlucky >.<
<muurkha>yeah, that's the hardest exit code to interpret
<stikonas>hmm, somehow managed to segfault M2-Planet... :(
<stikonas>ok, my C file is illegal, though segfault should not happen even for illegal C
<stikonas>oh, that's a bug in load_value...
<stikonas>it should never return NULL
<oriansj>stikonas: I'll do some fuzzing and clear out the possible segfaults the new code paths may have introduced.
<stikonas>and I still haven't managed to get that stage0-uefi testcase reproduced as M2-Planet test
<oriansj>well if fuzzing doesn't find anything in 24hours, odds are the bug is in the UEFI functions and then figuring it out become much harder.
<stikonas>no, the bug is not in UEFI... it must be in M2-Planet
<stikonas>I can reproduce that bug before calling UEFI function
<stikonas>oriansj: for now segfault fix: https://github.com/oriansj/M2-Planet/pull/48
<oriansj>merged
<oriansj>I'll still be fuzzing
<stikonas>sure, that's still useful
<stikonas>oriansj: and another segfault...
<stikonas>oh, actually this one is not in M2-Planet
<stikonas>but in the test
<oriansj>which is a sign the M2-Planet generated code isn't exactly correct
<stikonas>indeed
<stikonas>well, that is expected, as I was trying to modify the test to trigger that UEFI bug...
<stikonas>though on UEFI I had no crash, just wrong value
<oriansj>well UEFI doesn't do proper memory protection right? so a segfault wouldn't show up as you would just be reading memory address zero
<stikonas>oh I still had illegal C...
<stikonas>hmm, that is true...
<stikonas>backtrace shows FOR_THEN__malloc_free_list_1
<stikonas>but then the incorrect code might have been earlier...
<stikonas>well, I should probably first try to fix load_value
<oriansj>oh malloc and free bugs are always fun
<oriansj>got to remember to get out the coloring book and crayons and use color to try to figure this out
<stikonas>oriansj: oh, that segfaults on the first calloc
<stikonas>might be that libc.M1 does not initialize it
<stikonas>yeah, I need to switch to libc-full...
<stikonas>it's possibly that current crash is due to not everything being ported to load/store_value...
<stikonas>s/possibly/possible/
<stikonas>oh, actually it's because stuff like that is not yet supported array[1]->member
<stikonas>oriansj: so I've been investigating a bit, signed load might be one thing we need to fix, but I think the problem I saw in UEFI is also caused by incorrect load size
<stikonas>i.e. when I'm doing uint16_t rval = device_path->subtype
<stikonas>it has 3 from device_path->type = HARDWARE_DEVICE_PATH; but also higher 8 bits have 24 from device_path->length = sizeof(struct efi_device_path_protocol);
<stikonas>hmm, that's not exactly what I did, let me try again
<stikonas>I was reading 2nd element of the struct
<stikonas>which was device_path->subtype = MEMORY_MAPPED;
<stikonas>subtype is uint8_t
<stikonas>but it's contaminated with the 3rd element of the struct, it ends up in higher bits of rval
<stikonas>which probably happens because now read size is taken from uint16_t rval
<stikonas>but should be uint8_t...
<stikonas>though I don't understand why I can't reproduce it in M2-Planet tests...
<oriansj>stikonas: I guess I'll need to dig into it to figure out why
<stikonas>well, that might help it solve quicker. I'm also digging into it, but it's a bit hard when I can't exactly reproduce it on posix