IRC channel logs

2022-12-19.log

back to list of logs

<rickmasters>Before the autoconf problem I worked on a memory corruption problem. At some point when Fiwix switched to the idle task, it rebooted.
<stikonas>oh, that is hard to track...
<stikonas>reboots wipe most of the state
<stikonas>but yes, I've found that in UEFI memory coruption can often cause a reboot too
<rickmasters>Just finding the trigger was a long slog and then I looked closely at the stack it was restoring and before reboot the contents had been corrupted.
<rickmasters>So I just checked that piece of memory everywhere, all the time, ala if (*0xC000FF6C != 0x00000082) then printk("corrupted here: 22!")
<stikonas>oh, my sizes might be wrong because M2 might be treating + and * as same priority in arithmetic operations
<stikonas>so I need to add more brackets
<rickmasters>Ultimately found that the kernel stack memory had not been reserved and was allocated to an application...
<rickmasters>It's too bad M2 isn't a clean subset of C
<stikonas>yeah, a bit annoying...
<stikonas>though it might not be the only problem that is causing infinite loop for me
<muurkha>what was the benefit of building your own qemu?
<oriansj>stikonas: we can also make blood-elf create symbols for globals
<rickmasters>muurkha: I couldn't boot Fiwix - just a blank screen.
<stikonas>oriansj: well, that wouldn't help with my current UEFI problems, though that might be useful in general
<oriansj>it is the locals which are a bit harder to create; and honestly not quite sure how to do that.
<rickmasters>muurkha: So Fiwix can output to the qemu debug port so I wanted to see if qemu was reading anything
<stikonas>in principle we could also make * and / have higher priority than + - in M2-Planet
<stikonas>that is not hard
<rickmasters>muurkha: In certain scenarios Fiwix would boot but the keyboard didn't work.
<muurkha>rickmasters: the standard qemu build didn't have a debug port?
<stikonas>just need to split additive_expr_stub into a few differents functions
<stikonas>anyway, I'll have to continue debugging it some other day...
<rickmasters>muurkha: Maybe (don't remember well), but I don't think so - you need ./configure -enable-debug
<rickmasters>muurkha: Also, later I was using a large ramdrive and wanted to see where qemu was placing it in memory
<rickmasters>muurkha: and I wanted to see if qemu was finding the multiboot header in the Fiwix binary
<muurkha>rickmasters: aha, I see
<muurkha>so above all for debugging the boot process
<rickmasters>muurkha: yes, but ultimately my issues were more on the Fiwix side
<muurkha>debugging silent boot failures sucks
<rickmasters>muurkha: I spent an enormous amount of time figuring out that Fiwix requires compiling with -fno-pie because my distro's gcc turns on PIE which breaks Fiwix
<rickmasters>I submitted a PR for that so others with non standard gcc don't fall into that pit.
<rickmasters>BTW, I defined CONFIG_QEMU_DEBUGCON in Fiwix to get debug output but it didn't work - it was #undef'd in include/fiwix/config.h which was unfortunate and took me too long to find that
<muurkha>rickmasters: that sounds super annoying
<muurkha>why does PIE break Fiwix?
<rickmasters>muurkha: its a long story but briefly: gcc creates several ELF sections that Fiwix's linker script doesn't expect, PIE code uses ebx which some Fiwix assembly routines would trash,
<muurkha>oh, I should have realized this was i386 PIE
<rickmasters>also, Fiwix copies the init_trampoline code into a user segment and jumps to it to exec init, but PIE code is compiled with relative offsets to data structures which get lost
<muurkha>it needs ebx as a globals pointer?
<rickmasters>gcc PIE uses thunk call that moves the return address to ebx in order to locate the instruction pointer and then it finds data structures relative to that
<muurkha>ugh, I didn't realize that
<muurkha>that's worse than I thought
<rickmasters>it took me quite a while to figure out what it was doing because it seemed so convoluted
<rickmasters>then I had to figure out that it was using ebx for that and ebx was getting trashed, but I guess -fno-pie seems to avoid using ebx and so Fiwix assembly that trashed ebx didn't have problems
<muurkha>I'm guessing it still assumes ebx is callee-saved though
<rickmasters>with -fpie every single function relies on ebx right away to locate global data, and then many assembly functions would trash it so it caused havoc
<rickmasters>I'm surprised there weren't problems with -fno-pie also, because ebx is used, but not as much. But I haven't noticed any problems so far.
<rickmasters>I went through all his assembly in kernel/core386.S and preserved ebx and I have that available but I haven't had to use it yet...
<muurkha>if you're going to call assembly functions from C, you might save and restore ebx in them unless you can persuade GCC to invoke them with some other calling convention that doesn't require that
<muurkha>but hey, you've got it working, I haven't ;)
<rickmasters>Yes, I did the work to preserve ebx and will submit a PR for that at some point but its working without it (suprisingly)
<rickmasters>muurkha: i did that in hopes that I could make it work with -fno-pie but then I figured out he was copying code and gave up and finally figured out the compiler option I needed.
<rickmasters>For some reason the compiler option that broke everything eluded me for way too long but it's pretty obscure and gcc has like a thousand options
<muurkha>maybe two thousand
<rickmasters>Luckily I had an old CentOS machine that compiled Fiwix correctly. I chalked it up to newer gcc doing things differently but it was Ubuntu compiling it with a different default.
<rickmasters> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103398
<rickmasters>Fun fact: I used the "die-loop" to debug the Fiwix boot code before it could print.
<muurkha>die-loop?
<rickmasters>I also used that extensively in builder-hex0, along with "hlt" as an alternative.
<rickmasters>Though I was clever, but later I read an account from Linus about using the same technique: search "die-loop" here: https://www.cs.cmu.edu/~awb/linux.history.html
<rickmasters>Just code in a die-loop and if qemu went to 100% it hit your code. Or if it was already going to 100%, code a halt and see if qemu goes idle
<muurkha>oh nice, so you get one bit of output
<muurkha>does that work on modern bare metal too? or is power management dependent on successful booting?
<rickmasters>You can code in a reboot as well - so that works on bare metal
<muurkha>aha, reboot vs. hang
<rickmasters>reboot, loop, or halt
<rickmasters>depending on the circumstances
<muurkha>on bare metal without power management looping and halting might look the same
<rickmasters>I was getting that "bit" from Fiwix fairly early so I knew it was getting loaded so that helped a lot with motivation. But its a low bandwidth signal for debugging :)
<muurkha>yeah, low-bandwidth debugging takes a lot of careful experiment planning and execution
<rickmasters>stikonas: hows the memory problem going?
<stikonas[m]>rickmasters: stopped looking and went to bed, maybe I will look this evening
<fossy>rickmasters: i am constantly impressed with your debugging abilities. that's a crazy debugging story
<stikonas[m]>Yeah, using halt vs loop to get 1 bit
<stikonas[m]>On uefi I was able to get 32 bits by exiting with return code
<rillian>is the knight isa in the stage0 repo something orians designed for the project, or does it come from somewhere else?
<muurkha>apparently it's a bankrupt computer company called "Knight" from the 01970s
<muurkha>oriansj claims he didn't make it up