IRC channel logs

2024-12-19.log

back to list of logs

<ZhaoM>what's the use of Ctrl+L (^L) character in source?
<ZhaoM>It creates a large gap in my patch while viewing in thunderbird
<damo22>ZhaoM: its legacy for page break in source for printing
<damo22>i dont think we need to add them in new code but probably just leave the existing ones
<ZhaoM>not sure if it will cause any issue when ^L exists in a patch
<ZhaoM>can you see a big gap in the patch I just submitted?
<damo22>^L is treated as whitespace by gcc
<damo22>it really doesnt matter
<ZhaoM>ok
<ZhaoM>then I will ignore it
<damo22>solid_black: hi
<solid_black>hi
<damo22>where are we at with gnumach?
<solid_black>wdym
<damo22>i submitted smp patches for parallel init
<damo22>i dont know what to do next
<damo22>but it doesnt boot on my AMD machine
<damo22>(nor did it before the patches)
<solid_black>ah
<solid_black>doesn't boot why?
<damo22>ESR error 0x8
<damo22>something with the INIT/STARTUP IPIs is broken
<damo22>probably timings
<solid_black>that's clearly not something I would know anything about
<damo22>yeah ok
<solid_black>last time, me, Pellescours, and youpi were talking about pageout issues
<damo22>ok
<damo22>what needs to be done there?
<solid_black>which is what prevents rumpdisk, and so SMP, from being fully usable, IIUC
<solid_black>and my point was that we should write back dirty pages before there is memory pressure
<solid_black>and it looks like we currently don't
<damo22>i see
<solid_black>I should keep looking into how things around teh VM subsystem actually work
<damo22>yes, please
<damo22>i dont know anything about that really
<damo22>i can try debugging the startup sequence on my AMD
<damo22>but it has no serial console
<solid_black>off topic, but I'm making *a lot* of progress on reworking/fixing GTK layout
<solid_black>the time and effort I invested into this is finally paying off
<damo22>gtk?
<solid_black>yes, the gui toolkit
<damo22>which version
<solid_black>uh, I'm naturally contributing fixes to the in-development version, which is going to become 4.18
<damo22>i see
<damo22>is this for hurd?
<solid_black>no, it's unrelated
<solid_black>it's for $dayjob if anything
<damo22>ah ok
<damo22>cool
<solid_black>but I'm just very satisified with the progress I'm making
<damo22>what is wrong with the layout?
<damo22>that needs fixing
<solid_black>oh, there are many issues there
<solid_black>it only seems simple on the surface
<damo22>will it look better?
<solid_black>but there are many performance and correctness issues with how gtk does layout
<damo22>ok
<solid_black>somewhat better in some rare cases perhaps, but it should be a lot faster, and should explode in less cases, and apps should be able to remove various hacks they've put in place to prevent it from exploding
<damo22>i have a wip branch i can test for smp
<damo22>i fixed the ESR error previously but it still didnt boot
<damo22>i can rebase it on my latest code and check if it makes any difference
<damo22>something about timings, reading and writing the error_status register is weird in the hardware itself
<damo22>there are workarounds in linux for it
<damo22>its very strange, when the init/startup ipis are sent on AMD, the cpu number is 0 on all aps, but not in qemu
<damo22>it fails the assert (cpu > 0)
<damo22>4 times
<damo22>(there are 4 cores on the AMD)
<azert>damo22: why do you need serial console? Cannot you debug it using the screen?
<azert>just curious
<damo22>i can i guess, but serial is nicer
<azert>did you try to increase the timings dramatically and see where it gives you error?
<damo22>the timings are subtle, the cpu gives up trying to receive ipis after a very short time
<damo22>or send im not sure
<damo22>you cant just increase them indefinitely
<azert>Ok, I think I saw a guide for timings on osdev at some point
<damo22>also the error_status register needs to be written before read
<damo22>i should try bypassing GS reads for cpu number for now
<damo22>see if the problem lies there
<damo22>i tested in qemu by inspecting the actual registers
<azert>Your code regarding that looked fine
<damo22>it seems to be correct
<azert>Do you know where it exactly break?
<damo22>yes
<damo22>mp_desc.c:277
<damo22>this is with my latest patchset from the mailing list applied
<damo22>i also tried with another patch on top, this seems to fix one warning but still throw the assert
<azert>Oh that’s because it is running on cpu 0
<damo22>no, i dont think so
<azert>It shouldn’t be running on that one
<azert>either that, or cpu_number is broken
<damo22>if you check the IPI code, it is sending to ALL_EXCLUDING_SELF
<damo22>so the bsp apic id is passed in to the startup/init
<damo22>indeed i think cpu number might be broken
<damo22>but the strange thing is it works fine in qemu
<azert>What if you replace cpu_number with cpu_number_slow in that function?
<damo22>i was going to try that next
<azert>I’d do that, chances are that you are running on cpu0 for whatever other reason
<damo22>it cant get into that function from anywhere except AP execution
<azert>I understood, but what if the starting cpu, for instance, wasn’t cpu0? Did you check that?
<damo22>yes, theres another assert for bsp
<azert>Then ALL_EXCLUDING_SELF might be broken
<damo22>before it sends the ipi sequence it asserts that itself is bsp
<damo22>hmm
<azert>Apparently ALL_EXCLUDING_SELF causes issues on certain machines
<azert>Since it sends the interrupt to CPUs that are disabled or broken
<azert>Might be your case
<damo22>11b All excluding self (This sends a message with a destination encoding of all 1s,
<damo22>so if lowest priority is used the message could end up being reflected back to
<damo22>this APIC.)
<damo22>no i am only getting 4 asserts, and there are 4 working cores
<youpi>4 asserts? You should be getting 3? (one BSP and 3 APs)
<azert>That’s one too many
<damo22>oh crap
<damo22>hahaha
<damo22>its also hard to tell because the messages are all scrambled as they all print out at once
<damo22>let me try again
<damo22>if this ALL_EXCLUDING_SELF thing is broken, i have no idea how to start cores on all cpus
<damo22>because it seems impossible to address all cpus
<damo22>via individual apic messages
<azert>Could you send to all and then bail out if on cpu0?
<damo22>its not supposed to actually wake bsp, because bsp must retain the original code path
<damo22>it orchestrates the rest
<damo22>that would be a super hack
<damo22>wake all cpus, then if cpu0, go back to orchestrating
<azert>Ok but check if the initialization is already done and long jump back doesn’t seem like an ugly hack to me
<azert>Or you can return from interrupt? Don’t know the details. I think you need a way to check if the cpu is already inited and just do nothing
<damo22>if ALL_EXCLUDING_SELF is broken, wouldnt ALL_INCLUDING_SELF be also broken?
<damo22>you would just have one extra in the set and still try to wake broken cores
<damo22>why should it matter if you wake a broken core?
<damo22>wont it just sit there and be useless?
<youpi>(and consume power)
<damo22>ah
<damo22>apparently INIT ipis should be level triggered not edge
<youpi>you changed it in c3a8722c4a131734395a2893f92e092ba441a844
<youpi>I don't know why
<youpi>(except the comments you added)
<damo22>(09:14:16 PM) damo22: apparently INIT ipis should be level triggered not edge <- i read this on a osdev forum
<damo22>when i committed that c3a8722, i was testing it on a different AMD processor than i have now
<damo22>it worked there
<damo22>i needed edge triggered then because when i tried level it was warm resetting to some weird state in coreboot and couldnt start properly
<damo22>according to the BKDG for that cpu, the ICR does not support INIT LEVEL DEASSERT
<damo22>which is consistent with the comment i wrote in the code
<damo22>and the BKDG for the cpu i am testing has the same spec
<damo22>it mentions all excluding self as being valid for INIT and STARTUP
<damo22>it still fails with cpu_number_slow()
<damo22>maybe the apic numbering is broken?
<damo22>when we parse the madt table?
<damo22> https://uefi.org/specs/ACPI/6.5/05_ACPI_Software_Programming_Model.html#local-apic-flags
<damo22>theres a second bit we are not checking
<damo22>if the online capable bit is set, but the enabled bit is not set, it means we can start the processor at runtime
<damo22>but currently we are skipping the detection of the cpu
<damo22>just on the enabled bit
<damo22>i fixed that, but it still fails
<Pellescours>wow, I made my VM hang, and when it reboot it ask for a manual fsck. When I it enter (to have a shell to do the fsck), It hit a "ext2fs: ../../libdiskfs/node-drop.c:45: diskfs_drop_node: Assertion '!diskfs_readonly' failed.
<Pellescours>"
<Pellescours>Before this line I have a line about "can’t create temp file for here document: read-only file system" which is normal because it need to a fsck
<Pellescours>Ah but the code does a call to diskfs_check_readonly() but does not use the result of it...
<Pellescours>why does the backtrace not printed?
<damo22>Pellescours: sounds like /tmp is readonly?
<Pellescours>possibly