IRC channel logs
2024-12-19.log
back to list of logs
<ZhaoM>what's the use of Ctrl+L (^L) character in source? <ZhaoM>It creates a large gap in my patch while viewing in thunderbird <damo22>ZhaoM: its legacy for page break in source for printing <damo22>i dont think we need to add them in new code but probably just leave the existing ones <ZhaoM>not sure if it will cause any issue when ^L exists in a patch <ZhaoM>can you see a big gap in the patch I just submitted? <damo22>^L is treated as whitespace by gcc <damo22>i submitted smp patches for parallel init <damo22>but it doesnt boot on my AMD machine <damo22>something with the INIT/STARTUP IPIs is broken <solid_black>that's clearly not something I would know anything about <solid_black>last time, me, Pellescours, and youpi were talking about pageout issues <solid_black>which is what prevents rumpdisk, and so SMP, from being fully usable, IIUC <solid_black>and my point was that we should write back dirty pages before there is memory pressure <solid_black>I should keep looking into how things around teh VM subsystem actually work <damo22>i dont know anything about that really <damo22>i can try debugging the startup sequence on my AMD <solid_black>off topic, but I'm making *a lot* of progress on reworking/fixing GTK layout <solid_black>the time and effort I invested into this is finally paying off <solid_black>uh, I'm naturally contributing fixes to the in-development version, which is going to become 4.18 <solid_black>but I'm just very satisified with the progress I'm making <solid_black>but there are many performance and correctness issues with how gtk does layout <solid_black>somewhat better in some rare cases perhaps, but it should be a lot faster, and should explode in less cases, and apps should be able to remove various hacks they've put in place to prevent it from exploding <damo22>i have a wip branch i can test for smp <damo22>i fixed the ESR error previously but it still didnt boot <damo22>i can rebase it on my latest code and check if it makes any difference <damo22>something about timings, reading and writing the error_status register is weird in the hardware itself <damo22>there are workarounds in linux for it <damo22>its very strange, when the init/startup ipis are sent on AMD, the cpu number is 0 on all aps, but not in qemu <azert>damo22: why do you need serial console? Cannot you debug it using the screen? <damo22>i can i guess, but serial is nicer <azert>did you try to increase the timings dramatically and see where it gives you error? <damo22>the timings are subtle, the cpu gives up trying to receive ipis after a very short time <damo22>you cant just increase them indefinitely <azert>Ok, I think I saw a guide for timings on osdev at some point <damo22>also the error_status register needs to be written before read <damo22>i should try bypassing GS reads for cpu number for now <damo22>i tested in qemu by inspecting the actual registers <azert>Your code regarding that looked fine <azert>Do you know where it exactly break? <damo22>this is with my latest patchset from the mailing list applied <damo22>i also tried with another patch on top, this seems to fix one warning but still throw the assert <azert>Oh that’s because it is running on cpu 0 <azert>It shouldn’t be running on that one <azert>either that, or cpu_number is broken <damo22>if you check the IPI code, it is sending to ALL_EXCLUDING_SELF <damo22>so the bsp apic id is passed in to the startup/init <damo22>indeed i think cpu number might be broken <damo22>but the strange thing is it works fine in qemu <azert>What if you replace cpu_number with cpu_number_slow in that function? <azert>I’d do that, chances are that you are running on cpu0 for whatever other reason <damo22>it cant get into that function from anywhere except AP execution <azert>I understood, but what if the starting cpu, for instance, wasn’t cpu0? Did you check that? <damo22>yes, theres another assert for bsp <azert>Then ALL_EXCLUDING_SELF might be broken <damo22>before it sends the ipi sequence it asserts that itself is bsp <azert>Apparently ALL_EXCLUDING_SELF causes issues on certain machines <azert>Since it sends the interrupt to CPUs that are disabled or broken <damo22>11b All excluding self (This sends a message with a destination encoding of all 1s, <damo22>so if lowest priority is used the message could end up being reflected back to <damo22>no i am only getting 4 asserts, and there are 4 working cores <youpi>4 asserts? You should be getting 3? (one BSP and 3 APs) <damo22>its also hard to tell because the messages are all scrambled as they all print out at once <damo22>if this ALL_EXCLUDING_SELF thing is broken, i have no idea how to start cores on all cpus <damo22>because it seems impossible to address all cpus <azert>Could you send to all and then bail out if on cpu0? <damo22>its not supposed to actually wake bsp, because bsp must retain the original code path <damo22>wake all cpus, then if cpu0, go back to orchestrating <azert>Ok but check if the initialization is already done and long jump back doesn’t seem like an ugly hack to me <azert>Or you can return from interrupt? Don’t know the details. I think you need a way to check if the cpu is already inited and just do nothing <damo22>if ALL_EXCLUDING_SELF is broken, wouldnt ALL_INCLUDING_SELF be also broken? <damo22>you would just have one extra in the set and still try to wake broken cores <damo22>why should it matter if you wake a broken core? <damo22>wont it just sit there and be useless? <damo22>apparently INIT ipis should be level triggered not edge <youpi>you changed it in c3a8722c4a131734395a2893f92e092ba441a844 <youpi>(except the comments you added) <damo22>(09:14:16 PM) damo22: apparently INIT ipis should be level triggered not edge <- i read this on a osdev forum <damo22>when i committed that c3a8722, i was testing it on a different AMD processor than i have now <damo22>i needed edge triggered then because when i tried level it was warm resetting to some weird state in coreboot and couldnt start properly <damo22>according to the BKDG for that cpu, the ICR does not support INIT LEVEL DEASSERT <damo22>which is consistent with the comment i wrote in the code <damo22>and the BKDG for the cpu i am testing has the same spec <damo22>it mentions all excluding self as being valid for INIT and STARTUP <damo22>it still fails with cpu_number_slow() <damo22>maybe the apic numbering is broken? <damo22>theres a second bit we are not checking <damo22>if the online capable bit is set, but the enabled bit is not set, it means we can start the processor at runtime <damo22>but currently we are skipping the detection of the cpu <damo22>i fixed that, but it still fails <Pellescours>wow, I made my VM hang, and when it reboot it ask for a manual fsck. When I it enter (to have a shell to do the fsck), It hit a "ext2fs: ../../libdiskfs/node-drop.c:45: diskfs_drop_node: Assertion '!diskfs_readonly' failed. <Pellescours>Before this line I have a line about "can’t create temp file for here document: read-only file system" which is normal because it need to a fsck <Pellescours>Ah but the code does a call to diskfs_check_readonly() but does not use the result of it... <damo22>Pellescours: sounds like /tmp is readonly?