IRC channel logs
2023-01-28.log
back to list of logs
<damo22>Pellescours: im not sure what you mean by apic is not ready <damo22>are you saying smp works with --disable-apic? <Pellescours>i don't think it wokrs without apic but non smp with apic neither <damo22>i was focusing on getting smp+apic working, then when its stable we can look why it fails on other modes <Pellescours>because of the lost of intereupts, i'm not able to habe disk drivers working <damo22>gnumach disk driver does not lose interrupts <Pellescours>qemu-system-x86_64 -m 4096 -drive format=raw,cache=writeback,file=dev.img -nic user,hostfwd=tcp:127.0.0.1:2222-:22 -k bepo --enable-kvm -serial stdio -smp 6 -cpu host -usb <damo22>qemu-system-i386 -M q35,accel=kvm -smp 1 -m 4096 -net user,hostfwd=tcp::8888-:22 -net nic -curses -hda /dev/sdd -chardev socket,id=net0,host=127.0.0.1,port=9999,ipv4=on,server=on,telnet=on -monitor chardev:net0 --no-reboot --no-shutdown <damo22>your machine has emulation for entire 64 bit machine, mine is strictly 32 bit <Pellescours>for perfs, even if I don’t think it will change anythink because recent features are not really supported/used by the kernel <damo22>AHCI SATA 00:1f.2 BAR 0xfebd5000 IRQ 10 <damo22>sd0: QEMU HARDDISK, 465GB w/256kB Cache <Pellescours>me it’s hd0 and when I add -M q35 it fails to find hd0 <Pellescours>ah this explains the commit that reduce this times to 10 secs <Pellescours>So in the end, if we don’t use -M q35 there is irq lost. But with q35 we can focus on smp before trying to fix the irq problem <damo22>i dont know if the default machine even has an APIC <Pellescours>"info pic" in qemu consoles shows 24 entries under "ioapic" <damo22>../configure CFLAGS="-O2 -g" --enable-kdb --enable-ncpus=8 --enable-apic <damo22>yes my branch is not rebased onto that <Pellescours>I just tried to boot with -smp=6, 4 cpu were found (probably because I removed the -cpu host). And It’s sloooow. with smp 1 it’s normal. <damo22>its slow and the cpus are fighting <damo22>i think they are not idling properly <damo22>it uses 100% cpu in idle with smp 1 <Pellescours>and console timeout make it unusable, only ssh is possible <Pellescours>Idk, looking at htop, it says that procfs is taking a lot of cpu ~46% <damo22>does the cpu usage add up to the load? <damo22>i was getting load 6 with about 100% usage using smp 4 <Pellescours>100% for one cpu == load of 1 If I understand correctly <damo22>must be that the aps are spinning doing nothing <Pellescours>with normal kernel (no smp) proc and procfs usualy take 0% of cpu <damo22>that seems to tell it to sit in tight loop <damo22>maybe the cpus are sitting in machine_relax a fair bit <damo22>every time i interrupt the cpu with kdb, its sitting in machine_idle <damo22>hmmm it never seems to get out of the loop in kern/startup.c <Pellescours>are you sure they are running the slave_main function? <Pellescours>When ap are set up, the kernel is booting so other tasks are not yet creating. So AP put themself automatically in idle. But when more tasks are created, AP should start taking some. So imho problem in scheduler <damo22>APs are stuck in the middle of cpu_launch_first_threa <damo22>i modified my code, its in cpu_launch_first_thread <damo22>where the loop is that APs spin inside <damo22>it seems to hang when i put a printf <damo22>Thread 2 (Thread 1.2 (CPU#1 [running])): <damo22>#0 _kret_popl_ds () at ../i386/i386/locore.S:533 <Pellescours>I just re-check the past you did yesterday all AP were at thread_quantum_update (at ../kern/priority.c:152) <Pellescours>And this line is a thread_lock call (macro that contains a while) <damo22>i need to see why this is not printing that the APs are passing this point <damo22> asm volatile ("pause" : : : "memory"); <damo22>the two loops are happening simultaneously on BSP and AP <damo22>so they are waiting for each other <damo22>how do i do an interprocessor lock? <Pellescours>that’s what is set at multiple places, I think it works <damo22>for all cpus, and then wait for the number to be ready <damo22>Pellescours: i just pushed to my branch, we need to figure out why APx DONE! is not being printed <damo22>i dont understand why the code never reaches there <damo22>maybe lapic_enable_timer needs to run just before load_context() <damo22>where it usually calls startrtclock() <damo22>timer interrupts could be causing havoc too early? <damo22>it crashes because ioapic_configure is running on APs <damo22>i need APs to wait until BSP has chosen a thread? <damo22>kernel_stack is zero and APs switch to a zero stack <damo22>how do i initialise kernel_stack? <damo22>is there supposed to be only one kernel_stack? <damo22>youpi1: when the TLB(cpu0 -> cpu1) shootdown happens, cpu1 ends up with ESP = 0 <damo22>this is before cpu1 has chosen a thread to rnu <damo22>what order do the cpus need to launch threads? <damo22>im in a racy part of the code when all APs are eager to fire up <damo22>so now BSP is launching, and sending shootdowns to all APs in a row, but interrupts are off on the APs because they havent chosen a thread to run yet and it deadlocks <damo22>and if i enable interrupts on APs at that point, kernel_stack = 0 and APs crash <damo22>possibly also because there is no thread yet <damo22>youpi1: ive got it to a point where it doesnt crash anymore, and all APs are enumerated and started, now everything is sitting in machine_idle <damo22>something is fishy with TLB shootdowns, it seems fire them correctly, but then doesnt continue <damo22>f72eb9c2 (HEAD -> feat-smp2-hangs <damo22>with smp 6 it sits idle just before starting acpi and has 32% usage in the host <damo22>it seems one of the cores is sitting in HLT with interrupts disabled <damo22>EIP=c1028491 EFL=00000046 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=1 <damo22>does iret instruction enable interrupts? <damo22>Pellescours: my branch no longer crashes, it just hangs at boot with low cpu usage and you can enter kdb with any number of cores running <damo22>in that huge comment in pmap.c regarding the TLB shootdown code, it seems there is a requirement for 2 different spl levels <damo22>maybe we need to set one of them to spl0 <damo22>i think they are both the same currently <youpi>iret enables interrupts if the eflags on the stack has the interrupts enabled <youpi>normally APs would stay in machine_idle with interrupts enabled <youpi>so that the scheduler can check from times to times whether there's a thread to run <youpi>or the BSP send an IPI to trigger a schedule <softwar>is there any good web browser that's graphical for debian hurd besides firefox? <damo22>youpi: it seems that the cpu that was targeted with the TLB shootdown ended up in machine_idle with interrupts off <damo22>and the thread it was running died <damo22>can i call "sti" just before running pmap_update_interrupt? <youpi>splx is suposed to be doing that <damo22>or perhaps call sti right before the machine_idle loop? <youpi>again, there is no neezd for that <damo22>but its the same interrupt code, how can it be only the TLB interrupt causing problems? <youpi>does the timer interrupt actually work? <youpi>"thinking" is not enough when dealing with bugs <damo22>i remember printing in the timer interrupt and it printed for all cores <damo22>i havent tested it again on my latest branch <damo22>there is lapic timer per cpu now <damo22>in between calling the pmap_update_interrupt and the lapic_eoi, do i need to call splx_cli and cli?