IRC channel logs
2023-01-26.log
back to list of logs
<damo22>Pellescours: feat-smp2-works is a branch that has AP with no periodic interrupts, so likely its sitting in a loop, thats why it mostly works <damo22>im trying to fix feat-smp2-fault with apic and smp <damo22>this one has a timer enabled on all cores <damo22>-smp 1 or -smp 2 are the only things i am testing because more than 2 is more difficult to fix the AP bringup <damo22>yes when i compile gnumach for smp i do --enable-kdb --enable-ncpus=8 --enable-apic <damo22>the problem seems to be now not in the intstack but on a different stack, because its general protection faulting with the stack pointer not on an intstack <Pellescours>with smp2faults apic and 2 cpus i have a protection exception when starting acpi task <damo22>general protection faults are difficult to track down because there are many reasons why one can occur <Pellescours>is this instruction the one ine CPU_NUMBER ? (movl lapic,%ebp)? <damo22>not sure, i am going to try disabling interrupts during an interrupt <damo22>start acpi: acpi Kernel General protection trap, eip 0xc103160c <damo22>kernel: General protection (13), code=0 <damo22>Stopped at all_intrs+0x8: movl 0xc11e070c,%ebx <damo22>that is inside CPU_NUMBER i think yes <youpi1>you can at least use x in kdb to read it <damo22>not sure how that instruction failed, with no interrupts enabled <youpi1>that being said, I wouldn't be surprised if the faulty instruction was the instruction just before that <damo22>c103160c: 8b 1d 0c 07 1e c1 mov 0xc11e070c,%ebx <damo22>the only way that can fail is if esp = 0 <damo22>Kernel Page fault trap, eip 0xc1061dc5 <youpi1>is the memory location of esp writable? <damo22>no bootstrap code loaded with the kernle <damo22>Thread 2 (Thread 1.2 (CPU#1 [halted ])): <damo22>#0 0xc10283b1 in machine_idle (cpu=1) at ../i386/i386at/model_dep.c:236 <damo22>#1 0xc100c019 in idle_thread_continue () at ../kern/sched_prim.c:1657 <damo22>Thread 1 (Thread 1.1 (CPU#0 [running])): <damo22>#0 kdcnmaygetc () at ../i386/i386at/kd.c:2999 <damo22>is there anything useful i can probe at this point? <damo22>it seems to only fault when you give it something to run <damo22>i mean if you run the kernel in gdb by itself, it does not fault <damo22>it sets up AP and is fine, but complains it has nothing to run <damo22>i should put infinite loop just before the kernel tries to run something and check that the timer interrupts are being serviced by all cores? <damo22>should hardclock ignore clock ticks coming from APs? <youpi1>only one CPU should advance the time <youpi1>but stats for processes etc. need to be updated <youpi1>(notably counting what process the tick accounts for) <youpi1>clock_interrupt already advances the time only on the master <youpi1>so you shouldn't need to special-case more than what is already there <damo22>it seems all timer interrupts are being serviced by cpu0 <damo22>oh i had it in a loop sorry, thats not right <damo22>maybe linux_timer_intr() needs to ignore cpu_number != 0 <damo22>Sending IPI(0) to call TLB shootdown...done <damo22>youpi1: i noticed the 30 second timeout lasted only a split second, could it be the timer is too short? <damo22>causing a general fault with stack overflow of too many timer interrupts <damo22>Sending IPI(0) to call TLB shootdown...done <damo22>both cpus in HLT with interrupts enabled <damo22>pmap_update_interrupt on cpu1Sending IPI(0) to call TLB shootdown...done <damo22>hmm cpu1 sent an IPI to cpu0 but cpu1 serviced it <damo22>start acpi: Sending IPI(0 -> 1) to call TLB shootdown...done <damo22>acpi Kernel General protection trap, eip 0xc10315db <damo22>kernel: General protection (13), code=0 <damo22>Stopped at all_intrs+0x7: movl 0xc11e070c,%ebx <damo22>no memory is assigned to address 00040004 <damo22>but there is still a general fault <damo22>i am trying to write a compact CPU_NUMBER that reads the kernel id <Pellescours>I’m triggering the general protection trap with smp enabled and 1 cpu <Pellescours>damo22: I tried something, I replaced the CPU_NUMBER macro to globaly set 0 to the ebx register, and boot with 1 cpu. And it works without page fault. It’s really seems to be the "movl lapic, %ebx" that makes the protection fault <Pellescours>it’s definitively this instruction that trigger the protection fault <Pellescours>damo22: can it be because the 1st CPU_NUMBER is called before the switch to kernel segments, so some kernel variables are not accessible yet? <Pellescours>I think that’s the cause, acd3fa8f8ba9c093c426f83488b338088035f117 introduced a CPU_NUMBER call before the stack switch <damo22>so how do we make an ASM macro to read cpu number? <damo22>maybe we can use the hardcoded address of the lapic? <damo22>or maybe we can make an early stack switch function that only gets used when cpu number will fail <damo22>like before the switch to kernel segments it can use a hardcoded cpu number <damo22>where can i store a flag that can be read even when not on kernel segs? <damo22>i need to mark when the cpu bringup is done <youpi>isn't it possible to just disable interrupts until the bringup is done? <damo22>but the problem is cpu_number cannot read lapic <damo22>because the first CPU_NUMBER is called before switch to kernel segs <youpi>the one in all_intrs is after <youpi>ah, the additional before the int_from_stack check <damo22>(20:00:23) Pellescours: I think that’s the cause, acd3fa8f8ba9c093c426f83488b338088035f117 introduced a CPU_NUMBER call before the stack switch <youpi>possibly setting the registers could be moved before that <youpi>Mmm, actually, can't one just use the cs segment ? <youpi>it doesn't allow writing, but it should be fine for reading <youpi>which is already set by the interrupt mechanism <youpi>or whatever variable you want to read <damo22>../i386/i386/cswitch.S:42: Error: junk `:lapic' after expression <youpi>see e.g. i386/i386/debug_trace.S: movl %ss:EXT(debug_trace_pos),%eax <youpi>actually you could also just use ss: like below the cmpl instruction <youpi>but cs: inside CPU_NUMBER should be safest <youpi>that macros saves esi, edi, etc. that's useless <damo22>err, how do you iterate through a list of kernel ids and choose the one that matches apic id with just one reg? <damo22>im writing that function now in asm <youpi>? I don't see CPU_NUMBER doing that <youpi>if something like that is needed, I'd say just prepare a table <youpi>that already gives you the result directly <youpi>spending 256 byte on that is cheap <damo22>for now i will just assume the apic id == the kernel id <damo22>Stopped at all_intrs+0xe: movl 0x20(%ebx),%eax <damo22>do i need cs here too?: movl %cs:APIC_ID(%ebx), %eax <damo22>Sending IPI(1 -> 0) to call TLB shootd <damo22>rumpdisk pmap_update_interrupt on cpu1 <damo22>no more general protection faults <damo22>it seems like the IPIs are being delivered to the wrong cpu <damo22>or there is some kind of cpu mapping problem <damo22>-smp 1 boots with that %cs thing <softwar>besides arch and debian are there any other hurd distros? <softwar>I can't find it. Something with my searches not working. I am using debian <youpi1>there's a big hurd icon in the middle <youpi1>well, I don't know, ask a guix channel :) <softwar>yeah debian hurd is good enough. managed to get something installed, thanks anyway