IRC channel logs

2022-10-14.log

back to list of logs

<damo22>Hurd server bootstrap: ext2fs[part:2:device:sd0] exec startup proc auth.
<damo22>{cpu0} ../kern/slab.c:1020: kmem_cache_free_to_slab: Assertion `(unsigned long)b
<damo22>uf >= (unsigned long)slab->addr' failed.Debugger invoked: assertion failure
<damo22>Kernel Breakpoint trap, eip 0xc1026224
<damo22>with -smp 1
<damo22>Pellescours: i protected CPU_NUMBER but i need to also protect cpu_number() from interrupts
<damo22>when i do this, i get a TLB shootdown trigger
<damo22>but AP does not call the interrupt vector
<damo22>even though its IRR is set to 251
<damo22>so we need to fix this too
<Pellescours>okay
<damo22>strange, i cannot replicate the IRR=251
<damo22>Pellescours: do you have any ideas about the IPI failing to cause an interrupt
<damo22>maybe we need to enable ioapic?
<damo22>unless the interrupt is happening and we are not seeing it
<damo22>i just pushed a commit that will get to the TLB message
<Pellescours>IOAPIC is here to redirect the interrupts to ALL cpus, without IOAPIC, the interrupts will always be redirected to BSP. That’s what I understood.
<Pellescours>So if IPI only involve the LocalAPICs, that should work even without IOAPIC
<damo22>no i think FIXED lapic interrupts can be directed to a destination apic_id
<Pellescours>if it’s lapic interrupts it use lapic only, not ioapic. So in this case, that should be fine in our case.
<damo22>maybe we need to set the LAPIC_ENABLE flag on the APs?
<Pellescours>I don’t know, In the linux book, it says «During system bootstrap, moreover, all CPUs
<Pellescours>execute the setup_local_APIC() function, which takes care of initializing the local
<Pellescours>APICs»
<Pellescours>damo22: I checked in linux code and they do that, then set the LAPIC_ENABLE flag on the APs (in the setup_local_APIC function)
<damo22>its wierd, the spurious register is not being set on the APs
<damo22>even though i toggle it
<damo22>maybe its invalid
<damo22>and not letting me set it
<Pellescours>Is the lapic struct correctly mapped even for APs?
<damo22>its at the same address on all cpus
<Pellescours>because we get the lapic_pointer on the BSP
<Pellescours>so I don’t know if it’s correct that APs use the same ptr
<damo22>when you dereference a pointer to the lapic, it reads from the current cpu
<damo22>you cant write to the lapic on a different cpu from bsp
<damo22>there is no mechanism for that
<Pellescours>okay, in linux to enable the lapic, they use the apic_write which is implemented with this https://github.com/torvalds/linux/blob/master/arch/x86/include/asm/apic.h#L101
<Pellescours>in darwin they do the `(volatile uint32_t*) *ptr = value`
<damo22>thats pretty much what we are doing
<damo22>SPIV 0x000000ff APIC disabled, focus=off, spurious vec 255
<damo22>it wont toggle on lol
<Pellescours>damo22: you are talking about an ap cpu?
<damo22>yes
<damo22>something reset it back to 0xff
<damo22>youpi: is it possible the interrupt flag sets the spurious vector of the APs back to disabled when the interrupts are not routed through the ioapic?
<damo22>or sending an AP an IPI without properly configuring lapic could cause it to turn off the lapic?
<Pellescours>damo22: in linux the procedure to enable lapic contain more step than just enable SPIV, they set DFR, LDR and TPR, then they set the taskpri
<Pellescours>damo22: I tried to enable apic, the lapic of the second cpu stay disabled
<damo22>do we need to use cpuid to populate the apic_id.r register
<damo22>before(1): lapic=0xf9693000 spiv=0xff
<damo22>after (1): lapic=0xf9693000 spiv=0x1ff
<damo22>but when i go into qemu to check the SPIV its 0xff
<damo22>info lapic 1
<Pellescours>looks like the value you update is not reflected into the cpu
<damo22>hmm now after 60 seconds it worked
<damo22>waiting
<damo22>SPIV 0x000001ff APIC enabled, focus=off, spurious vec 255
<damo22>...
<damo22>IRR 251(level)
<youpi>damo22: the interrupt flag only changes the local processor behavior
<youpi>it doesn't touch anything else in the machine
<damo22>Sending IPI(1) to call TLB shootdown...done
<damo22>hang
<damo22>and the IRR on AP 1 is 251
<damo22>how do you set up the lapic to tell cpus to accept interrupts themselves
<damo22>does cli only affect a local processor not all processors?
<Pellescours>for what I understood, yes cli only affect the current cpu. you need locks if you want to prevent multiple cpus accessing the same ressource
<damo22>if a cpu is interrupted and you call cli in the interrupt handler, that will stop further interrupts from occurring in the interrupt. When are interrupts reenabled?
<Pellescours>you need to call sti I think
<youpi>rather pushf/popf, so it's nestable
<damo22>if you use "cpu 1" in qemu
<damo22>then info lapic works
<damo22>but info lapic 1 shows bad info
<damo22>......
<damo22>so i think the problem is something called cli on the AP and never reenabled interrupts
<damo22>but i cant find it
<Pellescours>this cli happen before or after the cpu_slave_main()?
<damo22>i better push my latest commit so you can see what im doing
<Pellescours>yup
<damo22>ok
<damo22>(qemu) cpu 1
<damo22>(qemu) info lapic
<damo22>shows its getting interrupt 251
<damo22>but its not running the function
<Pellescours> https://linux-kernel-labs.github.io/refs/pull/183/merge/lectures/smp.html#disabling-preemption-interrupts in local_irq_restore what mean "cc"?
<damo22>cpu1 is getting interrupt 251 but isnt running the handler
<damo22>its stuck
<Pellescours>idt missconfigured?
<damo22>i thought i fixed it
<Pellescours>I though that too, what other reason would it make the handler not being call? I see that or interrupt being disabled. But popf will enable the interrupt and you do it at every cpu_number call
<damo22>popf wont enable the interrupt if its previously disabled
<damo22>before the pushf
<Pellescours>right
<Pellescours>I can see an sti before the slave_main() so if slave_main() is called, interrupt should be enabled at that point
<damo22>if the IDT was misconfigured i think it would crash the processor or trap
<Pellescours>damo22: you can check if intr are enabled with cpu_intr_enabled
<Pellescours>If you print the value just before you expect to get the IPI
<damo22>i cant do that easily because it sits in idle waiting for thread, ipi happens on other processor
<damo22>i need to trace through the code all the way to idle loop and see if cli is called without preserving flags
<damo22>* Block all interrupts for choose_thread
<damo22>splhigh() is called without restoring flags!
<damo22>ok cpu crashed
<damo22>when i enabled interrupts
<damo22>IDT ?
<Pellescours>this or asm code that handle the interrupt that does not handle it correctly in case the interrupt happens in cpu != 0?
<damo22>?
<damo22>maybe
<Pellescours>in model_deps.c line 563 int_stack_top[0] is set to `int_stack_base[0] + KERNEL_STACK_SIZE - 4;` but in the interrupt_stack_alloc there is not the -4
<damo22>ok
<Pellescours>can the others cpu interrupt stack being not aligned correctly or something like that
<damo22>i manually aligned it in cpuboot.S
<Pellescours>it goes from top to base or from base to top for the stack?
<damo22>it starts at high address and fills lower i think
<Pellescours>so from top to base
<damo22>i think so
<youpi>on x86 it's growing down yes
<Pellescours>int_from_intstack compare esp to int_stack_base, shouldn’t it do instead compare it to the int_stack_base of the current cpu?
<Pellescours>Because if I understand correctly now, a cpu intr stack can override the stack from another cpu if it overflow
<damo22>int_stack_base is not defined correctly for APs
<damo22>interrupt_stack is
<Pellescours>Effectively int_stack_base is never initialized for APs
<damo22>we dont need interrupt_stack i think
<damo22>oh its also used in cswitch
<damo22>i think we should change cswitch.S to use int_stack_base instead of interrupt_stack
<damo22>and initialise int_stack_base
<damo22>for APs
<Pellescours>but interrupt_stack and int_stack_base represent the same thing, so yeah imo we should replace interrupt_stack by int_stack_base
<damo22>ok fixed, but the cpu is still crashing on irq 251
<damo22>ISR 251(level)
<damo22>IRR (none)
<Pellescours>is it cpu 1 that crash?
<damo22>hard to tell
<damo22>the vm stops
<damo22>but considering the ISR is set, probably 1
<damo22>as it happened just as the interrupt happened
<damo22>unless the call needs to be aligned
<Pellescours>I wanted to try your code with smp 2, and I missed my command so ran it with smp 1. Here it hang at the middle of rumpdisk, but the cpu did not crash, it just seems to be deadlock or something like that
<damo22>yes probably because the ioapic interrupts were enabled
<damo22>but not switched from the pic
<damo22>im tired
<damo22>goodnight!
<Pellescours>goodnight
<Pellescours>youpi: in locore.S:676 (http://git.zammit.org/gnumach-sv.git/tree/i386/i386/locore.S?h=feat-smp&id=332ae598d67476403ed7ecb2b897476aec6f98e9#n676) we save %ss to %dx, but then we call CPU_NUMBER(%edx). So %ds is overwriten and %ss is loose right?
<Pellescours>So %dx is overwriten*
<youpi>dx is overwritten yes but that's fine since that was only used to set the other segment registers