IRC channel logs
2022-10-14.log
back to list of logs
<damo22>Hurd server bootstrap: ext2fs[part:2:device:sd0] exec startup proc auth. <damo22>{cpu0} ../kern/slab.c:1020: kmem_cache_free_to_slab: Assertion `(unsigned long)b <damo22>uf >= (unsigned long)slab->addr' failed.Debugger invoked: assertion failure <damo22>Kernel Breakpoint trap, eip 0xc1026224 <damo22>Pellescours: i protected CPU_NUMBER but i need to also protect cpu_number() from interrupts <damo22>when i do this, i get a TLB shootdown trigger <damo22>but AP does not call the interrupt vector <damo22>even though its IRR is set to 251 <damo22>strange, i cannot replicate the IRR=251 <damo22>Pellescours: do you have any ideas about the IPI failing to cause an interrupt <damo22>unless the interrupt is happening and we are not seeing it <damo22>i just pushed a commit that will get to the TLB message <Pellescours>IOAPIC is here to redirect the interrupts to ALL cpus, without IOAPIC, the interrupts will always be redirected to BSP. That’s what I understood. <Pellescours>So if IPI only involve the LocalAPICs, that should work even without IOAPIC <damo22>no i think FIXED lapic interrupts can be directed to a destination apic_id <Pellescours>if it’s lapic interrupts it use lapic only, not ioapic. So in this case, that should be fine in our case. <damo22>maybe we need to set the LAPIC_ENABLE flag on the APs? <Pellescours>I don’t know, In the linux book, it says «During system bootstrap, moreover, all CPUs <Pellescours>execute the setup_local_APIC() function, which takes care of initializing the local <Pellescours>damo22: I checked in linux code and they do that, then set the LAPIC_ENABLE flag on the APs (in the setup_local_APIC function) <damo22>its wierd, the spurious register is not being set on the APs <damo22>its at the same address on all cpus <Pellescours>so I don’t know if it’s correct that APs use the same ptr <damo22>when you dereference a pointer to the lapic, it reads from the current cpu <damo22>you cant write to the lapic on a different cpu from bsp <Pellescours>in darwin they do the `(volatile uint32_t*) *ptr = value` <damo22>thats pretty much what we are doing <damo22>SPIV 0x000000ff APIC disabled, focus=off, spurious vec 255 <damo22>youpi: is it possible the interrupt flag sets the spurious vector of the APs back to disabled when the interrupts are not routed through the ioapic? <damo22>or sending an AP an IPI without properly configuring lapic could cause it to turn off the lapic? <Pellescours>damo22: in linux the procedure to enable lapic contain more step than just enable SPIV, they set DFR, LDR and TPR, then they set the taskpri <Pellescours>damo22: I tried to enable apic, the lapic of the second cpu stay disabled <damo22>do we need to use cpuid to populate the apic_id.r register <damo22>before(1): lapic=0xf9693000 spiv=0xff <damo22>after (1): lapic=0xf9693000 spiv=0x1ff <damo22>but when i go into qemu to check the SPIV its 0xff <Pellescours>looks like the value you update is not reflected into the cpu <damo22>hmm now after 60 seconds it worked <damo22>SPIV 0x000001ff APIC enabled, focus=off, spurious vec 255 <youpi>damo22: the interrupt flag only changes the local processor behavior <youpi>it doesn't touch anything else in the machine <damo22>Sending IPI(1) to call TLB shootdown...done <damo22>how do you set up the lapic to tell cpus to accept interrupts themselves <damo22>does cli only affect a local processor not all processors? <Pellescours>for what I understood, yes cli only affect the current cpu. you need locks if you want to prevent multiple cpus accessing the same ressource <damo22>if a cpu is interrupted and you call cli in the interrupt handler, that will stop further interrupts from occurring in the interrupt. When are interrupts reenabled? <youpi>rather pushf/popf, so it's nestable <damo22>so i think the problem is something called cli on the AP and never reenabled interrupts <Pellescours>this cli happen before or after the cpu_slave_main()? <damo22>i better push my latest commit so you can see what im doing <damo22>but its not running the function <damo22>cpu1 is getting interrupt 251 but isnt running the handler <Pellescours>I though that too, what other reason would it make the handler not being call? I see that or interrupt being disabled. But popf will enable the interrupt and you do it at every cpu_number call <damo22>popf wont enable the interrupt if its previously disabled <Pellescours>I can see an sti before the slave_main() so if slave_main() is called, interrupt should be enabled at that point <damo22>if the IDT was misconfigured i think it would crash the processor or trap <Pellescours>damo22: you can check if intr are enabled with cpu_intr_enabled <Pellescours>If you print the value just before you expect to get the IPI <damo22>i cant do that easily because it sits in idle waiting for thread, ipi happens on other processor <damo22>i need to trace through the code all the way to idle loop and see if cli is called without preserving flags <damo22>* Block all interrupts for choose_thread <damo22>splhigh() is called without restoring flags! <Pellescours>this or asm code that handle the interrupt that does not handle it correctly in case the interrupt happens in cpu != 0? <Pellescours>in model_deps.c line 563 int_stack_top[0] is set to `int_stack_base[0] + KERNEL_STACK_SIZE - 4;` but in the interrupt_stack_alloc there is not the -4 <Pellescours>can the others cpu interrupt stack being not aligned correctly or something like that <damo22>i manually aligned it in cpuboot.S <Pellescours>it goes from top to base or from base to top for the stack? <damo22>it starts at high address and fills lower i think <youpi>on x86 it's growing down yes <Pellescours>int_from_intstack compare esp to int_stack_base, shouldn’t it do instead compare it to the int_stack_base of the current cpu? <Pellescours>Because if I understand correctly now, a cpu intr stack can override the stack from another cpu if it overflow <damo22>int_stack_base is not defined correctly for APs <Pellescours>Effectively int_stack_base is never initialized for APs <damo22>we dont need interrupt_stack i think <damo22>i think we should change cswitch.S to use int_stack_base instead of interrupt_stack <Pellescours>but interrupt_stack and int_stack_base represent the same thing, so yeah imo we should replace interrupt_stack by int_stack_base <damo22>ok fixed, but the cpu is still crashing on irq 251 <damo22>but considering the ISR is set, probably 1 <damo22>as it happened just as the interrupt happened <damo22>unless the call needs to be aligned <Pellescours>I wanted to try your code with smp 2, and I missed my command so ran it with smp 1. Here it hang at the middle of rumpdisk, but the cpu did not crash, it just seems to be deadlock or something like that <damo22>yes probably because the ioapic interrupts were enabled <youpi>dx is overwritten yes but that's fine since that was only used to set the other segment registers