IRC channel logs

2024-02-05.log

back to list of logs

<damo22>so it makes sense that the INIT SIPI sequence causes the AP to load code from the reset vector, unless you change the RTC cmos setting to jump through an alt address
<damo22>that would explain why my code seems to reboot
<damo22>but gets stuck
<damo22>because i basically kexec'd coreboot on the AP
<damo22>but when i change the CMOS setting, the ESR fails with error 4 or c
<damo22>i also updated the warm reset vector at 0x467
<damo22>linux$ git grep TRAMPOLINE_PHYS_HIGH
<damo22>arch/x86/include/asm/apic.h:#define TRAMPOLINE_PHYS_HIGH 0x469
<damo22>arch/x86/kernel/smpboot.c: *((volatile unsigned short *)phys_to_virt(TRAMPOLINE_PHYS_HIGH)) = start_eip >> 4;
<damo22>linux$ git grep TRAMPOLINE_PHYS_LOW
<damo22>arch/x86/include/asm/apic.h:#define TRAMPOLINE_PHYS_LOW 0x467
<damo22>arch/x86/kernel/smpboot.c: *((volatile unsigned short *)phys_to_virt(TRAMPOLINE_PHYS_LOW)) = start_eip & 0xf;
<damo22>i took out the printfs in the critical path and got:
<damo22>Sending IPIs to APIC ID 1...
<damo22>ESR error upon INIT 0xc
<damo22>ESR error upon STARTUP 0x2c
<damo22>ESR error upon STARTUP 0x8
<damo22>hmmm startup sent illegal vector
<damo22>it seems that INIT DEASSERT is not supported with LEVEL triggered on my board.... it says so in the BKDG. that is also the old way before SIPI was supported
<damo22>woot ext2fs: part:2:device:sd0: No such device or address on AMD
<damo22>youpi: https://paste.debian.net/plain/1306345
<damo22>thats with smp
<Pellescours>invalid fpu state. Is there anyhing to do for SMP ?
<damo22>i think that happened with that recent patch for fpu
<damo22>Pellescours: that log above is the culmination of a weeks work, debugging all that on hardware :)
<youpi>damo22: woot!
<youpi>(I wouldn't be surprised if some bugs are left in the fpu-on-smp case)
<damo22>in master we are doing bad things with APIC
<damo22>overwriting read only fields
<damo22>i will clean this up and mail in fixes
<damo22>hmm i should also test smp on qemu
<anatoly>oh yeah! noice!
<damo22> ../kern/slab.c:966: kmem_cache_alloc_from_slab: Assertion `bufctl != NULL' failed.panic {cpu1}
<damo22>i get that error from qemu with smp 2
<AlmuHS>damo22: the ICR struct includes some RO fields, because it must include all register fields. But the IPI function only ask the write fields
<damo22>AlmuHS: yes, but you cannot write a 0 to some read only field
<damo22>actually the code is uninitialised
<damo22>so it could put anything in there
<AlmuHS>ok
<damo22>the best way is to read the current value and write it with the same value when you write it
<damo22>also, we forgot to use ".r" on the register write
<damo22>so it clobbered most of the block
<AlmuHS>what is .r?
<damo22>it refers to the first 32 bits of the icr only
<damo22>its part of that union
<damo22>i will mail patches shorlty
<AlmuHS>i remember that i added this as a simple padding
<damo22>yes but the padding is mapped to hardware registers
<damo22>so if you write the whole block, you set all the registers i think
<AlmuHS>you could check this reading ICR in GDB
<AlmuHS>reading the ICR status after IPI
<damo22>it could explain why the IPIs were lagging or doing something strange in smp
<damo22>the apic is not happy
<AlmuHS>in my T410, when i test SMP last year, the machine reboots after SIPI
<damo22>:D
<damo22>i may have a fix for that
<AlmuHS>but t410 is 64-bit, so it could be using x2APIC by default
<damo22>no, we set it to xAPIC
<damo22>its configurable
<damo22>i have patch for that
<AlmuHS>yeah, but i tried this machine so long months ago
<damo22>ok
<AlmuHS>if you modify icr struct, be careful with longs and padding
<AlmuHS>it must be exactly than specs, including reserved spaces
<AlmuHS>it's not necessary including an array for padding, this topic is solved in the struct with the manual padding (the fields without name)
<damo22>i did not modify the struct, i just added a name to a missing field and set it
<AlmuHS_>ok
<damo22>see mail
<damo22>AlmuHS_: ^
<AlmuHS_>in start_other_cpus(), why do you disable lapic?
<AlmuHS_>you need lapic to send IPI. it's not?
<damo22>not quite, the INIT and SIPI can still work with IOAPIC interrupts disabled
<damo22>LAPIC_ENABLE flag of lapic is kind of equivalent to masking all the ioapic individually
<damo22>so i made a separate function just to toggle that bit
<AlmuHS_>ok, the name function is not clear about that
<damo22>ok
<damo22>its good to turn it off because it prevents ioapic interrupts from getting in the way of smp startup
<AlmuHS_>yeah
<damo22>so then i turn it on after the APs are up
<AlmuHS_>makes sense
<youpi>damo22: it would be very useful to put that in comments
<youpi>because I have seen this kind of function calls getting moved here and there according to the specific issue each and everyone was having, without taking into account all situations
<youpi>so we really need comments to know *why* some call is done here and not there
<AlmuHS_>btw, could you send me the patch in which you modified the scheduler to remove the cpu pause? The patch will prevent boot stucks
<damo22>ok
<youpi>otherwise you'll probably see some random guy some years later who will move them again and break things
<AlmuHS_>**the patch which prevents
<damo22>AlmuHS_: see zam/fixes
<AlmuHS_>ok
<damo22> https://git.zammit.org/gnumach-sv.git/commit/?h=fixes&id=0fe92b6b52726bcd2976863d344117dad8d19694
<AlmuHS_>this issue is not solved yet in upstream, so i continue applying this patch to get boot successfully
<damo22>its not ideal yes
<AlmuHS_>thanks
<damo22>AlmuHS_: but perhaps try master + my patch series from the ML
<damo22>you might not need this extra patch anymore
<AlmuHS_>yes, i am pulling master after your repository
<AlmuHS_>starting from this commit
<gnu_srs>youpi: dhcpcd use a lot of entries and structs from linux/netlink.h and linux/rtnetlink.h. They are not defined in the header files in /usr/include.
<gnu_srs>They are available and used in libdde and pfinet though. Can I use the definitions in these files? In which additional library are they available, or not?
<AlmuHS_>but i want to check this patch to understand the problem
<youpi>gnu_srs: we don't expose a netlink interface, so the definitions will most probably not be useful
<gnu_srs>Used very much are: RTM_NEWADDR and RTM_DELADDR
<youpi>that can be replaced with ioctls
<youpi>SIOCSIFADDR and SIOCDIFADDR
<gnu_srs>ok, I'll take a further look.
<damo22>AlmuHS_: i have a branch with master plus my patches, its zam/fix-ioapic
<AlmuHS_>thanks, i will test it