IRC channel logs

2024-12-07.log

back to list of logs

<shmorg83>load average indicates the number of processes that are in the runnable state ; however it is not guaranteed to be the same number, for the same load, between different kernel versions or between different operating systems
<damo22>no its just wrong on idle machine
<damo22>it seems to show 1.0 on single processor gnumach most of the time
<damo22>shmorg83: maybe gnumach counts itself as a runnable process and so the load is always approx the number of cores
<damo22>youpi: i am getting APIC 0x280 (error status register) == 0x8 during startup IPI sequence on AMD : RcvAcceptError: receive accept error. Read-write. Reset: 0. This bit indicates that a message received by this APIC was not accepted by this or any other APIC.
<damo22>not sure why
<damo22>actually INIT ipi fails
<damo22>youpi: cpu_number() is called on an AP before cpu_setup() where init_percpu() is called, so the cpu number is wrong
<damo22>i think this commit broke smp for real hw bf1cd17a4
<damo22>but i still cant seem to fix mine
<damo22>i fixed the ESR error, but something is broken with cpu_number
<janneke>damo22: yeah, i didn't bisect yet, but on a year old chiildhurd, the load nicely shows
<janneke>offloading@childhurd ~$ cat /proc/loadavg
<janneke>0.00 0.08 0.05 1/0 0
<damo22>early percpu isnt working properly i think
<janneke>ah right, that makes sense, the smp work (which is great!)
<damo22>im trying to debug gnumach on AMD bare metal
<janneke>(y)
<janneke>guix' offload feature has a default overload-threshold of 0.8 so we'd have to override that otherwise: no offloading :)
<damo22>i fixed a bug for smp but i cant seem to fix this cpu_number bug
<damo22>AP thinks itself is cpu 0
<janneke>oh!
<damo22>its because someone changed the code to use percpu very early
<damo22>asm is hard to read
<damo22>janneke: can you take a look at i386/i386/cpuboot.S ?
<damo22>when cpu_ap_main is called from asm, it immediately tries to call cpu_number() and i think its getting 0 when it should be 1
<janneke>ACTION looks
<damo22>yep i put an assert(cpu > 0); in there and it fails on real hw but works in qemu
<janneke>ah, "lovely" :)
<damo22>this code looks fishy:
<damo22> addl $percpu_array - KERNELBASE, %eax
<damo22> /* Record our cpu number */
<damo22> movl %ecx, (PERCPU_CPU_ID + KERNELBASE)(%eax)
<damo22>why is it subtracting and then adding KERNELBASE?
<janneke>indeed, looks weird; a comment might have explained it
<damo22>youpi: should i submit a patch that exposes a bug in smp on real hardware with an assert?
<damo22>i cant seem to fix it though
<youpi>does it happen a lot during execution?
<damo22>its reproducible and stops booting smp on my AMD hw
<youpi>I mean does it happen many times during a single boot
<youpi>if not, you can leave just a warning
<damo22>it hangs anyway so i think an assert would be good
<damo22>it only happens once per PA
<damo22>AP
<youpi>would the assertion fail on unaffect systems?
<youpi>that's my point :)
<damo22>it shouldnt no
<youpi>putting an assert that breaks other people's boxes is a problem
<youpi>otherwise, sure, go ahead
<damo22>if the assert fails, it cant possibly boot an smp system correctly
<damo22>but without the assert its a hard to track down bug
<damo22>s/bug/condition
<damo22>ok
<janneke>damo22: thanks for looking into this!
<damo22>youpi: i think you added some code for early percpu area access to cpuboot.S, and im not sure why im getting a zero cpu_number for an AP
<damo22>but not on qemu, only on an AMD cpu
<damo22>the assert i just mailed in fails on AMD
<youpi>that assert makes full sense, thanks :)
<damo22>:)
<damo22>youpi: can you please explain i386/i386/cpuboot.S:172 , i dont understand why we subtract KERNELBASE and then add it again in the next line, i thought the GDT accounts for the KERNELBASE offset...
<youpi>I don't remember
<etno>Usually adding and subtracting a base is useful if there is a division in the middle, to compute an offset.
<youpi>(it doesn't really hurt since that's done at compile-time)
<youpi>the segments are already reloaded before that code
<youpi>so indeed KERNELBASE is supposed to be done by the movl
<youpi>and it probably shouldn't be added
<damo22>- movl %ecx, (PERCPU_CPU_ID + KERNELBASE)(%eax)
<damo22>+ movl %ecx, (PERCPU_CPU_ID)(%eax)
<damo22>i tried this and it broke
<damo22>hmm but PERCPU_GS selector doesnt have -KERNELBASE baked in
<damo22>so when it tries to access it, it will break?
<youpi>damo22: but that movl doesn't use gs:
<youpi>ah wait, yes we need the +KERNELBASE
<youpi>precisely because the segmentation adds -KERNELBASE :)
<youpi>because in the linear address space, the kernel is at low addresses
<azert>damo22: are you using x2APIC on your amd processor?
<azert>could it be that he 8-bit APIC IDs has been disabled on you cpu model for whatever reason? Then you’d need to use the 32-bit x2APIC IDs
<damo22>azert: my AMD processor does not have x2apic feature
<damo22>also, we are not using x2apic, but switching it to xapic
<damo22>yes i think i understand why now: the subtraction is because we want to amend the segmentation in the gs segment, the addition is because the current segmentation already adds -KERNELBASE, and then the subtracted version is put into the GDT
<damo22>can we rewrite it like this to make it more readable?
<damo22> addl $percpu_array, %eax
<damo22> /* Record our cpu number */
<damo22> movl %ecx, (PERCPU_CPU_ID)(%eax)
<damo22> /* Set up temporary percpu descriptor */
<damo22> addl $(-KERNELBASE), %eax