IRC channel logs
2024-12-07.log
back to list of logs
<shmorg83>load average indicates the number of processes that are in the runnable state ; however it is not guaranteed to be the same number, for the same load, between different kernel versions or between different operating systems <damo22>no its just wrong on idle machine <damo22>it seems to show 1.0 on single processor gnumach most of the time <damo22>shmorg83: maybe gnumach counts itself as a runnable process and so the load is always approx the number of cores <damo22>youpi: i am getting APIC 0x280 (error status register) == 0x8 during startup IPI sequence on AMD : RcvAcceptError: receive accept error. Read-write. Reset: 0. This bit indicates that a message received by this APIC was not accepted by this or any other APIC. <damo22>youpi: cpu_number() is called on an AP before cpu_setup() where init_percpu() is called, so the cpu number is wrong <damo22>i think this commit broke smp for real hw bf1cd17a4 <damo22>but i still cant seem to fix mine <damo22>i fixed the ESR error, but something is broken with cpu_number <janneke>damo22: yeah, i didn't bisect yet, but on a year old chiildhurd, the load nicely shows <janneke>offloading@childhurd ~$ cat /proc/loadavg <damo22>early percpu isnt working properly i think <janneke>ah right, that makes sense, the smp work (which is great!) <damo22>im trying to debug gnumach on AMD bare metal <janneke>guix' offload feature has a default overload-threshold of 0.8 so we'd have to override that otherwise: no offloading :) <damo22>i fixed a bug for smp but i cant seem to fix this cpu_number bug <damo22>its because someone changed the code to use percpu very early <damo22>janneke: can you take a look at i386/i386/cpuboot.S ? <damo22>when cpu_ap_main is called from asm, it immediately tries to call cpu_number() and i think its getting 0 when it should be 1 <damo22>yep i put an assert(cpu > 0); in there and it fails on real hw but works in qemu <damo22> addl $percpu_array - KERNELBASE, %eax <damo22> movl %ecx, (PERCPU_CPU_ID + KERNELBASE)(%eax) <damo22>why is it subtracting and then adding KERNELBASE? <janneke>indeed, looks weird; a comment might have explained it <damo22>youpi: should i submit a patch that exposes a bug in smp on real hardware with an assert? <youpi>does it happen a lot during execution? <damo22>its reproducible and stops booting smp on my AMD hw <youpi>I mean does it happen many times during a single boot <youpi>if not, you can leave just a warning <damo22>it hangs anyway so i think an assert would be good <youpi>would the assertion fail on unaffect systems? <youpi>putting an assert that breaks other people's boxes is a problem <damo22>if the assert fails, it cant possibly boot an smp system correctly <damo22>but without the assert its a hard to track down bug <janneke>damo22: thanks for looking into this! <damo22>youpi: i think you added some code for early percpu area access to cpuboot.S, and im not sure why im getting a zero cpu_number for an AP <damo22>but not on qemu, only on an AMD cpu <damo22>the assert i just mailed in fails on AMD <youpi>that assert makes full sense, thanks :) <damo22>youpi: can you please explain i386/i386/cpuboot.S:172 , i dont understand why we subtract KERNELBASE and then add it again in the next line, i thought the GDT accounts for the KERNELBASE offset... <etno>Usually adding and subtracting a base is useful if there is a division in the middle, to compute an offset. <youpi>(it doesn't really hurt since that's done at compile-time) <youpi>the segments are already reloaded before that code <youpi>so indeed KERNELBASE is supposed to be done by the movl <youpi>and it probably shouldn't be added <damo22>- movl %ecx, (PERCPU_CPU_ID + KERNELBASE)(%eax) <damo22>+ movl %ecx, (PERCPU_CPU_ID)(%eax) <damo22>hmm but PERCPU_GS selector doesnt have -KERNELBASE baked in <damo22>so when it tries to access it, it will break? <youpi>damo22: but that movl doesn't use gs: <youpi>ah wait, yes we need the +KERNELBASE <youpi>precisely because the segmentation adds -KERNELBASE :) <youpi>because in the linear address space, the kernel is at low addresses <azert>damo22: are you using x2APIC on your amd processor? <azert>could it be that he 8-bit APIC IDs has been disabled on you cpu model for whatever reason? Then you’d need to use the 32-bit x2APIC IDs <damo22>azert: my AMD processor does not have x2apic feature <damo22>also, we are not using x2apic, but switching it to xapic <damo22>yes i think i understand why now: the subtraction is because we want to amend the segmentation in the gs segment, the addition is because the current segmentation already adds -KERNELBASE, and then the subtracted version is put into the GDT <damo22>can we rewrite it like this to make it more readable? <damo22> movl %ecx, (PERCPU_CPU_ID)(%eax) <damo22> /* Set up temporary percpu descriptor */