IRC channel logs

<azert>damo22: if the code does already the same, then I think it would be more useful to add a comment then change it

<azert>if it’s already xapic, I don’t see why you get the apic id == 0? What is the bug?

<azert>Is it a proble

<damo22>azert: the bug is, cpu_number() calls percpu area to locate its cpu_id but seems something is broken with the gs segmentation

<azert>m with memory? Does it discards the value you are writing there in the per cpu region?

<damo22>i dont know yet

<damo22>it might just be mapped incorrectly so it reads zero

<youpi>damo22: since it's not on a critical path, I'm fine with making it more readable

<damo22>youpi: do we need to reload the gdt if we change values in the entries?

<youpi>yes

<damo22>im trying to figure out why im getting zero cpu_number

<youpi>you mean the entries in the segment descriptor, right?

<damo22>yes

<youpi>or do you mean the entries pointed by the segment?

<damo22>i mean if you patch the value in apboot_percpu_high for example

<youpi>then you have to reload gdt, yes

<damo22>i dont think we are doing this

<damo22>why do we nee percpu so early?

<damo22>need*

<youpi>there will be more and more code using percpu data

<youpi>not having it early means having to make sure that all code leading to setting it up does not use percpu data at all

<damo22>ok

<damo22>well i have execution on AP but it still thinks its cpu number is 0

<damo22>CS =0008 40000000 ffffffff 00c09b00 DPL=0 CS32 [-RA]

<damo22>GS =0068 010a0260 ffffffff 00c09300 DPL=0 DS [-WA]

<damo22>something tells me that should be 410a0260

<damo22>or even 400a0260

<damo22>youpi: i thought that before the final gdt is set up, the AP is in a weird segmentation, so how is it supposed to look up the cpu number in C code?

<damo22>i think we should assume nothing will need percpu area until after cpu_setup() is called on an AP, theres nothing it needs that early

<damo22>it only needs its own cpu number

<AlmuHS>damo22: in AP starting there are two GDT: in apboot.S we load a temporary GDT to jump to protected model, and after this, once in C code, we load the final GDT in cpu_setup()

<AlmuHS>even, if i remember well, there are a previous GDT in apboot, used only to jump to 32-bit

<damo22>AlmuHS: yes, i realise, but youpi has added complexity to the boot asm for APs such that it configures the GS segment for early percpu access. I think this is unnecessary

<damo22>also, i am getting cpu number == 0 on APs currently with smp on AMD cpu

<damo22>so something is definitely broken

<AlmuHS>the first gdt which is loaded before jump to 32-bit is gdt_tmp, the second (temporary but already in 32-bit) is apboot_gdt, and the final is in cpu_setup()

<damo22>i tried removing the early percpu configuration but now it hangs at paging setup

<AlmuHS>then, maybe cpu_number() is being executed between these jumps

<AlmuHS>we had a strange issue with paging flags

<damo22>yes?

<AlmuHS>maybe it's necessary to change some paging flags?

<AlmuHS>you found this problem. Check old commits

<damo22>but why would it be any different to master

<damo22>it already works

<AlmuHS>I'm not sure

<damo22>my preference is to move out complexity from asm

<damo22>its too error prone

<AlmuHS>other remember: the final gdt, idt... etc is different in BSP and AP. You had to create specific ap_gdt_init() and similar

<damo22>if it can be done in C code instead, we should do it there

<AlmuHS>we only can execute C code after jump to protected mode

<damo22>we already solved these problems almu

<AlmuHS>then try it

<damo22>the bug arrived when asm was heavily modified

<AlmuHS>other thing that simplify the AP booting could be remove the gdt_tmp and jump to protected mode directly using apboot_gdt

<damo22>yeah, not really possible unless you can hotpatch the code segment

<AlmuHS>it's difficult. youpi explained me many years ago how i could jump without gdt_tmp, but I'm not remember well

<damo22>since you dont know the address to jump to until runtime

<damo22>so we patch the realmode jump offsets

<damo22>again, we already solved this part

<damo22>we are seeing a new bug because the asm is now more complex

<AlmuHS>i think that we have to map the apboot_gdt. But the AP boots in 16-bit with limited segmentation, so it's difficult

<AlmuHS>oj

<AlmuHS>ok

<AlmuHS>but i don't know about percpu yet

<damo22>percpu is an array that has a gdt entry exclusively for it

<AlmuHS>but it could be a synchronizing problem

<damo22>no the array is flat in memory but only one cpu writes to each per cpu element

<AlmuHS>maybe the percpu is not ready when you call to cpu_number() in this step?

<damo22>yes, exactly, there was asm code added to make this possible

<damo22>but i think something is wrong with it

<damo22>but its hard to debug

<AlmuHS>you can try to force the other cpu_number(), which not use percpu, to be sure that they problem is from percpu

<AlmuHS>you wrote a alternative version of this function which not use percpu. Try to call that instead normal cpu_number()

<damo22>ok

<AlmuHS>if the code works with alternative cpu_number(), then we can be sure that the problem is from percpu. If the alternative fails too, then the problem is in other site

<AlmuHS>i go to sleep. Good luck

<damo22>night

<damo22>ok i figured it out, you cant send a PHYSICAL destination IPI to an APIC id > 0xf

<damo22>so it was restarting cpu 0

<damo22>hmm looks like you can only support up to 8 cpus to send unique ipips

<damo22>IPIs

<damo22>with logical destination

<damo22>i dont know how to set up the lapic on APs before i get execution on them

<damo22>seems like a chicken egg problem

<damo22>Upon receiving an IPI message that was sent using logical destination mode, a local APIC compares the MDA in the message with the values in its LDR and DFR to determine if it should accept and handle the IPI

<damo22>but how do you set the LDR and DFR on APs before they start?

<damo22>so you can interrupt them and wake them up

<damo22>i think we need to change the code to send a broadcast IPI

<damo22>and have them all start in parallel

<damo22>theres only 8 bits in the mask that you can use for identifying a cpu

<damo22>so you cant send unique IPIs to more than 8 cpus if they have APIC ids > 0xf

<damo22>youpi: it is actually impossible to uniquely address more than 8 processors with IPIs, (with the exception of some x86 hardware that allows 16 processors)

<damo22>with APIC

<damo22>i think the only way to start up a cpu with more than 8 cores is to do them in parallel

<damo22>so if we are going to fix smp, we may as well invest time in making it work on > 8 processors

<azert>damo22: out of curiosity, how many cores your cpu has?

<librehawk>x64 WAS WHAT HURD USERS WANTED, You Delivered! Mob Love From Kenya, Lodwar.

<librehawk>The Next #Challenge for The Hurd Community Is Guides & Tooling for Buildings Hurd Native Drivers and Applications...

<librehawk>I Love You Guys

<damo22>azert: My cpu only has 8 cores, but i dont want to write code that only works on 8

<damo22>i have a patch for parallel smp init, but the synchronisation is broken, and it hangs sometimes and boots other times

<damo22>I have a branch of gnumach that almost has parallel smp init working for any number of cores https://git.zammit.org/gnumach-sv.git/log/?h=fix-smp-amd

<damo22>i need to review all functions in cpu_setup() to see if any are trampling on each other

<AlmuHS>xAPIC allows a max of 256 cpus. Even APIC ID has 8-bit (2^8 = 256)

<AlmuHS>so must not be a problem using more than 8 cpu

<AlmuHS>in Qemu I got to boot the SMP kernel with 16 cpus

<AlmuHS>with the scheduler patch, it got to boot and the pthread test showed that all cpus was working

<AlmuHS>i have a program which creates 16 threads using pthread, and each of this runs a infinite loop showing its APIC ID

<AlmuHS>I tested it in Qemu using 16 cpus, and worked, showing alternative numbers in range 0-16

<AlmuHS>what is the problem? the IPI routine's address?

<janneke>damo22: oh very nice!

<gnu_srs1>Hello. Which program/script is active after: start ext2fs: Hurd server bootstrap: ext2fs[device:hd0s1] exec startup proc auth.

<gnu_srs1>I'm trying to upgrade an old image with dpkg-deb -x and boot hangs after the above :(

<gnu_srs1>Where to find info about the boot sequence of Hurd?

<janneke>gnu_srs1: starting a debian hurd, i see

<janneke>Hurd server bootstrap: ext2fs[part:1:device:wd0] exec startup proc auth.

<janneke>INIT: version 3.08 booting

<janneke>looking at the guix boot sequence, there's also: daemons/runsystem.sh

<janneke>ACTION had a writeup of this somewhere...

<janneke>hurd/startup.c is also a very early candidate

<damo22>AlmuHS: Only some x86 cpus support destination register with 8 bits of physical apic id, mostly only support 4 bits... but the apic id can be wider than 4 bits and so you cant address the lapic

<damo22>qemu doesnt have this limitation

<damo22>the way to work around it, is to use logical destination mode, but you still only have 8 unique mask bits to address 8 groups of cpus

<damo22>so i solved this in my branch using (cpu_number % 8)

<damo22>and ALL_EXCLUDING_SELF parallel startup

<gnu_srs1>janneke: tks for your hints. No luck so far :(

<gnu_srs1>repeating myself:

<gnu_srs1>(18:05:33) gnu_srs1: Hello. Which program/script is active after: start ext2fs: Hurd server bootstrap: ext2fs[device:hd0s1] exec startup proc auth.

<gnu_srs1>(18:07:25) gnu_srs1: I'm trying to upgrade an old image with dpkg-deb -x and boot hangs after the above

<gnu_srs1>(20:02:56) gnu_srs1: Where to find info about the boot sequence of Hurd?

<damo22>gnu_srs1: probably "init"

<gnu_srs1>damo22: /usr/sbin/init from sysvinit-core??

<gnu_srs1>I have linked that file to /sbin too. No progress :(

<damo22>most likely /etc/init.d/rc*

<damo22>use sysv-rc-conf to configure it

<gnu_srs1>seems like there is no debug or verbose option for that file.

<gnu_srs1>sysv-rc-conf does not exist.

<damo22>might need to install it

<damo22>youpi: do you know of any reason why parallel smp init may break if all the APs are running cpu_setup() at the same time?

<damo22>my branch sometimes boots, sometimes hangs

<youpi>iirc there were some initialization things that were using global variables

<youpi>probably with per_cpu data we can fix that

<damo22>ah right nice

<youpi>also, possibly it's difficult to control the APs with ACPI etc. if they all start at the same time

<damo22>we dont have an option

<gnu_srs1>tks. Not installed either on an updated box or the failing box. I think it is something else, missing packages or too old/buggy grub.cfg.

<damo22>since there is no way to address them individually

<damo22>it works in qemu because qemu doesnt have 4 bit limitation on destination register

<damo22>so really its emulating a xeon

<damo22>s/register/field

IRC channel logs

2024-12-08.log