IRC channel logs
2024-12-08.log
back to list of logs
<azert>damo22: if the code does already the same, then I think it would be more useful to add a comment then change it <azert>if it’s already xapic, I don’t see why you get the apic id == 0? What is the bug? <damo22>azert: the bug is, cpu_number() calls percpu area to locate its cpu_id but seems something is broken with the gs segmentation <azert>m with memory? Does it discards the value you are writing there in the per cpu region? <damo22>it might just be mapped incorrectly so it reads zero <youpi>damo22: since it's not on a critical path, I'm fine with making it more readable <damo22>youpi: do we need to reload the gdt if we change values in the entries? <damo22>im trying to figure out why im getting zero cpu_number <youpi>you mean the entries in the segment descriptor, right? <youpi>or do you mean the entries pointed by the segment? <damo22>i mean if you patch the value in apboot_percpu_high for example <youpi>then you have to reload gdt, yes <youpi>there will be more and more code using percpu data <youpi>not having it early means having to make sure that all code leading to setting it up does not use percpu data at all <damo22>well i have execution on AP but it still thinks its cpu number is 0 <damo22>CS =0008 40000000 ffffffff 00c09b00 DPL=0 CS32 [-RA] <damo22>GS =0068 010a0260 ffffffff 00c09300 DPL=0 DS [-WA] <damo22>something tells me that should be 410a0260 <damo22>youpi: i thought that before the final gdt is set up, the AP is in a weird segmentation, so how is it supposed to look up the cpu number in C code? <damo22>i think we should assume nothing will need percpu area until after cpu_setup() is called on an AP, theres nothing it needs that early <damo22>it only needs its own cpu number <AlmuHS>damo22: in AP starting there are two GDT: in apboot.S we load a temporary GDT to jump to protected model, and after this, once in C code, we load the final GDT in cpu_setup() <AlmuHS>even, if i remember well, there are a previous GDT in apboot, used only to jump to 32-bit <damo22>AlmuHS: yes, i realise, but youpi has added complexity to the boot asm for APs such that it configures the GS segment for early percpu access. I think this is unnecessary <damo22>also, i am getting cpu number == 0 on APs currently with smp on AMD cpu <damo22>so something is definitely broken <AlmuHS>the first gdt which is loaded before jump to 32-bit is gdt_tmp, the second (temporary but already in 32-bit) is apboot_gdt, and the final is in cpu_setup() <damo22>i tried removing the early percpu configuration but now it hangs at paging setup <AlmuHS>then, maybe cpu_number() is being executed between these jumps <AlmuHS>we had a strange issue with paging flags <AlmuHS>maybe it's necessary to change some paging flags? <AlmuHS>you found this problem. Check old commits <damo22>but why would it be any different to master <damo22>my preference is to move out complexity from asm <AlmuHS>other remember: the final gdt, idt... etc is different in BSP and AP. You had to create specific ap_gdt_init() and similar <damo22>if it can be done in C code instead, we should do it there <AlmuHS>we only can execute C code after jump to protected mode <damo22>we already solved these problems almu <damo22>the bug arrived when asm was heavily modified <AlmuHS>other thing that simplify the AP booting could be remove the gdt_tmp and jump to protected mode directly using apboot_gdt <damo22>yeah, not really possible unless you can hotpatch the code segment <AlmuHS>it's difficult. youpi explained me many years ago how i could jump without gdt_tmp, but I'm not remember well <damo22>since you dont know the address to jump to until runtime <damo22>so we patch the realmode jump offsets <damo22>again, we already solved this part <damo22>we are seeing a new bug because the asm is now more complex <AlmuHS>i think that we have to map the apboot_gdt. But the AP boots in 16-bit with limited segmentation, so it's difficult <AlmuHS>but i don't know about percpu yet <damo22>percpu is an array that has a gdt entry exclusively for it <AlmuHS>but it could be a synchronizing problem <damo22>no the array is flat in memory but only one cpu writes to each per cpu element <AlmuHS>maybe the percpu is not ready when you call to cpu_number() in this step? <damo22>yes, exactly, there was asm code added to make this possible <damo22>but i think something is wrong with it <AlmuHS>you can try to force the other cpu_number(), which not use percpu, to be sure that they problem is from percpu <AlmuHS>you wrote a alternative version of this function which not use percpu. Try to call that instead normal cpu_number() <AlmuHS>if the code works with alternative cpu_number(), then we can be sure that the problem is from percpu. If the alternative fails too, then the problem is in other site <damo22>ok i figured it out, you cant send a PHYSICAL destination IPI to an APIC id > 0xf <damo22>hmm looks like you can only support up to 8 cpus to send unique ipips <damo22>i dont know how to set up the lapic on APs before i get execution on them <damo22>seems like a chicken egg problem <damo22>Upon receiving an IPI message that was sent using logical destination mode, a local APIC compares the MDA in the message with the values in its LDR and DFR to determine if it should accept and handle the IPI <damo22>but how do you set the LDR and DFR on APs before they start? <damo22>so you can interrupt them and wake them up <damo22>i think we need to change the code to send a broadcast IPI <damo22>and have them all start in parallel <damo22>theres only 8 bits in the mask that you can use for identifying a cpu <damo22>so you cant send unique IPIs to more than 8 cpus if they have APIC ids > 0xf <damo22>youpi: it is actually impossible to uniquely address more than 8 processors with IPIs, (with the exception of some x86 hardware that allows 16 processors) <damo22>i think the only way to start up a cpu with more than 8 cores is to do them in parallel <damo22>so if we are going to fix smp, we may as well invest time in making it work on > 8 processors <azert>damo22: out of curiosity, how many cores your cpu has? <librehawk>x64 WAS WHAT HURD USERS WANTED, You Delivered! Mob Love From Kenya, Lodwar. <librehawk>The Next #Challenge for The Hurd Community Is Guides & Tooling for Buildings Hurd Native Drivers and Applications... <damo22>azert: My cpu only has 8 cores, but i dont want to write code that only works on 8 <damo22>i have a patch for parallel smp init, but the synchronisation is broken, and it hangs sometimes and boots other times <damo22>i need to review all functions in cpu_setup() to see if any are trampling on each other <AlmuHS>xAPIC allows a max of 256 cpus. Even APIC ID has 8-bit (2^8 = 256) <AlmuHS>so must not be a problem using more than 8 cpu <AlmuHS>in Qemu I got to boot the SMP kernel with 16 cpus <AlmuHS>with the scheduler patch, it got to boot and the pthread test showed that all cpus was working <AlmuHS>i have a program which creates 16 threads using pthread, and each of this runs a infinite loop showing its APIC ID <AlmuHS>I tested it in Qemu using 16 cpus, and worked, showing alternative numbers in range 0-16 <AlmuHS>what is the problem? the IPI routine's address? <gnu_srs1>Hello. Which program/script is active after: start ext2fs: Hurd server bootstrap: ext2fs[device:hd0s1] exec startup proc auth. <gnu_srs1>I'm trying to upgrade an old image with dpkg-deb -x and boot hangs after the above :( <gnu_srs1>Where to find info about the boot sequence of Hurd? <janneke>gnu_srs1: starting a debian hurd, i see <janneke>Hurd server bootstrap: ext2fs[part:1:device:wd0] exec startup proc auth. <janneke>looking at the guix boot sequence, there's also: daemons/runsystem.sh <janneke>ACTION had a writeup of this somewhere... <janneke>hurd/startup.c is also a very early candidate <damo22>AlmuHS: Only some x86 cpus support destination register with 8 bits of physical apic id, mostly only support 4 bits... but the apic id can be wider than 4 bits and so you cant address the lapic <damo22>qemu doesnt have this limitation <damo22>the way to work around it, is to use logical destination mode, but you still only have 8 unique mask bits to address 8 groups of cpus <damo22>so i solved this in my branch using (cpu_number % 8) <damo22>and ALL_EXCLUDING_SELF parallel startup <gnu_srs1>janneke: tks for your hints. No luck so far :( <gnu_srs1>(18:05:33) gnu_srs1: Hello. Which program/script is active after: start ext2fs: Hurd server bootstrap: ext2fs[device:hd0s1] exec startup proc auth. <gnu_srs1>(18:07:25) gnu_srs1: I'm trying to upgrade an old image with dpkg-deb -x and boot hangs after the above <gnu_srs1>(20:02:56) gnu_srs1: Where to find info about the boot sequence of Hurd? <gnu_srs1>damo22: /usr/sbin/init from sysvinit-core?? <gnu_srs1>I have linked that file to /sbin too. No progress :( <damo22>use sysv-rc-conf to configure it <gnu_srs1>seems like there is no debug or verbose option for that file. <damo22>youpi: do you know of any reason why parallel smp init may break if all the APs are running cpu_setup() at the same time? <damo22>my branch sometimes boots, sometimes hangs <youpi>iirc there were some initialization things that were using global variables <youpi>probably with per_cpu data we can fix that <youpi>also, possibly it's difficult to control the APs with ACPI etc. if they all start at the same time <gnu_srs1>tks. Not installed either on an updated box or the failing box. I think it is something else, missing packages or too old/buggy grub.cfg. <damo22>since there is no way to address them individually <damo22>it works in qemu because qemu doesnt have 4 bit limitation on destination register