IRC channel logs

2022-05-28.log

back to list of logs

***duds-_ is now known as duds-
<AlmuHS>hi. I have a problem with the cpu startup in my smp kernel
<AlmuHS>i noticed that the ap processor is not able to access to bsp data
<AlmuHS>this line doesn't works https://github.com/AlmuHS/GNUMach_SMP/blob/smp_stage2/i386/i386/mp_desc.c#L368
<AlmuHS>so this loop never ends https://github.com/AlmuHS/GNUMach_SMP/blob/smp_stage2/i386/i386/mp_desc.c#L513
<AlmuHS>also, I noticed that there are some problem in the GDT loading. The next printf never is shown https://github.com/AlmuHS/GNUMach_SMP/blob/smp_stage2/i386/i386/mp_desc.c#L403
<AlmuHS>could you help me to find the problem?
<AlmuHS>cpu_setup() is the first function which the AP processor executes after startup and jump to protected mode
<luckyluke>AlmuHS: did you check whether there is a fault on the secondary cpu? (e.g. if GDT is wrong)
<AlmuHS>I've checked that the secondary cpu reach the cpu_setup() function, and stops just after call to gdt_init()
<luckyluke>you should check exactly where it stops, i.e. on which instruction or statement
<luckyluke>also, how are you testing this? did you add some new RPC to enable the other cpus?
<AlmuHS>a startup IPI
<luckyluke>how do you trigger it?
<AlmuHS> https://github.com/AlmuHS/GNUMach_SMP/blob/smp_stage2/i386/i386/smp.c
<AlmuHS>using the Local APIC
<AlmuHS>in the smp_startup_cpu() function
<AlmuHS>writting the IPI data in the ICR register
<AlmuHS> https://github.com/AlmuHS/GNUMach_SMP/blob/smp_stage2/i386/i386/apic.c#L238-L258
<AlmuHS>the ICR is written by a bitfield structure https://github.com/AlmuHS/GNUMach_SMP/blob/smp_stage2/i386/i386/apic.h#L67-L108
<AlmuHS> https://github.com/AlmuHS/GNUMach_SMP/blob/smp_stage2/i386/i386/apic.h#L177-L178
<luckyluke>ok so it seems you enable all cpu at boot
<luckyluke>are you sure the gdt_init function is reached?
<luckyluke>you could set a breakpoint on if and the step into the function
<luckyluke>s/if/it/
<AlmuHS>I put some prints in cpu_setup()
<AlmuHS>this print is shown " printf("Configuring GDT and IDT\n");"
<AlmuHS> https://github.com/AlmuHS/GNUMach_SMP/blob/smp_stage2/i386/i386/mp_desc.c#L398
<AlmuHS>but the next not
<AlmuHS>this is not reached
<AlmuHS> https://github.com/AlmuHS/GNUMach_SMP/blob/smp_stage2/i386/i386/mp_desc.c#L404
<AlmuHS>also, other interesting data is that this line doesn't works properly
<AlmuHS>machine_slot[i].running = TRUE;
<AlmuHS>because the intel_startCPU() continues checking it as false, and this while loop in start_other_cpus() never end
<AlmuHS>while(machine_slot[cpu].running != TRUE);
<AlmuHS>this is the function which calls to starts the AP enabling https://github.com/AlmuHS/GNUMach_SMP/blob/smp_stage2/i386/i386/mp_desc.c#L484-L515
<luckyluke>I think it would be good to check where exactly in gdt_init() the execution stops, e.g. with gdb or adding some more printf() on each step
<AlmuHS>ok
<AlmuHS>I also suspect that the temporary GDT or the stack is bad configured
<luckyluke>also, I was looking at your branch, it seems in smpboot.S you re-define boot_gdt (the other one is in boothdr.S)
<luckyluke>should the GDT be different on the secondary cpu?
<AlmuHS>cpuboot.S do you refers?
<luckyluke>(if yes the name should probably be changed otherwise I suspect they clash)
<luckyluke>ah, yes, smpboot.S -> cpuboot.S
<AlmuHS>in cpuboot.S there are a little trick. There are two GDT: "gdt_tmp" that is simply to jump to protected mode, and the "boot_gdt" which is the real Mach temporary GDT
<AlmuHS>probably there are ways to avoid this trick, but I don't know yet
<AlmuHS>boot_gdt is a copy-paste from boothdr.S
<luckyluke>but in general, could you reuse the same code from boothdr.S? I'm not very familiar with APIC and multiple cpus, but unless there is a reason to have different configurations, maybe it's better to reuse boothdr.S
<AlmuHS>how?
<AlmuHS>how can I reuse it?
<AlmuHS>I need to jump to protected mode
<AlmuHS>and then starts the configurations for paging and other things
<luckyluke>for example, you probably don't need to re-define boot_gdt, the linker will anyway keep only one
<luckyluke>and to jump to protected mode, you could move this part to a dedicated function, which is then called by the primary cpu and then later also by the secondary
<luckyluke>unless they need a different configuration
<AlmuHS>to jump to protected mode, I need assembly
<AlmuHS>because the cpu is not able to execute 32-bit instructions
<luckyluke>sure, but is the procedure different for secondary cpus?
<luckyluke>about paging, maybe you could make the bootstrap procedure become the idle thread for that cpu
<AlmuHS>i can't overwrite the main cpu structures
<AlmuHS>and the secondary cpus has different stack
<AlmuHS>a stack for each
<luckyluke>I mean the bootstrap code could be reused, not the cpu-specific structures or stack
<luckyluke>and boot_gdt maybe doesn't need to be cpu-specific, or does it?
<AlmuHS>i'm not sure
<AlmuHS>each cpu has its own gdt
<AlmuHS>but with similar structure
<luckyluke>looking at gdt_tmp, it seems you don't use KERNELBASE as in boot_gdt
<luckyluke>also gdt_init() will load another set of segments, and this maybe causes a fault (you can enable exception tracing in qemu)
<luckyluke>I had similar issues with x86_64, as segmentation can't be used there
<AlmuHS>the boot_gdt is a preliminary step to be able to run C code, and the load the definitive GDT with gdt_init()
<AlmuHS>in the smp branch you can see the original code in which I am taking as base
<luckyluke>why do you need to first load gdt_tmp, then load boot_gdt?
<AlmuHS>to jump easier to 32 bits
<luckyluke>also, why do you use 0x7000 as AP_BOOT_ADDR? couldn't that be just the address of apboot?
<AlmuHS>i'm not sure
<AlmuHS>i remember it's a restriction of apic
<luckyluke>maybe some comments would be useful, to keep track of this restriction and also to explain why you need to load gdt twice :)
<luckyluke>also, if you copy apboot and gdt_tmp to 0x7000, it will probably still point to the "original" gdt_tmp, as it's not relocated
<luckyluke>maybe you could use some linker script magic if you need to execute from some special addresses
<AlmuHS>sorry, i am eating
<AlmuHS>wait some minutes
<AlmuHS>I need to load gdt twice, because i can't access to boot_gdt directly
<luckyluke>ah, so initially the secondary cpu has a limited gdt pre-configured? does apic also define 0x7000?
<AlmuHS> https://wiki.osdev.org/APIC#Interrupt_Command_Register
<AlmuHS>Bits 0-7 The vector number, or starting page number for SIPIs
<AlmuHS> https://d2pgu9s4sfmw1s.cloudfront.net/UAM/Prod/Done/a062E00001UMCGVQA5/aa4c5fd2-65ff-40fa-8340-397cecc0d178?Expires=1653772609&Key-Pair-Id=APKAJKRNIMMSNYXST6UA&Signature=bkgTRFPV2k7pWMbvz0TGAXo60pfArYgXzZ07XNKvzJq~6pIIb3FNlPHScyvrwKoW~EfAVXjEuK~zMl0BGU6FDAg38lCzGpIsJnLWhtLcISTf9h4YeQuHdS3CFKZoV05eVAcyfkOP2KcsdOF2PIfMgfGGQSH9~hpf1NGQRA33lqKTxBPobrvjU6isQivgXaDBx2Yr4mtpfxXk2Hh5lYiY3ox3zyEQfpfR3mlW4SvFlts2uX7UK6KcWfEjUQJpRULzPk-ACGz1cUFs-uaIb
<AlmuHS>wNmRAmiLASyyDR4aJTrQA3215ThtDvI8vCs6KCwtFIRHUCSB8haeK6pWxSZ2SsbgfkE6w__
<luckyluke>so it could be any page between 0x0000 and 0xF000 right?
<AlmuHS>It's possible
<luckyluke>maybe 0x3F000 as it has 7 bits
<AlmuHS>here a little example
<AlmuHS> https://riptutorial.com/x86/example/20472/wake-up-all-the-processors
<AlmuHS> ;Vector: 08h (Will make the CPU execute instruction ad address 08000h)
<AlmuHS>A SIPI contains a vector, this is similar in meaning, but absolutely different in practice, to an interrupt vector (a.k.a. interrupt number).
<AlmuHS>The vector is an 8 bit number, of value V (represented as vv in base 16), that makes the CPU starts executing instructions at the physical address 0vv000h.
<AlmuHS>We will call 0vv000h the Wake-up address (WA).
<AlmuHS>The WA is forced at a 4KiB (or page) boundary. We will use 08h as V, the WA is then 08000h, 400h bytes after the bootloader. This gives control to the APs.
<AlmuHS>this is the APIC restriction that I refered before