IRC channel logs

<Pellescours>damo22: _mp_desc is defined as an area of N_CPU*1024, but the sizeof(struct mp_desc_table *) = 10468

<Pellescours>so in mp_desc.c we may overwrite other data

<damo22>maybe its better to use init_alloc_aligned

<Pellescours>maybe, but init_alloc_aligned or statically allocated, this does not explain why lgdt doesn’t work

<damo22>im getting pdesc.limit = 3

<damo22>when it crashes

<damo22>sounds like sizeof a pointer is being used instead of the struct

<Pellescours>damo22: isn't a cli missing in cpuboot.S

<Pellescours>I see one at the beginning but I don't know of we should not add one after, once in 32 bits

<Pellescours>in the smp branch of almu, there is a cli just after amhaving enabled paging

<Pellescours> https://github.com/AlmuHS/GNUMach_SMP/blob/smp/i386/i386/cpuboot.S#L100

<damo22>i removed it yes, not sure if its needed

<Pellescours>I will try with it just to see

<Pellescours>doesn’t work

<damo22>i got it to not crash on gdt load but it hangs instead

<Pellescours>yeah same

<damo22>i think the problem is the structs

<damo22>struct real_descriptor *mp_gdt[NCPUS}

<damo22>but the real gdt is defined as struct real_descriptor gdt[GDTSZ];

<damo22>mp_gdt is just a collection of pointers

<damo22>kvtolin(pointer) and (unsigned long)(pointer) are the same value on the AP

<damo22>because of the segmentation i think

<damo22>Pellescours: what if the first gdt is not being set up correctly? the gdt address needs to be relative to the data segment to dereference it

<damo22>but we just set up the data segment to zero ?

<damo22>i need to pause the BSP before gdt_init() of itself and check the segmentation

<damo22>and after

<damo22> /*

<damo22> * We'll have to temporarily install a direct mapping

<damo22> * between physical memory and low linear memory,

<damo22> * until we start using our new kernel segment descriptors.

<damo22> */

<damo22>i guess we need to set the CR* registers

<damo22>before changing the gdt

<damo22>ive got a value in cr2 and the cpu crashed which means theres no trap but it hit a page fault i think

<damo22>ok so the problem is not actually loading the gdt, its making the kernel able to read its own memory as it executes i think before the gdt loads

<damo22>the AP has no paging and i just enabled it but its faulting

<damo22>i turned off CR0_WP and its working kinda

<damo22>task loaded: acpi --host-priv-port=1 --device-master-port=2 --next-task=3

<damo22>panic {cpu0} ../i386/intel/pmap.c:1591: pmap_remove_range: pmap_remove: null pv_

<damo22>list!

<damo22>Debugger invoked: panic

<damo22>Kernel Breakpoint trap, eip 0xc1026034

<damo22>TODO: cpu_interrupt_to_db

<damo22>Stopped at Debugger+0x13: int $3

<damo22>Debugger(c116b7f5,0,f4839c40,c1026015,f52d4120)+0x13

<damo22>Panic(c115fd91,637,c1155fb4,c115fd76,ffffffff)+0xdb

<damo22>pmap_remove_range(f52d4124,0,f4839ce0,c100ca2a,ffffffff)+0x213

<damo22>pmap_enter(f4837fa8,8048000,7ff93000,3,0)+0x285

<damo22>vm_fault(f4835f20,8048000,3,0,0,0,f4839db0,c100a057)+0x582

<damo22>kernel_trap(f4839dc0)+0xbf

<damo22>>>>>> Page fault (14) at copyout+0x23 <<<<<

<damo22>copyout(f67fe010,0,185828,8048000,185828,305,f87fe4e0,0)+0x23

<damo22>exec_load(c1063f80,c1064030,f67fe010,f4839fb0)+0x132

<damo22>user_bootstrap(c1031680,f4844150,0,c1055175,f4844e70)+0x2f

<damo22>thread_continue(f4844e70,f4843430,c1031680,f4839fe4,0)+0x2e

<damo22>Thread_continue()

<damo22>i think its working!

<damo22>it boots with APs in tight loop

<damo22>but their gdt etc is all set up

<damo22>youpi: i got an AP to pass into slave_main and become idle

<damo22>but it hangs waiting for something to do

<damo22>BSP continues and then stops when the AP gets stuck

<damo22>are we missing something in the operating system for SMP?

<damo22>see * 0384ef5b (HEAD -> feat-smp, zammit/feat-smp) Let BSP continue to boot, don't wait for APs to reach idle

<damo22>Debugger invoked: panic

<damo22>Kernel Breakpoint trap, eip 0xc1026074

<damo22>TODO: cpu_interrupt_to_db

<damo22>Stopped at Debugger+0x13: int $3

<damo22>Debugger(c116b852,1,f4840f28,c1025fc8,3d5)+0x13

<damo22>Panic(c1161727,6e,c11570ac,c1161714,c11616e2)+0xdb

<damo22>Debugger(c116b852,1,f4840f78,c1026055,0)+0x2a

<damo22>Panic(c1161c43,6db,c1157a24,c1161cf4,c1171a5c)+0xdb

<damo22>idle_thread(c10316c0,0,0,c10551c5,f4845bd0)

<damo22>thread_continue(f4845bd0,f4844bf0,f4847ea0,f4843520,f4843558)+0x2e

<damo22>Thread_continue()

<damo22>db{1}> show all tasks

<damo22> ID TASK NAME [THREADS]

<damo22>Kernel Invalid opcode trap, eip 0x3

<damo22>Caught Invalid opcode (6), code = 0, pc = 10

<damo22>what is this TODO: cpu_interrupt_to_db ?

<damo22>i fixed kdb for multiprocessor

<civodul>neat :-)

<damo22>theres a problem with starting APs because they are sharing the pmap but need to hack it at the start to get gdt working

<damo22>we need to insert a mapping like we do on the BSP and then remove it before the BSP continues

<damo22>and flush the tlb

<damo22>youpi: it seems like the APs are not allowed to reach load_context() before the BSP because otherwise they dont have anything to run and it triggers a trap

<damo22>but otherwise everything looks good

<civodul>is ld.so supposed to be mapped at 0x1000?

<damo22>Debugger(c116b837,1,f4840f78,c1026055,0)+0x13

<damo22>Panic(c1161c43,6db,c1157a24,c1161cf4,c1171a44)+0xdb

<damo22>idle_thread(c10316c0,0,0,c10551c5,f4845bd0)

<damo22>thread_continue(f4845bd0,f4844bf0,f4847ea0,f4843520,f4843558)+0x2e

<damo22>Thread_continue()

<damo22>youpi: its hitting Bad processor state 1 (running)

<damo22>in idle_thread_continue

<damo22>should i just let it continue running if its in idle_thread_continue and in state PROCESSOR_RUNNING ?

<damo22>instead of panic

<damo22>i think im stuck at the first context switch

<damo22>i hit a failed assert() in thread_dispatch

<damo22> /*

<damo22> * Pretend it is already running, and resume it.

<damo22> * Since it looks as if it is running, thread_resume

<damo22> * will not try to put it on the run queues.

<damo22> *

<damo22> * We can do all of this without locking, because nothing

<damo22> * else is running yet.

<damo22> */

<damo22>aha, but the AP is running now

<youpi>damo22: you probably need one idle_thread per cpu

<youpi>civodul: I wouldn't be surprised that it is

<youpi>since it's PIE

<damo22>youpi: i dont think so, it looks like gnumach was designed to be multiprocessor

<youpi>ok but it doesn't make sense to have the same thread running concurrently on several processors

<youpi>the state of the thread will be shared, that can't work

<youpi>if in doubt, check how xnu does it

<youpi>that's most often a good source of inspiration

<youpi>also

<youpi>kern/processor.h: struct thread *idle_thread; /* this processor's idle thread. */

<youpi>see that

<youpi>it clearly shows that each processor has its idle thread

<damo22>kern/ast.c: if (self != current_processor()->idle_thread

<damo22>ok yes, they already do have more idle_threads

<damo22>kern/sched_prim.c: return myprocessor->idle_thread;

<damo22>the idle threads are created by the BSP before starting the APs and they are set ready to go

<damo22>youpi: http://paste.debian.net/plain/1256269

<damo22>hit an assert in thread_dispatch

<damo22>-smp 2

<damo22>i think something is not implemented yet, the "interrupt_processor"

<damo22>for TLB coherenc

<damo22>its deadlocked waiting for the cpu_update_needed[]

<damo22>is it enough to call pmap_update_interrupt() ?

<damo22>but how do i get execution on the specific cpu

<damo22>i need to send an IPI to a specific processor to tell it to flush tlb

<damo22>if anyone wants to try smp, its currently stuck here in my branch * cfbce13b Add debugging to see why the TLB shootdown fails http://git.zammit.org/gnumach-sv.git/log/?h=feat-smp

<damo22>it *almost* boots

***FragByte_ is now known as FragByte

***ks` is now known as idchoppers

IRC channel logs

2022-10-07.log