IRC channel logs

2022-10-07.log

back to list of logs

<Pellescours>damo22: _mp_desc is defined as an area of N_CPU*1024, but the sizeof(struct mp_desc_table *) = 10468
<Pellescours>so in mp_desc.c we may overwrite other data
<damo22>maybe its better to use init_alloc_aligned
<Pellescours>maybe, but init_alloc_aligned or statically allocated, this does not explain why lgdt doesn’t work
<damo22>im getting pdesc.limit = 3
<damo22>when it crashes
<damo22>sounds like sizeof a pointer is being used instead of the struct
<Pellescours>damo22: isn't a cli missing in cpuboot.S
<Pellescours>I see one at the beginning but I don't know of we should not add one after, once in 32 bits
<Pellescours>in the smp branch of almu, there is a cli just after amhaving enabled paging
<Pellescours> https://github.com/AlmuHS/GNUMach_SMP/blob/smp/i386/i386/cpuboot.S#L100
<damo22>i removed it yes, not sure if its needed
<Pellescours>I will try with it just to see
<Pellescours>doesn’t work
<damo22>i got it to not crash on gdt load but it hangs instead
<Pellescours>yeah same
<damo22>i think the problem is the structs
<damo22>struct real_descriptor *mp_gdt[NCPUS}
<damo22>but the real gdt is defined as struct real_descriptor gdt[GDTSZ];
<damo22>mp_gdt is just a collection of pointers
<damo22>kvtolin(pointer) and (unsigned long)(pointer) are the same value on the AP
<damo22>because of the segmentation i think
<damo22>Pellescours: what if the first gdt is not being set up correctly? the gdt address needs to be relative to the data segment to dereference it
<damo22>but we just set up the data segment to zero ?
<damo22>i need to pause the BSP before gdt_init() of itself and check the segmentation
<damo22>and after
<damo22> /*
<damo22> * We'll have to temporarily install a direct mapping
<damo22> * between physical memory and low linear memory,
<damo22> * until we start using our new kernel segment descriptors.
<damo22> */
<damo22>i guess we need to set the CR* registers
<damo22>before changing the gdt
<damo22>ive got a value in cr2 and the cpu crashed which means theres no trap but it hit a page fault i think
<damo22>ok so the problem is not actually loading the gdt, its making the kernel able to read its own memory as it executes i think before the gdt loads
<damo22>the AP has no paging and i just enabled it but its faulting
<damo22>i turned off CR0_WP and its working kinda
<damo22>task loaded: acpi --host-priv-port=1 --device-master-port=2 --next-task=3
<damo22>panic {cpu0} ../i386/intel/pmap.c:1591: pmap_remove_range: pmap_remove: null pv_
<damo22>list!
<damo22>Debugger invoked: panic
<damo22>Kernel Breakpoint trap, eip 0xc1026034
<damo22>TODO: cpu_interrupt_to_db
<damo22>Stopped at Debugger+0x13: int $3
<damo22>Debugger(c116b7f5,0,f4839c40,c1026015,f52d4120)+0x13
<damo22>Panic(c115fd91,637,c1155fb4,c115fd76,ffffffff)+0xdb
<damo22>pmap_remove_range(f52d4124,0,f4839ce0,c100ca2a,ffffffff)+0x213
<damo22>pmap_enter(f4837fa8,8048000,7ff93000,3,0)+0x285
<damo22>vm_fault(f4835f20,8048000,3,0,0,0,f4839db0,c100a057)+0x582
<damo22>kernel_trap(f4839dc0)+0xbf
<damo22>>>>>> Page fault (14) at copyout+0x23 <<<<<
<damo22>copyout(f67fe010,0,185828,8048000,185828,305,f87fe4e0,0)+0x23
<damo22>exec_load(c1063f80,c1064030,f67fe010,f4839fb0)+0x132
<damo22>user_bootstrap(c1031680,f4844150,0,c1055175,f4844e70)+0x2f
<damo22>thread_continue(f4844e70,f4843430,c1031680,f4839fe4,0)+0x2e
<damo22>Thread_continue()
<damo22>db{0}>
<damo22>i think its working!
<damo22>it boots with APs in tight loop
<damo22>but their gdt etc is all set up
<damo22>youpi: i got an AP to pass into slave_main and become idle
<damo22>but it hangs waiting for something to do
<damo22>BSP continues and then stops when the AP gets stuck
<damo22>are we missing something in the operating system for SMP?
<damo22>see * 0384ef5b (HEAD -> feat-smp, zammit/feat-smp) Let BSP continue to boot, don't wait for APs to reach idle
<damo22>Debugger invoked: panic
<damo22>Kernel Breakpoint trap, eip 0xc1026074
<damo22>TODO: cpu_interrupt_to_db
<damo22>Stopped at Debugger+0x13: int $3
<damo22>Debugger(c116b852,1,f4840f28,c1025fc8,3d5)+0x13
<damo22>Panic(c1161727,6e,c11570ac,c1161714,c11616e2)+0xdb
<damo22>Debugger(c116b852,1,f4840f78,c1026055,0)+0x2a
<damo22>Panic(c1161c43,6db,c1157a24,c1161cf4,c1171a5c)+0xdb
<damo22>idle_thread(c10316c0,0,0,c10551c5,f4845bd0)
<damo22>thread_continue(f4845bd0,f4844bf0,f4847ea0,f4843520,f4843558)+0x2e
<damo22>Thread_continue()
<damo22>db{1}> show all tasks
<damo22> ID TASK NAME [THREADS]
<damo22>Kernel Invalid opcode trap, eip 0x3
<damo22>Caught Invalid opcode (6), code = 0, pc = 10
<damo22>db{1}>
<damo22>what is this TODO: cpu_interrupt_to_db ?
<damo22>i fixed kdb for multiprocessor
<civodul>neat :-)
<damo22>theres a problem with starting APs because they are sharing the pmap but need to hack it at the start to get gdt working
<damo22>we need to insert a mapping like we do on the BSP and then remove it before the BSP continues
<damo22>and flush the tlb
<damo22>youpi: it seems like the APs are not allowed to reach load_context() before the BSP because otherwise they dont have anything to run and it triggers a trap
<damo22>but otherwise everything looks good
<civodul>is ld.so supposed to be mapped at 0x1000?
<damo22>Debugger(c116b837,1,f4840f78,c1026055,0)+0x13
<damo22>Panic(c1161c43,6db,c1157a24,c1161cf4,c1171a44)+0xdb
<damo22>idle_thread(c10316c0,0,0,c10551c5,f4845bd0)
<damo22>thread_continue(f4845bd0,f4844bf0,f4847ea0,f4843520,f4843558)+0x2e
<damo22>Thread_continue()
<damo22>db{1}>
<damo22>youpi: its hitting Bad processor state 1 (running)
<damo22>in idle_thread_continue
<damo22>should i just let it continue running if its in idle_thread_continue and in state PROCESSOR_RUNNING ?
<damo22>instead of panic
<damo22>i think im stuck at the first context switch
<damo22>i hit a failed assert() in thread_dispatch
<damo22> /*
<damo22> * Pretend it is already running, and resume it.
<damo22> * Since it looks as if it is running, thread_resume
<damo22> * will not try to put it on the run queues.
<damo22> *
<damo22> * We can do all of this without locking, because nothing
<damo22> * else is running yet.
<damo22> */
<damo22>aha, but the AP is running now
<youpi>damo22: you probably need one idle_thread per cpu
<youpi>civodul: I wouldn't be surprised that it is
<youpi>since it's PIE
<damo22>youpi: i dont think so, it looks like gnumach was designed to be multiprocessor
<youpi>ok but it doesn't make sense to have the same thread running concurrently on several processors
<youpi>the state of the thread will be shared, that can't work
<youpi>if in doubt, check how xnu does it
<youpi>that's most often a good source of inspiration
<youpi>also
<youpi>kern/processor.h: struct thread *idle_thread; /* this processor's idle thread. */
<youpi>see that
<youpi>it clearly shows that each processor has its idle thread
<damo22>kern/ast.c: if (self != current_processor()->idle_thread
<damo22>ok yes, they already do have more idle_threads
<damo22>kern/sched_prim.c: return myprocessor->idle_thread;
<damo22>the idle threads are created by the BSP before starting the APs and they are set ready to go
<damo22>youpi: http://paste.debian.net/plain/1256269
<damo22>hit an assert in thread_dispatch
<damo22>-smp 2
<damo22>i think something is not implemented yet, the "interrupt_processor"
<damo22>for TLB coherenc
<damo22>its deadlocked waiting for the cpu_update_needed[]
<damo22>is it enough to call pmap_update_interrupt() ?
<damo22>but how do i get execution on the specific cpu
<damo22>i need to send an IPI to a specific processor to tell it to flush tlb
<damo22>if anyone wants to try smp, its currently stuck here in my branch * cfbce13b Add debugging to see why the TLB shootdown fails http://git.zammit.org/gnumach-sv.git/log/?h=feat-smp
<damo22>it *almost* boots
***FragByte_ is now known as FragByte
***ks` is now known as idchoppers