IRC channel logs
2023-02-08.log
back to list of logs
<damo22>youpi: i think the simple locks need to change to something different eg: pmap.c:1448 simple_lock(&p->lock); <damo22>when i updated the curr_ipl to be an array per cpu i got a deadlock very early <youpi>damo22: deadlock -> which backtraces? <damo22>i cant find where simple_lock is defined when NCPUS >1 <youpi>really, use tools such as ctags to easily determine such a thing <youpi>using a #error will tell what actually gets included <damo22>dont we need extra things like in lock_mon.c <youpi>that can be useful for debugging, but that's not necessary <damo22>when i enabled per cpu curr_ipl[] it deadlocked when trying to take a lock in pmap.c:1448 <damo22>i changed all instances of curr_ipl and made it cpu number aware <youpi>I'm talking about the "I can"t" part <youpi>"doesn't work" is never precise enough for me to divine what could be going wrong <youpi>aaah, sorry, it's even a "I can" <youpi>really, I'm amazed when people have information, but don't realize that it could be useful they show it <youpi>if it stops at an simple_lock, most probably some code missed unlocking it <youpi>which would be really not surprising considered the pmap code has been changed over years without testing with smp <youpi>so a complete review of the pmap lock / unlock calls would probably be very useful <damo22>no it freezes before it prints anything <damo22>Program received signal SIGINT, Interrupt. <damo22>0xc1001328 in pmap_reference (p=0xc10a81c4 <kernel_pmap_store>) at ../i386/intel/pmap.c:1448 <damo22>1448 ../i386/intel/pmap.c: No such file or directory. <damo22>#0 0xc1001328 in pmap_reference (p=0xc10a81c4 <kernel_pmap_store>) at ../i386/intel/pmap.c:1448 <damo22>#1 0xc101d033 in kmem_submap (map=0xc10be900 <ipc_kernel_map_store>, <damo22> parent=0xc10b16a0 <kernel_map_store>, min=0xc10a0f84 <solid_intstack+3972>, <damo22> max=0xc10a0f88 <solid_intstack+3976>, size=8388608) at ../vm/vm_kern.c:878 <damo22>#2 0xc10419e4 in ipc_init () at ../ipc/ipc_init.c:113 <damo22>#3 0xc1016e45 in setup_main () at ../kern/startup.c:118 <damo22>#4 0xc10049a2 in c_boot_entry (bi=38144) at ../i386/i386at/model_dep.c:600 <damo22>#5 0xc1000093 in iplt_done () at ../i386/i386at/boothdr.S:103 <youpi>so it's even before activating other cpus, so not a contention issue, so really most probably a missing simple_unlock somewhere <damo22>Program received signal SIGINT, Interrupt. <damo22>0xc1001328 in pmap_reference (p=0xc10a81c4 <kernel_pmap_store>) at ../i386/intel/pmap.c:1448 <damo22>1448 kmem_free(kernel_map, (vm_offset_t)p->user_pdpbase, INTEL_PGBYTES); <damo22>0xc1001328 in pmap_reference (p=0xc10a81c4 <kernel_pmap_store>) at ../i386/intel/pmap.c:1448 <damo22>warning: Source file is more recent than executable. <damo22>(19:35:42) damo22: 0xc1001328 in pmap_reference (p=0xc10a81c4 <kernel_pmap_store>) at ../i386/intel/pmap.c:1448 <damo22>(19:35:42) damo22: warning: Source file is more recent than executable. <damo22>(19:35:42) damo22: 1448 simple_lock(&p->lock); <damo22>i checked pmap.c there are no simple_lock() calls without matching simple_unlock() calls on the pmap <youpi>damo22: is kernel_pmap_store.lock.lock_data indeed 1 ? <youpi>at worse you can probably put a simple_lock(&kernel_pmap->lock); around in the boot code, to see where it starts getting stuck <damo22>hmm NCPUS > 1 but i am running with -smp 1 <youpi>NCPUS is just the compiled-in number of cpus <damo22>ok but it allows multiprocessor code paths to be compiled in <youpi>that's a good way to exercise them in an easy case first <damo22>$1 = {lock_data = 0, is_a_simple_lock = {<No data fields>}} <luckyluke><damo22> "1448 kmem_free(kernel_map, (..." <- Are you compiling for xen? <damo22>my code was on the wrong branch when i used gdb that time <youpi>it will probably be useful to disas the function, to make sure what exactly the compiler understood it should be doing <Pellescours>damo22: note, building your branch with ncpus=6 generate an error because of a table size but ncpus=4 build correctly <Pellescours>../i386/i386/mp_desc.c:70:1: error: requested alignment ‘24576’ is not a positive power of 2 70 | uint8_t solid_intstack[NCPUS*INTSTACK_SIZE] __aligned(NCPUS*INTSTACK_SIZE); <Pellescours>I just tried master branch with latest commit from damo22 branch, and the code is blocked in pmap_enter ../i386/intel/pmap.c:2001 it’s trying to get read lock on the pmap <gnucode>hey hurd people. I am trying to install the Hurd in a T43. wish me luck! <gnucode>Pellescours: I'm already worried. It just created a partition #1 and a partition #5. That seems odd. <Pellescours>We can’t compile gnumach with -O0, it’s sad to debug efficiently with gdb. <gnucode>Pellescours: what is -O0? super quick code? <Pellescours>youpi: does it help if I say that when I add CFLAGS="-g" to the configure, it unlocks the boot? <Pellescours>I just disassemble the method pmap_enter but then I don’t really know where to loook <Pellescours>I have the 2 versions disassembled, with -g (good), and normal (bad) <Pellescours>One thing I see it’s that with -g it does not inline some functions (I see call instructions that I can’t see without, code is much more longer) <DiffieHellman>You should probably combine -Og with -g if you are looking to debug without much optimisation. <youpi>Pellescours: you can just pastebin them so people can get a look <gnucode>are there any debian GNU/Hurd mirrors in the U.S. ? <gnucode>the installer is only trying to download from a mirror in the netherlands, and it is saying that it failed. <gnucode>well I guess I am going to continue installing without specifying a download mirror... <gnucode>oh, that's right. I am using the netinstall, and that warned me that I would have issues downloading anything other than the base system. <gnucode>well, I just rebooted the T43. and it works! <gnucode>I am actually running the Hurd on real hardware! awesome! <gnucode>I just did whatever the installer had me use. So probably the linux driver. <gnucode>This hard drive only has 40GB on it. :) <gnucode>and it has network connectivity. that's cool. <Pellescours>youpi: with your last commit on gnumach I’m not able to build it with ncpus=1 it says <Pellescours>../ipc/ipc_port.c:60:22: error: expected declaration specifiers or ‘...’ before ‘,’ token 60 | def_simple_lock_data(, ipc_port_multiple_lock_data) | ^ ../ipc/ipc_port.c:60:24: error: expected declaration specifiers or ‘...’ before ‘ipc_port_multiple_lock_data’ 60 | def_simple_lock_data(, ipc_port_multiple_lock_data) |