IRC channel logs

2023-02-08.log

back to list of logs

<damo22>youpi: i think the simple locks need to change to something different eg: pmap.c:1448 simple_lock(&p->lock);
<damo22>when i updated the curr_ipl to be an array per cpu i got a deadlock very early
<youpi>damo22: deadlock -> which backtraces?
<damo22>i cant find where simple_lock is defined when NCPUS >1
<youpi>well, kern/lock.h ?
<youpi>really, use tools such as ctags to easily determine such a thing
<youpi>or rather i386/i386/lock.h
<youpi>such tool reports both
<youpi>using a #error will tell what actually gets included
<damo22>dont we need extra things like in lock_mon.c
<youpi>that can be useful for debugging, but that's not necessary
<damo22>when i enabled per cpu curr_ipl[] it deadlocked when trying to take a lock in pmap.c:1448
<damo22>i can get the backtrace
<youpi>as in ?
<damo22>i changed all instances of curr_ipl and made it cpu number aware
<youpi>I'm talking about the "I can"t" part
<youpi>"doesn't work" is never precise enough for me to divine what could be going wrong
<youpi>aaah, sorry, it's even a "I can"
<youpi>then show it?
<youpi>really, I'm amazed when people have information, but don't realize that it could be useful they show it
<damo22>i need to run it again with gdb
<youpi>if it stops at an simple_lock, most probably some code missed unlocking it
<youpi>which would be really not surprising considered the pmap code has been changed over years without testing with smp
<youpi>so a complete review of the pmap lock / unlock calls would probably be very useful
<youpi>with gdb? can't kdb work?
<damo22>no it freezes before it prints anything
<damo22>(gdb) c
<damo22>Continuing.
<damo22>^C
<damo22>Program received signal SIGINT, Interrupt.
<damo22>0xc1001328 in pmap_reference (p=0xc10a81c4 <kernel_pmap_store>) at ../i386/intel/pmap.c:1448
<damo22>1448 ../i386/intel/pmap.c: No such file or directory.
<damo22>(gdb) bt
<damo22>#0 0xc1001328 in pmap_reference (p=0xc10a81c4 <kernel_pmap_store>) at ../i386/intel/pmap.c:1448
<damo22>#1 0xc101d033 in kmem_submap (map=0xc10be900 <ipc_kernel_map_store>,
<damo22> parent=0xc10b16a0 <kernel_map_store>, min=0xc10a0f84 <solid_intstack+3972>,
<damo22> max=0xc10a0f88 <solid_intstack+3976>, size=8388608) at ../vm/vm_kern.c:878
<damo22>#2 0xc10419e4 in ipc_init () at ../ipc/ipc_init.c:113
<damo22>#3 0xc1016e45 in setup_main () at ../kern/startup.c:118
<damo22>#4 0xc10049a2 in c_boot_entry (bi=38144) at ../i386/i386at/model_dep.c:600
<damo22>#5 0xc1000093 in iplt_done () at ../i386/i386at/boothdr.S:103
<youpi>so it's even before activating other cpus, so not a contention issue, so really most probably a missing simple_unlock somewhere
<damo22>ugh:
<damo22>Program received signal SIGINT, Interrupt.
<damo22>0xc1001328 in pmap_reference (p=0xc10a81c4 <kernel_pmap_store>) at ../i386/intel/pmap.c:1448
<damo22>1448 kmem_free(kernel_map, (vm_offset_t)p->user_pdpbase, INTEL_PGBYTES);
<damo22>why doesnt the code line up
<damo22>0xc1001328 in pmap_reference (p=0xc10a81c4 <kernel_pmap_store>) at ../i386/intel/pmap.c:1448
<damo22>warning: Source file is more recent than executable.
<damo22>1448 simple_lock(&p->lock);
<damo22>thats better
<damo22>(19:35:42) damo22: 0xc1001328 in pmap_reference (p=0xc10a81c4 <kernel_pmap_store>) at ../i386/intel/pmap.c:1448
<damo22>(19:35:42) damo22: warning: Source file is more recent than executable.
<damo22>(19:35:42) damo22: 1448 simple_lock(&p->lock);
<damo22>i checked pmap.c there are no simple_lock() calls without matching simple_unlock() calls on the pmap
<youpi>damo22: is kernel_pmap_store.lock.lock_data indeed 1 ?
<youpi>at worse you can probably put a simple_lock(&kernel_pmap->lock); around in the boot code, to see where it starts getting stuck
<damo22>ok
<damo22>hmm NCPUS > 1 but i am running with -smp 1
<youpi>? that doesn't matter
<youpi>NCPUS is just the compiled-in number of cpus
<youpi>+maximum
<damo22>ok but it allows multiprocessor code paths to be compiled in
<youpi>yes
<youpi>that's a good way to exercise them in an easy case first
<damo22>1448 simple_lock(&p->lock);
<damo22>(gdb) p p->lock
<damo22>$1 = {lock_data = 0, is_a_simple_lock = {<No data fields>}}
<luckyluke><damo22> "1448 kmem_free(kernel_map, (..." <- Are you compiling for xen?
<damo22>no that was a gdb code mismatch
<damo22>my code was on the wrong branch when i used gdb that time
<youpi>it will probably be useful to disas the function, to make sure what exactly the compiler understood it should be doing
<Pellescours>damo22: note, building your branch with ncpus=6 generate an error because of a table size but ncpus=4 build correctly
<Pellescours>../i386/i386/mp_desc.c:70:1: error: requested alignment ‘24576’ is not a positive power of 2 70 | uint8_t solid_intstack[NCPUS*INTSTACK_SIZE] __aligned(NCPUS*INTSTACK_SIZE);
<Pellescours>I just tried master branch with latest commit from damo22 branch, and the code is blocked in pmap_enter ../i386/intel/pmap.c:2001 it’s trying to get read lock on the pmap
<gnucode>hey hurd people. I am trying to install the Hurd in a T43. wish me luck!
<Pellescours>good luck, nice challenge
<gnucode>Pellescours: I'm already worried. It just created a partition #1 and a partition #5. That seems odd.
<Pellescours>We can’t compile gnumach with -O0, it’s sad to debug efficiently with gdb.
<gnucode>Pellescours: what is -O0? super quick code?
<Pellescours>youpi: does it help if I say that when I add CFLAGS="-g" to the configure, it unlocks the boot?
<Pellescours>I just disassemble the method pmap_enter but then I don’t really know where to loook
<Pellescours>I have the 2 versions disassembled, with -g (good), and normal (bad)
<Pellescours>One thing I see it’s that with -g it does not inline some functions (I see call instructions that I can’t see without, code is much more longer)
<DiffieHellman>ACTION likes combining -O3 with -ggdb.
<DiffieHellman>All the optimisations, except with debug symbols :3
<DiffieHellman>You should probably combine -Og with -g if you are looking to debug without much optimisation.
<youpi>Pellescours: you can just pastebin them so people can get a look
<Pellescours>the good one: https://pastebin.com/jFMAs3ut
<Pellescours>the bad one: https://pastebin.com/0Ua8Rw9D
<gnucode>are there any debian GNU/Hurd mirrors in the U.S. ?
<gnucode>the installer is only trying to download from a mirror in the netherlands, and it is saying that it failed.
<gnucode>well I guess I am going to continue installing without specifying a download mirror...
<gnucode>oh, that's right. I am using the netinstall, and that warned me that I would have issues downloading anything other than the base system.
<gnucode>so that makes sense.
<gnucode>well, I just rebooted the T43. and it works!
<gnucode>I am actually running the Hurd on real hardware! awesome!
<Pellescours>gnucode: yaaayy, rumpdisk or linux driver?
<gnucode>I just did whatever the installer had me use. So probably the linux driver.
<gnucode>This hard drive only has 40GB on it. :)
<gnucode>max RAM is 2GB
<gnucode>and it has network connectivity. that's cool.
<Pellescours>Is it normal that an object being unlocked without being locked first? https://git.savannah.gnu.org/cgit/hurd/gnumach.git/tree/vm/vm_page.c#n1038
<Pellescours>youpi: with your last commit on gnumach I’m not able to build it with ncpus=1 it says
<Pellescours>../ipc/ipc_port.c:60:22: error: expected declaration specifiers or ‘...’ before ‘,’ token 60 | def_simple_lock_data(, ipc_port_multiple_lock_data) | ^ ../ipc/ipc_port.c:60:24: error: expected declaration specifiers or ‘...’ before ‘ipc_port_multiple_lock_data’ 60 | def_simple_lock_data(, ipc_port_multiple_lock_data) |
<Pellescours>^~~~~~~~~~~~~~~~~~~~~~~~~~~
<Pellescours>oh badly paste sorry