IRC channel logs

2024-02-19.log

back to list of logs

<youpi>damo22: better use simple_lock_irq() rather than spl + simple_lock_nocheck
<youpi>(e.g. about the quantum adjustment)
<youpi>and concerning functions that assume running at splsched and then use simple_lock_nocheck, add a call to assert_splsched() so we are sure of this
<youpi>damo22: the db_spl variable looks bogus, you probably need one per cpu
<youpi>ah, no, it's actually protected by the lock itself
<youpi>I have applied part of your branch
<damo22>:)
<damo22>youpi: i have updated my branch
<solid_black>hi
<saravia>hi solid_black :D
<damo22>gday
<saravia>hi damo22 :D
<damo22>whats up?
<damo22>solid_black: any ideas why my test is locking up?
<damo22>i changed it so it only uses omp to spawn threads:
<damo22> 65 ./test.inf(817) (f6101100): 8 threads:
<damo22> 0 (f6472da8) .W..NF 0xf5f2c8f8
<damo22> 1 (f6495220) .W..N. 0xf5f2c8f8
<damo22> 2 (f64953a0) .W..N. 0xf5f2c8f8
<damo22> 3 (f64956a0) .W..N. 0xf5f2c8f8
<damo22> 4 (f5fc2dc0) .W..N. 0xf5f2c8f8
<damo22> 5 (f5fc2640) .W..N. 0xf5f2c8f8
<damo22> 6 (f5f68558) .W..N. 0xf5f2c8f8
<damo22> 7 (f5f68258) .W..N. 0xf5f2c8f8
<damo22>it still blocks on gsync_wait
<damo22>printf seems to call gsync_wait
<damo22>should i remove the print calls?
<solid_black>damo22: yes, I've tried that yesterday (#pragma omp parallel + no printf) on Samuel's advice, and it still locks up, but this time not inside gsync
<solid_black>it's inside the VM subsystem
<damo22>oh man
<solid_black>and the way it locks up makes me think the locking implementation is broken somehow
<damo22>the sleep locks have a comment there that says this works on UP
<solid_black>then I looked at kern/lock.c, but couldn't find what was wrong
<solid_black>specifically, what happens is, multiple threads get stuck inside mach_port_names() -> vm_map_pageable(ipc_kernel_map, ...)
<damo22> * Sleep locks. These use the same data structure and algorithm
<damo22> * as the spin locks, but the process sleeps while it is waiting
<damo22> * for the lock. These work on uniprocessor systems.
<solid_black>and as you can see, vm_map_pageable() starts by calling vm_map_lock(map); (if lock_map argument is true, which it is in this case)
<solid_black>...oh no no no, wait
<solid_black>it does take the other if branch inside vm_map_pageable_scan() of course!
<solid_black>let me thing what that means for a minute
<solid_black>yeah, that's fine
<solid_black>still, it locks up, with all the threads waiting for the lock
<solid_black>let me see if I can inspect this from GDB actually
<solid_black>how do I enable MACH_LDEBUG?
<damo22>you set it to 1 in configfrag.ac
<damo22>but it doesnt all work properly yet
<solid_black>so by just patching the .ac, there's no user-accessible --enable-ldebug option or something like that?
<damo22>it still triggers some in_interrupt assert
<solid_black>I'll patch that assert out
<damo22>ok
<damo22>yea theres no variable for it
<damo22>is the vm_map lock supposed to be a sleep lock?
<damo22>or just a spin lock
<damo22>with write/read etc
<damo22>we can set can_sleep = 0
<damo22>maybe that will fix it
<solid_black>it is a sleep lock, yes
<solid_black>the issue is not that the VM subsystem is using locks improperly, it seems to be somehing wrong with the locks themselves
<solid_black>if you make the VM subsystem not use sleeping locks, the issue will reproduce somewhere else that uses sleeping locks
<solid_black>we better find the root cause
<damo22>ok
<solid_black>ok, rebuilt with MACH_LDEBUG, let's see how it fares
<damo22>nothing else uses sleep locks
<solid_black>huh? from a quick glance, i386/intel/pmap.c and ipc/ipc_space.h also do
<damo22>ah yes
<damo22>ipc_space.h does
<solid_black>ok, repro'ed with MACH_LDEBUG, let's see
<solid_black>how do I enter KDB again?
<damo22>pmap does lock_init(&pmap_system_lock, FALSE); /* NOT a sleep lock */
<damo22>ctrl shift d
<solid_black>thx
<damo22>or is it alt
<damo22>ctrl alt d
<damo22>sorry yes
<solid_black>(gdb) p ipc_kernel_map->lock
<solid_black>$4 = {thread = 0xffffffff, read_count = 0, want_upgrade = 0, want_write = 1, waiting = 1, can_sleep = 1, recursion_depth = 0, writer = 0xf6484de8, interlock = {
<solid_black> lock_data = 0, is_a_simple_lock = {<No data fields>}}}
<solid_black>so nobody holds it for reading, and there is a single writer
<solid_black>which is one of the threads of this same task, sure
<solid_black>actually no, it's not
<solid_black>yes, yes it, is, ok
<solid_black>and this writer thread is stuck inside vm_fault_unwire
<solid_black>let me look at that
<solid_black>but yes, it makes sense that it would hold a write lock a this point
<solid_black>and that calls vm_fault_page, and that calls thread_block, with the VM map lock still held
<solid_black>let me try to understand why it does that
<damo22>im glad you can debug this, i cant get gdb to attach
<solid_black>huh? you just pass -s to qemu, and then "gdb ./gnumach", "tar ext :1234"
<damo22>uhhh, thats for debugging gnumach
<solid_black>(a tiny bit harder over here on aarch64, since we're trying to do PIC)
<solid_black>yes, it's gnumach we're interested in
<solid_black>userland is fine
<damo22>oh rihgt
<solid_black>you *can* debug userland the same way, but that's much less convenient
<solid_black>but that's how I brought up Hurd userland on x86_64, by debugging userland startup using gdb attached to qemu
<solid_black>so an interesting thing is,
<damo22>ok
<solid_black>(gdb) p ipc_kernel_map->lock->writer->event_key
<solid_black>$8 = (event_t) 0x20009
<solid_black>but that's not a valid kernel address, certainly not one of a vm_page_t
<damo22>it uses a hash not an address?
<solid_black>good point
<solid_black>no, it doesn't, look at kern/sched_prim.c:assert_wait(): index = wait_hash(event), but thread->wait_event = event
<solid_black>and it is an event_t, too
<damo22>#define wait_hash(event) \
<damo22> ((((long)(event) < 0) ? ~(long)(event) : (long)(event)) % NUMQUEUES)
<solid_black>ah, it's ->event_key, and I thought I was looking at ->wait_event
<solid_black>(gdb) p *(vm_page_t) ipc_kernel_map->lock->writer->wait_event
<solid_black>$13 = {node = {prev = 0xc109b110 <vm_page_segs+1936>, next = 0xf7f21208}, priv = 0x0, phys_addr = 1945075712, listq = {next = 0xf64f4168, prev = 0xf64f4168},
<solid_black> next = 0x0, vm_page_header = {<No data fields>}, object = 0xf64f4168, offset = 0, wire_count = 1, inactive = 0, active = 0, laundry = 0, external_laundry = 0,
<solid_black> free = 0, reference = 0, external = 0, busy = 1, wanted = 1, tabled = 1, fictitious = 0, private = 0, absent = 0, error = 0, dirty = 0, precious = 0,
<solid_black> overwriting = 0, page_lock = 0, unlock_request = 0, vm_page_footer = {<No data fields>}, type = 3, seg_index = 2, order = 12}
<solid_black>it does have busy = 1, which is why vm_fault_page decides to block
<damo22>solid_black: how many threads did you make it run
<solid_black>8
<damo22>maybe try with 3
<solid_black>+ the implicit msg thread
<solid_black>why? the point was to reproduce it, and we did
<damo22>sure, but now you need to follow up 8 threads
<damo22>if it deadlocks with just 3 or 4 thats less to debug
<solid_black>ok, let me try to describe my current understanding of the deadlock
<solid_black>multiple threads call vm_map_pageable on the same map (ipc_kernel_map) concurrently
<solid_black>since it's a kernel map, the "HACK HACK HACK HACK" logic in vm_map_pageable_scan() fully unlocks the map before calling into vm_fault_wire()
<solid_black>which lets the other threads grab the same lock and do the same thing, and so on
<solid_black>now, one of the threads is *unwiring* the pages, not wiring them in (evidently, it has already progressed further inside mach_port_names)
<solid_black>unwiring is done with the full map lock still held
<solid_black>so the other threads are waiting on this one
<solid_black>but, one of the other threads must have gotten to this same vm_page first, and set its busy bit
<solid_black>now it's sleeping in vm_fault() -> vm_map_verify(), waiting for the lock
<solid_black>and the writer thread holds the lock, and is sleeping, waiting for the busy bit on the page to go away
<solid_black>the question would be, why are two different threads trying to wire/unwire the same vm_page?
<damo22>probably the vm_deallocate in my test
<damo22>it requests memory and then frees it
<solid_black>no, I mean, whatever userland does, should not cause a kernel deadlock
<solid_black>also this is not vm_deallocate, all of these threads are inside mach_port_names
<damo22>ok
<solid_black>please stop questioning userspace, the userpsace is fine, it's a kernel bug
<damo22>im not questioning userspace, im trying to give you a reason why two threads are wiring/unwiring the same page
<solid_black>vm_deallocate doesn't wire/unwire anyway
<solid_black>can I view local variables with KDB in any way?
<damo22>globals you can
<youpi>you can inspect the stack by hand
<solid_black>globals, I see in GDB too
<youpi>if you know where the locals are :)
<solid_black>oh hi youpi
<solid_black>does the above ^^^ make sense to you?
<youpi>just passing by, I'm going to work
<youpi>(and I don't really know that code so I guess you now know more about it than I do)
<damo22>solid_black: is there any lock needed to set the busy bit?
<solid_black>yes, you must hold the lock for the VM object the page belongs to
<damo22>so are two or more threads referencing the same VM object?
<damo22>only one of them can get the lock for that
<solid_black>but the lock is not locked at the time of the deadlock
<solid_black>you lock the lock, set the busy bit, unlock the lock
<damo22>ok
<solid_black>why does unwiring even need the full write lock?
<solid_black>the comment in vm_map.c says "Note that unwiring faults can be performed while holding a write lock on the map. A wiring fault can only be done with a read lock."
<damo22>what is unwiring ? is it freeing memory from the vm subsystem?
<solid_black>the "We must do this in two passes" comment below describes precisely the situation we have, except that it's talking about wiring, and we have unwiring
<solid_black>wiring/unwiring refers to marking vm_pages as pageable or not
<solid_black>if a page is, it cannot be paged/swapped out
<solid_black>if a page is _wired_
<solid_black>if it's not wired, it can be paged out
<solid_black>or be absent in the first place, and created on first access
<solid_black>an example of where wiring is needed is for the code of bootstrap tasks, such as ext2fs
<solid_black>or rumpdisk I guess
<damo22>so if you are unwiring memory, you need the write lock because you need to be the only writer of that page to remove it
<damo22>otherwise someone else could be using it?
<solid_black>1. unwiring doesn't actually mean paging out the page here and now, it just means clearing the "can not be paged out" state
<solid_black>2. the issue is with locking *the vm map*, not an individual page
<damo22>i see
<solid_black>I mean, the page too, via the busy bit
<solid_black>but the talk about read/write locking referes to the map
<damo22>why do we need to lock the whole map?
<solid_black>that's the question exactly :)
<solid_black>open vm_map.c and look for the comment that starts with "We must do this in two passes:"
<solid_black>it describes a potential deadlock similar to ours
<solid_black>except it talks about wiring, not unwiring
<solid_black>hm, but the case it describes is actually different, I think
<solid_black>they talk about another thread faulting on the page that is to be wired (and is not present, otherwise it wouldn't fault)
<solid_black>but our page is still wired, and I don't see any thread faulting on it
<solid_black>it's busy for some other reason, as if another thread is trying to wire it back indeed
<damo22>what if kernel threads are not well behaved, due to ipc happening so fast
<damo22>so the HACK thing might be broken
<damo22>is ipc multithreaded?
<solid_black>everything is multithreaded, yes
<damo22>inside gnumach?
<solid_black>yes
<damo22>so its possible there are multiple threads doing ipc concurrently
<solid_black>the point that "HACK HACK HACK HACK" makes about the kernel threads being well behaved
<solid_black>is not that multiple threads wouldn't access the ipc_kernel_map concurrently; that's fine and they do that indeed (including in our case), because of the locking
<solid_black>it's the kernel code won't "do anything destructive to this region of the map"
<damo22>but what if one ipc thread tries to write to the same region of the vm map just as the lock is released in HACK?
<solid_black>why would it do that? that's the point of kernel code being well-behaved
<solid_black>, that it doesn't do that
<solid_black>each mach_port_names() invocation only accesses the region of virtual memory that it vm_allocate'd for itself in the ipc_kernel_map
<solid_black>and not others' regions
<damo22>aha
<damo22>that makes more sense
<solid_black>whereas you can't trust userland to do the same
<solid_black>userland could try to wire a region from one thread, and deallocate it from another
<damo22>yup
<damo22>so where is unwiring
<solid_black>hm, is it possible to wire pages using the non-priv host port? that's sounds like an easy DoS
<solid_black>not that Mach lacks ways to DoS the system...
<damo22>lets focus on the deadlock
<youpi>solid_black: yes but there's a limit per task
<solid_black>ok
<damo22>could it be an interaction with the two sleep lock invocations?
<damo22>ipc one and vm one
<damo22>ie, too much stuff goes to sleep and never wakes
<solid_black>so fwiw, the thread doing *un*wiring calls vm_map_pageable() for a single page
<solid_black>and it's that single page that has the busy bit set
<solid_black>(off to lunch)
<gnu_srs2>youpi: #ifdef _IOT_ifreq_int #error ok #endif; #ifdef _IOT_ifreq_short #error ok #endif: 954 | #error ok; 957 | #error ok
<gnu_srs2>Still: if-hurd.c:959:23: error: invalid application of ‘sizeof’ to incomplete type ‘struct ifreq_int’. 959 | if (ioctl(fd, SIOCGIFNAME, &ifr) == -1)
<gnu_srs2>struct ifreq ifr;
<youpi>you didn't define _IOT_ifreq_int with _IOTS or _IOT_SIMPLE, right?
<youpi>you could add
<youpi>#define _IOT_ifreq_int foobar
<youpi>to make the preprocessor complain that it's already defined, and tell you where it was defined
<youpi>to be sure exactly what definition it is taking
<gnu_srs2>/usr/include/net/if.h:168: note: this is the location of the previous definition; 168 | # define _IOT_ifreq_int _IOT(_IOTS(char),IFNAMSIZ,_IOTS(int),1,0,0)
<damo22> https://git.sceen.net/rbraun/x15.git/tree/kern/thread.c#n2561
<youpi>gnu_srs2: I'm puzzled. Maybe do the same with the other macros: #define SIOCGIFNAME foobar #define _IOWR(g,n,t) foobar #define _IOC_ENCODE_TYPE(t) foobar #define _IOTBASE_struct foobar
<damo22>i think i wrote that IOT
<youpi>which IOT ?
<damo22>(09:10:39 PM) gnu_srs2: /usr/include/net/if.h:168: note: this is the location of the previous definition; 168 | # define _IOT_ifreq_int _IOT(_IOTS(char),IFNAMSIZ,_IOTS(int),1,0,0)
<youpi>da0debaa44d (Ulrich Drepper 2001-06-26 04:59:41 +0000 169)# define _IOT_ifreq_int _IOT(_IOTS(char),IFNAMSIZ,_IOTS(int),1,0,0)
<youpi>(patch actually by Mark Kettenis <kettenis@wins.uva.nl>)
<damo22>i wrote something similar then
<damo22>for the routing
<youpi>for the ifrtreq probably
<damo22>ea
<damo22>yeah
<gnu_srs2>/usr/include/i386-gnu/bits/ioctls.h: #define _IOTS(type) (sizeof (type) == 8 ? IOC_64 : (enum __ioctl_datum) (sizeof (type) >> 1))
<youpi>gnu_srs2: yes, but that's not supposed to be called on struct ifreq_int, that's the point
<gnu_srs2>/usr/include/i386-gnu/device/input.h: 43 | #define _IOWR(g,n,t) _IOC(IOC_INOUT, (g), (n), sizeof(t))
<youpi>precisely because we have _IOT_ifreq_int defined
<youpi>aha
<youpi>that's the problem
<youpi>it's not using the right _IOWR
<youpi>how is this header included ?
<gnu_srs2>#include <device/input.h>
<youpi>ok, we need to scratch that
<youpi>(the spurious _IOWR declaration from input.h)
<damo22>sleep locks in hurd are recursive, and "braunr: if the recursion level is greater than 1, unlocking doesn't mean you can sleep"
<damo22>so theres a problem with our sleep locking code for smp
<youpi>gnu_srs2: so for now, just drop that _IOWR definition from input.h
<gnu_srs2>/usr/include/i386-gnu/bits/ioctls.h:#define _IOWR(g, n, t) _IOC (IOC_INOUT, (g), (n), _IOC_ENCODE_TYPE (t)) and
<gnu_srs2>/usr/include/i386-gnu/device/input.h:#define _IOWR(g,n,t) _IOC(IOC_INOUT, (g), (n), sizeof(t))
<solid_black>ACTION is back
<youpi>gnu_srs2: yes it's only the former that is correct
<youpi>the latter is spurious and we need to stop shipping it to userspace
<gnu_srs2>Should I remove inclusion of device/input.h or edit the header file?
<youpi>no, edit the header file
<gnu_srs2>OK
<youpi>as I said: so for now, just drop that _IOWR definition from input.h
<solid_black>damo22: you mean Mach, not Hurd; also sleeping locks are recursive only when you enable that explicitly (lock_set_recursive)
<youpi>gnu_srs2: though I'm wondering why you are including device/input.h ?
<damo22>solid_black: aha youre right
<damo22>solid_black: vm_map does use recursive sleep lokcs
<solid_black>it does, yes
<solid_black>but that doesn't seem to be the issue here
<solid_black>since due to the HACK HACK HACK, it just unlocks the map, instead of making the lock recursive
<solid_black>I really want to inspect another thread's local variables, how do I do that? :|
<damo22>thread 2
<damo22>frame x
<youpi>solid_black: as I said, you can inspect the stack by hand
<solid_black>no, that switch CPU cores
<youpi>in kdb
<solid_black>yes, if I knew the addresses
<youpi>you mean if you knew %sp ?
<youpi>or if you knew the offset from %sp ?
<solid_black>sp I could probably find
<solid_black>the offset from sp, yes
<youpi>(simpler through bp, actually)
<youpi>for that you can just look at the code
<youpi>ask gdb for the asm for an instruction that accesses the variable
<youpi>and you can see what offset it uses from bp
<solid_black>in any case this is not inside the bottommost frame (that one is switch_context)
<gnu_srs2>youpi: I don't really remember. Not including that header file any more!
<solid_black>could I, for instance, forcibly switch to another thread from gdb?
<youpi>solid_black: that'll be more involved, but you can go through the bp chain, to get the bp of the function you're interested in
<solid_black>I guess I could, I just need to restore %esp and %eip
<youpi>that's possible, I had done such a thing during my phd
<youpi>you'll need to restore %ebp as well
<youpi>otherwise stack unwinding will get lost
<solid_black>yes
<youpi>also, local variable may be in registers
<youpi>then you need to restore all registers
<solid_black>it can't possibly be in regsiters, exactly because we've context-switched to another thread, so it must be spilled somewhere
<youpi>there are callee-saved registers
<solid_black>either in the same frame, or inside switch_context if it's callee-saved
<youpi>that's what I mean
<solid_black>ok
<youpi>you need to restore the registers from the context switch
<youpi>ok, during my phd we were just restoring ebp, esp, eip
<youpi>that wasn't bringing correct values for local variables in registers
<youpi>but in my memory we were getting good backtraces
<solid_black>maybe it'd be easier to instrument the code to add printfs, and hope it still reproduces
<youpi>possibly
<youpi>what I sometimes do is writing at 0xb8000
<damo22>putchar uses gsync_wait
<youpi>damo22: I guess he meant kernel printf
<damo22>ok
<youpi>writing to 0xb8000 usually doesn't hide heisenbugs
<solid_black>it really is another thread, inside vm_fault_wire(), marking this page as busy
<solid_black>the same page that is being unwired by this thread who holds the lock on the map
<solid_black>the two threads use different virtual addresses to do that
<solid_black>and the two pages at these addresses appear to have different contents, so it's not the same physical page
<solid_black>and yet it seems to be the same vp_page
<solid_black>what gives?
<solid_black>the physical page appears to be the one that is accessible through vm_fault_unwire()'s address (good)
<solid_black>huh? now I seemingly have two threads actually doing vm_fault_wire() / vm_fault_unwire() on the exact same address range
<solid_black>that is certainly not "well-behaved"
<solid_black>one of these two threads is doing alternating pairs of vm_fault_wire/vm_faul_unwire, as expected
<solid_black>but the other is only doing vm_fault_wire()
<solid_black>???
<youpi>solid_black: could it be that the vm_allocate part isn't properly locked and thus the two threads end up allocating the same memory ?
<solid_black>youpi: yes, that's the only theory I have, and yet vm_map_enter() seems to be doing locking properly
<Pellescours>"