IRC channel logs
2024-02-19.log
back to list of logs
<youpi>damo22: better use simple_lock_irq() rather than spl + simple_lock_nocheck <youpi>(e.g. about the quantum adjustment) <youpi>and concerning functions that assume running at splsched and then use simple_lock_nocheck, add a call to assert_splsched() so we are sure of this <youpi>damo22: the db_spl variable looks bogus, you probably need one per cpu <youpi>ah, no, it's actually protected by the lock itself <youpi>I have applied part of your branch <damo22>solid_black: any ideas why my test is locking up? <damo22>i changed it so it only uses omp to spawn threads: <damo22> 65 ./test.inf(817) (f6101100): 8 threads: <damo22>should i remove the print calls? <solid_black>damo22: yes, I've tried that yesterday (#pragma omp parallel + no printf) on Samuel's advice, and it still locks up, but this time not inside gsync <solid_black>and the way it locks up makes me think the locking implementation is broken somehow <damo22>the sleep locks have a comment there that says this works on UP <solid_black>then I looked at kern/lock.c, but couldn't find what was wrong <solid_black>specifically, what happens is, multiple threads get stuck inside mach_port_names() -> vm_map_pageable(ipc_kernel_map, ...) <damo22> * Sleep locks. These use the same data structure and algorithm <damo22> * as the spin locks, but the process sleeps while it is waiting <damo22> * for the lock. These work on uniprocessor systems. <solid_black>and as you can see, vm_map_pageable() starts by calling vm_map_lock(map); (if lock_map argument is true, which it is in this case) <solid_black>it does take the other if branch inside vm_map_pageable_scan() of course! <solid_black>still, it locks up, with all the threads waiting for the lock <damo22>you set it to 1 in configfrag.ac <damo22>but it doesnt all work properly yet <solid_black>so by just patching the .ac, there's no user-accessible --enable-ldebug option or something like that? <damo22>it still triggers some in_interrupt assert <damo22>is the vm_map lock supposed to be a sleep lock? <solid_black>the issue is not that the VM subsystem is using locks improperly, it seems to be somehing wrong with the locks themselves <solid_black>if you make the VM subsystem not use sleeping locks, the issue will reproduce somewhere else that uses sleeping locks <solid_black>ok, rebuilt with MACH_LDEBUG, let's see how it fares <solid_black>huh? from a quick glance, i386/intel/pmap.c and ipc/ipc_space.h also do <damo22>pmap does lock_init(&pmap_system_lock, FALSE); /* NOT a sleep lock */ <solid_black>$4 = {thread = 0xffffffff, read_count = 0, want_upgrade = 0, want_write = 1, waiting = 1, can_sleep = 1, recursion_depth = 0, writer = 0xf6484de8, interlock = { <solid_black> lock_data = 0, is_a_simple_lock = {<No data fields>}}} <solid_black>so nobody holds it for reading, and there is a single writer <solid_black>and this writer thread is stuck inside vm_fault_unwire <solid_black>but yes, it makes sense that it would hold a write lock a this point <solid_black>and that calls vm_fault_page, and that calls thread_block, with the VM map lock still held <damo22>im glad you can debug this, i cant get gdb to attach <solid_black>huh? you just pass -s to qemu, and then "gdb ./gnumach", "tar ext :1234" <damo22>uhhh, thats for debugging gnumach <solid_black>(a tiny bit harder over here on aarch64, since we're trying to do PIC) <solid_black>you *can* debug userland the same way, but that's much less convenient <solid_black>but that's how I brought up Hurd userland on x86_64, by debugging userland startup using gdb attached to qemu <solid_black>but that's not a valid kernel address, certainly not one of a vm_page_t <solid_black>no, it doesn't, look at kern/sched_prim.c:assert_wait(): index = wait_hash(event), but thread->wait_event = event <damo22> ((((long)(event) < 0) ? ~(long)(event) : (long)(event)) % NUMQUEUES) <solid_black>ah, it's ->event_key, and I thought I was looking at ->wait_event <solid_black>(gdb) p *(vm_page_t) ipc_kernel_map->lock->writer->wait_event <solid_black>$13 = {node = {prev = 0xc109b110 <vm_page_segs+1936>, next = 0xf7f21208}, priv = 0x0, phys_addr = 1945075712, listq = {next = 0xf64f4168, prev = 0xf64f4168}, <solid_black> next = 0x0, vm_page_header = {<No data fields>}, object = 0xf64f4168, offset = 0, wire_count = 1, inactive = 0, active = 0, laundry = 0, external_laundry = 0, <solid_black> free = 0, reference = 0, external = 0, busy = 1, wanted = 1, tabled = 1, fictitious = 0, private = 0, absent = 0, error = 0, dirty = 0, precious = 0, <solid_black> overwriting = 0, page_lock = 0, unlock_request = 0, vm_page_footer = {<No data fields>}, type = 3, seg_index = 2, order = 12} <solid_black>it does have busy = 1, which is why vm_fault_page decides to block <damo22>solid_black: how many threads did you make it run <damo22>sure, but now you need to follow up 8 threads <damo22>if it deadlocks with just 3 or 4 thats less to debug <solid_black>ok, let me try to describe my current understanding of the deadlock <solid_black>multiple threads call vm_map_pageable on the same map (ipc_kernel_map) concurrently <solid_black>since it's a kernel map, the "HACK HACK HACK HACK" logic in vm_map_pageable_scan() fully unlocks the map before calling into vm_fault_wire() <solid_black>which lets the other threads grab the same lock and do the same thing, and so on <solid_black>now, one of the threads is *unwiring* the pages, not wiring them in (evidently, it has already progressed further inside mach_port_names) <solid_black>but, one of the other threads must have gotten to this same vm_page first, and set its busy bit <solid_black>now it's sleeping in vm_fault() -> vm_map_verify(), waiting for the lock <solid_black>and the writer thread holds the lock, and is sleeping, waiting for the busy bit on the page to go away <solid_black>the question would be, why are two different threads trying to wire/unwire the same vm_page? <damo22>probably the vm_deallocate in my test <damo22>it requests memory and then frees it <solid_black>no, I mean, whatever userland does, should not cause a kernel deadlock <solid_black>also this is not vm_deallocate, all of these threads are inside mach_port_names <solid_black>please stop questioning userspace, the userpsace is fine, it's a kernel bug <damo22>im not questioning userspace, im trying to give you a reason why two threads are wiring/unwiring the same page <youpi>you can inspect the stack by hand <youpi>if you know where the locals are :) <youpi>just passing by, I'm going to work <youpi>(and I don't really know that code so I guess you now know more about it than I do) <damo22>solid_black: is there any lock needed to set the busy bit? <solid_black>yes, you must hold the lock for the VM object the page belongs to <damo22>so are two or more threads referencing the same VM object? <damo22>only one of them can get the lock for that <solid_black>but the lock is not locked at the time of the deadlock <solid_black>you lock the lock, set the busy bit, unlock the lock <solid_black>the comment in vm_map.c says "Note that unwiring faults can be performed while holding a write lock on the map. A wiring fault can only be done with a read lock." <damo22>what is unwiring ? is it freeing memory from the vm subsystem? <solid_black>the "We must do this in two passes" comment below describes precisely the situation we have, except that it's talking about wiring, and we have unwiring <solid_black>wiring/unwiring refers to marking vm_pages as pageable or not <solid_black>or be absent in the first place, and created on first access <solid_black>an example of where wiring is needed is for the code of bootstrap tasks, such as ext2fs <damo22>so if you are unwiring memory, you need the write lock because you need to be the only writer of that page to remove it <damo22>otherwise someone else could be using it? <solid_black>1. unwiring doesn't actually mean paging out the page here and now, it just means clearing the "can not be paged out" state <solid_black>2. the issue is with locking *the vm map*, not an individual page <solid_black>but the talk about read/write locking referes to the map <damo22>why do we need to lock the whole map? <solid_black>open vm_map.c and look for the comment that starts with "We must do this in two passes:" <solid_black>hm, but the case it describes is actually different, I think <solid_black>they talk about another thread faulting on the page that is to be wired (and is not present, otherwise it wouldn't fault) <solid_black>but our page is still wired, and I don't see any thread faulting on it <solid_black>it's busy for some other reason, as if another thread is trying to wire it back indeed <damo22>what if kernel threads are not well behaved, due to ipc happening so fast <damo22>so the HACK thing might be broken <damo22>so its possible there are multiple threads doing ipc concurrently <solid_black>the point that "HACK HACK HACK HACK" makes about the kernel threads being well behaved <solid_black>is not that multiple threads wouldn't access the ipc_kernel_map concurrently; that's fine and they do that indeed (including in our case), because of the locking <solid_black>it's the kernel code won't "do anything destructive to this region of the map" <damo22>but what if one ipc thread tries to write to the same region of the vm map just as the lock is released in HACK? <solid_black>why would it do that? that's the point of kernel code being well-behaved <solid_black>each mach_port_names() invocation only accesses the region of virtual memory that it vm_allocate'd for itself in the ipc_kernel_map <solid_black>userland could try to wire a region from one thread, and deallocate it from another <solid_black>hm, is it possible to wire pages using the non-priv host port? that's sounds like an easy DoS <youpi>solid_black: yes but there's a limit per task <damo22>could it be an interaction with the two sleep lock invocations? <damo22>ie, too much stuff goes to sleep and never wakes <solid_black>so fwiw, the thread doing *un*wiring calls vm_map_pageable() for a single page <gnu_srs2>youpi: #ifdef _IOT_ifreq_int #error ok #endif; #ifdef _IOT_ifreq_short #error ok #endif: 954 | #error ok; 957 | #error ok <gnu_srs2>Still: if-hurd.c:959:23: error: invalid application of ‘sizeof’ to incomplete type ‘struct ifreq_int’. 959 | if (ioctl(fd, SIOCGIFNAME, &ifr) == -1) <youpi>you didn't define _IOT_ifreq_int with _IOTS or _IOT_SIMPLE, right? <youpi>#define _IOT_ifreq_int foobar <youpi>to make the preprocessor complain that it's already defined, and tell you where it was defined <youpi>to be sure exactly what definition it is taking <gnu_srs2>/usr/include/net/if.h:168: note: this is the location of the previous definition; 168 | # define _IOT_ifreq_int _IOT(_IOTS(char),IFNAMSIZ,_IOTS(int),1,0,0) <youpi>gnu_srs2: I'm puzzled. Maybe do the same with the other macros: #define SIOCGIFNAME foobar #define _IOWR(g,n,t) foobar #define _IOC_ENCODE_TYPE(t) foobar #define _IOTBASE_struct foobar <damo22>(09:10:39 PM) gnu_srs2: /usr/include/net/if.h:168: note: this is the location of the previous definition; 168 | # define _IOT_ifreq_int _IOT(_IOTS(char),IFNAMSIZ,_IOTS(int),1,0,0) <youpi>da0debaa44d (Ulrich Drepper 2001-06-26 04:59:41 +0000 169)# define _IOT_ifreq_int _IOT(_IOTS(char),IFNAMSIZ,_IOTS(int),1,0,0) <youpi>(patch actually by Mark Kettenis <kettenis@wins.uva.nl>) <gnu_srs2>/usr/include/i386-gnu/bits/ioctls.h: #define _IOTS(type) (sizeof (type) == 8 ? IOC_64 : (enum __ioctl_datum) (sizeof (type) >> 1)) <youpi>gnu_srs2: yes, but that's not supposed to be called on struct ifreq_int, that's the point <gnu_srs2>/usr/include/i386-gnu/device/input.h: 43 | #define _IOWR(g,n,t) _IOC(IOC_INOUT, (g), (n), sizeof(t)) <youpi>precisely because we have _IOT_ifreq_int defined <youpi>it's not using the right _IOWR <youpi>how is this header included ? <youpi>(the spurious _IOWR declaration from input.h) <damo22>sleep locks in hurd are recursive, and "braunr: if the recursion level is greater than 1, unlocking doesn't mean you can sleep" <damo22>so theres a problem with our sleep locking code for smp <youpi>gnu_srs2: so for now, just drop that _IOWR definition from input.h <gnu_srs2>/usr/include/i386-gnu/bits/ioctls.h:#define _IOWR(g, n, t) _IOC (IOC_INOUT, (g), (n), _IOC_ENCODE_TYPE (t)) and <gnu_srs2>/usr/include/i386-gnu/device/input.h:#define _IOWR(g,n,t) _IOC(IOC_INOUT, (g), (n), sizeof(t)) <youpi>gnu_srs2: yes it's only the former that is correct <youpi>the latter is spurious and we need to stop shipping it to userspace <gnu_srs2>Should I remove inclusion of device/input.h or edit the header file? <youpi>as I said: so for now, just drop that _IOWR definition from input.h <solid_black>damo22: you mean Mach, not Hurd; also sleeping locks are recursive only when you enable that explicitly (lock_set_recursive) <youpi>gnu_srs2: though I'm wondering why you are including device/input.h ? <damo22>solid_black: vm_map does use recursive sleep lokcs <solid_black>since due to the HACK HACK HACK, it just unlocks the map, instead of making the lock recursive <solid_black>I really want to inspect another thread's local variables, how do I do that? :| <youpi>solid_black: as I said, you can inspect the stack by hand <youpi>or if you knew the offset from %sp ? <youpi>(simpler through bp, actually) <youpi>for that you can just look at the code <youpi>ask gdb for the asm for an instruction that accesses the variable <youpi>and you can see what offset it uses from bp <solid_black>in any case this is not inside the bottommost frame (that one is switch_context) <gnu_srs2>youpi: I don't really remember. Not including that header file any more! <solid_black>could I, for instance, forcibly switch to another thread from gdb? <youpi>solid_black: that'll be more involved, but you can go through the bp chain, to get the bp of the function you're interested in <solid_black>I guess I could, I just need to restore %esp and %eip <youpi>that's possible, I had done such a thing during my phd <youpi>you'll need to restore %ebp as well <youpi>otherwise stack unwinding will get lost <youpi>also, local variable may be in registers <youpi>then you need to restore all registers <solid_black>it can't possibly be in regsiters, exactly because we've context-switched to another thread, so it must be spilled somewhere <youpi>there are callee-saved registers <solid_black>either in the same frame, or inside switch_context if it's callee-saved <youpi>you need to restore the registers from the context switch <youpi>ok, during my phd we were just restoring ebp, esp, eip <youpi>that wasn't bringing correct values for local variables in registers <youpi>but in my memory we were getting good backtraces <solid_black>maybe it'd be easier to instrument the code to add printfs, and hope it still reproduces <youpi>what I sometimes do is writing at 0xb8000 <youpi>damo22: I guess he meant kernel printf <youpi>writing to 0xb8000 usually doesn't hide heisenbugs <solid_black>it really is another thread, inside vm_fault_wire(), marking this page as busy <solid_black>the same page that is being unwired by this thread who holds the lock on the map <solid_black>the two threads use different virtual addresses to do that <solid_black>and the two pages at these addresses appear to have different contents, so it's not the same physical page <solid_black>the physical page appears to be the one that is accessible through vm_fault_unwire()'s address (good) <solid_black>huh? now I seemingly have two threads actually doing vm_fault_wire() / vm_fault_unwire() on the exact same address range <solid_black>one of these two threads is doing alternating pairs of vm_fault_wire/vm_faul_unwire, as expected <youpi>solid_black: could it be that the vm_allocate part isn't properly locked and thus the two threads end up allocating the same memory ? <solid_black>youpi: yes, that's the only theory I have, and yet vm_map_enter() seems to be doing locking properly