IRC channel logs

<youpi>damo22: better use simple_lock_irq() rather than spl + simple_lock_nocheck

<youpi>(e.g. about the quantum adjustment)

<youpi>and concerning functions that assume running at splsched and then use simple_lock_nocheck, add a call to assert_splsched() so we are sure of this

<youpi>damo22: the db_spl variable looks bogus, you probably need one per cpu

<youpi>ah, no, it's actually protected by the lock itself

<youpi>I have applied part of your branch

<damo22>:)

<damo22>youpi: i have updated my branch

<solid_black>hi

<saravia>hi solid_black :D

<damo22>gday

<saravia>hi damo22 :D

<damo22>whats up?

<damo22>solid_black: any ideas why my test is locking up?

<damo22>i changed it so it only uses omp to spawn threads:

<damo22> 65 ./test.inf(817) (f6101100): 8 threads:

<damo22> 0 (f6472da8) .W..NF 0xf5f2c8f8

<damo22> 1 (f6495220) .W..N. 0xf5f2c8f8

<damo22> 2 (f64953a0) .W..N. 0xf5f2c8f8

<damo22> 3 (f64956a0) .W..N. 0xf5f2c8f8

<damo22> 4 (f5fc2dc0) .W..N. 0xf5f2c8f8

<damo22> 5 (f5fc2640) .W..N. 0xf5f2c8f8

<damo22> 6 (f5f68558) .W..N. 0xf5f2c8f8

<damo22> 7 (f5f68258) .W..N. 0xf5f2c8f8

<damo22>it still blocks on gsync_wait

<damo22>printf seems to call gsync_wait

<damo22>should i remove the print calls?

<solid_black>damo22: yes, I've tried that yesterday (#pragma omp parallel + no printf) on Samuel's advice, and it still locks up, but this time not inside gsync

<solid_black>it's inside the VM subsystem

<damo22>oh man

<solid_black>and the way it locks up makes me think the locking implementation is broken somehow

<damo22>the sleep locks have a comment there that says this works on UP

<solid_black>then I looked at kern/lock.c, but couldn't find what was wrong

<solid_black>specifically, what happens is, multiple threads get stuck inside mach_port_names() -> vm_map_pageable(ipc_kernel_map, ...)

<damo22> * Sleep locks. These use the same data structure and algorithm

<damo22> * as the spin locks, but the process sleeps while it is waiting

<damo22> * for the lock. These work on uniprocessor systems.

<solid_black>and as you can see, vm_map_pageable() starts by calling vm_map_lock(map); (if lock_map argument is true, which it is in this case)

<solid_black>...oh no no no, wait

<solid_black>it does take the other if branch inside vm_map_pageable_scan() of course!

<solid_black>let me thing what that means for a minute

<solid_black>yeah, that's fine

<solid_black>still, it locks up, with all the threads waiting for the lock

<solid_black>let me see if I can inspect this from GDB actually

<solid_black>how do I enable MACH_LDEBUG?

<damo22>you set it to 1 in configfrag.ac

<damo22>but it doesnt all work properly yet

<solid_black>so by just patching the .ac, there's no user-accessible --enable-ldebug option or something like that?

<damo22>it still triggers some in_interrupt assert

<solid_black>I'll patch that assert out

<damo22>ok

<damo22>yea theres no variable for it

<damo22>is the vm_map lock supposed to be a sleep lock?

<damo22>or just a spin lock

<damo22>with write/read etc

<damo22>we can set can_sleep = 0

<damo22>maybe that will fix it

<solid_black>it is a sleep lock, yes

<solid_black>the issue is not that the VM subsystem is using locks improperly, it seems to be somehing wrong with the locks themselves

<solid_black>if you make the VM subsystem not use sleeping locks, the issue will reproduce somewhere else that uses sleeping locks

<solid_black>we better find the root cause

<damo22>ok

<solid_black>ok, rebuilt with MACH_LDEBUG, let's see how it fares

<damo22>nothing else uses sleep locks

<solid_black>huh? from a quick glance, i386/intel/pmap.c and ipc/ipc_space.h also do

<damo22>ah yes

<damo22>ipc_space.h does

<solid_black>ok, repro'ed with MACH_LDEBUG, let's see

<solid_black>how do I enter KDB again?

<damo22>pmap does lock_init(&pmap_system_lock, FALSE); /* NOT a sleep lock */

<damo22>ctrl shift d

<solid_black>thx

<damo22>or is it alt

<damo22>ctrl alt d

<damo22>sorry yes

<solid_black>(gdb) p ipc_kernel_map->lock

<solid_black>$4 = {thread = 0xffffffff, read_count = 0, want_upgrade = 0, want_write = 1, waiting = 1, can_sleep = 1, recursion_depth = 0, writer = 0xf6484de8, interlock = {

<solid_black> lock_data = 0, is_a_simple_lock = {<No data fields>}}}

<solid_black>so nobody holds it for reading, and there is a single writer

<solid_black>which is one of the threads of this same task, sure

<solid_black>actually no, it's not

<solid_black>yes, yes it, is, ok

<solid_black>and this writer thread is stuck inside vm_fault_unwire

<solid_black>let me look at that

<solid_black>but yes, it makes sense that it would hold a write lock a this point

<solid_black>and that calls vm_fault_page, and that calls thread_block, with the VM map lock still held

<solid_black>let me try to understand why it does that

<damo22>im glad you can debug this, i cant get gdb to attach

<solid_black>huh? you just pass -s to qemu, and then "gdb ./gnumach", "tar ext :1234"

<damo22>uhhh, thats for debugging gnumach

<solid_black>(a tiny bit harder over here on aarch64, since we're trying to do PIC)

<solid_black>yes, it's gnumach we're interested in

<solid_black>userland is fine

<damo22>oh rihgt

<solid_black>you *can* debug userland the same way, but that's much less convenient

<solid_black>but that's how I brought up Hurd userland on x86_64, by debugging userland startup using gdb attached to qemu

<solid_black>so an interesting thing is,

<damo22>ok

<solid_black>(gdb) p ipc_kernel_map->lock->writer->event_key

<solid_black>$8 = (event_t) 0x20009

<solid_black>but that's not a valid kernel address, certainly not one of a vm_page_t

<damo22>it uses a hash not an address?

<solid_black>good point

<solid_black>no, it doesn't, look at kern/sched_prim.c:assert_wait(): index = wait_hash(event), but thread->wait_event = event

<solid_black>and it is an event_t, too

<damo22>#define wait_hash(event) \

<damo22> ((((long)(event) < 0) ? ~(long)(event) : (long)(event)) % NUMQUEUES)

<solid_black>ah, it's ->event_key, and I thought I was looking at ->wait_event

<solid_black>(gdb) p *(vm_page_t) ipc_kernel_map->lock->writer->wait_event

<solid_black>$13 = {node = {prev = 0xc109b110 <vm_page_segs+1936>, next = 0xf7f21208}, priv = 0x0, phys_addr = 1945075712, listq = {next = 0xf64f4168, prev = 0xf64f4168},

<solid_black> next = 0x0, vm_page_header = {<No data fields>}, object = 0xf64f4168, offset = 0, wire_count = 1, inactive = 0, active = 0, laundry = 0, external_laundry = 0,

<solid_black> free = 0, reference = 0, external = 0, busy = 1, wanted = 1, tabled = 1, fictitious = 0, private = 0, absent = 0, error = 0, dirty = 0, precious = 0,

<solid_black> overwriting = 0, page_lock = 0, unlock_request = 0, vm_page_footer = {<No data fields>}, type = 3, seg_index = 2, order = 12}

<solid_black>it does have busy = 1, which is why vm_fault_page decides to block

<damo22>solid_black: how many threads did you make it run

<solid_black>8

<damo22>maybe try with 3

<solid_black>+ the implicit msg thread

<solid_black>why? the point was to reproduce it, and we did

<damo22>sure, but now you need to follow up 8 threads

<damo22>if it deadlocks with just 3 or 4 thats less to debug

<solid_black>ok, let me try to describe my current understanding of the deadlock

<solid_black>multiple threads call vm_map_pageable on the same map (ipc_kernel_map) concurrently

<solid_black>since it's a kernel map, the "HACK HACK HACK HACK" logic in vm_map_pageable_scan() fully unlocks the map before calling into vm_fault_wire()

<solid_black>which lets the other threads grab the same lock and do the same thing, and so on

<solid_black>now, one of the threads is *unwiring* the pages, not wiring them in (evidently, it has already progressed further inside mach_port_names)

<solid_black>unwiring is done with the full map lock still held

<solid_black>so the other threads are waiting on this one

<solid_black>but, one of the other threads must have gotten to this same vm_page first, and set its busy bit

<solid_black>now it's sleeping in vm_fault() -> vm_map_verify(), waiting for the lock

<solid_black>and the writer thread holds the lock, and is sleeping, waiting for the busy bit on the page to go away

<solid_black>the question would be, why are two different threads trying to wire/unwire the same vm_page?

<damo22>probably the vm_deallocate in my test

<damo22>it requests memory and then frees it

<solid_black>no, I mean, whatever userland does, should not cause a kernel deadlock

<solid_black>also this is not vm_deallocate, all of these threads are inside mach_port_names

<damo22>ok

<solid_black>please stop questioning userspace, the userpsace is fine, it's a kernel bug

<damo22>im not questioning userspace, im trying to give you a reason why two threads are wiring/unwiring the same page

<solid_black>vm_deallocate doesn't wire/unwire anyway

<solid_black>can I view local variables with KDB in any way?

<damo22>globals you can

<youpi>you can inspect the stack by hand

<solid_black>globals, I see in GDB too

<youpi>if you know where the locals are :)

<solid_black>oh hi youpi

<solid_black>does the above ^^^ make sense to you?

<youpi>just passing by, I'm going to work

<youpi>(and I don't really know that code so I guess you now know more about it than I do)

<damo22>solid_black: is there any lock needed to set the busy bit?

<solid_black>yes, you must hold the lock for the VM object the page belongs to

<damo22>so are two or more threads referencing the same VM object?

<damo22>only one of them can get the lock for that

<solid_black>but the lock is not locked at the time of the deadlock

<solid_black>you lock the lock, set the busy bit, unlock the lock

<damo22>ok

<solid_black>why does unwiring even need the full write lock?

<solid_black>the comment in vm_map.c says "Note that unwiring faults can be performed while holding a write lock on the map. A wiring fault can only be done with a read lock."

<damo22>what is unwiring ? is it freeing memory from the vm subsystem?

<solid_black>the "We must do this in two passes" comment below describes precisely the situation we have, except that it's talking about wiring, and we have unwiring

<solid_black>wiring/unwiring refers to marking vm_pages as pageable or not

<solid_black>if a page is, it cannot be paged/swapped out

<solid_black>if a page is _wired_

<solid_black>if it's not wired, it can be paged out

<solid_black>or be absent in the first place, and created on first access

<solid_black>an example of where wiring is needed is for the code of bootstrap tasks, such as ext2fs

<solid_black>or rumpdisk I guess

<damo22>so if you are unwiring memory, you need the write lock because you need to be the only writer of that page to remove it

<damo22>otherwise someone else could be using it?

<solid_black>1. unwiring doesn't actually mean paging out the page here and now, it just means clearing the "can not be paged out" state

<solid_black>2. the issue is with locking *the vm map*, not an individual page

<damo22>i see

<solid_black>I mean, the page too, via the busy bit

<solid_black>but the talk about read/write locking referes to the map

<damo22>why do we need to lock the whole map?

<solid_black>that's the question exactly :)

<solid_black>open vm_map.c and look for the comment that starts with "We must do this in two passes:"

<solid_black>it describes a potential deadlock similar to ours

<solid_black>except it talks about wiring, not unwiring

<solid_black>hm, but the case it describes is actually different, I think

<solid_black>they talk about another thread faulting on the page that is to be wired (and is not present, otherwise it wouldn't fault)

<solid_black>but our page is still wired, and I don't see any thread faulting on it

<solid_black>it's busy for some other reason, as if another thread is trying to wire it back indeed

<damo22>what if kernel threads are not well behaved, due to ipc happening so fast

<damo22>so the HACK thing might be broken

<damo22>is ipc multithreaded?

<solid_black>everything is multithreaded, yes

<damo22>inside gnumach?

<solid_black>yes

<damo22>so its possible there are multiple threads doing ipc concurrently

<solid_black>the point that "HACK HACK HACK HACK" makes about the kernel threads being well behaved

<solid_black>is not that multiple threads wouldn't access the ipc_kernel_map concurrently; that's fine and they do that indeed (including in our case), because of the locking

<solid_black>it's the kernel code won't "do anything destructive to this region of the map"

<damo22>but what if one ipc thread tries to write to the same region of the vm map just as the lock is released in HACK?

<solid_black>why would it do that? that's the point of kernel code being well-behaved

<solid_black>, that it doesn't do that

<solid_black>each mach_port_names() invocation only accesses the region of virtual memory that it vm_allocate'd for itself in the ipc_kernel_map

<solid_black>and not others' regions

<damo22>aha

<damo22>that makes more sense

<solid_black>whereas you can't trust userland to do the same

<solid_black>userland could try to wire a region from one thread, and deallocate it from another

<damo22>yup

<damo22>so where is unwiring

<solid_black>hm, is it possible to wire pages using the non-priv host port? that's sounds like an easy DoS

<solid_black>not that Mach lacks ways to DoS the system...

<damo22>lets focus on the deadlock

<youpi>solid_black: yes but there's a limit per task

<solid_black>ok

<damo22>could it be an interaction with the two sleep lock invocations?

<damo22>ipc one and vm one

<damo22>ie, too much stuff goes to sleep and never wakes

<solid_black>so fwiw, the thread doing *un*wiring calls vm_map_pageable() for a single page

<solid_black>and it's that single page that has the busy bit set

<solid_black>(off to lunch)

<gnu_srs2>youpi: #ifdef _IOT_ifreq_int #error ok #endif; #ifdef _IOT_ifreq_short #error ok #endif: 954 | #error ok; 957 | #error ok

<gnu_srs2>Still: if-hurd.c:959:23: error: invalid application of ‘sizeof’ to incomplete type ‘struct ifreq_int’. 959 | if (ioctl(fd, SIOCGIFNAME, &ifr) == -1)

<gnu_srs2>struct ifreq ifr;

<youpi>you didn't define _IOT_ifreq_int with _IOTS or _IOT_SIMPLE, right?

<youpi>you could add

<youpi>#define _IOT_ifreq_int foobar

<youpi>to make the preprocessor complain that it's already defined, and tell you where it was defined

<youpi>to be sure exactly what definition it is taking

<gnu_srs2>/usr/include/net/if.h:168: note: this is the location of the previous definition; 168 | # define _IOT_ifreq_int _IOT(_IOTS(char),IFNAMSIZ,_IOTS(int),1,0,0)

<damo22> https://git.sceen.net/rbraun/x15.git/tree/kern/thread.c#n2561

<youpi>gnu_srs2: I'm puzzled. Maybe do the same with the other macros: #define SIOCGIFNAME foobar #define _IOWR(g,n,t) foobar #define _IOC_ENCODE_TYPE(t) foobar #define _IOTBASE_struct foobar

<damo22>i think i wrote that IOT

<youpi>which IOT ?

<damo22>(09:10:39 PM) gnu_srs2: /usr/include/net/if.h:168: note: this is the location of the previous definition; 168 | # define _IOT_ifreq_int _IOT(_IOTS(char),IFNAMSIZ,_IOTS(int),1,0,0)

<youpi>da0debaa44d (Ulrich Drepper 2001-06-26 04:59:41 +0000 169)# define _IOT_ifreq_int _IOT(_IOTS(char),IFNAMSIZ,_IOTS(int),1,0,0)

<youpi>(patch actually by Mark Kettenis <kettenis@wins.uva.nl>)

<damo22>i wrote something similar then

<damo22>for the routing

<youpi>for the ifrtreq probably

<damo22>ea

<damo22>yeah

<gnu_srs2>/usr/include/i386-gnu/bits/ioctls.h: #define _IOTS(type) (sizeof (type) == 8 ? IOC_64 : (enum __ioctl_datum) (sizeof (type) >> 1))

<youpi>gnu_srs2: yes, but that's not supposed to be called on struct ifreq_int, that's the point

<gnu_srs2>/usr/include/i386-gnu/device/input.h: 43 | #define _IOWR(g,n,t) _IOC(IOC_INOUT, (g), (n), sizeof(t))

<youpi>precisely because we have _IOT_ifreq_int defined

<youpi>aha

<youpi>that's the problem

<youpi>it's not using the right _IOWR

<youpi>how is this header included ?

<gnu_srs2>#include <device/input.h>

<youpi>ok, we need to scratch that

<youpi>(the spurious _IOWR declaration from input.h)

<damo22>sleep locks in hurd are recursive, and "braunr: if the recursion level is greater than 1, unlocking doesn't mean you can sleep"

<damo22>so theres a problem with our sleep locking code for smp

<youpi>gnu_srs2: so for now, just drop that _IOWR definition from input.h

<gnu_srs2>/usr/include/i386-gnu/bits/ioctls.h:#define _IOWR(g, n, t) _IOC (IOC_INOUT, (g), (n), _IOC_ENCODE_TYPE (t)) and

<gnu_srs2>/usr/include/i386-gnu/device/input.h:#define _IOWR(g,n,t) _IOC(IOC_INOUT, (g), (n), sizeof(t))

<solid_black>ACTION is back

<youpi>gnu_srs2: yes it's only the former that is correct

<youpi>the latter is spurious and we need to stop shipping it to userspace

<gnu_srs2>Should I remove inclusion of device/input.h or edit the header file?

<youpi>no, edit the header file

<gnu_srs2>OK

<youpi>as I said: so for now, just drop that _IOWR definition from input.h

<solid_black>damo22: you mean Mach, not Hurd; also sleeping locks are recursive only when you enable that explicitly (lock_set_recursive)

<youpi>gnu_srs2: though I'm wondering why you are including device/input.h ?

<damo22>solid_black: aha youre right

<damo22>solid_black: vm_map does use recursive sleep lokcs

<solid_black>it does, yes

<solid_black>but that doesn't seem to be the issue here

<solid_black>since due to the HACK HACK HACK, it just unlocks the map, instead of making the lock recursive

<solid_black>I really want to inspect another thread's local variables, how do I do that? :|

<damo22>thread 2

<damo22>frame x

<youpi>solid_black: as I said, you can inspect the stack by hand

<solid_black>no, that switch CPU cores

<youpi>in kdb

<solid_black>yes, if I knew the addresses

<youpi>you mean if you knew %sp ?

<youpi>or if you knew the offset from %sp ?

<solid_black>sp I could probably find

<solid_black>the offset from sp, yes

<youpi>(simpler through bp, actually)

<youpi>for that you can just look at the code

<youpi>ask gdb for the asm for an instruction that accesses the variable

<youpi>and you can see what offset it uses from bp

<solid_black>in any case this is not inside the bottommost frame (that one is switch_context)

<gnu_srs2>youpi: I don't really remember. Not including that header file any more!

<solid_black>could I, for instance, forcibly switch to another thread from gdb?

<youpi>solid_black: that'll be more involved, but you can go through the bp chain, to get the bp of the function you're interested in

<solid_black>I guess I could, I just need to restore %esp and %eip

<youpi>that's possible, I had done such a thing during my phd

<youpi>you'll need to restore %ebp as well

<youpi>otherwise stack unwinding will get lost

<solid_black>yes

<youpi>also, local variable may be in registers

<youpi>then you need to restore all registers

<solid_black>it can't possibly be in regsiters, exactly because we've context-switched to another thread, so it must be spilled somewhere

<youpi>there are callee-saved registers

<solid_black>either in the same frame, or inside switch_context if it's callee-saved

<youpi>that's what I mean

<solid_black>ok

<youpi>you need to restore the registers from the context switch

<youpi>ok, during my phd we were just restoring ebp, esp, eip

<youpi>that wasn't bringing correct values for local variables in registers

<youpi>but in my memory we were getting good backtraces

<solid_black>maybe it'd be easier to instrument the code to add printfs, and hope it still reproduces

<youpi>possibly

<youpi>what I sometimes do is writing at 0xb8000

<damo22>putchar uses gsync_wait

<youpi>damo22: I guess he meant kernel printf

<damo22>ok

<youpi>writing to 0xb8000 usually doesn't hide heisenbugs

<solid_black>it really is another thread, inside vm_fault_wire(), marking this page as busy

<solid_black>the same page that is being unwired by this thread who holds the lock on the map

<solid_black>the two threads use different virtual addresses to do that

<solid_black>and the two pages at these addresses appear to have different contents, so it's not the same physical page

<solid_black>and yet it seems to be the same vp_page

<solid_black>what gives?

<solid_black>the physical page appears to be the one that is accessible through vm_fault_unwire()'s address (good)

<solid_black>huh? now I seemingly have two threads actually doing vm_fault_wire() / vm_fault_unwire() on the exact same address range

<solid_black>that is certainly not "well-behaved"

<solid_black>one of these two threads is doing alternating pairs of vm_fault_wire/vm_faul_unwire, as expected

<solid_black>but the other is only doing vm_fault_wire()

<solid_black>???

<youpi>solid_black: could it be that the vm_allocate part isn't properly locked and thus the two threads end up allocating the same memory ?

<solid_black>youpi: yes, that's the only theory I have, and yet vm_map_enter() seems to be doing locking properly

<Pellescours>"

IRC channel logs

2024-02-19.log