IRC channel logs

<Gooberpatrol66>is the discussion here in 2013-09-18 about pci arbitration obsolete since pci-arbiter has been added?

<Gooberpatrol66> https://darnassus.sceen.net/~hurd-web/open_issues/virtio/

<damo22>would be good if Avadis Tevanian Jr was lurking around to help with the vm map locking :D

<damo22>vm_allocate returns with the vm_map lock in a read lock

<damo22>is that correct

<damo22>if map->wiring_required

<damo22>ah no it unlocks the map in Bailout

<damo22>Gooberpatrol66: we have solved most of the problems that would otherwise prevent virtio from working

<damo22>rump has virtio disk, we could use it

<damo22>the only reason its disabled is i had a compilation failure and had to remove it temporarily

<damo22>otherwise it would actually work

<damo22>although its not clear to me why its any different from having a working AHCI

<damo22>solid_black: hi

<damo22>solid_black: im starting to think there is no bug with the vm locking

<damo22>i am seeing all threads in W state with no vm lock calls

<damo22>just in cpu idle

<damo22>eg :

<damo22> 65 ./test.inf(786) (f61001e8): 8 threads:

<damo22> 0 (f64bf998) .W..NF 0xc10b40e0

<damo22> 1 (f61bfe08) .W.O..(mach_msg_continue) 0

<damo22> 2 (f5eed500) .W..N. 0xc10b40e0

<damo22> 3 (f5eed800) .W..N. 0xc10b40e0

<damo22> 4 (f5eed980) .W..N. 0xc10b40e0

<damo22> 5 (f67cc7e0) .W..N. 0xc10b40e0

<damo22> 6 (f5439368) .W..N. 0xc10b40e0

<damo22> 7 (f61f7350) .W..N. 0xf7de3f6c

<damo22>oh but the backtraces of sleeping threads do show vm stuff

<solid_black>hi!

<damo22>hey

<damo22>solid_black: i reprod with three threads https://paste.debian.net/plain/1307900

<damo22>./smp ./test.inf 3

<solid_black>cool, let me see

<solid_black>so one thread is still doing unwire

<solid_black>the second thread is just trying to lock the map (for vm_allocate, but that doesn't matter)

<solid_black>the first thread is wiring

<solid_black>are they wiring/unwiring the same pages again?

<solid_black>the VM regions are disjoint: f8892000,f8893000 vs f8894000,f8895000

<damo22>f5eeb108

<damo22>its wiring and unwiring the same thing?

<damo22>the first arg is the ipc_kernel_map

<damo22>c10b40e0

<solid_black>oh yes indeed, it is the same entry, great point!

<solid_black>can you see what that entry's actual start/end VM addresses are?

<solid_black>not sure what we'll learn from that though; it would match eirher one region or the other

<damo22>isnt the kernel supposed to not do this?

<damo22>ie, wire/unwire same object while the map is unlocked

<solid_black>yes, eaxctly

<solid_black>it's supposed to not do that, that's what "well behaved" refers to

<solid_black>but here it does that due to some bug

<solid_black>see, it really thinks it's wiring/unwiring two different regions of virtual memory, but somehow that ends up being the same entry / vm_page

<damo22>but the parameters it passes to the wire/unwire faults are the same

<solid_black>yes, it's the same vm_map_entry_t (for some reason)

<solid_black>and yet the address pairs passed to vm_map_pageable are different, can you see that?

<damo22>somehow the entry is being reused

<solid_black>yes

<solid_black>so could entry lookup be bugged?

<damo22>yes the pageable pairs are different

<solid_black>I really really wish qemu supported rr with smp, we'd have this figured out a long time ago :|

<damo22>does anything check that the entries are unique?

<solid_black>there must be something wrong with how vm_map_pageable_scan args are displayed

<damo22>in my trace?

<solid_black>yes

<damo22>ok

<damo22>kdb guesses how to print them from the stack

<solid_black>yes, there's not much to it on i386 (though I wonder how it works on x86_64, and what we could do on aarch64)

<solid_black>but vm_map_pageable_scan is a static func, it might be partly inlined or something

<solid_black>which explains why the stack doesn't contain what kdb expects

<damo22>yesterday i was getting proper backtrace in gdb

<damo22>it happened to stop when the program hung

<damo22>i cant reproduce now

<solid_black>could this (the entry reuse) be due to vm_map_clip_start?

<solid_black>it should be a no-op if it doesn't have to do any clipping -- but what if it does?

<solid_black>the implementation of clipping is ok, under the assumption that things are properly locked

<solid_black>but in our case, they're not, because the other thread has released the map lock while still referencing an entry from the map

<damo22>which line

<solid_black>vm/vm_map.c:1791 here in my tree, might be different for you

<solid_black>next to a comment that says "Pass 1. Clip entries, check for holes and protection mismatches"

<solid_black>it's not just clipping entries though... *lots* of things must be broken in the same way

<solid_black>for instance entry coalescing that I've implemented

<solid_black>all these things (rightfully) assume that the map is locked and noone else is looking at the entries, so we can change them

<solid_black>but this is not true in this one case

<solid_black>this could very well explain what we're seeing

<damo22>can we hack it to test?

<damo22>eg remove the unlock?

<solid_black>you could try to disable coalescing for example

<solid_black>make vm_map_coalesce_entry just "return FALSE;"

<solid_black>that wouldn't take care of clipping, but cleipping should not happen if things are not coalesced in the first place

<solid_black>but yeah, this sounds very plausible

<solid_black>and maybe try reverting dc0c9160173d2a5b40e9d6e29c1800102b17b1ec, ee65849bec5da261be90f565bee096abb4117bdd

<solid_black>because it also coalesces the objects

<solid_black>let me think of a proper fix though

<solid_black>can we just disable the "HACK HACK HACK HACK"?

<solid_black>a read lock on the map would be enough to prevent this

<damo22>right

<solid_black>why does it do that? is that for perfomance reasons, or needed for correctness / avoiding some other deadlock?

<solid_black>hm, it says "unlock the map to avoid deadlocks"

<damo22>:|

<solid_black>what kind of a deadlock did they have in mind?

<damo22>wiring in the kernel map or submap

<solid_black>guess something somewhere (in vm_fault) could try to jsut allocate memory, and that would deadlock

<solid_black>so for one thing, vm_fault.c doesn't actually care about the passed-in entry being a proper entry in the map

<solid_black>it just needs uses address, end address, is_sub_map, object, offset

<damo22>can we remove the unlock hack, but do a read lock further down?

<solid_black>so we could, for instance, make a copy of the entry, and pass that copy

<solid_black>but this of course gets more complicated when there are multiple entries on the region

<damo22> https://github.com/apple-oss-distributions/xnu/blob/main/osfmk/vm/vm_map.c#L9395

<solid_black>vm_map_copy_overwrite_nested? what does that have to do with this?

<damo22>can we copy the entry and overwrite it, then discard it?

<solid_black>that's kind of what I'm suggesting, make a temp on-stack copy of the entry, unlock the map, then pass the copy to vm_fault_wire

<solid_black>that should work as long as other stuff is indeed well-behaved

<solid_black>but then again, this only works for a single entry; how do you deal with multiple ones? alloca?

<damo22>idk

<solid_black>anyway, let me try implementing something like this

<solid_black>well, it didn't hang

<solid_black>I got "no more room in ipc_kernel_map"

<solid_black> /root/damo22s-test/test: Cannot get mach_port_names: (os/kern) resource shortage

<damo22>:D

<solid_black>but the system is alive & well otherwise

<damo22>what did you do?

<damo22>did you really fill up kernel memory?

<solid_black>perhaps I should boot with more ram?

<damo22>how much are you using?

<solid_black>though it's not physical ram, it's virtual space

<solid_black>but, like, the fact that it no longer panics (I added an assertion that there is only a single entry), and no longer hangs, and all the 8 CPU cores are busy, is reassuring

<solid_black>well s/no longer panics/doesn't panic/, it didn't panic before either

<damo22>:D

<solid_black>is there some other way I could test it, with some other test?

<damo22>yes

<damo22>./smp /bin/bash && make -j4

<damo22>in gnumach source

<damo22>you will need a chunk of memory

<damo22>actually dont run that

<solid_black>that would require me to mount the fs writable :|

<damo22>yeah

<damo22>you can run stress instead

<damo22>see stress --help

<solid_black>I don't seem to have that installed?

<damo22>apt install stress

<solid_black>ah, let me reboot into the full system on non-smp then

<damo22>can you send me a patch

<damo22>i can test too

<solid_black> https://paste.centos.org/view/1733b12a

<damo22>my old test with openmp fails

<solid_black>oh great, my (non-SMP) VM locked up after "stress --vm 8"

<solid_black>fails how?

<damo22>locks

<solid_black>then post the traces please?

<damo22>ok

<solid_black>you mean the gsync one?

<damo22>yeah

<damo22>maybe not a good test

<solid_black>stress --cpu 8 does hog up all the cpu cores

<damo22>root@zamhurd:/part3/test-ipc# ./smp ./test 7

<damo22>./test: Cannot get mach_port_names: (os/kern) resource shortage

<damo22>thats with no prints

<solid_black>yes, so same as mine

<solid_black>it runs a lot of successful iterations before that error, doesn't it?

<damo22>i made it not print anything

<solid_black>yes, but you should be able to tell by e.g. cpu load

<solid_black>and/or time it rakes

<damo22>yes

<damo22>do you think gsync_wait is still broken?

<damo22> https://paste.debian.net/plain/1307904

<damo22>thats my original openmp test

<solid_black>no, this looks benign

<solid_black>can you see who holds the lock?

<damo22>sorry its gone

<damo22>try running the infinite loop test

<damo22>the one you got a shortage on

<damo22>sometimes it hasngs

<damo22>hangs*

<damo22>no more room in c10b40e0 ()

<damo22>random init open failed

<damo22>rumpdisk: mach_vm_object_pages: Success

<damo22>/hurd/startup: Crashing system; essential task rumpdisk died

<damo22>startup: notifying ext2fs /dev/wd0s3 of reboot...done

<damo22>lol

<damo22>stress --vm hangs because it chews too much ram i think

<damo22>you need to dial down the amoung

<damo22>amount it uses per thread

<damo22>solid_black: i removed the HACK block altogether

<damo22>now it does similar behaviour to your patch but still breaks on gsync_wait

<solid_black>damo22: but that would deadlock if the vm_fault_wire handler wants to allocate some memory

<solid_black>maybe not for ipc_kernel_map though

<solid_black>but at least we now have a fairly good understanding of why the VM wire/unwire deadlock happens

<solid_black>the gsync deadlock, we should also take a closer look at, but clearly it's a different one

<damo22>does vm_fault_wire allocate anything ?

<solid_black>it might, who knows

<damo22>were?

<damo22>where?

<solid_black>vm_fault certainly does e.g. allocate a new object for a copy, though vm_map_pageable_scan tries to handle that before calling vm_fault_wire

<damo22>lets try to fix gsync

<damo22>how do i repair a git tree? i get bad object

<damo22>if i try to git blame

<youpi>damo22: git fsck

<solid_black>git fsck

<solid_black>...and faulting in the page from the object can allocate memory too, depending on what the object is

<damo22>why does gsync_wait lock the task->map but then remove objects from the kernel_map?

<solid_black>ACTION looks

<solid_black>that's the temp mapping it sets up

<solid_black>in case of a remote lookup

<solid_black>that not our case, is it? we should not hit the remote branch

<damo22>i showed Richard Braun my traces about the deadlock earlier and he has interesting feedback:

<damo22>(05:12:38 PM) damo22: https://paste.debian.net/plain/1307900

<damo22>(08:30:38 PM) braunr: hm it doesn't look like a deadlock

<damo22>(08:30:55 PM) braunr: the unwire call is blocked inside the critical section, isn't it ?

<damo22>(08:33:59 PM) braunr: yeah that's what it looks to me

<damo22>(08:34:29 PM) braunr: vm_fault_page blocks for some unknown reason, and the rest is just serialized after that operation

<damo22>(08:34:34 PM) braunr: it's not a locking issue

<damo22>(08:35:36 PM) braunr: it could be a missed event around thread_block

<damo22>(08:36:06 PM) braunr: which is the main reason i redesigned my version as thread_sleep, directly with the interlock as an argument

<damo22>(08:36:16 PM) braunr: that's the critical difference that allows not missing events

<damo22>(08:36:49 PM) braunr: it's the same idea as with condition variables, and why the mutex associated with a condvar must be locked when signalling

<solid_black>he's wrong

<solid_black>(look at me, having the confidence)

<solid_black>this is a locking issue

<solid_black>and the reason vm_fault_page blocks is beacuse is waits for the page to lose its busy bit

<solid_black>and the thread that owns the busy bit is waiting to lock the map

<solid_black>=> deadlock

<damo22>solid_black: but could it unblock if there was a missed event ? maybe there is something more to it

<damo22>solid_black: its possible the third thread is waiting for a missed event to unbusy the page that the others are locked on

<damo22>thats what braunr says

<damo22>theyre not all waiting on a lock

<solid_black>damo22: I really doubt it's anything else but the deadlock on map/page that I'm describing

<solid_black>you can't tell this from the trace alone, it indeed looks unclear what the unwiring thread is blocking on

<solid_black>which is why I added all the printfs yesterday, and connected gdb, and inspected it

<solid_black>it really is the case that the unwiring thread (which holds the lock) is waiting on a busy page, and the page is marked busy just prior to that by another thread, and that other thread is waiting in vm_map_verify() to lock the map

<solid_black>that all we knew yesterday; the question was why then would two threads both access the same page, if they supposedly deal each with their own VM region

<solid_black>and yesterday I actually observed two threads wiring and unwiring *the same* VM address range, which is super weird, and makes me think there could be more going on

<solid_black>but today, thanks to your observation, we known that in your trace, the two threads were accessing adjacent address ranges, but somehow the same entry

<solid_black>them accessing the same entry of course explains why the ended up with the same page

<solid_black>so then, why would they access the same entry? -- well, because of clipping/coalescing of entries, of course

<solid_black>hence, my patch, that tries to work around that

<solid_black>and evidently makes that case not deadlock

<solid_black>if gsync still deadlocks, that must be caused by something else

<damo22>ok

<damo22>yeah

<solid_black>let me see if I can reproduce the gsync hang

<damo22>its easy to reproduce with 4 threads

<damo22>just run the old test

<damo22>the one with openmp and lots of printing

<damo22>i read the gsync.c code, i think the linked list and hashing should be ok

<damo22>i tried to assume it was broken and stepped through it to see if it did what i expected

<damo22>but i dont understand the locking side

<solid_black>I got a crash

<solid_black>as in, a Unix SIGABRT

<damo22>hmm

<solid_black>"writing core file" is of course a lie, the fs is read-only :|

<solid_black>second attempt (with no change), still running & spitting out port info

<solid_black>no deadlocks, no crashes

<solid_black>it's seemingly not consuming multiple CPU cores though, judging by CPU usage

<damo22>thats because printing takes too long

<solid_black>maybe, yeah

<solid_black>still going

<damo22>mine hangs at around 8000 iterations

<solid_black>how do I know the number of iterations?

<solid_black>it spits things out so fast I cannot see anything

<solid_black>let me take a screenshot I guess

<damo22>heh

<solid_black>iteration 5598

<damo22>it stops at 10000

<solid_black>ok, let's wait some more

<damo22>if you get back to a shell, it completed without hanging after 10000 iterations

<damo22>did you run with ./smp ./test 4

<solid_black>8

<damo22>ok

<damo22>it should hang

<solid_black>for one thing, it doesn't iterate in order, so the current printed number is rather meaningless

<solid_black>it's done, exit code 0

<damo22>heh

<damo22>interesting

<damo22>ive not seen that happen before

<solid_black>I guess my patch works wonders :)

<damo22>did you change anything, or same as before that you posted

<solid_black>same as before

<damo22>ok

<solid_black>just in case, yes nproc is 8, and I ran it as /root/damo22s-test/smp /root/damo22s-test/test-gsync 8

<damo22>i just ran it and it hung

<damo22>using your patch

<solid_black>ok, let's investigate then

<solid_black>get the trace, connect gdb

<damo22>hmm its probably easier if you reproduce

<solid_black>and I'll try to reproduce it again

<damo22>(10:12:30 PM) braunr: if there is enough memory pressure, the swapper could attempt to balance segments

<damo22>(10:12:36 PM) braunr: calling vm_page_seg_balance_page

<damo22>(10:13:45 PM) braunr: this function does set the busy bit to false, but doesn't wake up any thread beyond that apprently

<damo22>(10:14:46 PM) braunr: unlike say vm_object_copy_slowly

<damo22>(10:14:53 PM) braunr: see PAGE_WAKEUP_DONE

<damo22>(10:15:26 PM) braunr: i'd check whether the busy bit really still is true while the thread waiting for it is blocking

<damo22>(10:15:33 PM) braunr: if it's true, it's a logic problem

<damo22>(10:15:54 PM) braunr: otherwise, the thread has simply not been awoken

<solid_black>the busy bit is still set, yes

<solid_black>good point about vm_page_seg_balance_page not doing a wakeup (unlike PAGE_WAKEUP_DONE), does it not do that intentionally?

<solid_black>I wouldn't expect it to steal a busy page anyway, that sounds disastrous?

<solid_black>test-gsync 8 completed with 0 the second time too

<damo22>hmm

<damo22>how fast is your machine

<damo22>mine is i7-3xxx

<solid_black>Intel i7-8650U (8) @ 4.200GHz (no idea what that means though)

<damo22>wow

<damo22>super fast

<solid_black>(this is Microsoft Surface Pro 6)

<damo22>youpi: is there some kind of problem we could be facing due to threads running completely concurrently that we would never have seen before, eg with system calls and glibc?

<youpi>damo22: with system calls themselves, probably not, since they're on the stack only & such

<youpi>the non-hurdish part of glibc, of course not since it's used widely on linux

<youpi>the hurdish part of glibc, possibly

<youpi>the translators, probably

<youpi>that's why I recommended testing progressively

<damo22>yes

<damo22>i gtg to sleep, its my birthday tomorrow :)

<solid_black>I got it to hang!

<solid_black>and if you're already gone, I'm going to have to debug this all by myself :|

<solid_black>ok, same as your case, all the threads are waiting for a lock on the map, it's unclear what holds this lock

<solid_black>this lock has a single reader and no writers

<solid_black>I don't know how to debug this further; none of the threads look like they could be holding the lock

<youpi>adding __FILE__ & __LINE__ ?

<solid_black>but that would only remember one of the readers

<solid_black>the last one to enter, not the remaining one

<solid_black>wait, no, gsync_wake() locks it first

<solid_black>before calling into gsync_prepare_key

<solid_black>that's asking for a deadlock

<solid_black>told you it was a separate issue!

<solid_black>lock_t doesn't let new readers in if there are waiting writers

<solid_black>so you cannot lock it for reading recursively

<solid_black>also, told you that gsyncwas broken :)

<youpi>not the way we were discussing ;)

<solid_black>it's broken in way too many ways :)

<youpi>that's still to explain

<youpi>I mean

<youpi>I *very* often hear people barf at something, just because they haven't actually understood how it works

<solid_black>it still blocks the current thread, instead of just delaying the response message

<solid_black>which is what my old patch tried to fix

<solid_black>that's my biggest gripe with it

<youpi>note that it currently works with a gsync_waiter on the stack

<youpi>so you cannot just get rid of the stack

<solid_black>my patch moved that onto the heap IIRC

<youpi>that's expensive

<solid_black>really, this is breaking rpctracing any non-trivial program

<youpi>(13:03:27) solid_black: lock_t doesn't let new readers in if there are waiting writers

<youpi>(13:03:34) solid_black: so you cannot lock it for reading recursively

<youpi>which writers are you thinking of?

<solid_black>it certainly is more expensive than putting things on the stack, but also not that expensive compared to making a syscall, looking up ports, etc

<youpi>ah, that affects rpctrace

<solid_black>wdym? in general, anyone trying to lock_write() the same lock

<youpi>then it might be useful to do indeed

<solid_black>oh course it affects rpctrace

<youpi>solid_black: sure, but they'll release it at some point

<youpi>solid_black: remember that this is already a long ago

<solid_black>who, the writers? they won't release it if they never manage to get it

<youpi>I have *WAY* too much on my plate to remember about everything that goes on

<youpi>solid_black: why shouldn't they never manage to get it ?

<youpi>I mean, if a read holds the lock for long, sure

<youpi>but that's not supposed to happen

<solid_black>sure, I don't expect you to *remember*, but it's only logical: rpctrace is the primary user of _make remote calls for something that a task/thread normally does to itself_

<youpi>it's logical when you think about it

<solid_black>so anything that breaks that, immediately breaks rpctrace

<youpi>on my daily work I'm mostly working on linux

<youpi>where strace is not affected by futex problems

<youpi>so, no, I wouldn't magically think about rpctrace

<youpi>which has indeed different concerns

<solid_black>yes, normally the outstanding readers should release the lcok eventually, but not if one of the readers is the same thread who now waits to take the same lock *again*, recursively, and is blocked on the writer!

<solid_black>really, see gsync_wake

<solid_black>it does vm_map_lock_read (task->map); first, and that succeeds

<youpi>"write" doesn't appear in the file

<youpi>remember that I'm currently at work, I cannot spend time on checking things and stuff

<solid_black>then it calls gsync_prepare_key() -> probe_address() -> vm_map_lookup(), while holding the read lock

<youpi>sure, if you have a read lock you cannot expect to be able to acquire a write lock

<solid_black>vm_map_lookup also tries to get a read lock

<solid_black>see the issue?

<solid_black>what if another thread calls lock_write() in the middle?

<youpi>ah so another read, not a write

<solid_black>another thread who wants to write

<youpi>ah, so a read then write then read

<youpi>(write being another thread)

<solid_black>now the nested lock_read() inside vm_map_lookup() is waiting for the writer to go away, but the writer is waiting for the original read lock to go away

<youpi>that was completely not obvious from what you wrote above

<solid_black>maybe short real-time messages is not the best medium for complex topics :|

<youpi>it isn't

<solid_black>or maybe I'm just bad at explaning my thoughts

<youpi>no, it just isn't

<solid_black>so, one way to fix / work around this would be to make gsync get a *write* lock, and then allow recursion

<solid_black>it's a shame, because it doesn't need a write lock, though

<youpi>I'd say I would really *not* be surprised that various code in gnumach has such double-read-lock

<solid_black>and yes, exactly, it's a more general problem

<youpi>and thus we might want to have some debugging tool to trace such errors

<youpi>I wonder if we really need to keep a read lock on the map

<youpi>and whether we could rather keep some lock on the object instead

<youpi>as the vm_map_lookup() way of working suggests, actually

<solid_black>sneek: later tell damo22: I reproduced and investigated your gsync deadlock, it's a completely unrelated (to the VM wire/unwire thing) issue, purely related to locking (readers vs writers) inside gsync

<sneek>Will do.

<youpi>vm_map_enter takes the object and not the map, after all

<solid_black>why does gsync_wake need to take the explicit read lock on the map at all?

<youpi>it was probably to make sure that the user doesn't release the memory in the meanwhile

<solid_black>it has to synchronize with gsync_wait, sure, but the loking inside vm_map_lookup would accomplish that

<youpi>but we can keep some lock on the object itself instead

<youpi>to be sure that even if the user releases its mappipng, the underlying memory is still there and we can still synchronize on it

<youpi>it's the memory object that matters in the end anyway

<solid_black>I don't think it matters, gsync_wake only uses the memory address as a key to find waiters to wake, it doesn't care about the underlying memory cell at all, unlike gsync_wait

<solid_black>(please point out if that's not so)

<youpi>I was looking at gsync_wait indeed

<youpi>which we have to fix too anyway

<solid_black>and the deadlock is inside gsync_wake

<solid_black>perhaps wait too, let me look

<youpi>wait will have the same issue

<youpi>wake can need to mutate the value

<youpi>when told so

<youpi>so it does care about the underlying memory

<youpi>but there as well it's essentially the memory object that matters

<solid_black>well, if the user is doing "serious" operations on the VM region concurrently (such as unmapping this object and mapping another one in its place), I wouldn't expect futex/gsync to behave in any nice manner (other than not crashing) anyway

<solid_black>ah, it can mutate the memory, ok

<solid_black>so yes I would say we care about the object

<youpi>it does make sense to see unmap concurrently with futex / gsync operations

<solid_black>but we need to convince ourselves that whatever new locking scheme we devise, it won't let a wake() slip in between wait() checking the number and going to sleep

<youpi>it poses a lot of problems in userland implementations of e.g. semaphores

<youpi>if you do sem_post(&struct->sem); free(struct);

<youpi>yes, it looks bogus, it kinda is

<youpi>but there are various cases where userland wants to do such things

<solid_black>how would that work?

<solid_black>the other threads are still possibly accessing it?

<youpi>another would be on sem_wait(&struct->sem); yes

<youpi>and not accessing anything else in it

<youpi>again, yes it looks bogus, I'm just taking the simple example to show my point

<solid_black>you need a happens-before between the other threads stopping to access the semaphore, and you releasing it

<youpi>you have the same kind of concerns with pthread_mutex_lock/unlock vs pthread_mutex_destroy

<youpi>it's *hard* to propperly implement pthread_mutex_destroy, making sure that the last thread doing pthread_mutex_unlock really has finished working on the mutex

<youpi>and then you'll be able to free it

<youpi>solid_black: again, I was just using a small dumb example to explain the kind of thing I'm talking about

<youpi>actual cases are much more involved

<solid_black>I understand that it looks bogus on purpose

<youpi>bleh

<solid_black>I don't understand why it's supposedly not actually bogus and something that we should support

<youpi>if you don't want to take that there is a point to make but it's long to explain

<youpi>I'm not sure I'll take the time to explain

<youpi>ok

<youpi>=klmqsjd n$*

<youpi>qsdkl "fje'

<youpi> https://pastebin.aquilenet.fr/?454201ac7b8bf9f1#721S7XfjEAukAMwt37zZyKpm8bq1ZcjRYi5jASXTxQtF

<youpi>this

<solid_black>but here, you know for sure that thread1 is completely done with it, because you've witnessed blah(&x->state) being true *after* you grabbed the mutex, meaning thread1 has already released the mutex

<youpi>no you don't

<solid_black>no?

<youpi>you don't know whether pthread_mutex_unlock() is really finished with accessing the memory

<solid_black>ah, that, true

<solid_black>so your gsync_wake() could still be in process of completing when the vm region is already deallocated

<solid_black>right?

<youpi>that kind of stuff happens, yes

<youpi>but if we keep a ref or lock on the object, we can still continue fine

<youpi>that's also important for cross-process semaphores and such

<solid_black>yes

<youpi>sem_post(&shared_mem->p);; munmap(shared_mem); , I believe posix probably says it's fine

<youpi>and again, that's just a simple example

<youpi>you might have whatnot odd examples

<solid_black>so in other words, [you believe] that pthread primitive do establish a happens-before between unlock/post *being done accessing the mutex/sempahore* in a thread that held the mutex/semaphire first, and unlock/post *returning* in a thread that held it next

<youpi>not unlock/post for the latter, lock/wait for the latter

<youpi>and I'm not actually talking about that

<youpi>I'm just talking about the fact that memory can disappear under your feet

<solid_black>that'd be an even stronger assertion

<youpi>especially where you're talking about another task's memory

<youpi>? stronger?

<youpi>that's the whole point of synchronization primitives

<youpi>sem_post() in one thread, sem_wait() in the other

<youpi>you have a happesn-before between the two

<solid_black>I have a happens-before between thread1 *entering* post and thread2 *exiting* wait, yes

<youpi>ah, you talk about enter/exit

<solid_black>and you're kind of saying that I also have a happens-before thread1 *exiting* post and something in thread2, which (the latter) should at least be existing the *next* post

<youpi>I'm not saying that it is there, I'm saying that in various situations people need it :)

<solid_black>ok

<youpi>IIRC glibc implements that in pthread_mutex/cond_destroy

<youpi>something like it makes sure that there is no unlock left

<solid_black>pthread_mutex_destroy in nptl really looks like it's doing nothing to me

<solid_black>other than returning EBUSY and setting kind to -1

<solid_black>anyways

<youpi>I don't remember the details

<youpi>but I do remember that it's tricky

<youpi>and glibc does some stuff about it

<solid_black>what we really care about is that a call to gsync_wake() can not happen in between gsync_wait() reading the value and going to sleep

<youpi>sure, but that's already handled by the hashed mutex

<youpi>no need to have interferences with map & object

<youpi>ok, see __pthread_cond_destroy instead

<youpi>it's even more complex than the example I showed above

<youpi>but again, I was just showing the kind of problem

<youpi>not going to take the time to write a reproducer

<solid_black>ok, so we lock the kmutex (/me wonders how that differes from locks, but whatever), then check, if it matches we insert ourselves as a waiter

<solid_black>makes sense

<solid_black>the locking on the map doesn't really serve any purpose then, does it?

<youpi>as I wrote above

<youpi>it's meant to keep the memory alive

<youpi>but I'd say we can probably replace that with keeping a ref or lock on the memory object instead

<youpi>since that's what we use to do the mem map

<youpi>ah but there is also the direct access

<youpi>which uses addresses

<youpi>so we indeed do not want the adderss to idsappear

<youpi>(and keeping a ref/lock on the object wil lnot be enough)

<solid_black>vm_map_lookup() mutates the map if it has to CoW or make a new VM object

<solid_black>so, vm_allocate a new page, call gsync on it with GSYNC_MUTATE, guaranteed deadlock

<solid_black>SMP or not

<solid_black>let me actually test this, sounds too ~good~ bad to be true

<youpi>I wouldn't be surprised at all that such a case exist

<solid_black>🤔️ that did not deadlock

<solid_black>ah, in the SMP build the locks are no-ops of course

<solid_black>non-SMP build I mean

<solid_black>(or are they?)

<youpi>depends which locks

<youpi>simple_lock are no-ops indeed

<youpi>but read/write locks as used by vm are not no-ops

<solid_black>yes, seems so

<solid_black>then why didn't it deadlock

<solid_black>there only are a handful of users of vm_map_lookup()

<solid_black>we could try to add a boolean_t map_locked argument like some other map AIPs have

<youpi>problem is: there are submaps

<youpi>see the loop inside vm_map_lookup

<youpi>possibly that can be handled somehow

<youpi>just not so trivially

<solid_black>yes, but that's not relevant for gsync

<solid_black>'cause that doesn't happen for userland memory

<youpi>why not?

<youpi>ah, userland memory doesn't have it?

<solid_black>submaps are for kernel only

<youpi>ok

<solid_black>yes

<youpi>but we have to make sure of that

<youpi>with asserts etc.

<solid_black>assert(!map_locked)

<youpi>in the submap case, yes

<solid_black>yes

<solid_black>maybe what we really want is the (sub)map being *left* locked by vm_map_lookup

<solid_black>that sounds cleaner

<youpi>I was wondering about that possibility indeed

<youpi>there could be a parameter for this

<solid_black>yes

<solid_black>s/map_locked/leave_map_locked/

<youpi>:)

<solid_black>or -- do you think maybe we should use the existing vm_map_version_t/vm_map_verify/vm_map_verify_done mechanism?

<solid_black>it does relocking basically

<solid_black>yeah let's try that

<youpi>it'd mean a retry loop

<youpi>it looks more complex to me than just making the lookup leave the lock held

<solid_black>it is a lot more complex indeed :|

IRC channel logs

2024-02-20.log