IRC channel logs
2024-02-21.log
back to list of logs
<sneek>damo22, solid_black says: I reproduced and investigated your gsync deadlock, it's a completely unrelated (to the VM wire/unwire thing) issue, purely related to locking (readers vs writers) inside gsync <damo22>cool solid_black, hopefully its not too hard to fix <damo22>its difficult to see why this amd board AHCI controller is not probed <damo22>it has a pci id defined in rump so should work <damo22>maybe there is some unaligned access and it fails <damo22> { PCI_VENDOR_AMD, PCI_PRODUCT_AMD_HUDSON_SATA_AHCI, <damo22>its mentioned in the probe table as having a pmp quirk <youpi>you can put prints in the probe loop <damo22>we need to make it easier to compile parts of librump <damo22>i tried to read the scrollback but couldnt really understand the gsync issue <solid_black>a first thread calls either gsync_wait() or gsync_wake(), which start by getting a read lock on the map <solid_black>then a second thread tries to lock the same map for writing <solid_black>at this point, once there is a waiting writer, the locks (as implemented in Mach there can be different strategies here, but this is a common one) don't let any new readers in <solid_black>so we wait for existing readers to go away, then we'd let the writer in, and then all the new readers <solid_black>so what happens then is the first thread, the one that is doing gsync and has the map read-locked, proceeds to call vm_map_lookup() <solid_black>this would normally be fine, the lock wouldn't conflict with the other read lock already held by this same thread, since both are read locks <solid_black>but since there is a waiting writer, this lock_read() blocks <damo22>* The map should not be locked; it will not be <damo22>does our case violate the assumption of vm_map_lookup? <solid_black>yes, if you read it as "must not be locked, or deadlocks could occur", then gsync is violating that <damo22>can we just unlock it before calling vm_map_lookup? <damo22>why does the lock need to be held? <solid_black>we kind of want to have it locked over us accessing the address below <solid_black>otherwise, somebody could remove the mapping from that address, for insytance <solid_black>so we have two ideas, either we make it possible for vm_map_lookup() to leave the map locked on exit <solid_black>or we try to make use of its existing mechanism for ensuring consistency, map versioning <solid_black>so I'm going to study the existing callers of vm_map_lookup() (in vm_fault.c) and get a better understanding, and then decide on which way to go <damo22>but the call to vm_map_lookup is only done at the beginning <damo22>we could just not lock it before calling prepare_key <solid_black>the point is, we want to keep it read-locked (unless we do versioning checks) from the lookup up to after we access the user's value <solid_black>if we don't lock it and vm_map_lookup also returns it unlocked, there's nothing preventing e.g. a vm_deallocate() call from deallocating the region right after vm_map_lookup() returns <damo22>so theres no mechanism to lookup a region while having the map locked <solid_black>there is, you wouldn't normally use vm_map_lookup(), you'd use vm_map_lookup_entry() <solid_black>the difference is, if I understand it correctly, that vm_map_lookup() behaves a lot like actually accessing the address would <solid_black>i.e. it does CoW of the VM object if it needs that, and creates an internal object if there's none <solid_black>and again, in addition to that, vm_map_lookup() has the versioning mechanism <solid_black>where instead of keeping the map locked, you get back a "version", and you can later compare this version with the map's current version to see if the map has changed since you've done the lookup <solid_black>if it hasn't, your lookup result is still valid (just like if you held a read lock the whole time), if it's not, you retry <solid_black>did Apple really have to build their gaming service into XNU / Mach? <youpi>solid_black: I don't think the versioning version would be much better than keeping the lock held, there is not much to gain from freeing the lock between making the lookup and making the access, we don't do much in-between <biblio>damo22: I will try to manage time to test the issue you are facing. Any clue where can I get more info "how to re-produce the bug you are facing". <damo22>biblio: we are already working on it, i think solid_black almost has a solution <damo22>if you read the chat logs from just above here you will see a good explanation of it <biblio>damo22: yupi: solid_black: It would be great to have a buzilla - in my personal view. It will help to speedup things. <biblio>demo22: thanks. I was not aware of it. <damo22>although it seems quite old, i havent created any bugs there <biblio>damo22: I have an account in savannah. Yes. <damo22>solid_black: if the object is returned locked from vm_map_lookup, isnt that enough to ensure the memory backing it wont disappear? <youpi>the memory, yes, the mapping no <youpi>in the current-task-map case, we us the address, so the mapping needs to stay <youpi>we've already discussed that yesterday with solid_black <damo22>so we need to make the vm_map_lookup allow to assume map is already locked and not unlock it at the return <damo22>so we can extend its lock outside <damo22>or why not remove the lock entirely and make all existing cases wrap it in a lock <damo22>and make it assume the map is locked always on entry <youpi>or make vm_map_lookup possibly leave the map locked, that's what we actually need and what can easily be done and was what we discussed yesterday <youpi>making callers have to take the lock is not necessarily the simplest for the code in the end <youpi>also it changes practices, and thus git branches that have been worked on before the change, would get the lock wrong <youpi>while adding a parameter to decide to leave the lock taken or not, will catch such change <damo22>but the first thing vm_map_lookup does is take a read lock the map <damo22>isnt gsync taking that lock already earlier? <youpi>that's what solid_black explained above <youpi>and that we could replace by making vm_map_lookup leave the map locked <Pellescours>if I understand correctly the lock is locked by a reader, then a writer wait, then the reader try to re-lock the lock. Right? Or is it a second lock? <Pellescours>If this is the same lock, can’t we change the implementation to allow to re-lock if we already lock it even if a writer is waiting (because we already hold it, so higher priority than writer) <youpi>Pellescours: that'd make the rwlock significantly higher cost <youpi>because you'd need to record who got what read lock <Pellescours>we may want to test some translator separately first, no? <damo22>solid_black: its in fix-smp branch <damo22>it shows a new bug without pset pinning <damo22>its possibly related to netdde and then when i cancel networking it throws a page fault just before giving me a login shell <damo22>Pellescours: we can keep the pset pinning for now and test more things for sure, but the ultimate test is removing the pinning and see what breaks <damo22>btw, im happy to put your name as co-author on the patches solid_black if you think theyre correct <damo22>i translated your ideas into code <solid_black>if you do get to it first, sure, put me as co-a-b, thanks <solid_black>but I was going to implement it myself in a similar way (or, potentially, with versioning instead of locking), and I'm looking at making more changes to gsync anyway <damo22>what i did is very minimal, i didnt want to touch the vm code but it was required <solid_black>you identified the issues, wrote a reproducer, posted traces & stuff, helped me reproduce things, and looked into it w/ me <damo22>i thought youpi said versioning method is less attractive than the locking way <damo22>what would be good is if you could figure out a solid way to handle patch 3 (the one in my tree with your name on it), then we can send up all three as a set <solid_black>it is still attractive to me, because this is the already-existing mechanism, this is how the designers of Mach intended this function to be used <solid_black>and we clearly don't understand the VM subsystem nearly as good as the original designers (Avie Tevanian) <solid_black>so we better use the API as it was intended to be used <solid_black>(they were clearly aware that locking existed :) and yet they went with versioning for this one API) <damo22>yes, so its not harmful to add a parameter to an existing vm api <youpi>solid_black: it's indented to be used when you have to e.g. sleep in between <solid_black>but, on the other hand, I suspect that the actual reason for the versioning mechnism, as opposed to locking, is that they meant for this to be usable with the kernel maps <youpi>but for our case, there's really no point in unlocking the map <solid_black>...usable for kernel maps, where you might have to allocate things in the kernel map *while* serving the vm fault <solid_black>which hopefully would not disturb the mappings at the address you're fauliting in, so all the work you've done would still be useful, but your verify() would fail once, and then you'd look it up again, and won't have to do the work the second time <solid_black>which is not relevant for us, since gsync is never about a kernel maps <solid_black>(which it should verify btw, and return an error upfront if someone calls it with a kernel map) <damo22>solid_black: if youre happy, i will submit the vm_map_lookup and gsync patches, but not the one you hacked together.... i'll write a proper commit message etc <solid_black>you know what, actually yes, please go ahead and do that <damo22>i think it would be a more practical use of your time if you work on the third patch that has a remaining assert() that assumes there is only one entry <solid_black>I'd leave it in, assuming it never gets hit on a full (non-SMP) system <solid_black>does the kernel ever want to wire/unwire a multi-entry chunk of kernel memory? <azert>Kudos for the smp milestone to damo22 youpi and solid_black! <azert>When a message is recieved, the thread acting as receiver checks if any other thread is also waiting for requests. If there is none, a new thread is spawned. Thus, the current thread continues processing the message while the newly created thread starts listening for new ones. <azert>I wonder if gnumach already starts the receiving thread in the same cpu as the sender? Or if this would be a good thing to implement to improve data locality <azert>Of course now that’s impossible since ext2 is confined to cpu0. Also it might be that this wouldn’t be an optimization <youpi>not sure if that happens in practice <youpi>in principle for a send+recv machmsg call the client thread would sleep immediately and leave the cpu available for the server thread <youpi>but possibly deep inside the code, the wakeup function doesn't know that the client thread will sleep right after that, and thus believes the cpu is busy, and send an ipi to another cpu to run the server thread <youpi>unfortunate, but that's the problem with layering software, some precious information may be lost <azert>Is the send+recv machmsg call a single syscall? <azert>If that’s the case then the kernel has the information <youpi>it is, the "kernel" has the information <youpi>"as all good software should be" <youpi>and keeping information along layers is hard <youpi>(iirc that's where migrating threads were meant to improve performance)