IRC channel logs

2019-11-16.log

back to list of logs

***Server sets mode: +nt
<damo22>youpi: how do i do thread_wakeup and assert_wait() in userspace?
***Glider_IRC_ is now known as Glider_IRC
<damo22>i tried using a pair of semaphores for synchronising open/close
<damo22>but i think they are running in the same thread
<damo22>so everything locks up
<damo22>its definitely a synchronisation issue, i get a log like this:
<damo22>device open ALL GOOD
<damo22>device open
<damo22>rump_sys_open failure
<damo22>device open
<damo22>rump_sys_open failure
<damo22>it should not be calling rump_sys_open the second time because i set a bool flag to stop it, but it looks like it hasnt been set by the time it calls device_open the second time
<damo22>i cant see if there are multiple threads with device_open
<damo22>dammit 100GB is too big for ide driver
<damo22>arghh the offset LBA 316M is too big to start the partition for ide
<damo22>even though its small enough
<damo22>in total size
<damo22>once i rearrange my partitions so i can boot off ide and have an AHCI controller as well i'll be set
<damo22>youpi: what is the maximum offset i can have for an ide disk?
<damo22>using the ide driver
<damo22>106811391 is my last sector for / i think its too large
<youpi>damo22: lba28 support should get you 2^28 * 512 bytes, so 128GB
<youpi>I'd be surprised if the driver doesn't support lba28
<damo22>i think i corrupted my symlinks when i chowned them
<youpi>damo22: thread_wakeup and assert_wait can be implemented with condition variables
<damo22>using linux
<youpi>chown shouldn't be corrupting a symlink
<youpi>and not even a translator entry
<damo22>when i rsynced my home dir and chowned -R everything back to demo user on linux it broke all symlinks when booted into hurd
<youpi>I don't think rsync knows about translators (i.e. xattr nowadays), but a symlink wouldn't use a translator
<damo22>well i deleted all symlinks and restored them somehow and everything is working on a 50GB /
<youpi>damo22: for thread_wakeup and assert_wait, you'd need a pthread_mutex_t, a pthread_cond_t, and an int variable. assert_wait would take the mutex, set the variable to 0, release the mutex ; thread_wakeup would lock the mutex, set the variable to 1, signal the condition, unlock the mutex ; thread_block would lock the mutex, and while the variable is 0, call cond_wait, then unlock the mutex
<damo22>thanks, not sure if i need this though, i can't tell if device_open is running again in a new thread or not
<youpi>you can add pthread_self() to printfs to know what do what
<damo22>rump is hanging in qemu on wd0 at atabus0 drive 0
<damo22>but it works on real hw
<damo22>i'll have to add the timeout again
<damo22>k_handle...irq handler 10: release dead delivery 1 unacked irqs
<damo22>k_done
<damo22>do you have a timeout somewhere?
<damo22>disks might take a while to handle the irq
<youpi>usually drivers have a timeout when it expects an irq to signal the end of the operation, yes
<youpi>but it's usually quite long, like a second
<damo22>but youre doing thread_set_timeout(hz)
<damo22>in intr_thread ?
<damo22>so it will think the irq is stuck and clear it?
<damo22>every 10ms
<youpi>damo22: no, in intr_thread I only look at userland processes which died
<youpi>since we are not yet using a send-once port that could immediately notify of such a death
<youpi>and instead we probe for it
<damo22>if ((!e->dest || e->dest->ip_references == 1) && e->unacked_interrupts)
<youpi>that's the probe
<damo22>how does it know its an aborted process
<youpi>by the references being 1,
<youpi>which is the reference gnumach acquired when it created the port
<youpi>i.e. the process reference doesn't exist any more
<damo22>it couldnt be a just opened one?
<youpi>no, because the process reference is already there when we create the port
<youpi>the gnumach ref is a second one, not the first one
<damo22>is there a chance i am not setting e->dest
<youpi>no it's allocated when the port is created
<damo22>so then something died in my rump driver?
<youpi>oh, I completely misunderstood what you said above
<youpi>when I said drivers usually have a timeout, I mean inside the rump disk driver
<youpi>there is no timeout inside the user-intr support for irqs
<youpi>the message above definitely means that userland somehow dropped the port
<damo22>k_handle...irq handler 10: release dead delivery 1 unacked irqs k_done
<youpi>either by dying, or by closing the port
<damo22>in the handler, it closed the port somehow
<youpi>only one thread dying in a process wouldn't close the port
<youpi>since it's the process which owns port
<damo22>i dont know how its possible i got this message in the handler
<damo22> /* For now netdde calls device_intr_enable once after registration. Assume
<damo22> * it does so for now. When we move to IRQ acknowledgment convention we will
<damo22> * change this. */
<damo22> new->unacked_interrupts = 1;
<damo22>the first time the interrupt handler is called it is considered unacked?
<youpi>that will be the idea yes
<youpi>userland will have to ack each interrupt
<youpi>by re-enabling it
<damo22>do i need to change my pci-userspace code for the new gnumach?
<youpi>currently netdde also explicitly enable the interrupt after registering it
<youpi>I was thinking about dropping that, but perhaps we want to keep it
<youpi>no, the current gnumach only keeps its existing behavior
<youpi>I just added the comment to explicit what netdde is currently doing
<damo22> https://github.com/zamaudio/pci-userspace/blob/upstreaming-rump-hurd/src-gnu/pci_user-gnu.c#L305
<damo22>im calling device_intr_enable() twice
<youpi>no, just one before running the server loop, and after each interrupt, which is what netdde does
<damo22>ok
<damo22> https://imgur.com/a/cjZ5dr8
<damo22>i dont understand where this could be failing
<youpi>is it perhaps just talking about a previous instance of your translator?
<youpi>Mmm, no, you wouldn't have had irq fired if it wasn't cleaned up yet
<damo22>im running this with no translator, just opening the block device
<youpi>perhaps other code is deallocating a random port
<youpi>and at some point by bad luck it's the delivery port
<youpi>I remember you posting warnings about deallocating bogus ports
<youpi>perhaps you could print the port numuber of the delivery port and print along every port deallocation in the code
<damo22>okay
<youpi>by translator I just meant your rump process
<damo22>wd0: <QEMU HARDDISK>
<damo22>wd0: drive supports 16-sector PIO transfers, LBA48 addressing
<damo22>wd0: 100 MB, 203 cyl, 16 head, 63 sec, 512 bytes/sect x 204800 sectors
<damo22>wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100)
<damo22>stupid translators were still running
<damo22>i need to force it to go away twice
<damo22>ok now im getting a different problem
<damo22>returning from establish
<damo22>irq handler 10: release a dead delivery port
<damo22>irq handler 10: removed
<damo22>irq fired: 10
<damo22>k_handle... irq handler 10: release dead delivery 1 unacked irqs
<damo22>k_done
<damo22><hang>
<damo22>its reproducible in that order every time
<youpi>damo22: just to make sure what is really happening, I have pushed more debugging to the master-user_level_drivers gnumach branch
<youpi>in case the kernel messages you are seeing are not actually about the process you are running, but a previous one
<damo22>ok
<damo22>settrans -fga /dev/rump /bin/sh -c 'exec >> /root/rump.log 2>&1 && /usr/bin/env RUMP_VERBOSE=1 RUMP_NCPU=1 /hurd/rumpdisk'
<damo22>wd0(ahcisata0:0:0): using PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100) (using DMA)
<damo22>/hurd/rumpdisk: must be started as a translator
<damo22>its getting that far
<damo22>but somehow when i run it with exec it gets killed
<damo22>oh crap !!!! my libstore.so is out of date
<damo22>i have two
<damo22>ok why is block.opened_count reset to 0 the next time it opens?
<damo22>the static struct is not holding state
<youpi>that you can watch with gdb
<damo22>i did
<damo22>its not holding the state
<damo22>it resets every time device_open gets called
<damo22>if it spawns a new thread when it opens the device how do you synchronise the state?
<youpi>damo22: by "watch", I mean the "watch" command
<youpi>which catches changes in the variable
<youpi>so you get to know *what* is changing it
<youpi>about synchronization, the question is too vague for me to provide any useful answer beyond "use mutex, cond, etc."
<damo22>in block.c i have static struct block_data block;
<damo22>as a global
<damo22>why would the state be resetting to zero for that global when device_open is called again
<damo22>the pid is getting updated
<damo22>wth
<damo22>its dying and the translator is resetting it?
<damo22>gdb isnt reattaching so its not a different pid
<damo22> at ../../libports/manage-one-thread.c:122
<damo22>122 while (err != MACH_RCV_TIMED_OUT);
<damo22>(gdb) p err
<damo22>$6 = EMACH_SEND_INVALID_RIGHT
<damo22>(gdb) p rumpblock
<damo22>$8 = {port = {class = 0x0, refcounts = {references = {hard = 0, weak = 0}, value = 0}, mscount = 0, cancel_threshold = 0, flags = 0, port_right = 0, current_rpcs = 0x0, bucket = 0x0, hentry = 0x0, ports_htable_entry = 0x0}, device = { emul_ops = 0x1218a60 <rump_block_emulation_ops>, emul_data = 0x121f9e0 <rumpblock>}, mode = 3, rump_fd = 3, media_size = 104857600, block_size = 512, opened_count = 6, opening = true, closing = fal
<damo22>se}
<damo22>seems to be losing the port info
<damo22>do i create a new port for it every open?
<damo22>how do i tell mach that the device succeeded to open?
<damo22> *devp = (device_t)&bd->device;
<damo22> *devicePoly = MACH_MSG_TYPE_MOVE_SEND;
<damo22> return D_SUCCESS;
<damo22>is that it?
<damo22>i dont understand ports
<damo22>root@zamhurd:~# mount /dev/wd0s1 /mnt -t ext2fs
<damo22>root@zamhurd:~# ls /mnt
<damo22>hello lost+found
<gnu_srs>damo22: Does that mean your rump disk driver works?
<damo22>pretty much, it needs a lot of cleaning up
<gnu_srs>Nice and congrats :) ;)
<gnu_srs>VM or real HW?
<damo22>this is running in a VM at the moment
<gnu_srs>k!
<damo22>when i figure out what the bogus ports are from i will try on real hw
<damo22>the problem is, i made it work by reverting block.c to an older version but now i have a lot of cleanup to do
<damo22>i still dont know how my latest block.c is broken
***Glider_IRC__ is now known as Glider_IRC