IRC channel logs

***Server sets mode: +nt

<damo22>youpi: how do i do thread_wakeup and assert_wait() in userspace?

***Glider_IRC_ is now known as Glider_IRC

<damo22>i tried using a pair of semaphores for synchronising open/close

<damo22>but i think they are running in the same thread

<damo22>so everything locks up

<damo22>its definitely a synchronisation issue, i get a log like this:

<damo22>device open ALL GOOD

<damo22>device open

<damo22>rump_sys_open failure

<damo22>device open

<damo22>rump_sys_open failure

<damo22>it should not be calling rump_sys_open the second time because i set a bool flag to stop it, but it looks like it hasnt been set by the time it calls device_open the second time

<damo22>i cant see if there are multiple threads with device_open

<damo22>dammit 100GB is too big for ide driver

<damo22>arghh the offset LBA 316M is too big to start the partition for ide

<damo22>even though its small enough

<damo22>in total size

<damo22>once i rearrange my partitions so i can boot off ide and have an AHCI controller as well i'll be set

<damo22>youpi: what is the maximum offset i can have for an ide disk?

<damo22>using the ide driver

<damo22>106811391 is my last sector for / i think its too large

<youpi>damo22: lba28 support should get you 2^28 * 512 bytes, so 128GB

<youpi>I'd be surprised if the driver doesn't support lba28

<damo22>i think i corrupted my symlinks when i chowned them

<youpi>damo22: thread_wakeup and assert_wait can be implemented with condition variables

<damo22>using linux

<youpi>chown shouldn't be corrupting a symlink

<youpi>and not even a translator entry

<damo22>when i rsynced my home dir and chowned -R everything back to demo user on linux it broke all symlinks when booted into hurd

<youpi>I don't think rsync knows about translators (i.e. xattr nowadays), but a symlink wouldn't use a translator

<damo22>well i deleted all symlinks and restored them somehow and everything is working on a 50GB /

<youpi>damo22: for thread_wakeup and assert_wait, you'd need a pthread_mutex_t, a pthread_cond_t, and an int variable. assert_wait would take the mutex, set the variable to 0, release the mutex ; thread_wakeup would lock the mutex, set the variable to 1, signal the condition, unlock the mutex ; thread_block would lock the mutex, and while the variable is 0, call cond_wait, then unlock the mutex

<damo22>thanks, not sure if i need this though, i can't tell if device_open is running again in a new thread or not

<youpi>you can add pthread_self() to printfs to know what do what

<damo22>rump is hanging in qemu on wd0 at atabus0 drive 0

<damo22>but it works on real hw

<damo22>i'll have to add the timeout again

<damo22>k_handle...irq handler 10: release dead delivery 1 unacked irqs

<damo22>k_done

<damo22>do you have a timeout somewhere?

<damo22>disks might take a while to handle the irq

<youpi>usually drivers have a timeout when it expects an irq to signal the end of the operation, yes

<youpi>but it's usually quite long, like a second

<damo22>but youre doing thread_set_timeout(hz)

<damo22>in intr_thread ?

<damo22>so it will think the irq is stuck and clear it?

<damo22>every 10ms

<youpi>damo22: no, in intr_thread I only look at userland processes which died

<youpi>since we are not yet using a send-once port that could immediately notify of such a death

<youpi>and instead we probe for it

<damo22>if ((!e->dest || e->dest->ip_references == 1) && e->unacked_interrupts)

<youpi>that's the probe

<damo22>how does it know its an aborted process

<youpi>by the references being 1,

<youpi>which is the reference gnumach acquired when it created the port

<youpi>i.e. the process reference doesn't exist any more

<damo22>it couldnt be a just opened one?

<youpi>no, because the process reference is already there when we create the port

<youpi>the gnumach ref is a second one, not the first one

<damo22>is there a chance i am not setting e->dest

<youpi>no it's allocated when the port is created

<damo22>so then something died in my rump driver?

<youpi>oh, I completely misunderstood what you said above

<youpi>when I said drivers usually have a timeout, I mean inside the rump disk driver

<youpi>there is no timeout inside the user-intr support for irqs

<youpi>the message above definitely means that userland somehow dropped the port

<damo22>k_handle...irq handler 10: release dead delivery 1 unacked irqs k_done

<youpi>either by dying, or by closing the port

<damo22>in the handler, it closed the port somehow

<youpi>only one thread dying in a process wouldn't close the port

<youpi>since it's the process which owns port

<damo22>i dont know how its possible i got this message in the handler

<damo22> /* For now netdde calls device_intr_enable once after registration. Assume

<damo22> * it does so for now. When we move to IRQ acknowledgment convention we will

<damo22> * change this. */

<damo22> new->unacked_interrupts = 1;

<damo22>the first time the interrupt handler is called it is considered unacked?

<youpi>that will be the idea yes

<youpi>userland will have to ack each interrupt

<youpi>by re-enabling it

<damo22>do i need to change my pci-userspace code for the new gnumach?

<youpi>currently netdde also explicitly enable the interrupt after registering it

<youpi>I was thinking about dropping that, but perhaps we want to keep it

<youpi>no, the current gnumach only keeps its existing behavior

<youpi>I just added the comment to explicit what netdde is currently doing

<damo22> https://github.com/zamaudio/pci-userspace/blob/upstreaming-rump-hurd/src-gnu/pci_user-gnu.c#L305

<damo22>im calling device_intr_enable() twice

<youpi>no, just one before running the server loop, and after each interrupt, which is what netdde does

<damo22>ok

<damo22> https://imgur.com/a/cjZ5dr8

<damo22>i dont understand where this could be failing

<youpi>is it perhaps just talking about a previous instance of your translator?

<youpi>Mmm, no, you wouldn't have had irq fired if it wasn't cleaned up yet

<damo22>im running this with no translator, just opening the block device

<youpi>perhaps other code is deallocating a random port

<youpi>and at some point by bad luck it's the delivery port

<youpi>I remember you posting warnings about deallocating bogus ports

<youpi>perhaps you could print the port numuber of the delivery port and print along every port deallocation in the code

<damo22>okay

<youpi>by translator I just meant your rump process

<damo22>wd0: drive supports 16-sector PIO transfers, LBA48 addressing

<damo22>wd0: 100 MB, 203 cyl, 16 head, 63 sec, 512 bytes/sect x 204800 sectors

<damo22>wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100)

<damo22>stupid translators were still running

<damo22>i need to force it to go away twice

<damo22>ok now im getting a different problem

<damo22>returning from establish

<damo22>irq handler 10: release a dead delivery port

<damo22>irq handler 10: removed

<damo22>irq fired: 10

<damo22>k_handle... irq handler 10: release dead delivery 1 unacked irqs

<damo22>k_done

<damo22>its reproducible in that order every time

<youpi>damo22: just to make sure what is really happening, I have pushed more debugging to the master-user_level_drivers gnumach branch

<youpi>in case the kernel messages you are seeing are not actually about the process you are running, but a previous one

<damo22>ok

<damo22>settrans -fga /dev/rump /bin/sh -c 'exec >> /root/rump.log 2>&1 && /usr/bin/env RUMP_VERBOSE=1 RUMP_NCPU=1 /hurd/rumpdisk'

<damo22>wd0(ahcisata0:0:0): using PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100) (using DMA)

<damo22>/hurd/rumpdisk: must be started as a translator

<damo22>its getting that far

<damo22>but somehow when i run it with exec it gets killed

<damo22>oh crap !!!! my libstore.so is out of date

<damo22>i have two

<damo22>ok why is block.opened_count reset to 0 the next time it opens?

<damo22>the static struct is not holding state

<youpi>that you can watch with gdb

<damo22>i did

<damo22>its not holding the state

<damo22>it resets every time device_open gets called

<damo22>if it spawns a new thread when it opens the device how do you synchronise the state?

<youpi>damo22: by "watch", I mean the "watch" command

<youpi>which catches changes in the variable

<youpi>so you get to know *what* is changing it

<youpi>about synchronization, the question is too vague for me to provide any useful answer beyond "use mutex, cond, etc."

<damo22>in block.c i have static struct block_data block;

<damo22>as a global

<damo22>why would the state be resetting to zero for that global when device_open is called again

<damo22>the pid is getting updated

<damo22>wth

<damo22>its dying and the translator is resetting it?

<damo22>gdb isnt reattaching so its not a different pid

<damo22> at ../../libports/manage-one-thread.c:122

<damo22>122 while (err != MACH_RCV_TIMED_OUT);

<damo22>(gdb) p err

<damo22>$6 = EMACH_SEND_INVALID_RIGHT

<damo22>(gdb) p rumpblock

<damo22>$8 = {port = {class = 0x0, refcounts = {references = {hard = 0, weak = 0}, value = 0}, mscount = 0, cancel_threshold = 0, flags = 0, port_right = 0, current_rpcs = 0x0, bucket = 0x0, hentry = 0x0, ports_htable_entry = 0x0}, device = { emul_ops = 0x1218a60 <rump_block_emulation_ops>, emul_data = 0x121f9e0 <rumpblock>}, mode = 3, rump_fd = 3, media_size = 104857600, block_size = 512, opened_count = 6, opening = true, closing = fal

<damo22>se}

<damo22>seems to be losing the port info

<damo22>do i create a new port for it every open?

<damo22>how do i tell mach that the device succeeded to open?

<damo22> *devp = (device_t)&bd->device;

<damo22> *devicePoly = MACH_MSG_TYPE_MOVE_SEND;

<damo22> return D_SUCCESS;

<damo22>is that it?

<damo22>i dont understand ports

<damo22>root@zamhurd:~# mount /dev/wd0s1 /mnt -t ext2fs

<damo22>root@zamhurd:~# ls /mnt

<damo22>hello lost+found

<gnu_srs>damo22: Does that mean your rump disk driver works?

<damo22>pretty much, it needs a lot of cleaning up

<gnu_srs>Nice and congrats :) ;)

<gnu_srs>VM or real HW?

<damo22>this is running in a VM at the moment

<gnu_srs>k!

<damo22>when i figure out what the bogus ports are from i will try on real hw

<damo22>the problem is, i made it work by reverting block.c to an older version but now i have a lot of cleanup to do

<damo22>i still dont know how my latest block.c is broken

***Glider_IRC__ is now known as Glider_IRC

IRC channel logs

2019-11-16.log