***Server sets mode: +nt
***Server sets mode: +nt
***justan0theruser is now known as justanotheruser
<damo22>youpi: when i call "umount /mnt" i get one deallocate bug <youpi>does that happen without rump? <damo22>if ext2fs is being passed an invalid port to play with then <damo22>it actually unmounts the rumpdisk <youpi>yes, I've already noticed that <youpi>that happens with xen's block device too <damo22>im using the devp setting from libmachdev/block.c <damo22>225 *devp = ports_get_right (bd); <damo22>ds_device_open (open_port=127, reply_port=115, reply_port_type=18, mode=3, <damo22> name=0x1fffd5c "/dev/wd0", devp=0x2001d64, devicePoly=0x1fffc7c) <youpi>yes, what do yo uput in *devicePoly ? <damo22>i'll have to find the source im currently in gdb <damo22>225 *devp = ports_get_right (bd); <damo22>226 *devicePoly = MACH_MSG_TYPE_MAKE_SEND; <damo22>ahh the reply port is sometimes the same as the normal port <damo22>Thread 1 hit Breakpoint 1, device_open (reply_port=78, reply_port_type=18, <damo22>Thread 1 hit Breakpoint 1, device_open (reply_port=115, reply_port_type=18, <damo22>Thread 1 hit Breakpoint 1, device_open (reply_port=78, reply_port_type=18, <damo22>its difficult to debug because the deallocation occurs zillion times <damo22>it seems to happen after processing the mach msg from device_open <damo22>Thread 1 hit Breakpoint 1, ds_device_open (open_port=127, reply_port=110, <damo22>Thread 1 hit Breakpoint 1, ds_device_open (open_port=108, reply_port=125, <damo22>Thread 1 hit Breakpoint 1, ds_device_open (open_port=127, reply_port=108, <damo22>and in between its a bogus port 110 <damo22>if ds_device_open is calling the emulation device_open, why does it already know the port? <damo22>i am allocating a port for the device in device_open <youpi>that's not supposed to happen that the open_port (the device master port) would be the same as the port you are allocating <youpi>do you know what the device master port is? <youpi>that's what opener use to show they are allowed to access the resource <youpi>processes running with uid=0 have it <youpi>when calling device_open(), they pass it along <youpi>so device_open can check that it's the master port <youpi>that's what is_master_device() tests <youpi>by checking that it's a port known in the mach port_bucket <youpi>and if so, it calls ports_port_deref() <youpi>since it doesn't need the port any more <youpi>so that will deallocate the port <damo22>so where do i call is_master_device? <youpi>and thus it's normal that allocating a p ort for the device itself will return the same port <youpi>it is already called in ds_device_open() <youpi>there however seems to be one issue: if for some other reason the function fails, one should not have released the port <youpi>because on error mig does the cleanup <damo22>it does not have is_master_device() <youpi>I'm really lost in what source code you are actually using <damo22>i havent quite pushed what i am running now <youpi>ok, then in the long run you will want to call is_master_device() <youpi>otherwise you'll let any process open it <youpi>now, that means that it's not being deallocated there <youpi>and thus it's surprising that you'd get the same port for the device port <damo22>> if (!is_master_device (open_port)) <damo22>these two lines are missing from my src <damo22>i cant remember why i removed it <damo22>i thought it was unnecessary because i am not a mach device <youpi>more precisely, the unix permissions should be fine enough <youpi>but deallocating the open_port on success will be needed, otherwise that'll be a port leak <youpi>no, mach_port_deallocate(open_port) <youpi>bd has nothing to do with bd <youpi>open_port has nothing to do with bd <damo22>it seems to be complaining about the reply_port <damo22>it seems that the reply_port from a previous call to device_open is not being deallocated and it tries to deallocate the wrong port the next time but the reply port has changed <damo22>Thread 1 hit Breakpoint 1, ds_device_open (open_port=115, reply_port=114, <damo22>Thread 1 hit Breakpoint 1, ds_device_open (open_port=112, reply_port=108, <youpi>when is "bogus port" printed exactly? <youpi>are you sure it happens after or before ds_device_open is getting called? <damo22>let me repeat it and i will document <youpi>really, tracing from the kernel directly provides the answer <youpi>switch the bogus allocation variable <youpi>and trace/u will show you the bbacktrace <damo22>the backtrace is useless because its in mach port loop <youpi>yes but I'm telling you that trace/u ALSO prints the user part <youpi>inside your userland process <damo22>0x81b8c1c <syscall_mach_port_deallocate+12>: 0x909066c3 <youpi>see, that's the userland part of the syscall <youpi>and below you have the callers <damo22>i need to paste here so i dont lose it 0x814e755 <youpi>it could be useful to build your program with -fno-omit-frame-pointer, so it'll be easier for kdb to unroll the stack <damo22>0x814e755 <ports_manage_port_operations_one_thread+117> <youpi>ok, so it must be on cleanup after processing the message <youpi>i.e. your message management did one deallocation that it shouldn't have done <youpi>and when cleanup came it did it again <youpi>you could use gdb to go step by step there, to see which port exactly is cleaned up a second time <damo22>could it be that im doing the rpc with too many elements in the message? <damo22>because i overloaded the device struct <damo22>i dont know where to set the size of the message <youpi>that's not passed in messages, that's only allocated on your side <damo22>device_read was not replying correctly <damo22>pushed working disk driver to incubator <youpi>damo22: don't always deallocate open_port, only deallocate on success <youpi>on error mig will do the cleanup <youpi>so you mustn't do it yourself before that <youpi>it was indeed bogus to both call ds_device_read_reply and return D_SUCCESS <youpi>either you return MIG_NO_REPLY and call ds_device_read_reply later <youpi>currently your code is correct for the success case, but not for EIO <youpi>don't call ds_device_read_reply in addition to returning EIO <youpi>but even in the success case, instead of calling ds_device_read_reply and return MIG_NO_REPLY, you can just return D_SUCCESS <youpi>since you'll have already set *bytes_read <youpi>using ds_device_read_reply is only needed when you work asynchronously <youpi>here you are synchronous so you can just reply immediately <damo22>somehow your suggestions broke device_read <damo22>after get_status i get EIEIO on the ext2fs mount <damo22>i left the reply with D_SUCCESS and then return MIG_NO_REPLY and it works <damo22>i cant mount a partition past the 128GB boundary <damo22>i think its because storeio partition thing doesnt work <damo22>because it goes only 50GB past the start of the disk <damo22>it has same behaviour on native hw <damo22>pushed everything that ive done now <youpi>if it doesn't work past 128GB perhaps the rump ide driver only supports lba28 <youpi>well, somehow there's a point where it doesn't :) <youpi>the device interface itself supports 32bit block numbers, so that's 2T <youpi>libstore uses 64bit off_t and the device interface <youpi>so I don't see how a 128GB limit could be there <youpi>128GB definitely is a 2^28 thing <damo22>yeah but the log from rumpdisk shows LBA48 <youpi>please show me the change you've made to make the device_read return D_SUCCESS <youpi>it's really supposed to work <youpi>probably there's a detail you are doing wrongly <youpi>showing LBA48 perhaps only means the device supports it, and not the driver <damo22>i dont think netbsd only supports 128GB drives in 2019 <youpi>possibly it's not netbsd itself, but some glue at some point <damo22>its not really glue, it compiles the actual source <youpi>yes, but there's glue around it to make it work with the rest <damo22>i'll have to investigate further but not tonight <youpi>I'd say try with dd bs=1M skip=150000 to see where the offset gets wrong <damo22>- ds_device_read_reply (reply_port, reply_port_type, D_SUCCESS, buf, *bytes_read); <damo22>doesn't it need the "buf" to send back through the message? <youpi>you need to set *data = buf indeed <youpi>(and drop the ds_device_read_reply for the EIO case, they are really not useful, just returning EIO will do it <youpi>(no need to set *bytes_read on errors, that won't be transmitted anyway) <damo22>if this works on native hw i will be very happy <damo22>ive just compiled a static binary with the latest changes <damo22>could there be a lba28 restriction there? <damo22>i get EIEIO reading some of the files in /mnt <youpi>I don't think it'd be in libpartesd <youpi>really, just putting printf along the path will tell you where it gets wrong <damo22>its very hard to debug when i need to repack initrd every time <youpi>I thought that irq sharing fix would allow to run it live <youpi>where you can make qemu expose several ahci controllers <damo22>it does, but i have no errors on qemu <youpi>so you can let one driven by gnumach, and the other by rump <youpi>but did you try >128G access in qemu ? <damo22>i cant select which ahci controller rump can control <youpi>to e.g. exclude some pci devices by hand <youpi>but I gues there is a probe function which can return an error? <youpi>which you could do when the pci device is to be excluded <youpi>you can also make qemu expose both an ide device and an ahci device <damo22>yeah that is what i have been doing <youpi>sure but that's enough for your / <youpi>you have a / bigger than 128G ? <damo22>problem is i spend most of my time in e2fsck <damo22>because it sets the dirty bit when i try mounting real disk <damo22>and if it fails it cant unset it <youpi>you can use a small partition for your tests <damo22>less space, but enough for dev environment <youpi>I don't understand: only rump mount does bring missing umounts, no? <youpi>so that your / always correctly umounts <damo22>it doesnt even need to be cleaned <damo22>but i have real disk with another / <damo22>i need to test mounting the real disk <youpi>so here you are talking about tests on real hw <youpi>I can understand that that case is hard to debug <youpi>but most often you can debug with qemu <youpi>(and my students wonder why we are making them work on a purely simultated system, not even qemu, to do their OS assignments...) <damo22>but i seem to only encounter problems on real hw, not in qemu <damo22>so i think its all good, then transfer disk to real system and boom it explodes ***Emulatorman_ is now known as Emulatorman