IRC channel logs

2021-08-22.log

back to list of logs

<youpi>so rumpdisk doesn't seem to be leaking any more
<youpi>but it's eating a terrible lot of memory
<youpi>and since we have to keep it calling mlockall() that's really a concern
<youpi>(its vmsize is 656M, that's really too much)
<youpi>damo22: it seems that rump-disk calls pci_conf_read quite a lot, we'd want to fix that since that's costly
<damo22>in which layer does the recnum get converted into sectors?
<damo22>or is it always in sectors
<damo22>regarding memory usage, we are anonymous mmaping every page in rumpdisk_device_read, does that get freed?
<youpi>recnum: that depends on the bd->block_size
<youpi>callers of the device interface are supposed to use that as the base
<youpi>and thus convert into that base before calling device_*
<youpi>device_read: that's the dealloc flag I have just added to gnumach
<youpi>as I said above, it's not leaking any more
<damo22>oh ok
<damo22>i am struggling to find out where the bug is, is it possible my ext2fs is corrupt even though it fscks fine?
<youpi>again, there is no point in looking somewhere else than: for the same block number, one driver returns something, another driver reports something
<youpi>else
<youpi>the filesystem doesn't matter, it's at the block level that the problem is
<damo22>rihgt
<damo22>i can try creating a small dummy disk with 4 partitions
<youpi>why creating something new?
<youpi>you already have your reproducer
<damo22>because i want to inject a block into the disk and see where it ends up
<youpi>what for?
<youpi>you already have a case that doesn't work properly
<youpi>no need to err out somewhere else
<youpi>fix that case already
<damo22>it will tell me the offset, then i can have an idea where it is going wrong
<youpi>no it won't
<damo22>minimal test cases are easier to debug
<youpi>it will just tell how by how much it's going wrong
<damo22>not a 320GB disk
<youpi>but not tell you *what* source code is going wrong
<youpi>the disk size doesn't amtter
<youpi>the amount of work made to do by the code you run does
<youpi>here you have boiled it down to one block which is read erroneously
<youpi>that's way plenty small enough to work on it
<youpi>really, I don't understand your thinking
<youpi>you have a *SMALL* testcase already
<youpi>the path from dd to the rump disk driver is not small however, sur
<youpi>but that's precisely my point: put prints along the way to check how well it's going
<youpi>and then you 'll notice exactly *where* it goes wrong
<youpi>and then you can debug from there
<damo22>i cant because it prints a zillion lines just to mount the partition
<youpi>you don't need to mount the partition!m!!m*
<youpi>s dgfml$
<damo22>oh yea
<youpi>that was the whole point of using debugfs to get a block number
<youpi>boil it down to a mere dd
<damo22>no it still prints a zillion lines to dd from partition3
<youpi>??
<youpi>how so?
<youpi>dd will make only one open, one io_read, baiscally
<youpi>what else would happen?
<youpi>using a smaller disk won't change that
<youpi>ah, there are the reads from the part store layer possibly, but you can as well just read directly from the whole device, with the complete offset taking the partition into account
<damo22> http://paste.debian.net/plain/1208508
<damo22>i dont think its the part layer
<damo22>since the data is still different between drivers at the same offset from sd0
<damo22>XXXX libstore: dev_read recnum=291360792
<damo22>wdstrategy (wd0)
<damo22>wdstrategy (wd0) sizeof(daddr_t)=8 blkno=291360792 bp->b_blkno=0
<damo22>ok looks like the driver subtracts 3 blocks
<damo22> https://salsa.debian.org/hurd-team/rumpkernel/-/blob/master/buildrump.sh/src/sys/dev/ata/wd.c#L560 from here onward i dont really understand but the block changes
<damo22>XXXX libstore: dev_read recnum=291360787
<damo22>wdstrategy (wd0)
<damo22>wdstrategy (wd0) sizeof(daddr_t)=8 blkno=291360784 bp->b_blkno=0
<damo22>XXXX libstore: dev_read recnum=291360789
<damo22>wdstrategy (wd0)
<damo22>wdstrategy (wd0) sizeof(daddr_t)=8 blkno=291360788 bp->b_blkno=0
<damo22>off by one or 3
<damo22>??
<damo22> http://paste.debian.net/plain/1208544
<damo22>seems to be failing here https://salsa.debian.org/hurd-team/rumpkernel/-/blob/master/buildrump.sh/src/sys/miscfs/specfs/spec_vnops.c#L724
<damo22>youpi: its in the driver somewhere http://paste.debian.net/plain/1208547
<damo22>i asked for a block and it read a different one
<damo22>i think its because we are missing an ioctl?
<damo22> if (bdev_ioctl(vp->v_rdev, DIOCGPARTINFO, &pi, FREAD, l) == 0) bsize = pi.pi_bsize; else bsize = BLKDEV_IOSIZE;
<damo22>since that ioctl is not implemented by rumpdisk, bsize defaults to 2048
<damo22>?
<damo22>hmm i think because its a raw block device, there is no partition, therefore the ioctl inside librump fails, thus defaults to 2048
<damo22>it looks like the driver reads blocks of 2048 regardless, and finds the sector of 512 i asked for and tries to return that, but gets it wrong
<damo22>(20:35:03) mlelstv: if you want to pass I/O through to the device, you need the character device. I.e. rwd0 instead of wd0.
<damo22>(20:37:12) mlelstv: wd0d is passed through the buffer cache. rwd0d is not.
<damo22>maybe that is why we are getting so much memory usage too?
<youpi>that's possible yes
<youpi>and we don't want the buffer cache, we already cache data
<damo22>hmm wdread crashes rumpdisk
<damo22>in cdev_read