IRC channel logs

<damo22>i got a page fault

<damo22>i tried to write 0 to the first byte of each page on the read and read first byte of each page on the buffer to write

<damo22>eg:

<damo22> for (i = 0; i < npages; i++)

<damo22> ((uint8_t *)buf)[i * pagesize] = 0;

<youpi>a page fault? from the kernel itself?

<damo22>yes

<youpi>with which backtrace in the kernel?

<damo22>let me reproduce

<damo22>it was last night and then i gave up

<damo22>hmm i get different results when mounting

<damo22> http://paste.debian.net/plain/1208925

<damo22>sometimes a crash of the process, sometimes a page fault sometimes this fault

<youpi>oh you do have a reproducer for this?

<youpi>I've seen it sometimes

<damo22>mount /dev/wd0s3 /part3

<damo22>without the revert, but with the raw patch

<damo22>and the mlockall

<youpi>what revert?

<damo22>you reverted my patch

<youpi>ah, so with the raw patch

<damo22>yes

<youpi>you wrote "but" above, so that was contradictory :)

<damo22>i also added something, a fault-in thing

<damo22>to read/write one byte of each page in rumpdisk_device_read/write

<youpi>in principle that shouldn't be needed with the mlockall call

<damo22>ok

<damo22>i can remove and retest

<youpi>does adding it in or not changes the behavior?

<youpi>that can be useful to check yes

<damo22>ah wait, the crash i got was *without* mlockall but with my fault-in

<damo22>and with the raw patch

<damo22>what would you like me to test?

<damo22>+raw +mlockall, +raw -mlockall ?

<youpi>I don't know off-hand, but I guess the it's the page fault or pmap_remove bug that is the most easy to debug

<youpi>because they point at something specific that is wrong

<youpi>I'm wondering about the backtrace of the page fault

<damo22>i got that last night, i am trying to reproduce

<youpi>the pmap_remove issue could look like a race between deallocations

<youpi>normally that wouldn't happen thanks to reference counting

<damo22>it hung and when i step into kernel debugger, there is a pending page fault

<damo22> http://paste.debian.net/plain/1208927

<youpi>uh that one is really messed, the kernel keeps stepping on itself, until filling the stack

<damo22>ah ok

<youpi>so I'd say concentrate on the pmap_remove

<youpi>for a start, print the name of the task

<youpi>possibly it's mount or ext2fs, but not really sure

<youpi>at that point "show task 0xed919310" may work, but not sure

<youpi>possibly better look it up by hand, by computing the address of the name field

<youpi>I don't remember if ddb can print strings with e.g. p/s

<damo22>+ for (i = 0; i < npages; i++)

<damo22>+ dummy_read = ((volatile uint8_t *)data)[i * pagesize];

<youpi>Mmm, the va parameter of pmap_remove_range is 0, that's odd

<youpi>nothing is supposed to be mapped at 0 in principe

<youpi>but possibly it's just compiler optimizations

<damo22>ok i will look more into it after work, gtg

<youpi>the entry parameter of vm_map_entry_delete could help with its vme_start field, to check whether it's indeed address 0

<damo22>ok

<youpi>basically I'd say try to pinpoint what change does switch from triggering the issues and not triggering the issues

<youpi>s/from/between

<youpi>and possibly we can try to variate from that to have an idea of what kind of change does the trigger

<damo22>hey youpi i have a small test case program that seems to misbehave: http://paste.debian.net/plain/1208936

<damo22>the value of dummy should be 0xab right

<damo22>also what happened to gdb?

<damo22>lol uninitialised value

<youpi>damo22: always build with -Wall

<youpi>and -O2

<damo22>i did build with -Wall

<youpi>yes but you also need -O2 for the compiler to avoid being dumb

<youpi>and thus track the liveness of values etc.

<damo22>ok

<damo22>im building netbsd trunk, at least the tools, and then try to build rump

<throwaway>damo22: regarding memset (buf, 0, npages * pagesize) in your last patch. Souldn't mmap provide a flag to assure this happens?

<youpi>it does, but it's not implemented yet

<throwaway>well it seems easy to implement "the crappy way"

<throwaway>what I mean is that this is an hack, but I would assume the best place for this hack is the libc

<youpi>yet it is still waiting for somebody to implement it :)

<youpi>but the thing is: populating is not enough here, we also need wiring

<youpi>since we plan to do a dma, we have to make sure the page doesn't move, no swapping

<throwaway>in that case it needs kernel support

<youpi>there is kernel support for that

<youpi>that's the mlockall I mentioned

<youpi>we could also mlock, but we *also* need the rest of the process not to get swapped out

<throwaway>what is the issue with mlockall? That the pages are locked also in the kernel virtual address space?

<youpi>that rumpdisk currently allocates 600M

<throwaway>oh that's a lot! Maybe mlockall should be used with MCL_CURRENT just at the beginning, and then one needs a smart use of mlock and munlock to allow swapping of the buffers after DMA

<youpi>we cannot let rumpdisk swap at all

<youpi>it wouldn't be able to swap itself in

<throwaway>not swap itself, just the buffers

<youpi>the 600M are not buffers

<throwaway>what can be so big?

<youpi>please read the backlog

<youpi>I'm sorry I cannot take the time to re-explaine everything

<youpi>basically, we don't know

<youpi>(yet)

<throwaway>ok sorry thanks for the explanations

<damo22>it must be a missing deallocation somewhere

<youpi>no, it's not leaking

<damo22>it does not use 600M all the time, it grows and then chews all my memory

<youpi>oh? in my tests the rss without mlockall caps at 100M

<youpi>did you install the latest gnumach-dev to have the dealloc fix?

<damo22>i almost built rumpkernel but it destroyed my filesystem with 0 byte writes

<damo22>and then chewed all memory

<damo22>no i dont think i have gnumach-dev

<youpi>? how do you have device.defs then?

<damo22>i mean not latest

<youpi>that's why

<youpi>device_read() are not deallocating the buffer then

<damo22>aha

<damo22>it started with about 25M

<damo22>and grew

<youpi>yes, it grows as much as the data you read from the disk

<youpi>so that's quick :)

<damo22>yes

<damo22>rump pretty much builds well on netbsd

<youpi>good :)

<damo22>but the pci-userspace libs are not in upstream source code

<damo22>lib*

<damo22>so there are missing librumpdev_pci*

<damo22>im not sure how to do a full build of rump with upstream netbsd src

<damo22>im building libs now

<damo22>i have a patch for upstream netbsd that fixes the "rumptest" target, but im still working on the "rump" target

<damo22>i had to build all of netbsd libs to make the rump target work on netbsd because it needed crti.o

<damo22>im guessing that if we build rump on hurd, hurd will provide the crti.o?

<youpi>€ dpkg -S crti.o

<youpi>libc0.3-dev:hurd-i386: /usr/lib/i386-gnu/crti.o

<damo22> 6 root 18 -2 400268 268280 0 S 0.0 12.8 0:11.04 rumpdisk

<damo22>its not growing

<damo22>i have mlockall enabled and raw char dev patch

<damo22>as well as dealloc gnumach-dev

<damo22>but blkid /dev/wd0 is broken

<damo22>demo@zamhurd:~$ sudo blkid /dev/wd0

<damo22>malloc(): corrupted top size

<damo22>REBOOT

<damo22>also i think when i boot off rumpdisk, i cant mount another disk on the same controller, it tries to spawn another rumpdisk

<damo22>actually another partition on the same disk

***john__ is now known as gaqwas

IRC channel logs

2021-08-25.log