IRC channel logs

2021-08-25.log

back to list of logs

<damo22>i got a page fault
<damo22>i tried to write 0 to the first byte of each page on the read and read first byte of each page on the buffer to write
<damo22>eg:
<damo22> for (i = 0; i < npages; i++)
<damo22> ((uint8_t *)buf)[i * pagesize] = 0;
<youpi>a page fault? from the kernel itself?
<damo22>yes
<youpi>with which backtrace in the kernel?
<damo22>let me reproduce
<damo22>it was last night and then i gave up
<damo22>hmm i get different results when mounting
<damo22> http://paste.debian.net/plain/1208925
<damo22>sometimes a crash of the process, sometimes a page fault sometimes this fault
<youpi>oh you do have a reproducer for this?
<youpi>I've seen it sometimes
<damo22>mount /dev/wd0s3 /part3
<damo22>without the revert, but with the raw patch
<damo22>and the mlockall
<youpi>what revert?
<damo22>you reverted my patch
<youpi>ah, so with the raw patch
<damo22>yes
<youpi>you wrote "but" above, so that was contradictory :)
<damo22>i also added something, a fault-in thing
<damo22>to read/write one byte of each page in rumpdisk_device_read/write
<youpi>in principle that shouldn't be needed with the mlockall call
<damo22>ok
<damo22>i can remove and retest
<youpi>does adding it in or not changes the behavior?
<youpi>that can be useful to check yes
<damo22>ah wait, the crash i got was *without* mlockall but with my fault-in
<damo22>and with the raw patch
<damo22>what would you like me to test?
<damo22>+raw +mlockall, +raw -mlockall ?
<youpi>I don't know off-hand, but I guess the it's the page fault or pmap_remove bug that is the most easy to debug
<youpi>because they point at something specific that is wrong
<youpi>I'm wondering about the backtrace of the page fault
<damo22>i got that last night, i am trying to reproduce
<youpi>the pmap_remove issue could look like a race between deallocations
<youpi>normally that wouldn't happen thanks to reference counting
<damo22>it hung and when i step into kernel debugger, there is a pending page fault
<damo22> http://paste.debian.net/plain/1208927
<youpi>uh that one is really messed, the kernel keeps stepping on itself, until filling the stack
<damo22>ah ok
<youpi>so I'd say concentrate on the pmap_remove
<youpi>for a start, print the name of the task
<youpi>possibly it's mount or ext2fs, but not really sure
<youpi>at that point "show task 0xed919310" may work, but not sure
<youpi>possibly better look it up by hand, by computing the address of the name field
<youpi>I don't remember if ddb can print strings with e.g. p/s
<damo22>+ for (i = 0; i < npages; i++)
<damo22>+ dummy_read = ((volatile uint8_t *)data)[i * pagesize];
<youpi>Mmm, the va parameter of pmap_remove_range is 0, that's odd
<youpi>nothing is supposed to be mapped at 0 in principe
<youpi>but possibly it's just compiler optimizations
<damo22>ok i will look more into it after work, gtg
<youpi>the entry parameter of vm_map_entry_delete could help with its vme_start field, to check whether it's indeed address 0
<damo22>ok
<youpi>basically I'd say try to pinpoint what change does switch from triggering the issues and not triggering the issues
<youpi>s/from/between
<youpi>and possibly we can try to variate from that to have an idea of what kind of change does the trigger
<damo22>hey youpi i have a small test case program that seems to misbehave: http://paste.debian.net/plain/1208936
<damo22>the value of dummy should be 0xab right
<damo22>also what happened to gdb?
<damo22>lol uninitialised value
<youpi>damo22: always build with -Wall
<youpi>and -O2
<damo22>i did build with -Wall
<youpi>yes but you also need -O2 for the compiler to avoid being dumb
<youpi>and thus track the liveness of values etc.
<damo22>ok
<damo22>im building netbsd trunk, at least the tools, and then try to build rump
<throwaway>damo22: regarding memset (buf, 0, npages * pagesize) in your last patch. Souldn't mmap provide a flag to assure this happens?
<youpi>it does, but it's not implemented yet
<throwaway>well it seems easy to implement "the crappy way"
<throwaway>what I mean is that this is an hack, but I would assume the best place for this hack is the libc
<youpi>yet it is still waiting for somebody to implement it :)
<youpi>but the thing is: populating is not enough here, we also need wiring
<youpi>since we plan to do a dma, we have to make sure the page doesn't move, no swapping
<throwaway>in that case it needs kernel support
<youpi>there is kernel support for that
<youpi>that's the mlockall I mentioned
<youpi>we could also mlock, but we *also* need the rest of the process not to get swapped out
<throwaway>what is the issue with mlockall? That the pages are locked also in the kernel virtual address space?
<youpi>that rumpdisk currently allocates 600M
<throwaway>oh that's a lot! Maybe mlockall should be used with MCL_CURRENT just at the beginning, and then one needs a smart use of mlock and munlock to allow swapping of the buffers after DMA
<youpi>we cannot let rumpdisk swap at all
<youpi>it wouldn't be able to swap itself in
<throwaway>not swap itself, just the buffers
<youpi>the 600M are not buffers
<throwaway>what can be so big?
<youpi>please read the backlog
<youpi>I'm sorry I cannot take the time to re-explaine everything
<youpi>basically, we don't know
<youpi>(yet)
<throwaway>ok sorry thanks for the explanations
<damo22>it must be a missing deallocation somewhere
<youpi>no, it's not leaking
<damo22>it does not use 600M all the time, it grows and then chews all my memory
<youpi>oh? in my tests the rss without mlockall caps at 100M
<youpi>did you install the latest gnumach-dev to have the dealloc fix?
<damo22>i almost built rumpkernel but it destroyed my filesystem with 0 byte writes
<damo22>and then chewed all memory
<damo22>no i dont think i have gnumach-dev
<youpi>? how do you have device.defs then?
<damo22>i mean not latest
<youpi>that's why
<youpi>device_read() are not deallocating the buffer then
<damo22>aha
<damo22>it started with about 25M
<damo22>and grew
<youpi>yes, it grows as much as the data you read from the disk
<youpi>so that's quick :)
<damo22>yes
<damo22>rump pretty much builds well on netbsd
<youpi>good :)
<damo22>but the pci-userspace libs are not in upstream source code
<damo22>lib*
<damo22>so there are missing librumpdev_pci*
<damo22>im not sure how to do a full build of rump with upstream netbsd src
<damo22>im building libs now
<damo22>i have a patch for upstream netbsd that fixes the "rumptest" target, but im still working on the "rump" target
<damo22>i had to build all of netbsd libs to make the rump target work on netbsd because it needed crti.o
<damo22>im guessing that if we build rump on hurd, hurd will provide the crti.o?
<youpi>€ dpkg -S crti.o
<youpi>libc0.3-dev:hurd-i386: /usr/lib/i386-gnu/crti.o
<damo22> 6 root 18 -2 400268 268280 0 S 0.0 12.8 0:11.04 rumpdisk
<damo22>its not growing
<damo22>i have mlockall enabled and raw char dev patch
<damo22>as well as dealloc gnumach-dev
<damo22>but blkid /dev/wd0 is broken
<damo22>demo@zamhurd:~$ sudo blkid /dev/wd0
<damo22>malloc(): corrupted top size
<damo22>REBOOT
<damo22>also i think when i boot off rumpdisk, i cant mount another disk on the same controller, it tries to spawn another rumpdisk
<damo22>actually another partition on the same disk
***john__ is now known as gaqwas