<damo22>i tried to write 0 to the first byte of each page on the read and read first byte of each page on the buffer to write <damo22> ((uint8_t *)buf)[i * pagesize] = 0; <youpi>a page fault? from the kernel itself? <youpi>with which backtrace in the kernel? <damo22>it was last night and then i gave up <damo22>hmm i get different results when mounting <damo22>sometimes a crash of the process, sometimes a page fault sometimes this fault <youpi>oh you do have a reproducer for this? <damo22>without the revert, but with the raw patch <youpi>you wrote "but" above, so that was contradictory :) <damo22>i also added something, a fault-in thing <damo22>to read/write one byte of each page in rumpdisk_device_read/write <youpi>in principle that shouldn't be needed with the mlockall call <youpi>does adding it in or not changes the behavior? <youpi>that can be useful to check yes <damo22>ah wait, the crash i got was *without* mlockall but with my fault-in <damo22>+raw +mlockall, +raw -mlockall ? <youpi>I don't know off-hand, but I guess the it's the page fault or pmap_remove bug that is the most easy to debug <youpi>because they point at something specific that is wrong <youpi>I'm wondering about the backtrace of the page fault <damo22>i got that last night, i am trying to reproduce <youpi>the pmap_remove issue could look like a race between deallocations <youpi>normally that wouldn't happen thanks to reference counting <damo22>it hung and when i step into kernel debugger, there is a pending page fault <youpi>uh that one is really messed, the kernel keeps stepping on itself, until filling the stack <youpi>so I'd say concentrate on the pmap_remove <youpi>for a start, print the name of the task <youpi>possibly it's mount or ext2fs, but not really sure <youpi>at that point "show task 0xed919310" may work, but not sure <youpi>possibly better look it up by hand, by computing the address of the name field <youpi>I don't remember if ddb can print strings with e.g. p/s <damo22>+ dummy_read = ((volatile uint8_t *)data)[i * pagesize]; <youpi>Mmm, the va parameter of pmap_remove_range is 0, that's odd <youpi>nothing is supposed to be mapped at 0 in principe <youpi>but possibly it's just compiler optimizations <damo22>ok i will look more into it after work, gtg <youpi>the entry parameter of vm_map_entry_delete could help with its vme_start field, to check whether it's indeed address 0 <youpi>basically I'd say try to pinpoint what change does switch from triggering the issues and not triggering the issues <youpi>and possibly we can try to variate from that to have an idea of what kind of change does the trigger <damo22>the value of dummy should be 0xab right <youpi>damo22: always build with -Wall <youpi>yes but you also need -O2 for the compiler to avoid being dumb <youpi>and thus track the liveness of values etc. <damo22>im building netbsd trunk, at least the tools, and then try to build rump <throwaway>damo22: regarding memset (buf, 0, npages * pagesize) in your last patch. Souldn't mmap provide a flag to assure this happens? <youpi>it does, but it's not implemented yet <throwaway>well it seems easy to implement "the crappy way" <throwaway>what I mean is that this is an hack, but I would assume the best place for this hack is the libc <youpi>yet it is still waiting for somebody to implement it :) <youpi>but the thing is: populating is not enough here, we also need wiring <youpi>since we plan to do a dma, we have to make sure the page doesn't move, no swapping <youpi>there is kernel support for that <youpi>that's the mlockall I mentioned <youpi>we could also mlock, but we *also* need the rest of the process not to get swapped out <throwaway>what is the issue with mlockall? That the pages are locked also in the kernel virtual address space? <youpi>that rumpdisk currently allocates 600M <throwaway>oh that's a lot! Maybe mlockall should be used with MCL_CURRENT just at the beginning, and then one needs a smart use of mlock and munlock to allow swapping of the buffers after DMA <youpi>we cannot let rumpdisk swap at all <youpi>it wouldn't be able to swap itself in <youpi>I'm sorry I cannot take the time to re-explaine everything <damo22>it must be a missing deallocation somewhere <damo22>it does not use 600M all the time, it grows and then chews all my memory <youpi>oh? in my tests the rss without mlockall caps at 100M <youpi>did you install the latest gnumach-dev to have the dealloc fix? <damo22>i almost built rumpkernel but it destroyed my filesystem with 0 byte writes <damo22>no i dont think i have gnumach-dev <youpi>? how do you have device.defs then? <youpi>device_read() are not deallocating the buffer then <youpi>yes, it grows as much as the data you read from the disk <damo22>rump pretty much builds well on netbsd <damo22>but the pci-userspace libs are not in upstream source code <damo22>so there are missing librumpdev_pci* <damo22>im not sure how to do a full build of rump with upstream netbsd src <damo22>i have a patch for upstream netbsd that fixes the "rumptest" target, but im still working on the "rump" target <damo22>i had to build all of netbsd libs to make the rump target work on netbsd because it needed crti.o <damo22>im guessing that if we build rump on hurd, hurd will provide the crti.o? <youpi>libc0.3-dev:hurd-i386: /usr/lib/i386-gnu/crti.o <damo22> 6 root 18 -2 400268 268280 0 S 0.0 12.8 0:11.04 rumpdisk <damo22>i have mlockall enabled and raw char dev patch <damo22>demo@zamhurd:~$ sudo blkid /dev/wd0 <damo22>also i think when i boot off rumpdisk, i cant mount another disk on the same controller, it tries to spawn another rumpdisk <damo22>actually another partition on the same disk ***john__ is now known as gaqwas