IRC channel logs
2022-02-17.log
back to list of logs
<youpi>damo22: ah, but I'm not using a pae kernel <youpi>so physical addresses can't be > 4giB <youpi>that being said, better pave the road for pae <youpi>when rump_dma thing called it i got an address above 0x100000000 <youpi>the interface doesn't even make it possible <damo22>i did mach_print of a string representation of the paddr just before vm_allocate_contiguous returns, and i got a huge address <damo22>i compiled gnumach with defaults i think <damo22>my biosmem shows a region above 4GB as available <youpi>yes but that doesn't mean that gnumach uses it <youpi>so checking what we currently have, a pmax of 4GiB would be translted to the directmap limit anyway <youpi>so it's not supposed to give you a >4GiB anyway <damo22>yes that is what i think is wrong <youpi>since the pmax is above the direct map it goes larger than 1GiB <youpi>but still without pae it shouldn't be providing a >4GiB <damo22>so the dma buffer is at a physical address that the kernel cannot see? <damo22>i assume that is a bug in vm_allocate_contiguous? <youpi>well, it just picks up what's there <youpi>if there's a bug in non-PAE it'd be the bootup that looks at available memory <youpi>biosmem_free_usable properly stops at 4GiB <youpi>just to be sure: how did you actually print that physical address? <damo22>in non-PAE, the kernel cannot address more than 4GB ram is that correct? <youpi>there's no way to access beyond 4GiB <damo22>i printed it with sprintf(wtf, "paddr=0x%llx\n", paddr); and then mach_print(wtf); iirc <damo22>but it wasnt paddr it was the correct name of the variable that i cannot remember <youpi>I can't seem to be able to build rumpkernel any more: BSDOBJDIR /usr/obj does not exist, bailing... <youpi>but you can directly use printf, inside the kernel :) <youpi>the false case is uncommented <damo22>how do i override dh_strip? i just want to make it a noop <youpi>(that's described in man dh_strip) <damo22>oh i didnt know it had a man page <youpi>on debian, all commands are supposed to have manpages <damo22>ok i will assume everything has a man page then <damo22>hopefully we find some memory address is bogus in a backtrace with debug symbols, maybe the leading 0x1... is missing <damo22>i think one of the important addresses is stored as a (void *) which would be only 32 bits wide? <damo22>one of the transfers in the middle <damo22>i need to find an offset that fails <damo22>dd if=/dev/wd0 of=testblock.dd count=1 bs=8192 skip=$((4*1024*1024*1024/512)) shell segfaulted <damo22>so skipping 8*1024*1024 blocks of 8192 bytes (64GB) and reading one block of 8192 bytes made bash crash <damo22>is that the correct lba offset? it seems large <youpi>damo22: that lba offset is way too large indeed, but that shouldn't be a real problem, as in: it'd just read bogus data, but not copy it bogusly into memory <youpi>concerning the backtrace, use thread apply all bt <youpi>to have local variables etc. <damo22>something is bogus. thats for sure <youpi>no I really meant only the full part <youpi>my figures just typed by themselves <youpi>see how fingers type by themselves ;) <damo22>i found a trick, tab complete on /dev/rumpdis<tab> forces the translator to work <damo22>so i can get the open without a read <damo22>then i can attach gdb to rumpdisk.static <damo22>root@zamhurd:~# dd if=/dev/wd0 of=testblock.dd count=1 bs=8192 skip=$((8*1024*1024)) <damo22>what can i break on after the ahci interrupt? <damo22>ahci_bio_complete is inside the handler? <damo22>i read the same block 6 times and the 6th time it crashed the shell... but it would have cached it right <damo22>it did not read from the disk driver after i queried the same block a few times <damo22>/hurd/crash: gdb --pid 734 /hurd/rumpdisk.static(794) crashed, signal {no:6, code:6, error:0}, exception {0, code:32609809, subcode:23}, PCs: {0x1ecbb0c, 0x1ecbb0c, 0x1ecbb0c}, writing core file. <damo22>it looks like gdb is executing in reverse <damo22>libmachdev implements device_S, is it ok for the kernel to deallocate the data buffer? <damo22>youpi: some kind of caching is happening when i read the same block repeatedly it does not always call ahci_cmd_start, but it does call ahci_bio_start every time <Pellescours>when you compare the 2 stacktrace (1 with ahci_cmd_start and 1 without) where is the divergent call ? <damo22>here is an example of rump assigning the dma for channel 5: <damo22> XXX vm_allocate_contiguous size=0x4000 pmax=0x100000000 paddr=169318000 <damo22> [ 1.5200050] atabus11 at ahcisata1 channel 5 <damo22>paddr is hex too i forgot the 0x <damo22>dd segfaulted when i set a breakpoint on ata_bio_start and continued <Pellescours>having tools like valgrind or sanitizers would help to debug i think <damo22>how did paddr get set to >4GB on non-PAE kernel? <damo22>maybe the address mapping is wrapping around into the middle of address space <damo22>meaning, 0x169318000 gets interpreted in the kernel as 0x69318000 somewhere <damo22> biosmem: 000000000000000000:00000000000009fc00, available <damo22> biosmem: 000000000000100000:00000000007ffdf000, available <damo22> biosmem: 000000000100000000:000000000180000000, available <damo22>that is the type of return value of rumpcomp_pci_virt_to_mach <damo22>therefore the api does not allow physical addresses larger than 4GB to be used <damo22>therefore we must restrict vm_allocate_contiguous to return a physical address lower than 4GB <Pellescours>damo22: I did a git pull on gnumach and youpi pushed some fixes for memory, are you running on a kernel with them? <damo22> XXX vm_allocate_contiguous size=0x4000 pmax=0x100000000 paddr=690d8000 <damo22> [ 1.5200050] atabus11 at ahcisata1 channel 5 <damo22>how do i continue 112 times exactly in gdb <damo22>i think thats how many it takes to fail fdisk <Pellescours>continue has an optional parameter to ignore the next n times it stop <damo22>heres the last time this function gets called before it crashes fdisk <damo22>with these memory fixes in gnumach, i havent been able to crash anything except fdisk <Pellescours>yeah, and is fdisk crashing because of an assertion ? <damo22>fdisk: libblkid/src/probe.c:687: blkid_probe_get_buffer: Assertion `bf->off <= real_off' failed. <damo22>then next time: Illegal instruction <damo22>Thread 4 received signal SIGILL, Illegal instruction. <damo22>0x0142d470 in ?? () from /usr/lib/i386-gnu/libblkid.so.1 <Pellescours>damo22: on my side fsck is failing at boot (with rump) error message is <Pellescours>malloc.c:2537: sysmalloc: Assertion `(old_top ** initial_top (av) && old_size == 0 || ((unsigned long) (old_size) >= MIN_SIZE && prev_inuse (old_top) && ((unsigned long) old_end & (pagesize - 1)) == 0)` failed <Pellescours>malloc.c:2537: sysmalloc: Assertion `(old_top == initial_top (av) && old_size == 0 || ((unsigned long) (old_size) >= MIN_SIZE && prev_inuse (old_top) && ((unsigned long) old_end & (pagesize - 1)) == 0)` failed <youpi>requested reading 65536 bytes at 0 into 1764000 <youpi>for virt 0x1764000 got region start 1764000 offset 0 and 16 pages <youpi>page 0 has physical 6d252000 with offset 0, so 6d252000 (vaddr_obj 0) <youpi>it seems that for my read of 65536 bytes, rumpcomp_pci_virt_to_mach was called only once <youpi>i.e. it assumed that all the pages vm_allocate()d by rumpdisk_device_read were contiguous <youpi>replacing with a contiguous allocation would probably fix things up but it's perhaps odd that rumpcomp_pci_virt_to_mach assumes that they're necessarily contiguous?