IRC channel logs

2022-02-17.log

back to list of logs

<youpi>damo22: ah, but I'm not using a pae kernel
<youpi>so physical addresses can't be > 4giB
<youpi>that being said, better pave the road for pae
<youpi>but then you wrote
<youpi>when rump_dma thing called it i got an address above 0x100000000
<youpi>how is that possible?
<youpi>the interface doesn't even make it possible
<damo22>i did mach_print of a string representation of the paddr just before vm_allocate_contiguous returns, and i got a huge address
<youpi>do you have pae enabled?
<damo22>i compiled gnumach with defaults i think
<damo22>configure --enable-kdb
<damo22>my biosmem shows a region above 4GB as available
<youpi>yes but that doesn't mean that gnumach uses it
<damo22>how do you enable pae?
<damo22>--enable-pae ?
<Pellescours>yep
<damo22>ive never used that flag
<youpi>so checking what we currently have, a pmax of 4GiB would be translted to the directmap limit anyway
<youpi>so it's not supposed to give you a >4GiB anyway
<damo22>yes that is what i think is wrong
<youpi>ah no sorry wrong reading
<youpi>since the pmax is above the direct map it goes larger than 1GiB
<youpi>but still without pae it shouldn't be providing a >4GiB
<youpi>that can't be mapped anyway
<damo22>so the dma buffer is at a physical address that the kernel cannot see?
<youpi>not "see"
<youpi>map
<damo22>i assume that is a bug in vm_allocate_contiguous?
<youpi>well, it just picks up what's there
<youpi>if there's a bug in non-PAE it'd be the bootup that looks at available memory
<youpi>biosmem_free_usable properly stops at 4GiB
<youpi>just to be sure: how did you actually print that physical address?
<damo22>in non-PAE, the kernel cannot address more than 4GB ram is that correct?
<youpi>there's no way to access beyond 4GiB
<youpi>except 64bit dmas
<damo22>i printed it with sprintf(wtf, "paddr=0x%llx\n", paddr); and then mach_print(wtf); iirc
<damo22>but it wasnt paddr it was the correct name of the variable that i cannot remember
<damo22> http://paste.debian.net/plain/1231177
<youpi>I can't seem to be able to build rumpkernel any more: BSDOBJDIR /usr/obj does not exist, bailing...
<damo22>huh
<youpi>mach_print ?
<youpi>well, that works
<youpi>but you can directly use printf, inside the kernel :)
<damo22>oh yeah
<damo22>woops
<damo22>build$ cat config.log |grep pae
<damo22>enable_pae_FALSE=''
<damo22>enable_pae_TRUE='#'
<damo22>what does that mean?
<youpi>that it's disabled
<youpi>the false case is uncommented
<damo22>ok
<damo22>how do i override dh_strip? i just want to make it a noop
<youpi>DEB_BUILD_OPTIONS=nostrip
<damo22>ty
<youpi>(that's described in man dh_strip)
<damo22>oh i didnt know it had a man page
<youpi>on debian, all commands are supposed to have manpages
<damo22>ok i will assume everything has a man page then
<damo22>hopefully we find some memory address is bogus in a backtrace with debug symbols, maybe the leading 0x1... is missing
<damo22>eg a truncated address?
<damo22>i think one of the important addresses is stored as a (void *) which would be only 32 bits wide?
<damo22> http://paste.debian.net/plain/1231183
<damo22>one of the transfers in the middle
<damo22>i need to find an offset that fails
<damo22>dd if=/dev/wd0 of=testblock.dd count=1 bs=8192 skip=$((4*1024*1024*1024/512)) shell segfaulted
<damo22>so skipping 8*1024*1024 blocks of 8192 bytes (64GB) and reading one block of 8192 bytes made bash crash
<damo22>r_lba = 576460752303423664
<damo22>0x8000000000000B0
<damo22>same block, pfinet crashed
<damo22>term crashed
<damo22>is that the correct lba offset? it seems large
<youpi>damo22: that lba offset is way too large indeed, but that shouldn't be a real problem, as in: it'd just read bogus data, but not copy it bogusly into memory
<youpi>concerning the backtrace, use thread apply all bt
<youpi>err
<youpi>I meant "bt full"
<youpi>to have local variables etc.
<damo22>do you want all threads?
<damo22>something is bogus. thats for sure
<youpi>no I really meant only the full part
<youpi>my figures just typed by themselves
<damo22>pl
<damo22>ok
<youpi>see how fingers type by themselves ;)
<damo22>yes
<damo22>i found a trick, tab complete on /dev/rumpdis<tab> forces the translator to work
<damo22>so i can get the open without a read
<damo22>then i can attach gdb to rumpdisk.static
<damo22>root@zamhurd:~# dd if=/dev/wd0 of=testblock.dd count=1 bs=8192 skip=$((8*1024*1024))
<damo22>malloc(): corrupted top size
<damo22>Aborted
<damo22>after the 4th ahci_bio_complete
<damo22>what can i break on after the ahci interrupt?
<damo22>ahci_bio_complete is inside the handler?
<damo22>i read the same block 6 times and the 6th time it crashed the shell... but it would have cached it right
<damo22>does gnumach cache reads?
<damo22>it did not read from the disk driver after i queried the same block a few times
<damo22>/hurd/crash: gdb --pid 734 /hurd/rumpdisk.static(794) crashed, signal {no:6, code:6, error:0}, exception {0, code:32609809, subcode:23}, PCs: {0x1ecbb0c, 0x1ecbb0c, 0x1ecbb0c}, writing core file.
<damo22>ugh
<damo22>it looks like gdb is executing in reverse
<damo22>ok not quite
<damo22>libmachdev implements device_S, is it ok for the kernel to deallocate the data buffer?
<damo22>936947864ee0
<damo22>gnumach sha1
<damo22>youpi: some kind of caching is happening when i read the same block repeatedly it does not always call ahci_cmd_start, but it does call ahci_bio_start every time
<Pellescours>when you compare the 2 stacktrace (1 with ahci_cmd_start and 1 without) where is the divergent call ?
<damo22>here is an example of rump assigning the dma for channel 5:
<damo22> XXX vm_allocate_contiguous size=0x4000 pmax=0x100000000 paddr=169318000
<damo22> [ 1.5200050] atabus11 at ahcisata1 channel 5
<damo22>paddr is hex too i forgot the 0x
<Pellescours>paddr is greater than pmax
<damo22>yes that is troubling
<damo22>dd segfaulted when i set a breakpoint on ata_bio_start and continued
<damo22>but its randomly occurring
<damo22>it seems like a race condition
<Pellescours>having tools like valgrind or sanitizers would help to debug i think
<Pellescours>is rump multithreaded ?
<damo22>yes
<damo22>how did paddr get set to >4GB on non-PAE kernel?
<damo22>maybe the address mapping is wrapping around into the middle of address space
<damo22>causing all kinds of corruption
<damo22>meaning, 0x169318000 gets interpreted in the kernel as 0x69318000 somewhere
<Pellescours>mig ?
<damo22>not at compile time
<damo22>its running
<damo22> biosmem: 000000000000000000:00000000000009fc00, available
<damo22> biosmem: 000000000000100000:00000000007ffdf000, available
<damo22> biosmem: 000000000100000000:000000000180000000, available
<damo22>sizeof(unsigned long) == 4
<damo22>that is the type of return value of rumpcomp_pci_virt_to_mach
<damo22>therefore the api does not allow physical addresses larger than 4GB to be used
<damo22>therefore we must restrict vm_allocate_contiguous to return a physical address lower than 4GB
<Pellescours>damo22: I did a git pull on gnumach and youpi pushed some fixes for memory, are you running on a kernel with them?
<damo22>ok let me see
<damo22>netdde now uses paddr under 4GB
<damo22>rumpdisk uses:
<damo22> XXX vm_allocate_contiguous size=0x4000 pmax=0x100000000 paddr=690d8000
<damo22> [ 1.5200050] atabus11 at ahcisata1 channel 5
<damo22>but fdisk still crashed
<Pellescours>so it’s better, right?
<damo22>yes
<damo22>how do i continue 112 times exactly in gdb
<damo22>i think thats how many it takes to fail fdisk
<Pellescours> https://dept-info.labri.fr/~thibault/gdb.html.en
<Pellescours>damo22: do you have an indice to your loop ?
<damo22>no
<Pellescours>put a break where you want and do a "c 112"
<damo22>WOOT
<Pellescours>continue has an optional parameter to ignore the next n times it stop
<damo22> http://paste.debian.net/plain/1231212
<damo22>heres the last time this function gets called before it crashes fdisk
<Pellescours>I have a database error with your link
<Pellescours>paste.debian.net has an error :/
<gnu_srs1>same here :(
<damo22>refesh
<Pellescours>it’s back
<gnu_srs1>:)
<damo22>with these memory fixes in gnumach, i havent been able to crash anything except fdisk
<damo22>that is a massive improvement
<Pellescours>yeah, and is fdisk crashing because of an assertion ?
<damo22>no, trace/breakpoint trap now
<damo22>maybe i broke it with gdb
<damo22>fdisk: libblkid/src/probe.c:687: blkid_probe_get_buffer: Assertion `bf->off <= real_off' failed.
<damo22>then next time: Illegal instruction
<damo22>Thread 4 received signal SIGILL, Illegal instruction.
<damo22>0x0142d470 in ?? () from /usr/lib/i386-gnu/libblkid.so.1
<damo22>fdisk -l /dev/wd0 works
<damo22>but not fdisk /dev/wd0
<Mete->join #freebsd
<Mete->sorry
<Pellescours>damo22: on my side fsck is failing at boot (with rump) error message is
<Pellescours>malloc.c:2537: sysmalloc: Assertion `(old_top ** initial_top (av) && old_size == 0 || ((unsigned long) (old_size) >= MIN_SIZE && prev_inuse (old_top) && ((unsigned long) old_end & (pagesize - 1)) == 0)` failed
<Pellescours>malloc.c:2537: sysmalloc: Assertion `(old_top == initial_top (av) && old_size == 0 || ((unsigned long) (old_size) >= MIN_SIZE && prev_inuse (old_top) && ((unsigned long) old_end & (pagesize - 1)) == 0)` failed
<Pellescours>ignore first line, it contains an error
<youpi>damo22:
<youpi>requested reading 65536 bytes at 0 into 1764000
<youpi>for virt 0x1764000 got region start 1764000 offset 0 and 16 pages
<youpi>page 0 has physical 6d252000 with offset 0, so 6d252000 (vaddr_obj 0)
<youpi>it seems that for my read of 65536 bytes, rumpcomp_pci_virt_to_mach was called only once
<youpi>i.e. it assumed that all the pages vm_allocate()d by rumpdisk_device_read were contiguous
<youpi>replacing with a contiguous allocation would probably fix things up but it's perhaps odd that rumpcomp_pci_virt_to_mach assumes that they're necessarily contiguous?
<Pellescours>youpi: rumpcomp_pci_virt_to_mach is only called in this method https://salsa.debian.org/hurd-team/rumpkernel/-/blob/master/buildrump.sh/src/sys/rump/dev/lib/libpci/rumpdev_bus_dma.c#L156
<youpi>grep told me so, yes :)