IRC channel logs

<youpi>damo22: ah, but I'm not using a pae kernel

<youpi>so physical addresses can't be > 4giB

<youpi>that being said, better pave the road for pae

<youpi>but then you wrote

<youpi>when rump_dma thing called it i got an address above 0x100000000

<youpi>how is that possible?

<youpi>the interface doesn't even make it possible

<damo22>i did mach_print of a string representation of the paddr just before vm_allocate_contiguous returns, and i got a huge address

<youpi>do you have pae enabled?

<damo22>i compiled gnumach with defaults i think

<damo22>configure --enable-kdb

<damo22>my biosmem shows a region above 4GB as available

<youpi>yes but that doesn't mean that gnumach uses it

<damo22>how do you enable pae?

<damo22>--enable-pae ?

<Pellescours>yep

<damo22>ive never used that flag

<youpi>so checking what we currently have, a pmax of 4GiB would be translted to the directmap limit anyway

<youpi>so it's not supposed to give you a >4GiB anyway

<damo22>yes that is what i think is wrong

<youpi>ah no sorry wrong reading

<youpi>since the pmax is above the direct map it goes larger than 1GiB

<youpi>but still without pae it shouldn't be providing a >4GiB

<youpi>that can't be mapped anyway

<damo22>so the dma buffer is at a physical address that the kernel cannot see?

<youpi>not "see"

<youpi>map

<damo22>i assume that is a bug in vm_allocate_contiguous?

<youpi>well, it just picks up what's there

<youpi>if there's a bug in non-PAE it'd be the bootup that looks at available memory

<youpi>biosmem_free_usable properly stops at 4GiB

<youpi>just to be sure: how did you actually print that physical address?

<damo22>in non-PAE, the kernel cannot address more than 4GB ram is that correct?

<youpi>there's no way to access beyond 4GiB

<youpi>except 64bit dmas

<damo22>i printed it with sprintf(wtf, "paddr=0x%llx\n", paddr); and then mach_print(wtf); iirc

<damo22>but it wasnt paddr it was the correct name of the variable that i cannot remember

<damo22> http://paste.debian.net/plain/1231177

<youpi>I can't seem to be able to build rumpkernel any more: BSDOBJDIR /usr/obj does not exist, bailing...

<damo22>huh

<youpi>mach_print ?

<youpi>well, that works

<youpi>but you can directly use printf, inside the kernel :)

<damo22>oh yeah

<damo22>woops

<damo22>build$ cat config.log |grep pae

<damo22>enable_pae_FALSE=''

<damo22>enable_pae_TRUE='#'

<damo22>what does that mean?

<youpi>that it's disabled

<youpi>the false case is uncommented

<damo22>ok

<damo22>how do i override dh_strip? i just want to make it a noop

<youpi>DEB_BUILD_OPTIONS=nostrip

<damo22>ty

<youpi>(that's described in man dh_strip)

<damo22>oh i didnt know it had a man page

<youpi>on debian, all commands are supposed to have manpages

<damo22>ok i will assume everything has a man page then

<damo22>hopefully we find some memory address is bogus in a backtrace with debug symbols, maybe the leading 0x1... is missing

<damo22>eg a truncated address?

<damo22>i think one of the important addresses is stored as a (void *) which would be only 32 bits wide?

<damo22> http://paste.debian.net/plain/1231183

<damo22>one of the transfers in the middle

<damo22>i need to find an offset that fails

<damo22>dd if=/dev/wd0 of=testblock.dd count=1 bs=8192 skip=$((4*1024*1024*1024/512)) shell segfaulted

<damo22>so skipping 8*1024*1024 blocks of 8192 bytes (64GB) and reading one block of 8192 bytes made bash crash

<damo22>r_lba = 576460752303423664

<damo22>0x8000000000000B0

<damo22>same block, pfinet crashed

<damo22>term crashed

<damo22>is that the correct lba offset? it seems large

<youpi>damo22: that lba offset is way too large indeed, but that shouldn't be a real problem, as in: it'd just read bogus data, but not copy it bogusly into memory

<youpi>concerning the backtrace, use thread apply all bt

<youpi>err

<youpi>I meant "bt full"

<youpi>to have local variables etc.

<damo22>do you want all threads?

<damo22>something is bogus. thats for sure

<youpi>no I really meant only the full part

<youpi>my figures just typed by themselves

<damo22>pl

<damo22>ok

<youpi>see how fingers type by themselves ;)

<damo22>yes

<damo22>i found a trick, tab complete on /dev/rumpdis<tab> forces the translator to work

<damo22>so i can get the open without a read

<damo22>then i can attach gdb to rumpdisk.static

<damo22>root@zamhurd:~# dd if=/dev/wd0 of=testblock.dd count=1 bs=8192 skip=$((8*1024*1024))

<damo22>malloc(): corrupted top size

<damo22>Aborted

<damo22>after the 4th ahci_bio_complete

<damo22>what can i break on after the ahci interrupt?

<damo22>ahci_bio_complete is inside the handler?

<damo22>i read the same block 6 times and the 6th time it crashed the shell... but it would have cached it right

<damo22>does gnumach cache reads?

<damo22>it did not read from the disk driver after i queried the same block a few times

<damo22>/hurd/crash: gdb --pid 734 /hurd/rumpdisk.static(794) crashed, signal {no:6, code:6, error:0}, exception {0, code:32609809, subcode:23}, PCs: {0x1ecbb0c, 0x1ecbb0c, 0x1ecbb0c}, writing core file.

<damo22>ugh

<damo22>it looks like gdb is executing in reverse

<damo22>ok not quite

<damo22>libmachdev implements device_S, is it ok for the kernel to deallocate the data buffer?

<damo22>936947864ee0

<damo22>gnumach sha1

<damo22>youpi: some kind of caching is happening when i read the same block repeatedly it does not always call ahci_cmd_start, but it does call ahci_bio_start every time

<Pellescours>when you compare the 2 stacktrace (1 with ahci_cmd_start and 1 without) where is the divergent call ?

<damo22>here is an example of rump assigning the dma for channel 5:

<damo22> XXX vm_allocate_contiguous size=0x4000 pmax=0x100000000 paddr=169318000

<damo22> [ 1.5200050] atabus11 at ahcisata1 channel 5

<damo22>paddr is hex too i forgot the 0x

<Pellescours>paddr is greater than pmax

<damo22>yes that is troubling

<damo22>dd segfaulted when i set a breakpoint on ata_bio_start and continued

<damo22>but its randomly occurring

<damo22>it seems like a race condition

<Pellescours>having tools like valgrind or sanitizers would help to debug i think

<Pellescours>is rump multithreaded ?

<damo22>yes

<damo22>how did paddr get set to >4GB on non-PAE kernel?

<damo22>maybe the address mapping is wrapping around into the middle of address space

<damo22>causing all kinds of corruption

<damo22>meaning, 0x169318000 gets interpreted in the kernel as 0x69318000 somewhere

<Pellescours>mig ?

<damo22>not at compile time

<damo22>its running

<damo22> biosmem: 000000000000000000:00000000000009fc00, available

<damo22> biosmem: 000000000000100000:00000000007ffdf000, available

<damo22> biosmem: 000000000100000000:000000000180000000, available

<damo22>sizeof(unsigned long) == 4

<damo22>that is the type of return value of rumpcomp_pci_virt_to_mach

<damo22>therefore the api does not allow physical addresses larger than 4GB to be used

<damo22>therefore we must restrict vm_allocate_contiguous to return a physical address lower than 4GB

<Pellescours>damo22: I did a git pull on gnumach and youpi pushed some fixes for memory, are you running on a kernel with them?

<damo22>ok let me see

<damo22>netdde now uses paddr under 4GB

<damo22>rumpdisk uses:

<damo22> XXX vm_allocate_contiguous size=0x4000 pmax=0x100000000 paddr=690d8000

<damo22> [ 1.5200050] atabus11 at ahcisata1 channel 5

<damo22>but fdisk still crashed

<Pellescours>so it’s better, right?

<damo22>yes

<damo22>how do i continue 112 times exactly in gdb

<damo22>i think thats how many it takes to fail fdisk

<Pellescours> https://dept-info.labri.fr/~thibault/gdb.html.en

<Pellescours>damo22: do you have an indice to your loop ?

<damo22>no

<Pellescours>put a break where you want and do a "c 112"

<damo22>WOOT

<Pellescours>continue has an optional parameter to ignore the next n times it stop

<damo22> http://paste.debian.net/plain/1231212

<damo22>heres the last time this function gets called before it crashes fdisk

<Pellescours>I have a database error with your link

<Pellescours>paste.debian.net has an error :/

<gnu_srs1>same here :(

<damo22>refesh

<Pellescours>it’s back

<gnu_srs1>:)

<damo22>with these memory fixes in gnumach, i havent been able to crash anything except fdisk

<damo22>that is a massive improvement

<Pellescours>yeah, and is fdisk crashing because of an assertion ?

<damo22>no, trace/breakpoint trap now

<damo22>maybe i broke it with gdb

<damo22>fdisk: libblkid/src/probe.c:687: blkid_probe_get_buffer: Assertion `bf->off <= real_off' failed.

<damo22>then next time: Illegal instruction

<damo22>Thread 4 received signal SIGILL, Illegal instruction.

<damo22>0x0142d470 in ?? () from /usr/lib/i386-gnu/libblkid.so.1

<damo22>fdisk -l /dev/wd0 works

<damo22>but not fdisk /dev/wd0

<Mete->join #freebsd

<Mete->sorry

<Pellescours>damo22: on my side fsck is failing at boot (with rump) error message is

<Pellescours>malloc.c:2537: sysmalloc: Assertion `(old_top ** initial_top (av) && old_size == 0 || ((unsigned long) (old_size) >= MIN_SIZE && prev_inuse (old_top) && ((unsigned long) old_end & (pagesize - 1)) == 0)` failed

<Pellescours>malloc.c:2537: sysmalloc: Assertion `(old_top == initial_top (av) && old_size == 0 || ((unsigned long) (old_size) >= MIN_SIZE && prev_inuse (old_top) && ((unsigned long) old_end & (pagesize - 1)) == 0)` failed

<Pellescours>ignore first line, it contains an error

<youpi>damo22:

<youpi>requested reading 65536 bytes at 0 into 1764000

<youpi>for virt 0x1764000 got region start 1764000 offset 0 and 16 pages

<youpi>page 0 has physical 6d252000 with offset 0, so 6d252000 (vaddr_obj 0)

<youpi>it seems that for my read of 65536 bytes, rumpcomp_pci_virt_to_mach was called only once

<youpi>i.e. it assumed that all the pages vm_allocate()d by rumpdisk_device_read were contiguous

<youpi>replacing with a contiguous allocation would probably fix things up but it's perhaps odd that rumpcomp_pci_virt_to_mach assumes that they're necessarily contiguous?

<Pellescours>youpi: rumpcomp_pci_virt_to_mach is only called in this method https://salsa.debian.org/hurd-team/rumpkernel/-/blob/master/buildrump.sh/src/sys/rump/dev/lib/libpci/rumpdev_bus_dma.c#L156

<youpi>grep told me so, yes :)

IRC channel logs

2022-02-17.log