IRC channel logs

<Pellescours>i keep my debugger open, of someone can tell me what to do to inspect, i will be the hands

<Pellescours>and if you want to reproduce, compile python3.12 with (--enable-optimizations --with-lto --enabled-shared). it does not hang every times but it's often enough. it's hanging at an lto step

<white-wolf>hello

<white-wolf>how many line code writing mach kernel ?

<white-wolf>hurd have a fondation for donations ?

<solid_black>hi!

<solid_black>Pellescours: rumpdisk in exception_raise_continue is the culprit

<solid_black>it must be crashing

<solid_black>set a breakpoint from GDB on i386_exception, then reproduce

<solid_black>white-wolf: GNU Mach master is 84819 lines of code (not counting comments/blanks/etc), according to tokei

<Pellescours>solid_black: ok

<Pellescours>exception is something a program should not generate while a trap is normal, right?

<solid_black>depending on the exact meaning of those terms... :)

<solid_black>Mach calls "traps" what AArch64 calls "exceptions"

<solid_black>i386 also calls it "traps" I *think*, but I'm not sure

<solid_black>and Mach exceptions (what *Mach* calls "exceptions") are another thing entirely, and they're the bad kind, yes

<solid_black>and then there are also traps as in SIGTRAP / int3, and that is fine if you're doing some debugging

<gnu_srs1>Hello, when I boot hurd in 'recovery mode' I was earlier getting the prompt: enter root password to continue.

<gnu_srs1>That is not happening recently, boot continues. Which program/script is responsible here?

<solid_black>gnu_srs1: that's something in the sysv init boot flow, I believe

<solid_black>that happens when the rootfs fsck fails

<gnu_srs1>tks, will search the initscripts for sysvinit.

<solid_black>Pellescours: if you mean traps as in user_trap()/kernel_trap() in i386/i386/trap.c, that's normal, right

<solid_black>and i386_exception()/exception() is bad

<solid_black>but a trap can directly lead to an exception, if it turns out to be a missing page or a bad instruction or something

<solid_black>so are you able to reproduce the exception?

<solid_black>also I hope you have rumpdisk symbols

<Pellescours>ok my debugger stopped at i386_exception

<solid_black>great

<solid_black>what's percpu_array[0]->active_thread->task->name?

<solid_black>is it rumpisk?

<Pellescours>yes

<solid_black>you should have a structure of type i386_saved_state passed as an argument

<Pellescours>it’s rumpdisk

<solid_black>oh also what's the backtrace? is it from user_trap(), or is it from user_page_fault_continue()?

<solid_black>also which trap is it? / what's the exception/code/subcode values?

<Pellescours>vm_fault_continue

<Pellescours>user_page_fault_continue

<solid_black>ok, so it's accessing a page that it's not supposed to

<solid_black>is vm_fault_continue the topmost call frame? not trap_from_user?

<solid_black>or what's it called, user_trap

<Pellescours>yes vm_fault_continue is the topmost

<solid_black>that means it waited for a pager

<solid_black>I wonder, is this the device pager?

<solid_black>ACTION looks at vm_fault.c

<Pellescours>vm_fault_continue calls vm_fault that calls user_page_fault_continue that calls user_page_fault_continue then i386_exception

<solid_black>in vm_fault() stack frame, do you still have locals' values?

<solid_black>can you see what 'object' is?

<Pellescours>args are optimized

<solid_black>ok, let's at least find the userland stack trace

<solid_black>oh, also what's kr / exception code & subcode?

<Pellescours>kr=10

<solid_black>that's KERN_MEMORY_ERROR

<solid_black>either the pager died, or signalled an error

<solid_black>could it be that your system ran out of memory so it was paging out?

<Pellescours>I don’t know, I have 2G of memory and very few stuff is running on it

<solid_black>but weren't you running a highly concurrent build?

<Pellescours>yes

<solid_black>I'm just trying to imagine what kind of pager this could be

<solid_black>rumpdisk's own image is wired, so it can't be that

<Pellescours>I’m in non smp, running make on python. And that’s the lto1 running programm

<solid_black>for the device pager, I think it only accesses /dev/mem?

<solid_black>which cant really die on you

<solid_black>so it could be the default pager, if things (internal objects) started to get paged out

<Pellescours>I think I have an idea

<solid_black>lto is indeed very memory-hungry, fwiw

<Pellescours>If my disk becomes full during the compilation

<solid_black>things should just get ENOSPC then?

<Pellescours>I though I was having 10G image but it’s a 3G image size

<Pellescours>normally yes, but maybe not in this case

<solid_black>I don't think rumpdisk should be mapping stuff from ext2fs anyway

<solid_black>let's see, there have to be some VM stats in the kernel

<Pellescours>ah you’re right

<solid_black>p vm_stat

<solid_black>wdym I'm right?

<Pellescours>if it’s a disk size issue, it should not be rumpdisk that hit the limit but ext2fs

<Pellescours>p vm_stat

<Pellescours>$2 = {pagesize = 0, free_count = 0, active_count = 0, inactive_count = 0, wire_count = 0, zero_fill_count = 14213373, reactivations = 398, pageins = 458741, pageouts = 427018, faults = 42581367, cow_faults = 2263564, lookups = 4497257, hits = 2389822}

<solid_black>quite a number of pageouts

<solid_black>I have very little idea about the inner workings of the default pager

<solid_black>but it's certainly going to be problematic if rumpdisk blocks waiting for defpager, and defpager tries to access the disk

<solid_black>it does device_read() / device_write(), yes

<solid_black>and will indeed return an error (via memory_object_data_error) if it fails to read from backing storage

<solid_black>basically yeah, see if you can still repro this if you give the VM tons more RAM

<Pellescours>I already gives it all I can (2G) by providing 4096MiB to the VM and the PAE is not enabled

<solid_black>2G and 4096MiB is... not the same?

<Pellescours>Yes, I could bump to 3GiB if I enable the PAE

<Pellescours>but that’s all, it’s a 32bit VM

<solid_black>not that I know much about this, but one would think that even without PAE, 32-bit physical addresses are enough to access up to 4GB

<solid_black>alternatively, you could try to see if you can reproduce this by just allocating a lot of memory

<solid_black>there might be even allocate(1) that will do that, though I might be confusing this with Serenity

<solid_black>but it should be very easy to write your own one

<youpi>by switching between kernel- and user-land page table, yes

<youpi>but that's expensive

<youpi>but that's only for the virtual addressing

<solid_black>why would it need to switch?

<youpi>the physical adressing is another question

<solid_black>physical memory is what we're talking about

<youpi>most often various hardware stuff is placed below the 4G limit

<youpi>which thus prevents from putting ram there

<youpi>with the q35 model for instance, qemu only puts 2G below 4G

<youpi>without q35, one can use as much as -m 3550M in my tests

<youpi>the missing piece up to 4G is used by pci/acpi/whatnot

<solid_black>do we ever return ENOMEM/KERN_RESOURCE_SHORTAGE for vm_allocate?

<solid_black>can the defpager even refuse accepting more pages?

<solid_black>memory_object_create() is surely asynchronous

<Pellescours>I was able to reproduce it by running stress programm

<Pellescours>really easy to make it hang with stress. either stressing the memory (-m) or the disk (-d) works

<Pellescours>I tested with linux driver (the one in gnumach) and stressing don’t make the system hang. It’s really stressing the system while being on rumpdisk that cause the hang

<Pellescours>solid_black: vm_allocate return KERN_NO_SPACE when it fails allocation

<solid_black>does it? when there are no physical pages, and the pager doesn't have any space either?

<solid_black>I don't think there's even an API for the pager to say, hey, I don't have any more space, stop sending me pages and start returning ENOMEM to people who want space

<Pellescours>For this case I don’t know, I just looked quickly at the implementation of vm_allocate

<solid_black>it returns KERN_NO_SPACE when there's no more address space in your vm_map, yes

<solid_black>but that's a different thing

<Pellescours>ah possibly, when you say the pager, you talk in the kernel or in hurd (libdefpager?)?

<Pellescours>mach-defpager*

<solid_black>I meant pager in the kernel sense of the word, as in whatever implements a memory object

<solid_black>but it is (mach-?)defpager in this case, yes

<Pellescours>there is a mach-defpager in hurd, that’s why I was asking

<solid_black>yes, and also just defpager

<solid_black>I don't know what the difference is, and how they are related

<Pellescours>I think that mach-defpager is about swapping

<solid_black>the default pager is always about swapping, yes

<solid_black>(called paging out in Mach)

<solid_black>when there are not enough usable physical pages, Mach forcibly creates memory objects in the default pager

<solid_black>otherwise, they're much like any other memory object

<solid_black>and the default pager writes them about to... wherever it wants to, really, it could send them over network to some cloud storage, or ask the user to memorize them

<solid_black>but in practice it's just going to write them back onto a disk partition -> rumpdisk

<solid_black>linux does zram these days btw, that's also a cool option that's allegedly fast and doesn't involve writing things to disk

<Pellescours>defpager seems to be like a lib while mach-defpager a whole program

<Pellescours>zram would be cool yeah

<solid_black>of course, this is something that you can implement entirely in userland :)

<Pellescours>yes, the power of hurd

<Pellescours>I see panic calls in pager_alloc_page

<Pellescours>If pager paniced, I should have seen some message in the console, probably not that

<Pellescours>youpi: do you know what is defpager (folder) in hurd?

<youpi>Pellescours: no

<Pellescours>seems to be a try to write a defpager or a defpager helper. I’m not sure but it seems not used

<solid_black>it appears to be an implementation of defpager based on Hurd APIs (libstore...) rather than Mach ones, yes

<almuhs>hi. I've just compiled rumpkernel with my modified files (adding many prints) and installed the deb files in the VM. But I don't see my prints during the boot

<almuhs>this is my ahcisata_pci.c https://pastebin.com/QF8uAZeR . I modified the ahci_pci_match() function

<almuhs>and the pci_map.c https://pastebin.com/vjaafMTj I modified pci_mapreg_submap()

<youpi>almuhs: you need to rebuild hurd with it, as the linking is static, not dynamic

<almuhs>ok

<almuhs>but only rumpdisk, not full hurd

<Pellescours>yes

<Pellescours>you can build from the git repo directly

<almuhs>yes

<Pellescours>configure --enable-static-progs=rumpdisk && make rumpdisk

<Pellescours>then you can copy the file rumpdisk.static to the /hurd directory

<almuhs>thanks, i will try it

<almuhs>compiling

<almuhs>ok, now it shows the prints

<almuhs>but now it doesn't find the disk :(

<almuhs>i don't know if this is an error in my code, or any incompatibily between upstream and debian's hurd

<almuhs>ok, i think that it's an error splitting this if \

<almuhs>if ((PCI_CLASS(pa->pa_class) != PCI_CLASS_MASS_STORAGE ||

<almuhs> ((PCI_SUBCLASS(pa->pa_class) != PCI_SUBCLASS_MASS_STORAGE_SATA ||

<almuhs> PCI_INTERFACE(pa->pa_class) != PCI_INTERFACE_SATA_AHCI) &&

<almuhs> PCI_SUBCLASS(pa->pa_class) != PCI_SUBCLASS_MASS_STORAGE_RAID)) &&

<almuhs> (force == false))

<almuhs> return 0;

<almuhs>by any reason, pfinet doesn't works in rescue mode. I don't know how to fix this without reinstall

<Pellescours>extract the original compiled rumpdisk from the debian pkg and replace the one in your disk maybe

<almuhs>i don't have the original rumpdisk package, i think

<almuhs>i will try to enable cdrom repository

<Pellescours>when I replace files compiled by packages, I sometimes backup them before in order to be able to restore them in case of problem

<almuhs>yes, i forgot it this time

<Pellescours>you can download the package

<almuhs>where can i download in the web?

<Pellescours>it’s a real disk right?

<almuhs>nope, but i can't mount it with a loop

<almuhs>i can

<almuhs>but, if i want to download the package from the host, i need a URL

<Pellescours> https://deb.debian.org/debian-ports/pool-hurd-i386/main/h/hurd/

<Pellescours>you’ll find the package you want in it

<almuhs>but i need librump package

<Pellescours>no you just want to override the static program

<almuhs>ok

<Pellescours>static program are self sufficients

<almuhs>ok, solved. Thanks

<almuhs>now i'm fixing the code. I splitted the if in a wrong way

<almuhs>compiling again with the new version of the file

<gnu_srs1>youpi: mahler crashed building babeltrace2_2.0.6-1 Rebooted now.

IRC channel logs

2024-04-02.log