IRC channel logs

2024-04-02.log

back to list of logs

<Pellescours>i keep my debugger open, of someone can tell me what to do to inspect, i will be the hands
<Pellescours>and if you want to reproduce, compile python3.12 with (--enable-optimizations --with-lto --enabled-shared). it does not hang every times but it's often enough. it's hanging at an lto step
<white-wolf>hello
<white-wolf>how many line code writing mach kernel ?
<white-wolf>hurd have a fondation for donations ?
<solid_black>hi!
<solid_black>Pellescours: rumpdisk in exception_raise_continue is the culprit
<solid_black>it must be crashing
<solid_black>set a breakpoint from GDB on i386_exception, then reproduce
<solid_black>white-wolf: GNU Mach master is 84819 lines of code (not counting comments/blanks/etc), according to tokei
<Pellescours>solid_black: ok
<Pellescours>exception is something a program should not generate while a trap is normal, right?
<solid_black>depending on the exact meaning of those terms... :)
<solid_black>Mach calls "traps" what AArch64 calls "exceptions"
<solid_black>i386 also calls it "traps" I *think*, but I'm not sure
<solid_black>and Mach exceptions (what *Mach* calls "exceptions") are another thing entirely, and they're the bad kind, yes
<solid_black>and then there are also traps as in SIGTRAP / int3, and that is fine if you're doing some debugging
<gnu_srs1>Hello, when I boot hurd in 'recovery mode' I was earlier getting the prompt: enter root password to continue.
<gnu_srs1>That is not happening recently, boot continues. Which program/script is responsible here?
<solid_black>gnu_srs1: that's something in the sysv init boot flow, I believe
<solid_black>that happens when the rootfs fsck fails
<gnu_srs1>tks, will search the initscripts for sysvinit.
<solid_black>Pellescours: if you mean traps as in user_trap()/kernel_trap() in i386/i386/trap.c, that's normal, right
<solid_black>and i386_exception()/exception() is bad
<solid_black>but a trap can directly lead to an exception, if it turns out to be a missing page or a bad instruction or something
<solid_black>so are you able to reproduce the exception?
<solid_black>also I hope you have rumpdisk symbols
<Pellescours>ok my debugger stopped at i386_exception
<solid_black>great
<solid_black>what's percpu_array[0]->active_thread->task->name?
<solid_black>is it rumpisk?
<Pellescours>yes
<solid_black>you should have a structure of type i386_saved_state passed as an argument
<Pellescours>it’s rumpdisk
<solid_black>oh also what's the backtrace? is it from user_trap(), or is it from user_page_fault_continue()?
<solid_black>also which trap is it? / what's the exception/code/subcode values?
<Pellescours>vm_fault_continue
<Pellescours>user_page_fault_continue
<solid_black>ok, so it's accessing a page that it's not supposed to
<solid_black>is vm_fault_continue the topmost call frame? not trap_from_user?
<solid_black>or what's it called, user_trap
<Pellescours>yes vm_fault_continue is the topmost
<solid_black>that means it waited for a pager
<solid_black>I wonder, is this the device pager?
<solid_black>ACTION looks at vm_fault.c
<Pellescours>vm_fault_continue calls vm_fault that calls user_page_fault_continue that calls user_page_fault_continue then i386_exception
<solid_black>in vm_fault() stack frame, do you still have locals' values?
<solid_black>can you see what 'object' is?
<Pellescours>args are optimized
<solid_black>ok, let's at least find the userland stack trace
<solid_black>oh, also what's kr / exception code & subcode?
<Pellescours>kr=10
<solid_black>that's KERN_MEMORY_ERROR
<solid_black>either the pager died, or signalled an error
<solid_black>could it be that your system ran out of memory so it was paging out?
<Pellescours>I don’t know, I have 2G of memory and very few stuff is running on it
<solid_black>but weren't you running a highly concurrent build?
<Pellescours>yes
<solid_black>I'm just trying to imagine what kind of pager this could be
<solid_black>rumpdisk's own image is wired, so it can't be that
<Pellescours>I’m in non smp, running make on python. And that’s the lto1 running programm
<solid_black>for the device pager, I think it only accesses /dev/mem?
<solid_black>which cant really die on you
<solid_black>so it could be the default pager, if things (internal objects) started to get paged out
<Pellescours>I think I have an idea
<solid_black>lto is indeed very memory-hungry, fwiw
<Pellescours>If my disk becomes full during the compilation
<solid_black>things should just get ENOSPC then?
<Pellescours>I though I was having 10G image but it’s a 3G image size
<Pellescours>normally yes, but maybe not in this case
<solid_black>I don't think rumpdisk should be mapping stuff from ext2fs anyway
<solid_black>let's see, there have to be some VM stats in the kernel
<Pellescours>ah you’re right
<solid_black>p vm_stat
<solid_black>wdym I'm right?
<Pellescours>if it’s a disk size issue, it should not be rumpdisk that hit the limit but ext2fs
<Pellescours>p vm_stat
<Pellescours>$2 = {pagesize = 0, free_count = 0, active_count = 0, inactive_count = 0, wire_count = 0, zero_fill_count = 14213373, reactivations = 398, pageins = 458741, pageouts = 427018, faults = 42581367, cow_faults = 2263564, lookups = 4497257, hits = 2389822}
<solid_black>quite a number of pageouts
<solid_black>I have very little idea about the inner workings of the default pager
<solid_black>but it's certainly going to be problematic if rumpdisk blocks waiting for defpager, and defpager tries to access the disk
<solid_black>it does device_read() / device_write(), yes
<solid_black>and will indeed return an error (via memory_object_data_error) if it fails to read from backing storage
<solid_black>basically yeah, see if you can still repro this if you give the VM tons more RAM
<Pellescours>I already gives it all I can (2G) by providing 4096MiB to the VM and the PAE is not enabled
<solid_black>2G and 4096MiB is... not the same?
<Pellescours>Yes, I could bump to 3GiB if I enable the PAE
<Pellescours>but that’s all, it’s a 32bit VM
<solid_black>not that I know much about this, but one would think that even without PAE, 32-bit physical addresses are enough to access up to 4GB
<solid_black>alternatively, you could try to see if you can reproduce this by just allocating a lot of memory
<solid_black>there might be even allocate(1) that will do that, though I might be confusing this with Serenity
<solid_black>but it should be very easy to write your own one
<youpi>by switching between kernel- and user-land page table, yes
<youpi>but that's expensive
<youpi>but that's only for the virtual addressing
<solid_black>why would it need to switch?
<youpi>the physical adressing is another question
<solid_black>physical memory is what we're talking about
<youpi>most often various hardware stuff is placed below the 4G limit
<youpi>which thus prevents from putting ram there
<youpi>with the q35 model for instance, qemu only puts 2G below 4G
<youpi>without q35, one can use as much as -m 3550M in my tests
<youpi>the missing piece up to 4G is used by pci/acpi/whatnot
<solid_black>do we ever return ENOMEM/KERN_RESOURCE_SHORTAGE for vm_allocate?
<solid_black>can the defpager even refuse accepting more pages?
<solid_black>memory_object_create() is surely asynchronous
<Pellescours>I was able to reproduce it by running stress programm
<Pellescours>really easy to make it hang with stress. either stressing the memory (-m) or the disk (-d) works
<Pellescours>I tested with linux driver (the one in gnumach) and stressing don’t make the system hang. It’s really stressing the system while being on rumpdisk that cause the hang
<Pellescours>solid_black: vm_allocate return KERN_NO_SPACE when it fails allocation
<solid_black>does it? when there are no physical pages, and the pager doesn't have any space either?
<solid_black>I don't think there's even an API for the pager to say, hey, I don't have any more space, stop sending me pages and start returning ENOMEM to people who want space
<Pellescours>For this case I don’t know, I just looked quickly at the implementation of vm_allocate
<solid_black>it returns KERN_NO_SPACE when there's no more address space in your vm_map, yes
<solid_black>but that's a different thing
<Pellescours>ah possibly, when you say the pager, you talk in the kernel or in hurd (libdefpager?)?
<Pellescours>mach-defpager*
<solid_black>I meant pager in the kernel sense of the word, as in whatever implements a memory object
<solid_black>but it is (mach-?)defpager in this case, yes
<Pellescours>there is a mach-defpager in hurd, that’s why I was asking
<solid_black>yes, and also just defpager
<solid_black>I don't know what the difference is, and how they are related
<Pellescours>I think that mach-defpager is about swapping
<solid_black>the default pager is always about swapping, yes
<solid_black>(called paging out in Mach)
<solid_black>when there are not enough usable physical pages, Mach forcibly creates memory objects in the default pager
<solid_black>otherwise, they're much like any other memory object
<solid_black>and the default pager writes them about to... wherever it wants to, really, it could send them over network to some cloud storage, or ask the user to memorize them
<solid_black>but in practice it's just going to write them back onto a disk partition -> rumpdisk
<solid_black>linux does zram these days btw, that's also a cool option that's allegedly fast and doesn't involve writing things to disk
<Pellescours>defpager seems to be like a lib while mach-defpager a whole program
<Pellescours>zram would be cool yeah
<solid_black>of course, this is something that you can implement entirely in userland :)
<Pellescours>yes, the power of hurd
<Pellescours>I see panic calls in pager_alloc_page
<Pellescours>If pager paniced, I should have seen some message in the console, probably not that
<Pellescours>youpi: do you know what is defpager (folder) in hurd?
<youpi>Pellescours: no
<Pellescours>seems to be a try to write a defpager or a defpager helper. I’m not sure but it seems not used
<solid_black>it appears to be an implementation of defpager based on Hurd APIs (libstore...) rather than Mach ones, yes
<almuhs>hi. I've just compiled rumpkernel with my modified files (adding many prints) and installed the deb files in the VM. But I don't see my prints during the boot
<almuhs>this is my ahcisata_pci.c https://pastebin.com/QF8uAZeR . I modified the ahci_pci_match() function
<almuhs>and the pci_map.c https://pastebin.com/vjaafMTj I modified pci_mapreg_submap()
<youpi>almuhs: you need to rebuild hurd with it, as the linking is static, not dynamic
<almuhs>ok
<almuhs>but only rumpdisk, not full hurd
<Pellescours>yes
<Pellescours>you can build from the git repo directly
<almuhs>yes
<Pellescours>configure --enable-static-progs=rumpdisk && make rumpdisk
<Pellescours>then you can copy the file rumpdisk.static to the /hurd directory
<almuhs>thanks, i will try it
<almuhs>compiling
<almuhs>ok, now it shows the prints
<almuhs>but now it doesn't find the disk :(
<almuhs>i don't know if this is an error in my code, or any incompatibily between upstream and debian's hurd
<almuhs>ok, i think that it's an error splitting this if \
<almuhs>if ((PCI_CLASS(pa->pa_class) != PCI_CLASS_MASS_STORAGE ||
<almuhs> ((PCI_SUBCLASS(pa->pa_class) != PCI_SUBCLASS_MASS_STORAGE_SATA ||
<almuhs> PCI_INTERFACE(pa->pa_class) != PCI_INTERFACE_SATA_AHCI) &&
<almuhs> PCI_SUBCLASS(pa->pa_class) != PCI_SUBCLASS_MASS_STORAGE_RAID)) &&
<almuhs> (force == false))
<almuhs> return 0;
<almuhs>by any reason, pfinet doesn't works in rescue mode. I don't know how to fix this without reinstall
<Pellescours>extract the original compiled rumpdisk from the debian pkg and replace the one in your disk maybe
<almuhs>i don't have the original rumpdisk package, i think
<almuhs>i will try to enable cdrom repository
<Pellescours>when I replace files compiled by packages, I sometimes backup them before in order to be able to restore them in case of problem
<almuhs>yes, i forgot it this time
<Pellescours>you can download the package
<almuhs>where can i download in the web?
<Pellescours>it’s a real disk right?
<almuhs>nope, but i can't mount it with a loop
<almuhs>i can
<almuhs>but, if i want to download the package from the host, i need a URL
<Pellescours> https://deb.debian.org/debian-ports/pool-hurd-i386/main/h/hurd/
<Pellescours>you’ll find the package you want in it
<almuhs>but i need librump package
<Pellescours>no you just want to override the static program
<almuhs>ok
<Pellescours>static program are self sufficients
<almuhs>ok, solved. Thanks
<almuhs>now i'm fixing the code. I splitted the if in a wrong way
<almuhs>compiling again with the new version of the file
<gnu_srs1>youpi: mahler crashed building babeltrace2_2.0.6-1 Rebooted now.