IRC channel logs
2024-04-02.log
back to list of logs
<Pellescours>i keep my debugger open, of someone can tell me what to do to inspect, i will be the hands <Pellescours>and if you want to reproduce, compile python3.12 with (--enable-optimizations --with-lto --enabled-shared). it does not hang every times but it's often enough. it's hanging at an lto step <solid_black>Pellescours: rumpdisk in exception_raise_continue is the culprit <solid_black>set a breakpoint from GDB on i386_exception, then reproduce <solid_black>white-wolf: GNU Mach master is 84819 lines of code (not counting comments/blanks/etc), according to tokei <Pellescours>exception is something a program should not generate while a trap is normal, right? <solid_black>i386 also calls it "traps" I *think*, but I'm not sure <solid_black>and Mach exceptions (what *Mach* calls "exceptions") are another thing entirely, and they're the bad kind, yes <solid_black>and then there are also traps as in SIGTRAP / int3, and that is fine if you're doing some debugging <gnu_srs1>Hello, when I boot hurd in 'recovery mode' I was earlier getting the prompt: enter root password to continue. <gnu_srs1>That is not happening recently, boot continues. Which program/script is responsible here? <solid_black>gnu_srs1: that's something in the sysv init boot flow, I believe <gnu_srs1>tks, will search the initscripts for sysvinit. <solid_black>Pellescours: if you mean traps as in user_trap()/kernel_trap() in i386/i386/trap.c, that's normal, right <solid_black>but a trap can directly lead to an exception, if it turns out to be a missing page or a bad instruction or something <solid_black>you should have a structure of type i386_saved_state passed as an argument <solid_black>oh also what's the backtrace? is it from user_trap(), or is it from user_page_fault_continue()? <solid_black>also which trap is it? / what's the exception/code/subcode values? <solid_black>ok, so it's accessing a page that it's not supposed to <solid_black>is vm_fault_continue the topmost call frame? not trap_from_user? <Pellescours>vm_fault_continue calls vm_fault that calls user_page_fault_continue that calls user_page_fault_continue then i386_exception <solid_black>in vm_fault() stack frame, do you still have locals' values? <solid_black>could it be that your system ran out of memory so it was paging out? <Pellescours>I don’t know, I have 2G of memory and very few stuff is running on it <solid_black>I'm just trying to imagine what kind of pager this could be <Pellescours>I’m in non smp, running make on python. And that’s the lto1 running programm <solid_black>for the device pager, I think it only accesses /dev/mem? <solid_black>so it could be the default pager, if things (internal objects) started to get paged out <Pellescours>I though I was having 10G image but it’s a 3G image size <solid_black>I don't think rumpdisk should be mapping stuff from ext2fs anyway <solid_black>let's see, there have to be some VM stats in the kernel <Pellescours>if it’s a disk size issue, it should not be rumpdisk that hit the limit but ext2fs <Pellescours>$2 = {pagesize = 0, free_count = 0, active_count = 0, inactive_count = 0, wire_count = 0, zero_fill_count = 14213373, reactivations = 398, pageins = 458741, pageouts = 427018, faults = 42581367, cow_faults = 2263564, lookups = 4497257, hits = 2389822} <solid_black>I have very little idea about the inner workings of the default pager <solid_black>but it's certainly going to be problematic if rumpdisk blocks waiting for defpager, and defpager tries to access the disk <solid_black>and will indeed return an error (via memory_object_data_error) if it fails to read from backing storage <solid_black>basically yeah, see if you can still repro this if you give the VM tons more RAM <Pellescours>I already gives it all I can (2G) by providing 4096MiB to the VM and the PAE is not enabled <solid_black>not that I know much about this, but one would think that even without PAE, 32-bit physical addresses are enough to access up to 4GB <solid_black>alternatively, you could try to see if you can reproduce this by just allocating a lot of memory <solid_black>there might be even allocate(1) that will do that, though I might be confusing this with Serenity <youpi>by switching between kernel- and user-land page table, yes <youpi>but that's only for the virtual addressing <youpi>the physical adressing is another question <youpi>most often various hardware stuff is placed below the 4G limit <youpi>which thus prevents from putting ram there <youpi>with the q35 model for instance, qemu only puts 2G below 4G <youpi>without q35, one can use as much as -m 3550M in my tests <youpi>the missing piece up to 4G is used by pci/acpi/whatnot <solid_black>do we ever return ENOMEM/KERN_RESOURCE_SHORTAGE for vm_allocate? <Pellescours>I was able to reproduce it by running stress programm <Pellescours>really easy to make it hang with stress. either stressing the memory (-m) or the disk (-d) works <Pellescours>I tested with linux driver (the one in gnumach) and stressing don’t make the system hang. It’s really stressing the system while being on rumpdisk that cause the hang <Pellescours>solid_black: vm_allocate return KERN_NO_SPACE when it fails allocation <solid_black>does it? when there are no physical pages, and the pager doesn't have any space either? <solid_black>I don't think there's even an API for the pager to say, hey, I don't have any more space, stop sending me pages and start returning ENOMEM to people who want space <Pellescours>For this case I don’t know, I just looked quickly at the implementation of vm_allocate <solid_black>it returns KERN_NO_SPACE when there's no more address space in your vm_map, yes <Pellescours>ah possibly, when you say the pager, you talk in the kernel or in hurd (libdefpager?)? <solid_black>I meant pager in the kernel sense of the word, as in whatever implements a memory object <Pellescours>there is a mach-defpager in hurd, that’s why I was asking <solid_black>I don't know what the difference is, and how they are related <solid_black>when there are not enough usable physical pages, Mach forcibly creates memory objects in the default pager <solid_black>otherwise, they're much like any other memory object <solid_black>and the default pager writes them about to... wherever it wants to, really, it could send them over network to some cloud storage, or ask the user to memorize them <solid_black>but in practice it's just going to write them back onto a disk partition -> rumpdisk <solid_black>linux does zram these days btw, that's also a cool option that's allegedly fast and doesn't involve writing things to disk <Pellescours>defpager seems to be like a lib while mach-defpager a whole program <solid_black>of course, this is something that you can implement entirely in userland :) <Pellescours>If pager paniced, I should have seen some message in the console, probably not that <Pellescours>youpi: do you know what is defpager (folder) in hurd? <Pellescours>seems to be a try to write a defpager or a defpager helper. I’m not sure but it seems not used <solid_black>it appears to be an implementation of defpager based on Hurd APIs (libstore...) rather than Mach ones, yes <almuhs>hi. I've just compiled rumpkernel with my modified files (adding many prints) and installed the deb files in the VM. But I don't see my prints during the boot <youpi>almuhs: you need to rebuild hurd with it, as the linking is static, not dynamic <almuhs>but only rumpdisk, not full hurd <Pellescours>configure --enable-static-progs=rumpdisk && make rumpdisk <Pellescours>then you can copy the file rumpdisk.static to the /hurd directory <almuhs>but now it doesn't find the disk :( <almuhs>i don't know if this is an error in my code, or any incompatibily between upstream and debian's hurd <almuhs>ok, i think that it's an error splitting this if \ <almuhs>if ((PCI_CLASS(pa->pa_class) != PCI_CLASS_MASS_STORAGE || <almuhs> ((PCI_SUBCLASS(pa->pa_class) != PCI_SUBCLASS_MASS_STORAGE_SATA || <almuhs> PCI_INTERFACE(pa->pa_class) != PCI_INTERFACE_SATA_AHCI) && <almuhs> PCI_SUBCLASS(pa->pa_class) != PCI_SUBCLASS_MASS_STORAGE_RAID)) && <almuhs>by any reason, pfinet doesn't works in rescue mode. I don't know how to fix this without reinstall <Pellescours>extract the original compiled rumpdisk from the debian pkg and replace the one in your disk maybe <almuhs>i don't have the original rumpdisk package, i think <almuhs>i will try to enable cdrom repository <Pellescours>when I replace files compiled by packages, I sometimes backup them before in order to be able to restore them in case of problem <almuhs>where can i download in the web? <almuhs>nope, but i can't mount it with a loop <almuhs>but, if i want to download the package from the host, i need a URL <almuhs>now i'm fixing the code. I splitted the if in a wrong way <almuhs>compiling again with the new version of the file <gnu_srs1>youpi: mahler crashed building babeltrace2_2.0.6-1 Rebooted now.