IRC channel logs
2024-12-04.log
back to list of logs
<Pellescours>I was doing something, and I felt in a case where none of my userspace program are running (my hurd term does not respond to keyboard, only kdb work), while I was doing some stuff (it was doing a cp), I tried to inspec the threads traces, but all what I have is a gnumach thread starting another gnumach thread from time to time and that’s all <Pellescours>my userspaces program never get back to running state (hand my hurd term does not respond, nor my ssh) <damo22>if not, possibly there is something fishy with swap to disk... <Pellescours>nope, it’s from a hurd cross-compiled with flavio scripts. I was doing some compilation stuff <damo22>i find when i run sync quite often, i avoid hang <Pellescours>can it be something with rump and write heavy action? Like write action is waiting from rump to actually write to disk and for some reason everyone wait from another ? <youpi>possibly, and possibly there is a loop if any page of rump gets swapped out <youpi>(doesn't matter if it's swapped out to swap, or just evicted from memory for read-only data) <Pellescours>when a thread is in exception_raise_continue (a rumpdisk thread) what does it mean? <Pellescours>is there a debugger command to check the interrupt status? <Pellescours>I attached a gdb to my gnumach and check a bit in the clear_wait function, the only thread that wake up (it’s always the same) is a kernel thread that don’t wait for an event and that was wake up by the softclock <Pellescours>I have a gnumach thread (thread 0) with Stat: ".WS.N." (so it’s suspended) that is never started by the kernel <solid_black>agreed that fs as a namespace/API is more plan 9-ish than hurdish; the Hurd uses fs as a *name directory* for servers' ports (in /servers) in place of classic Mach's (net)name server, or Darwin's bootstrap server (that later got folded into launchd) <solid_black>it makes sense to still expose things as an fs w/ directories and nodes when they are naturally hierarchical, such as the device tree perhaps <solid_black>but generally there'd be just a single node + special rpcs on it <solid_black>a thread in exception_raise_continue means there was an exception (that the exception server didn't yet reply to) <solid_black>so the origin of the exception needs to be investigated <solid_black>generally I found it useful to break in gdb on i386_exception() <solid_black>from there, you could find a register dump and try to reconstruct what happened <solid_black>I have this cool dwarf thing written for aarch64 gnumach which enabled gdb to unwind through syscalls, interrupts, and exceptions <solid_black>which means I see where the userland crashed (or was interrupted) right in 'bt', and I can go up the stack and inscept userspace registers/locals etc <solid_black>would be cool to have a similar thing for x86 too, it was a lot harder to debug userland crashes when bringing up x86_64 <solid_black>pellescours: it's probably the documentation that needs corrections <solid_black>in there really a deadlock inside gnumach in your case, or is it just that rumpdisk crashes and that made everything stop? <Pellescours>solid_black: it was youpi effectivelly, but your opinion is welcome <solid_black>my opinion as voiced above is that it was apparently intended that mqueue is locked when resume=FALSE and unlocked when resume=TRUE, and it's the doc that needs fixing <Pellescours>i would have moved the lick to the caller to keep coherency, particularly since in the resume=TRUE, locking is the 1st thing done <solid_black>hm, so I can allocate physical pages w/ vm_allocate_contiguous, and it returns both virtual and physical addresses <solid_black>but if I'm given a page (received in a message), there's no way to learn its physical address? <Pellescours>solid_black: in userspace (the need to know physical address)? <solid_black>cool, thanks, vm_pages_phys is exactly what I was needing <solid_black>because I don't want it paged out from under me the next instant, or CoW-ed away <gnu_srs1>Hi. In file bootclean.sh from initscripts in sysvinit the function clean_all() starts with the following: find >/dev/null 2>&1 || return 0 <gnu_srs1>This takes a long time since it seems to find all files starting at /. It also seems to return 1, so the subsequent commands are not executed: log_begin_msg "Cleaning up temporary files..." ... <gnu_srs1>A bug? What sets the return value of find? <youpi>no, that line is which find >/dev/null 2>&1 || return 0 <youpi>which doesn't start find, but just checks whether it is available <Pellescours>and everytime the scheduler tries to wake a thread, it is this thread that is taken (so an <Pellescours>also note that the thread.wait_event=0x0 (so no wait event) <Pellescours>so basically my idle_thread_continue() is trying to start thread "rumpclk0" but because his wait queue is empty, does not start the thread and do not update the processor->next_thread. So the next time idle_thread_continue, it will do the exact same thing again and again <Pellescours>Or is it just a problem in rump that make the rumpclk0 wake condition never met? <janneke>ACTION finally has all website dependencies (buildable) in guix and sends patch