IRC channel logs

<Pellescours>I was doing something, and I felt in a case where none of my userspace program are running (my hurd term does not respond to keyboard, only kdb work), while I was doing some stuff (it was doing a cp), I tried to inspec the threads traces, but all what I have is a gnumach thread starting another gnumach thread from time to time and that’s all

<Pellescours>my userspaces program never get back to running state (hand my hurd term does not respond, nor my ssh)

<damo22>Pellescours: is this with smp?

<damo22>if not, possibly there is something fishy with swap to disk...

<Pellescours>nope, it’s from a hurd cross-compiled with flavio scripts. I was doing some compilation stuff

<damo22>i find when i run sync quite often, i avoid hang

<Pellescours>can it be something with rump and write heavy action? Like write action is waiting from rump to actually write to disk and for some reason everyone wait from another ?

<youpi>possibly, and possibly there is a loop if any page of rump gets swapped out

<youpi>(doesn't matter if it's swapped out to swap, or just evicted from memory for read-only data)

<Pellescours>when a thread is in exception_raise_continue (a rumpdisk thread) what does it mean?

<Pellescours>is there a debugger command to check the interrupt status?

<Pellescours>I attached a gdb to my gnumach and check a bit in the clear_wait function, the only thread that wake up (it’s always the same) is a kernel thread that don’t wait for an event and that was wake up by the softclock

<Pellescours>I have a gnumach thread (thread 0) with Stat: ".WS.N." (so it’s suspended) that is never started by the kernel

<Pellescours>AH I think I found another deadlock: in ipc_mqueue_receive https://git.zammit.org/gnumach-sv.git/tree/ipc/ipc_mqueue.c?h=master#n518 it is said in Conditions "The message queue is lockes; it will be returned unlocked", but if you look at line 536, there is a goto after_thread_block. But just after this goto, there is a imq_lock(mqueue)... relock something already locked

<Pellescours>youpi ^

<damo22>Pellescours: nice!!

<solid_black>morning

<solid_black>agreed that fs as a namespace/API is more plan 9-ish than hurdish; the Hurd uses fs as a *name directory* for servers' ports (in /servers) in place of classic Mach's (net)name server, or Darwin's bootstrap server (that later got folded into launchd)

<solid_black>it makes sense to still expose things as an fs w/ directories and nodes when they are naturally hierarchical, such as the device tree perhaps

<solid_black>but generally there'd be just a single node + special rpcs on it

<solid_black>a thread in exception_raise_continue means there was an exception (that the exception server didn't yet reply to)

<solid_black>so the origin of the exception needs to be investigated

<solid_black>generally I found it useful to break in gdb on i386_exception()

<solid_black>from there, you could find a register dump and try to reconstruct what happened

<solid_black>I have this cool dwarf thing written for aarch64 gnumach which enabled gdb to unwind through syscalls, interrupts, and exceptions

<solid_black>which means I see where the userland crashed (or was interrupted) right in 'bt', and I can go up the stack and inscept userspace registers/locals etc

<solid_black>would be cool to have a similar thing for x86 too, it was a lot harder to debug userland crashes when bringing up x86_64

<solid_black>pellescours: it's probably the documentation that needs corrections

<solid_black>if resume is true, mqueue is not locked

<solid_black>on entry

<solid_black>see mach_msg_receive_continue

<solid_black>and exception_raise_continue

<solid_black>in there really a deadlock inside gnumach in your case, or is it just that rumpdisk crashes and that made everything stop?

<Pellescours>youpi: either this or ther callers that need to do the lock instead (example https://git.zammit.org/gnumach-sv.git/tree/kern/exception.c?h=master#n851 )

<solid_black>you mean me probably not youpi?

<Pellescours>to keep behavior coherent

<Pellescours>solid_black: it was youpi effectivelly, but your opinion is welcome

<solid_black>my opinion as voiced above is that it was apparently intended that mqueue is locked when resume=FALSE and unlocked when resume=TRUE, and it's the doc that needs fixing

<Pellescours>i would have moved the lick to the caller to keep coherency, particularly since in the resume=TRUE, locking is the 1st thing done

<Pellescours>s/lick/lock/

<solid_black>hm, so I can allocate physical pages w/ vm_allocate_contiguous, and it returns both virtual and physical addresses

<solid_black>but if I'm given a page (received in a message), there's no way to learn its physical address?

<Pellescours>solid_black: in userspace (the need to know physical address)?

<Pellescours>vm_pages_pys ?

<Pellescours>vm_pages_phys ?

<solid_black>yes, in userspace

<solid_black>cool, thanks, vm_pages_phys is exactly what I was needing

<solid_black>I still need to wire it or something first

<solid_black>let me think

<solid_black>because I don't want it paged out from under me the next instant, or CoW-ed away

<gnu_srs1>Hi. In file bootclean.sh from initscripts in sysvinit the function clean_all() starts with the following: find >/dev/null 2>&1 || return 0

<gnu_srs1>This takes a long time since it seems to find all files starting at /. It also seems to return 1, so the subsequent commands are not executed: log_begin_msg "Cleaning up temporary files..." ...

<gnu_srs1>A bug? What sets the return value of find?

<gnu_srs1>i.e. find with no arguments!

<youpi>no, that line is which find >/dev/null 2>&1 || return 0

<youpi>which doesn't start find, but just checks whether it is available

<Pellescours>for my "freeze" that I’m able to reproduce easily (by doing the same steps), I have a scheduler that tries to wake a thread but at this line https://github.com/etienne02/gnumach/blob/master/kern/sched_prim.c#L408 the queue is empty and so the thread never start. Is there a possibility that a collision of wait_hash cause the wait_queue to be empty (due to another thread picking it earlier)?

<Pellescours>and everytime the scheduler tries to wake a thread, it is this thread that is taken (so an

<Pellescours>infinity wait)

<gnu_srs1>youpi: Of course, my bad :(

<Pellescours>also note that the thread.wait_event=0x0 (so no wait event)

<Pellescours>are we sure this macro (TH_EV_STATE) is valid https://github.com/etienne02/gnumach/blob/master/kern/thread.h#L90 this looks odd to me

<Pellescours>so basically my idle_thread_continue() is trying to start thread "rumpclk0" but because his wait queue is empty, does not start the thread and do not update the processor->next_thread. So the next time idle_thread_continue, it will do the exact same thing again and again

<Pellescours>shouldn’t a check on the return value of thread_wakeup https://github.com/etienne02/gnumach/blob/master/kern/sched_prim.c#L661 be added and something be done in case where the thread did not started? (Like assigning another new_thread to processor?)

<Pellescours>Or is it just a problem in rump that make the rumpclk0 wake condition never met?

<janneke>ACTION finally has all website dependencies (buildable) in guix and sends patch

IRC channel logs

2024-12-04.log