<damo22>youpi: when it hangs and i cant enter kdb, all cpus are in HLT state with 0x246 eflags
<youpi>damo22: is the timer interrupt not working any more?
<youpi>does 0x246 include the interrupt flag?
<damo22>i think 0x200 is interrupt flag?
<damo22>so yes
<damo22>icrh is being set with apic ids for ipis
<damo22>during the hang
<damo22>and yes timer interrupt is working
<damo22>pit timer on cpu0 and lapic timers on each AP
<damo22>although, how does cpu0 receive timer interrupts?
<damo22>does it route through the lapic?
<damo22>i verified all the timer interrupts are being called
<youpi>ok so interrupts basically work, the question is then why the keyboard interrupt doesn't work to trigger kdb
<damo22>let me check if i compiled with --enable-kdb
<damo22> pin 1 0x0000000000010031 dest=0 vec=49 active-hi edge masked fixed physical
<damo22>why is pin 1 masked?
<damo22>must be that kbdopen is not called
<damo22>how can kd represent the screen if it refers to keyboard events?
<damo22>i fixed it
<damo22>db{0}> show all runqs
<damo22>Processor set runq: count(0) low(4)
<damo22>Processor #0 runq: count(0) low(32)
<damo22>Processor #1 runq: count(0) low(32)
<damo22>Processor #2 runq: count(0) low(32)
<damo22>Processor #3 runq: count(0) low(32)
<damo22>Processor #4 runq: count(0) low(32)
<damo22>Processor #5 runq: count(0) low(32)
<damo22>Stuck threads: 0
<damo22>so nothing in the runqs
<youpi>so it's not a scheduler bug, but a missing wakeup somewhere
<damo22>eg in thread_run and friends, it reads the cpu number before disabling interrupts, what if it gets interrupted in between reading the cpu number and turning off interrupts to handle the code
<damo22>i guess it will return here eventually with the right stack?
<youpi>it looks odd to be calling current_processor() outside splsched() indeed
<damo22>yeah can it be possibly getting the wrong value for cpu number?
<damo22>or caching it, and then returning on a different cpu
<youpi>I don't remember if the gnumach kernel is preemptible
<damo22>how can it run non-preemptible?
<youpi>err, by just not preempting
<youpi>I mean
<youpi>preempting *the kernel*
<youpi>userland is preemptible of course
<damo22>does that mean interrupts cannot interrupt gnumach process?
<youpi>they can interrupt them
<youpi>they just can't preempt them
<damo22>i need to read what that means
<youpi>what what means?
<youpi>preempt = run another thread
<youpi>i.e. when you're in kernel mode, you're sure to stay running, until you call something that might block
<damo22>how do you hand off to another thread?
<damo22>if youre blocking
<youpi>well, the blocking primitive will block
<youpi>e.g. thread_block
<youpi>i.e. tell the scheduler to run something else
<damo22>but i mean if a timer interrupt happens before splsched() will it guarantee to return with the same cpu and kernel stack?
<damo22>in kernel mode
<youpi>yes since it can't preempt
<youpi>threads don't magically change cpu :)
<damo22>what sort of missing wakeup would i be looking for
<youpi>no idea, but you probably rather want to look for what is actually supposed to be running
<damo22>the cpus all end up in machine_idle
<youpi>sure, if there is no thread to run, cpus will be idle
<youpi>question is: what is your system actually doing?
<youpi>what did you expect it to be doine?
<youpi>are you having a shell, something?
<youpi>very possibly you just have an userland interlock
<damo22>its in the bootup process running INIT
<youpi>and thus the kernel is not at fault at all
<youpi>and it's just userland being faulty
<youpi>so check what userland is doing, which point it is at
<youpi>you can also check the state of ext2fs
<youpi>possibly it's stuck for whatever reason
<youpi>but you can see in the backtrace what it's doing
<youpi>in a word: investigate
<youpi>it's like agatha christie novels
<youpi>you need to collect info
<youpi>making hypothesis is premature until you actually have an idea where you're aiming at
<damo22>i have 27 tasks
<damo22>its trying to boot into a shell
<damo22>running the init scripts
<damo22>is there any way to make all the init tasks bound on cpu0 but let some tasks like gcc run on APs?
<damo22>or does everything inherit from init task
<youpi>everything inherits from the init task
<youpi>but you can probably change the binding later when you want
<damo22>that might be easier
<damo22>make everything run on cpu0 and select some things to run on APs only
<damo22>ok when i do that, it runs, reallllly slow
<damo22>i think we need psets
<damo22>i changed it so APs are in a separate pset, but i think the scheduler is putting threads into the alternate pset as well but they are never run
<damo22>youpi: i think ive solved it for now
<damo22>i put APs into a separate processor set and they are disabled by default
<damo22>so smp boots with all APs but only executes on BSP
<damo22>i think we can enable them using processor_set RPCs
<damo22>root@zamhurd:~# nproc
<damo22>i compiled gnumach inside this
<damo22>ive mailed in a few small patches that enable this
<damo22>we might be able to spawn a shell that runs only on APs
<damo22>azert's offer may be desirable
<damo22>solid_black: # nproc
<damo22>but theyre disabled
<damo22>idling in a processor set that is unused
<janneke>could it be that the latest hurd release (v0.9.git20231217) needs an unreleased gnumach?
<janneke>it seems that i need gnumach commit
<janneke>x86_64: Support 8 byte inlined port rights to avoid message resizing.
<janneke>i'm getting
<janneke>start-translator-long.c:42:3: error: unknown type name ‘mach_port_name_inlined_t’
<janneke> 42 | mach_port_name_inlined_t control_port;
<youpi>which "release" of gnumcah do you have?
<youpi>possibly I just forgot to push latest tags
<youpi>it seems there was no 2023 tag indeed
<janneke>i'm using v1.8+git20230410, which is my latest afaics
<youpi>that's too old indeed
<youpi>see the mig changes that Flavio introduced lately
<youpi>which indeed makes incompatible API changes
<youpi>anyway, I have pushed the latest 2023 tags
<janneke>thanks, and good to know latest hurd needs something closer to gnumach master