IRC channel logs

<damo22>youpi: someone random mentioned this line was added to mach in a different project (xnu?) https://github.com/slp/osfmk-mklinux/blob/master/osfmk/src/mach_kernel/intel/pmap.c#L3328 i tried adding something like that and it calls the pmap_update_interrupt more often

<damo22>but other than repeatedly calling the interrupt routines it doesnt help it boot

<damo22>with my current branch, PMAP(x) gets printed a few times only, and then it hangs

<damo22>it seems like the TLB coherency code relies on the cpus being not idle at least some of the time

<damo22>is the timer interrupt supposed to mark the cpu active?

<youpi>note the test

<youpi>it's not testing that the cpu is idle

<youpi>it's testing that the cpu is *not* idle

<youpi>i.e. if it's idle there's no use interrupting it

<youpi>but if it's the kernel pmap that gets updated, one *has* to interrupt it

<youpi>(and we don't have the real_pmap optimization)

<youpi>so yes, the first additional test is needed

<damo22>(pmap == kernel_pmap)

<youpi>but there's no requirement that cpus be idle, any thread will get interrupted

<youpi>it's only interrupt code that cannot be interrupted

<damo22>so the current code is missing the first test

<youpi>yes

<damo22>that changes things, i get constant stream of updates

<youpi>well, depends what the os is doing

<youpi>if it's making virtual allocations, that's expected

<damo22>what is current_pmap() is it the same as real_pmap[which_cpu] ?

<youpi>we don't have real_pmap

<youpi>current_pmap is the map of the current thread's task

<damo22>ok

<damo22>cpu usage of my vm goes up significantly when i add that test

<damo22>probably because lots of tlb interrupts

<damo22>but the os is idle waiting for tasks

<damo22>its strange, it never gets any

<youpi>put prints in the scheduler to see what it's seeing

<damo22>is it possible to toggle eflags in (qemu) per cpu?

<youpi>I don't know if the qemu console permits to tinker with that

<damo22>how can machine_idle be entered with interrupts off?

<youpi>? because the scheduler restores ipl state on switching to another thread

<damo22>isnt that going to block the cpu from ever waking?

<youpi>??

<youpi>it *restores* it

<youpi>i.e. it puts it back into interrupts-allowed state

<damo22>in qemu i have some cpus HLT=1 with EFLAGS & 0x200 == 0

<youpi>then there's a missing splx somewhere

<damo22>yes

<damo22>i wish i could find it

<youpi>what actually boots?

<damo22>starting acpi:

<youpi>follow how well the boot progresses

<youpi>look in the code

<youpi>the normal printfs are way not fine-grain

<damo22>it loads all the grub modules

<damo22>or says its doing something

<damo22>acpi task died somehow

<youpi>how can you tell that it dies somehow?

<damo22>because sometimes it appears in the show all threads sometimes its missing but the rest are present

<damo22>after it hangs

<damo22>i can get into kdb

<damo22>db{0}> show all threads

<damo22> TASK THREADS

<damo22> 0 gnumach (f59a8ea0): 10 threads:

<damo22> 0 (f59a6e70) .W..N. 0xc10b2488

<damo22> 1 (f59a6d20) R.....

<damo22> 2 (f59a6bd0) R.....

<damo22> 3 (f59a6a80) .W.ON.(reaper_thread_continue) 0xc10b1454

<damo22> 4 (f59a6930) .W.ON.(swapin_thread_continue) 0xc10b15c4

<damo22> 5 (f59a67e0) .W.ON.(sched_thread_continue) 0

<damo22> 6 (f59a6690) .W..N. intr_thread

<damo22> 7 (f59a6540) .W.ON.(action_thread_continue) 0xc10ac284

<damo22> 8 (f59a63f0) .W.ON.(io_done_thread_continue) 0xc10b2c84

<damo22> 9 (f59a62a0) .W.ON.(net_thread_continue) 0xc10b449c

<damo22> 1 pci-arbiter (f59a8d00): (f59a6000) ..SO..(thread_bootstrap_return)

<damo22> 2 rumpdisk (f59a8c30): (f598fe78) ..SO..(thread_bootstrap_return)

<damo22> 3 ext2fs (f59a8b60): (f598fd28) ..SO..(thread_bootstrap_return)

<damo22> 4 exec (f59a8a90): (f598fbd8) ..SO..(thread_bootstrap_return)

<damo22>thats with -smp 2 and the two idle threads are "R"

<youpi>how does acpi show up? with only thread_bootstrap_return ?

<damo22>sometimes its a "walking_zombie"

<damo22>im pretty sure ive seen it with thread_bootstrap_return as well

<youpi>did the user_bootstrap function actually run ?

<youpi>put another way: as I wrote, follow the source code in bootstrap.c

<youpi>to see what actually runs, what doesn't

<damo22>ok

<youpi>relying on prints and kdb is really not precise

<damo22>ok, this time some of the modules call user_bootstrap -> thread_bootstrap_return, but the rest are stuck because the machine hangs

<damo22>task loaded: acpi --host-priv-port=1 --device-master-port=2 --next-task=3 XXXDZXXX

<damo22>task loaded: pci-arbiter --next-task=1 XXXDZXXX

<damo22>task loaded: rumpdisk --next-task=1 XXXDZXXX

<damo22>...

<damo22> 1 acpi (f59a8dd0): (f5998bd8) ..SO..(thread_bootstrap_return)

<damo22> 2 pci-arbiter (f59a8d00): (f5998a88) ..SO..(thread_bootstrap_return)

<damo22> 3 rumpdisk (f59a8c30): (f5998938) ..SO..(thread_bootstrap_return)

<damo22> 4 ext2fs (f59a8b60): (f59987e8) R..O..(user_bootstrap)

<damo22> 5 exec (f59a8a90): no threads

<damo22>EIP=c1004521 EFL=00000046 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=1

<damo22>on 3 of the cures

<damo22>cores*

<damo22>it seems there is a missing splx

<youpi>err, it seems that curr_ipl is not per-cpu ?

<youpi>it definitely should

<damo22>really??

<youpi>i386/i386/ipl.h:extern spl_t curr_ipl;

<youpi>i386/i386/pic.c:spl_t curr_ipl;

<youpi>kern/lock_mon.c:extern spl_t curr_ipl[];

<youpi>kern/lock_mon.c: if (curr_ipl[my_cpu])

<damo22>oh geez

<damo22>that will fix it for sure

<youpi>perhaps not fix it all, but that was definitely to be fixed

IRC channel logs

2023-02-07.log