IRC channel logs

2023-02-07.log

back to list of logs

<damo22>youpi: someone random mentioned this line was added to mach in a different project (xnu?) https://github.com/slp/osfmk-mklinux/blob/master/osfmk/src/mach_kernel/intel/pmap.c#L3328 i tried adding something like that and it calls the pmap_update_interrupt more often
<damo22>but other than repeatedly calling the interrupt routines it doesnt help it boot
<damo22>with my current branch, PMAP(x) gets printed a few times only, and then it hangs
<damo22>it seems like the TLB coherency code relies on the cpus being not idle at least some of the time
<damo22>is the timer interrupt supposed to mark the cpu active?
<youpi>note the test
<youpi>it's not testing that the cpu is idle
<youpi>it's testing that the cpu is *not* idle
<youpi>i.e. if it's idle there's no use interrupting it
<youpi>but if it's the kernel pmap that gets updated, one *has* to interrupt it
<youpi>(and we don't have the real_pmap optimization)
<youpi>so yes, the first additional test is needed
<damo22>(pmap == kernel_pmap)
<youpi>but there's no requirement that cpus be idle, any thread will get interrupted
<youpi>it's only interrupt code that cannot be interrupted
<damo22>so the current code is missing the first test
<youpi>yes
<damo22>that changes things, i get constant stream of updates
<youpi>well, depends what the os is doing
<youpi>if it's making virtual allocations, that's expected
<damo22>what is current_pmap() is it the same as real_pmap[which_cpu] ?
<youpi>we don't have real_pmap
<youpi>current_pmap is the map of the current thread's task
<damo22>ok
<damo22>cpu usage of my vm goes up significantly when i add that test
<damo22>probably because lots of tlb interrupts
<damo22>but the os is idle waiting for tasks
<damo22>its strange, it never gets any
<youpi>put prints in the scheduler to see what it's seeing
<damo22>is it possible to toggle eflags in (qemu) per cpu?
<youpi>I don't know if the qemu console permits to tinker with that
<damo22>how can machine_idle be entered with interrupts off?
<youpi>? because the scheduler restores ipl state on switching to another thread
<damo22>isnt that going to block the cpu from ever waking?
<youpi>??
<youpi>it *restores* it
<youpi>i.e. it puts it back into interrupts-allowed state
<damo22>in qemu i have some cpus HLT=1 with EFLAGS & 0x200 == 0
<youpi>then there's a missing splx somewhere
<damo22>yes
<damo22>i wish i could find it
<youpi>what actually boots?
<damo22>starting acpi:
<youpi>follow how well the boot progresses
<youpi>look in the code
<youpi>the normal printfs are way not fine-grain
<damo22>it loads all the grub modules
<damo22>or says its doing something
<damo22>acpi task died somehow
<youpi>how can you tell that it dies somehow?
<damo22>because sometimes it appears in the show all threads sometimes its missing but the rest are present
<damo22>after it hangs
<damo22>i can get into kdb
<damo22>db{0}> show all threads
<damo22> TASK THREADS
<damo22> 0 gnumach (f59a8ea0): 10 threads:
<damo22> 0 (f59a6e70) .W..N. 0xc10b2488
<damo22> 1 (f59a6d20) R.....
<damo22> 2 (f59a6bd0) R.....
<damo22> 3 (f59a6a80) .W.ON.(reaper_thread_continue) 0xc10b1454
<damo22> 4 (f59a6930) .W.ON.(swapin_thread_continue) 0xc10b15c4
<damo22> 5 (f59a67e0) .W.ON.(sched_thread_continue) 0
<damo22> 6 (f59a6690) .W..N. intr_thread
<damo22> 7 (f59a6540) .W.ON.(action_thread_continue) 0xc10ac284
<damo22> 8 (f59a63f0) .W.ON.(io_done_thread_continue) 0xc10b2c84
<damo22> 9 (f59a62a0) .W.ON.(net_thread_continue) 0xc10b449c
<damo22> 1 pci-arbiter (f59a8d00): (f59a6000) ..SO..(thread_bootstrap_return)
<damo22> 2 rumpdisk (f59a8c30): (f598fe78) ..SO..(thread_bootstrap_return)
<damo22> 3 ext2fs (f59a8b60): (f598fd28) ..SO..(thread_bootstrap_return)
<damo22> 4 exec (f59a8a90): (f598fbd8) ..SO..(thread_bootstrap_return)
<damo22>thats with -smp 2 and the two idle threads are "R"
<youpi>how does acpi show up? with only thread_bootstrap_return ?
<damo22>sometimes its a "walking_zombie"
<damo22>im pretty sure ive seen it with thread_bootstrap_return as well
<youpi>did the user_bootstrap function actually run ?
<youpi>put another way: as I wrote, follow the source code in bootstrap.c
<youpi>to see what actually runs, what doesn't
<damo22>ok
<youpi>relying on prints and kdb is really not precise
<damo22>ok, this time some of the modules call user_bootstrap -> thread_bootstrap_return, but the rest are stuck because the machine hangs
<damo22>task loaded: acpi --host-priv-port=1 --device-master-port=2 --next-task=3 XXXDZXXX
<damo22>task loaded: pci-arbiter --next-task=1 XXXDZXXX
<damo22>task loaded: rumpdisk --next-task=1 XXXDZXXX
<damo22>...
<damo22> 1 acpi (f59a8dd0): (f5998bd8) ..SO..(thread_bootstrap_return)
<damo22> 2 pci-arbiter (f59a8d00): (f5998a88) ..SO..(thread_bootstrap_return)
<damo22> 3 rumpdisk (f59a8c30): (f5998938) ..SO..(thread_bootstrap_return)
<damo22> 4 ext2fs (f59a8b60): (f59987e8) R..O..(user_bootstrap)
<damo22> 5 exec (f59a8a90): no threads
<damo22>EIP=c1004521 EFL=00000046 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=1
<damo22>on 3 of the cures
<damo22>cores*
<damo22>it seems there is a missing splx
<youpi>err, it seems that curr_ipl is not per-cpu ?
<youpi>it definitely should
<damo22>really??
<youpi>i386/i386/ipl.h:extern spl_t curr_ipl;
<youpi>i386/i386/pic.c:spl_t curr_ipl;
<youpi>kern/lock_mon.c:extern spl_t curr_ipl[];
<youpi>kern/lock_mon.c: if (curr_ipl[my_cpu])
<damo22>oh geez
<damo22>that will fix it for sure
<youpi>perhaps not fix it all, but that was definitely to be fixed