IRC channel logs

<Pellescours>And what is weird it’s that with ncpus=4 it builds

<youpi>Pellescours: rather than pmap_enter, rather disassemble a way smaller function, such as pmap_reference

<Pellescours>gnumach pmap_reference good: https://pastebin.com/q5YftLgP

<Pellescours>gnumach pmap_reference bad: https://pastebin.com/iLk1ceM7

<youpi>I'd need the content of the spl1 function

<youpi>I guess it overwrites a callee-saved register

<youpi>thus completely perturbating pmap_reference, or any spl* caller

<Pellescours>in good and/or bad?

<youpi>in both

<youpi>good just happens to be lucky not to store anything precious in e.g. ebx

<Pellescours>I have difficulties to get that values, it’s too early to have printf, debugger, i’m not able to activate it at this point and gdb don’t stop at my breakpoint :/

<youpi>what values?

<Pellescours>the spl

<youpi>I don't understand what you are aiming

<youpi>I'm saying that the spl.S code is bogus, it needs to be fixed into using only caller-saved registers

<Pellescours>wow sorry

<Pellescours>they are identical: https://pastebin.com/BLQQiAuC

<youpi>and what I meant above is that they are bogus

<youpi>they modify ebx, which is a callee-saved register

<Pellescours>the shr instruction, right?

<youpi>rdi, rsi, rdx, rcx, r8, r9

<youpi>(actually e* variants, and not r8 / r9)

<youpi>the movl, already

<youpi>it writes to ebx which is callee-saved

<Pellescours>so damo22 your problem, is your changes with spl functions, don’t use ebx, or you’ll need to restore it at the end

<youpi>better just use edi, esi, edx, ecx, eax

<Pellescours>in my local code I just replace ebx by ecx

<Pellescours>This problem fixed, the code hang at vm_page_seg_balance_page and I think it’s when locking queues or the double_lock (the 1sts instuctions of the method)

<Pellescours>actually now I have a panic. My local fix is wrong.

<Pellescours>time for me to sleep, good night

<damo22>you cant use ecx in spl.S it says it will break the interrupt code

<damo22>...

<damo22>#11 0xc100e0bd in Panic (file=0xc1076cf6 "../i386/i386/trap.c", line=226,

<damo22> fun=0xc106e710 <__FUNCTION__.1> "kernel_trap", s=0xc1072cc4 "kernel thread accessed user space!\n")

<damo22> at ../kern/debug.c:161

<damo22>#12 0xc1036a28 in kernel_trap (regs=0xc10a0e64 <solid_intstack+3684>) at ../i386/i386/trap.c:226

<damo22>#13 0xc100a90d in trap_from_kernel () at ../i386/i386/locore.S:563

<damo22>#14 0xc10a0e64 in solid_intstack ()

<damo22>#15 0xc10a0e64 in solid_intstack ()

<damo22>#16 0xc10a0000 in ?? ()

<damo22>#17 0xc102a599 in pmap_steal_memory (size=4194304) at ../vm/vm_resident.c:278

<damo22>#18 0xc102a648 in vm_page_bootstrap (startp=0xc10a0f84 <solid_intstack+3972>,

<damo22> endp=0xc10a0f88 <solid_intstack+3976>) at ../vm/vm_resident.c:207

<damo22>#19 0xc101c243 in vm_mem_bootstrap () at ../vm/vm_init.c:65

<damo22>#20 0xc1016ec1 in setup_main () at ../kern/startup.c:114

<damo22>#21 0xc1004a02 in c_boot_entry (bi=38144) at ../i386/i386at/model_dep.c:600

<damo22>#22 0xc1000093 in iplt_done () at ../i386/i386at/boothdr.S:103

<DiffieHellman>Based posting debug output straight to the channel.

<damo22>i missed one of the %ebx hang on

<damo22>same backtrace

<DiffieHellman>I reckon AMD64 would solve the register starvation problems.

<damo22>we're not out of regs, just using the wrong ones

<damo22> https://paste.debian.net/plain/1270101

<damo22>ok my cpu_number.h jmps were causing a bug, now i am getting legit errors

<youpi>I was thinking btw: possibly some code is erroneously using a simple lock without raising the spl level

<youpi>that cannot work

<youpi>if an interrupt also wants to take the same simple lock

<damo22>#21 0xc102a599 in vm_page_bootstrap (startp=0x400000, endp=0x9) at ../vm/vm_resident.c:224

<damo22>is that a correct endp?

<damo22>can we wrap "s= splhigh(); simple_lock " and "simple_unlock(); splx(s)" into a macro that takes the flag?

<damo22>two macros*

<damo22> * No multiprocessor locking is necessary.

<damo22>#define MACH_SLOCKS ((NCPUS > 1) || MACH_LDEBUG)

<damo22>why are spinlocks only enabled when NCPUS > 1 but it says in the code that theyre not needed for multiprocessor?

<damo22>lets see what happens if i turn them off :D

<damo22>youpi1: why are spinlocks only enabled when NCPUS > 1 but it says in the code that theyre not needed for multiprocessor?

<youpi1>where do you see that ?

<damo22>in kern/lock.h :

<damo22>(19:36:29) damo22: * No multiprocessor locking is necessary.

<damo22>(19:41:48) damo22: #define MACH_SLOCKS ((NCPUS > 1) || MACH_LDEBUG)

<youpi1>no, it's not a correct endp, but possibly the value was just dropped in the meanwhile, and gdb cannot get it, that's not a problem

<youpi1>the "No multiprocessor" part is in the #else

<damo22>If NCPUS==1, we are not exercising the same lock functions?

<youpi1>not if we don't define MACH_LDEBUG

<youpi1>the !MACH_SLOCK part only checks that it builds fine

<damo22>configfrag.ac:AC_DEFINE([MACH_LDEBUG], [0], [MACH_LDEBUG])

<damo22>so a different set of lock definitions apply for NCPUS==1 and NCPUS > 1

<damo22>should i try building single core gnumach with MACH_LDEBUG=1 ?

<youpi1>you can try that, that'll give you slocks without multiple spl etc. so probably simpler to debug indeed

<gnu_srs>Hi, I just saw that plenty of golang packages FTBFS recently: https://buildd.debian.org/status/recent.php?pkg=&a=hurd-i386&suite=sid&limit=30

<gnu_srs>Is there a common denominator somewhere? We all know that the go port to Hurd is not yet complete, mainly due to remaining Hurd bugs?

<youpi1>I haven't had a look so far

<youpi1>getting a backtrace would be useful

<gnu_srs1>Part of a backtrace: https://paste.debian.net/1270154/ Pasting the whole output of thread apply all bt bull complains of "Do not send spam" :(

<youpi1>gnu_srs1: at least a few lines more would allow to know what it's about. __GI__hurd_intr_rpc_mach_msg can be *any* RPC

<youpi1>possibly at least the complete backtrace of the thread ifself

<gnu_srs1>Here is bt full from the crashing thread: https://paste.debian.net/1270158/

<gnu_srs1>But /hurd/crash triggers, so I have to reboot to get that bt again.

<gnu_srs1>Console output: (if copied correctly): /hurd/crash: go install -trimpath -v -p 1 github.com/xo/terminfo github.com/xo/terminfo/cmd/infocmp(13828) crashed, signal {no:11, code:2, error:2}, exception [1,code:2,subcode:4}, PCs: {0x25a1279, 0x257a7ac, 0x25a1279, 0x28e324f}, writing core file.

<youpi1>gnu_srs1: was that also thread 4 that received SIGSEGV in the second paste?

<youpi1>the second paste head is at __GI___mach_msg_trap, not __GI__hurd_intr_rpc_mach_msg

<gnucode>hey hurd people! I am officially running the Hurd in real hardware. On a T43. It's pretty rad!

<gnucode>doom emacs seems to work fairly well.

<gnu_srs1>youpi1: The first short paste was after the crash: Backtrace stopped: previous frame inner to this frame (corrupt stack?)

<gnu_srs1>And no more crash on the console. Rebooting

<gnu_srs1>Another package: https://paste.debian.net/1270170/ No visible crash on console, but core file dumped.

<gnucode>what's the best method to start the openssh server at boot in debian GNU/Hurd?

<youpi1>apt install openssh-server

<youpi1>like all debian systems

<gnucode>youpi1: yup. I guess i figured that out. :)

<youpi1>gnu_srs1: ok, I got a look, it's the unwinding over __GI__hurd_intr_rpc_mach_msg which makes libunwind crash

<youpi1>because of -fno-omit-frame-pointer introduced in glibc 2.36-9~0

<youpi1>I'll fix the asm code

<Pellescours>right now with smp it hang at pmap_extract, at simple_lock

<Guest56> https://git.savannah.gnu.org/cgit/hurd/gnumach.git/tree/i386/i386/spl.S Seems largely unimplemented to me

<youpi1>"unimplemented" ?

<Pellescours>damo22: I think your last commit for smp (ipl per cpu) need more polishing, it don’t build with ncpus=1 because cpu_number is in baddef (a simple fix). And then it hang at LAPIC timer configured

<Guest56>youpi: curr_ipl was an array in Mach 3

<Guest56>deimplemented

<Pellescours>I don’t have more time for today but great job again, we are close to have SMP :) ping me if you need me

<youpi1>Guest56: so what?

<Guest56>Pellescours: where is damo22 implementation?

<Guest56>youpi1: it’s not in that file

<Pellescours> http://git.zammit.org/gnumach-sv.git/log/?h=feat-smp2-hangs

<Guest56>Ah look damo22 is fixing it

<Pellescours>Guest56: normal, it’s required for SMP, but for mono-cpu, it’s perfectly implemented

IRC channel logs

2023-02-09.log