IRC channel logs

2023-02-09.log

back to list of logs

<Pellescours>And what is weird it’s that with ncpus=4 it builds
<youpi>Pellescours: rather than pmap_enter, rather disassemble a way smaller function, such as pmap_reference
<Pellescours>gnumach pmap_reference good: https://pastebin.com/q5YftLgP
<Pellescours>gnumach pmap_reference bad: https://pastebin.com/iLk1ceM7
<youpi>I'd need the content of the spl1 function
<youpi>I guess it overwrites a callee-saved register
<youpi>thus completely perturbating pmap_reference, or any spl* caller
<Pellescours>in good and/or bad?
<youpi>in both
<youpi>good just happens to be lucky not to store anything precious in e.g. ebx
<Pellescours>I have difficulties to get that values, it’s too early to have printf, debugger, i’m not able to activate it at this point and gdb don’t stop at my breakpoint :/
<youpi>what values?
<Pellescours>the spl
<youpi>I don't understand what you are aiming
<youpi>I'm saying that the spl.S code is bogus, it needs to be fixed into using only caller-saved registers
<Pellescours>wow sorry
<Pellescours>they are identical: https://pastebin.com/BLQQiAuC
<youpi>and what I meant above is that they are bogus
<youpi>they modify ebx, which is a callee-saved register
<Pellescours>the shr instruction, right?
<youpi>rdi, rsi, rdx, rcx, r8, r9
<youpi>(actually e* variants, and not r8 / r9)
<youpi>the movl, already
<youpi>it writes to ebx which is callee-saved
<Pellescours>so damo22 your problem, is your changes with spl functions, don’t use ebx, or you’ll need to restore it at the end
<youpi>better just use edi, esi, edx, ecx, eax
<Pellescours>in my local code I just replace ebx by ecx
<Pellescours>This problem fixed, the code hang at vm_page_seg_balance_page and I think it’s when locking queues or the double_lock (the 1sts instuctions of the method)
<Pellescours>actually now I have a panic. My local fix is wrong.
<Pellescours>time for me to sleep, good night
<damo22>you cant use ecx in spl.S it says it will break the interrupt code
<damo22>...
<damo22>#11 0xc100e0bd in Panic (file=0xc1076cf6 "../i386/i386/trap.c", line=226,
<damo22> fun=0xc106e710 <__FUNCTION__.1> "kernel_trap", s=0xc1072cc4 "kernel thread accessed user space!\n")
<damo22> at ../kern/debug.c:161
<damo22>#12 0xc1036a28 in kernel_trap (regs=0xc10a0e64 <solid_intstack+3684>) at ../i386/i386/trap.c:226
<damo22>#13 0xc100a90d in trap_from_kernel () at ../i386/i386/locore.S:563
<damo22>#14 0xc10a0e64 in solid_intstack ()
<damo22>#15 0xc10a0e64 in solid_intstack ()
<damo22>#16 0xc10a0000 in ?? ()
<damo22>#17 0xc102a599 in pmap_steal_memory (size=4194304) at ../vm/vm_resident.c:278
<damo22>#18 0xc102a648 in vm_page_bootstrap (startp=0xc10a0f84 <solid_intstack+3972>,
<damo22> endp=0xc10a0f88 <solid_intstack+3976>) at ../vm/vm_resident.c:207
<damo22>#19 0xc101c243 in vm_mem_bootstrap () at ../vm/vm_init.c:65
<damo22>#20 0xc1016ec1 in setup_main () at ../kern/startup.c:114
<damo22>#21 0xc1004a02 in c_boot_entry (bi=38144) at ../i386/i386at/model_dep.c:600
<damo22>#22 0xc1000093 in iplt_done () at ../i386/i386at/boothdr.S:103
<DiffieHellman>Based posting debug output straight to the channel.
<damo22>i missed one of the %ebx hang on
<damo22>same backtrace
<DiffieHellman>I reckon AMD64 would solve the register starvation problems.
<damo22>we're not out of regs, just using the wrong ones
<damo22> https://paste.debian.net/plain/1270101
<damo22>ok my cpu_number.h jmps were causing a bug, now i am getting legit errors
<youpi>I was thinking btw: possibly some code is erroneously using a simple lock without raising the spl level
<youpi>that cannot work
<youpi>if an interrupt also wants to take the same simple lock
<damo22>#21 0xc102a599 in vm_page_bootstrap (startp=0x400000, endp=0x9) at ../vm/vm_resident.c:224
<damo22>is that a correct endp?
<damo22>can we wrap "s= splhigh(); simple_lock " and "simple_unlock(); splx(s)" into a macro that takes the flag?
<damo22>two macros*
<damo22> * No multiprocessor locking is necessary.
<damo22>#define MACH_SLOCKS ((NCPUS > 1) || MACH_LDEBUG)
<damo22>why are spinlocks only enabled when NCPUS > 1 but it says in the code that theyre not needed for multiprocessor?
<damo22>lets see what happens if i turn them off :D
<damo22>youpi1: why are spinlocks only enabled when NCPUS > 1 but it says in the code that theyre not needed for multiprocessor?
<youpi1>where do you see that ?
<damo22>in kern/lock.h :
<damo22>(19:36:29) damo22: * No multiprocessor locking is necessary.
<damo22>(19:41:48) damo22: #define MACH_SLOCKS ((NCPUS > 1) || MACH_LDEBUG)
<youpi1>no, it's not a correct endp, but possibly the value was just dropped in the meanwhile, and gdb cannot get it, that's not a problem
<youpi1>the "No multiprocessor" part is in the #else
<damo22>If NCPUS==1, we are not exercising the same lock functions?
<youpi1>not if we don't define MACH_LDEBUG
<youpi1>the !MACH_SLOCK part only checks that it builds fine
<damo22>configfrag.ac:AC_DEFINE([MACH_LDEBUG], [0], [MACH_LDEBUG])
<damo22>so a different set of lock definitions apply for NCPUS==1 and NCPUS > 1
<damo22>should i try building single core gnumach with MACH_LDEBUG=1 ?
<youpi1>you can try that, that'll give you slocks without multiple spl etc. so probably simpler to debug indeed
<gnu_srs>Hi, I just saw that plenty of golang packages FTBFS recently: https://buildd.debian.org/status/recent.php?pkg=&a=hurd-i386&suite=sid&limit=30
<gnu_srs>Is there a common denominator somewhere? We all know that the go port to Hurd is not yet complete, mainly due to remaining Hurd bugs?
<youpi1>I haven't had a look so far
<youpi1>getting a backtrace would be useful
<gnu_srs1>Part of a backtrace: https://paste.debian.net/1270154/ Pasting the whole output of thread apply all bt bull complains of "Do not send spam" :(
<youpi1>gnu_srs1: at least a few lines more would allow to know what it's about. __GI__hurd_intr_rpc_mach_msg can be *any* RPC
<youpi1>possibly at least the complete backtrace of the thread ifself
<gnu_srs1>Here is bt full from the crashing thread: https://paste.debian.net/1270158/
<gnu_srs1>But /hurd/crash triggers, so I have to reboot to get that bt again.
<gnu_srs1>Console output: (if copied correctly): /hurd/crash: go install -trimpath -v -p 1 github.com/xo/terminfo github.com/xo/terminfo/cmd/infocmp(13828) crashed, signal {no:11, code:2, error:2}, exception [1,code:2,subcode:4}, PCs: {0x25a1279, 0x257a7ac, 0x25a1279, 0x28e324f}, writing core file.
<youpi1>gnu_srs1: was that also thread 4 that received SIGSEGV in the second paste?
<youpi1>the second paste head is at __GI___mach_msg_trap, not __GI__hurd_intr_rpc_mach_msg
<gnucode>hey hurd people! I am officially running the Hurd in real hardware. On a T43. It's pretty rad!
<gnucode>doom emacs seems to work fairly well.
<gnu_srs1>youpi1: The first short paste was after the crash: Backtrace stopped: previous frame inner to this frame (corrupt stack?)
<gnu_srs1>And no more crash on the console. Rebooting
<gnu_srs1>Another package: https://paste.debian.net/1270170/ No visible crash on console, but core file dumped.
<gnucode>what's the best method to start the openssh server at boot in debian GNU/Hurd?
<youpi1>apt install openssh-server
<youpi1>like all debian systems
<gnucode>youpi1: yup. I guess i figured that out. :)
<youpi1>gnu_srs1: ok, I got a look, it's the unwinding over __GI__hurd_intr_rpc_mach_msg which makes libunwind crash
<youpi1>because of -fno-omit-frame-pointer introduced in glibc 2.36-9~0
<youpi1>I'll fix the asm code
<Pellescours>right now with smp it hang at pmap_extract, at simple_lock
<Guest56> https://git.savannah.gnu.org/cgit/hurd/gnumach.git/tree/i386/i386/spl.S Seems largely unimplemented to me
<youpi1>"unimplemented" ?
<Pellescours>damo22: I think your last commit for smp (ipl per cpu) need more polishing, it don’t build with ncpus=1 because cpu_number is in baddef (a simple fix). And then it hang at LAPIC timer configured
<Guest56>youpi: curr_ipl was an array in Mach 3
<Guest56>deimplemented
<Pellescours>I don’t have more time for today but great job again, we are close to have SMP :) ping me if you need me
<youpi1>Guest56: so what?
<Guest56>Pellescours: where is damo22 implementation?
<Guest56>youpi1: it’s not in that file
<Pellescours> http://git.zammit.org/gnumach-sv.git/log/?h=feat-smp2-hangs
<Guest56>Ah look damo22 is fixing it
<Pellescours>Guest56: normal, it’s required for SMP, but for mono-cpu, it’s perfectly implemented