IRC channel logs

<sobkas>So is there x86_64-gnu-gcc-15 for linux?

<sobkas>I was trying to use distcc

<nexussfan>IIRC you can cross-compile hurd programs on linux

<azert>sobkas: are you sure that defining those structures for the ioctl make rendering faster?

<azert>I don’t think anything in the Hurd implement those synchronization primitives

<sobkas>I don't think it was caused by this, but it was quiet funny, so I mentioned it

<azert>It’s funny how probably they claimed the same on Linux

<azert>like someone put lots of thought to make better something that was already working

<azert>quite decently

<sobkas>?

<azert>Slide 5: https://events.static.linuxfound.org/sites/events/files/slides/Explicit-Fencing_Talk.pdf

<azert>But it can freeze the whole desktop!

<azert>Except it never happened

<sobkas>nexussfan: I hope it involves sbuild?

<sobkas>I want to use qxl but there is memory leak in it when used with opengl, I hope to have some time to look into it

<sobkas>I was thinking about wayland, isn't it depends on libegl which depends on libgbm?

<nexussfan>Yes i think

<sobkas>So adding hurd console support to libgbm could make a short term "solution"?

<sobkas>Because wayland is essentially a buffer mover?

<sobkas>It won't be fast but it will be something...

<sobkas>good night

<damo22>0x43468c(...)

<damo22>0x50b80f(...)

<damo22>repeat

<damo22>(gdb) p *regs

<damo22>$4 = {r15 = 0x0, r14 = 0x0, r13 = 0x8, r12 = 0x30, r11 = 0x246, r10 = 0xffffffff, r9 = 0x0, r8 = 0x8, edi = 0x7ffffffe0040,

<damo22> esi = 0x3, ebp = 0x7ffffffe0020, cr2 = 0x7ffffffdfff8, ebx = 0x7, edx = 0x50, ecx = 0x30, eax = 0xc8800000000, trapno = 0xe,

<damo22> err = 0x6, eip = 0x4344a5, cs = 0x1f, efl = 0x10246, uesp = 0x7ffffffe0000, ss = 0x17}

<damo22>i think those addresses are in pci-arbiter

<youpi>damo22: you can use `show task` to see what task that is

<youpi>and then you can addrline the binary

<damo22>ok i recompiled pci-arbiter.static with debugging symbols

<damo22>mach_msg 0x004356a5 <+21>: push %rbx

<damo22>>>>>> user space <<<<<

<damo22>0x4356a5(...)

<damo22>0x50cdec(...)

<damo22>0x43588c(...)

<damo22>0x50ce3f(...)

<damo22>0x43588c(...)

<damo22> 1 pci-arbiter (ffffffffdc2eac18): (ffffffffdc2de908) R....F.

<damo22>the stack trace goes on for pages and pages

<damo22>its a recursion

<damo22>0x43588c(...)

<damo22>0x50ce3f(...)

<damo22>but i think its also because the stack is overflowed so it cant push %rbx

<damo22>mach_port_mod_refs calls mig_dealloc_reply_port which calls mach_port_mod_refs...

<damo22>and it recurses

<damo22> https://paste.debian.net/plainh/d00a9598

<youpi>damo22: it can happen if the fs-based tls access to get the reply_port is broken, making the reply port bogus and thus the mach_port_mod_refs rpc error out, and the stub calls __mig_dealloc_reply_port

<damo22> https://paste.debian.net/plainh/d00a9598

<youpi>you have the C code of mach_port_mod_refs in a glibc build tree, in build-tree/mach/RPC*mach_port_mod_refs.c

<damo22>i think you must be right

<damo22>how is FS supposed to work with amd64 smp?

<damo22>if theres no swapfs, i guess we just dont use fs in kernel?

<youpi>just as usual, switch_ktss should load the fs base on task switch

<youpi>we just don't use fs in kernel

<damo22>ok

<damo22>but it should be doing that

<youpi>it already does

<damo22>yes

<damo22>unless kernel is touching fs somewhere

<damo22>other than loading in user fs during task switch

<damo22>kernel uses FS for copyin copyout

<youpi>but that's not new with smp

<damo22>right

<damo22>inst_fetch uses fs

<damo22>could taking a trap be breaking fs?

<aculnaig>settrans -fac /tmp/site/ ./httpfs -D www.cloudfare.com

<aculnaig>./httpfs: Url must have a /, e.g., www.gnu.org/

<aculnaig>ok, then I have to append the /

<aculnaig>settrans -fac /tmp/site/ ./httpfs -D www.cloudfare.com/

<aculnaig>In the HTML parser for parsing tmp

<aculnaig>trying to open www.cloudfare.com:80/http://www.cloudfare.com/

<aculnaig>Error Page not Accesible

<aculnaig>400 Bad Request

<aculnaig>./httpfs: Error in Parsing.: Bad file descriptor

<aculnaig>settrans: ./httpfs: Translator died

<aculnaig>it is just me or the www.gnu.org site is down ?

<damo22>settrans -fac /tmp/site/ ./httpfs -D https://www.gnu.org/

<aculnaig>yeah i usually try with gnu site but at the moment is down

<damo22>its working here

<damo22>but you forgot the protocol://

<aculnaig>i think the code does have some sort of control that prepend the protocol if missing

<youpi>damo22: inst_fetch is only used for emulated int80 system call, which we don't us

<youpi>e

<damo22>yeah im not sure what is going on

<damo22>during a syscall64, do we need to touch fs?

<youpi>we should be able to just leave it as it is

<damo22>gnumach-sv$ git grep PCB_ISS

<damo22>x86_64/locore.S: addq $ PCB_ISS,%r11 /* point to saved state */

<damo22>how does PCB_ISS resolve to anything?

<youpi>it's generated in i386asm.h from i386asm.sym

<youpi>as the offset of iss in the structure pcb

<damo22>ah yes

<damo22>can a syscall be negative?

<damo22>syscall number

<youpi>iirc they are all negative :)

<damo22>because we sign extend the eax register holding the syscall number, if its negative it will set all the upper bits of rax

<damo22>not "mask" it

<youpi>It'd say that's expected

<youpi>and again, it does work in non-smp

<damo22>aha

<damo22>CPU_NUMBER_NO_STACK uses %cs:

<damo22>so that needs to be kernel segment

<damo22>but in the syscall we are in user segs?

<youpi>cs is necessarily the kernel segment

<youpi>since that's what is used to fetch instructions

<damo22>how does the cs change to kernel when syscall64 enters?

<youpi>the processor does it

<youpi>it's set in the syscall trap

<damo22>thanks

<damo22>booted with -smp 1

<damo22>--enable-ncpus=8

<damo22>-smp 2:

<damo22>Cannot load user executable module (error code 6000): pci-arbiter

<damo22>Panic(...)+0x10f

<damo22>Kernel Breakpoint trap, eip 0xffffffff81017a26, code 0, cr2 ffffffffdc2e5e

<damo22>Bad frame pointer: 0xffffffff81072895

<damo22>hmm lets see if this works

<damo22>when the syscall_call takes place, it needs to be in kernel gs as well i think

<damo22>well thats interesting, i get invalid opcode

<damo22>when about to call the actual syscall i am not allowed to swapgs

<youpi>what do you call syscall_call?

<damo22>t_debug(...)+0xb

<damo22>switch_ktss(...)+0xa5

<damo22>stack_handoff(...)+0x165

<damo22>thread_handoff(...)+0x13f

<damo22>mach_msg_trap(...)+0x18ed

<damo22>syscall64(...)+0xf2

<youpi>within switch_ktss, gsbase is still the kernel gs yes

<youpi>it's kgsbase which has the user gs

<damo22>interesting inside a syscall it can switch contexts

<youpi>sure, if it's a blocking system call, you have to switch to something else

<damo22>aha

<damo22>thats why my code is broken

<youpi>or if you wake some thread that is more prioritized

<damo22>syscall64 function is tricky

<damo22>you cant use many registers

<damo22>only r11

<youpi>but there you only need to swapgs

<youpi>inconditionally

<youpi>since you know you are doing user-to-kernel and kernel-to-user

<youpi>note that if you switch to another thread, you *won't* be returning through that syscall64 code immediately

<youpi>it's only when you get back to your thread that syscall64 will take control back, and swapgs

<youpi>switch_ktss having put back the user gs base into kgsbase for swapgs to install it into gsbase

<damo22>kernel: Invalid opcode (6), code=0

<damo22>Stopped at t_debug+0xb: TODO

<damo22>t_debug(...)+0xb

<damo22>switch_ktss(...)+0xa5

<damo22>stack_handoff(...)+0x165

<damo22>thread_handoff(...)+0x13f

<damo22>mach_msg_trap(...)+0x18ed

<damo22>syscall64(...)+0xee

<youpi>were/how are you stopping?

<youpi>(what line is switch_ktss+0xa5? you can ask gdb gnumach)

<damo22>245 fpu_load_context(pcb);

<damo22>line before is db_load_context(pcb)

<youpi>what instruction is there? the fpu_load_context macrois empty

<damo22> 0xffffffff810718e0 <+160>: call 0xffffffff8106f060 <db_load_context>

<damo22> 0xffffffff810718e5 <+165>: mov -0x8(%rbp),%rbx <----

<damo22>leave; ret;

<youpi>is there something non-zero in pcb->ims.ids.dr ?

<youpi>it looks as if something overflows into them

<youpi>they are normally only used by gdb for hardware breakpoints

<youpi>if something bogus is in there that could trigger the debug trap which doesn't understand why it's called

<damo22>(gdb) p percpu_array[0]->active_thread->pcb.ims.ids

<damo22>$11 = {dr = {0x0, 0x0, 0x0, 0x0, 0x810bb000, 0xffffffff, 0x810bb000, 0xffffffff}}

<damo22>thats the gs base address twice

<youpi>so you overflowed somehow somewhere

<damo22>wow

<youpi>that makes me realize that we need to fix i386_debug_state for 64bit

<damo22>sbs = {fsbase = 0x200000000480, gsbase = 0x0}}

<youpi>Mmm, but 0xffffffff810bb000 would be the kernel gs base

<youpi>we don't want that in the user thread state

<youpi>it's the user gs base that should be in the thread state

<youpi>did you add accessing the thread state by hand in assembly?

<youpi>I don't see why you would need such a thing

<youpi>switch_ktss handles updating kgsbase with the user thread's gs base

<youpi>and in assembly you just need to swapgs

<youpi>on return to user

<damo22>syscall64 had all that already

<youpi>had what?

<damo22>accessing the thread state

<youpi>yes, it does so for the generic-purpose registers, since these are clobbered by C

<youpi>but for gs you don't need that, switch_ktss does it

<damo22>yeah i just use swapgs

<damo22>to allow MY() to work

<damo22>let me check if i pushed my latest

<damo22>just pushed now

<damo22>* 5909d070 x86_64: Implement swapgs logic

<youpi>+#if NCPUS > 1

<youpi>+ wrmsr(MSR_REG_KGSBASE, pcb->ims.sbs.gsbase);

<youpi>I don't think we want to have a different logic in smp and non-smp

<damo22>ok

<youpi>that'd avoid swapgs, sure, but hairy code means buggy code :)

<damo22>so you want swapgs in UP as well?

<youpi>that'll make the code simpler yes

<damo22>ok

<youpi>and it shouldn't be very expensive

<youpi>SWAPGS_ENTRY_IF_NEEDED: I don't see why re-writing to KGSBASE, it should already be there, that's the whole point of that msr

<youpi>SWAPGS_EXIT_IF_NEEDED: you are rewriting the percpu_array into KSGBASE before swapgs, but before swapgs, it's the user gs base that is there, that we don't want to overwrite either

<youpi>in all_intrs, you removed SET_KERNEL_SEGMENTS and CPU_NUMBER, in the USER32 case we want them

<damo22>oh

<youpi>also SET_KERNEL_SEGMENTS in ast_from_interrupts

<damo22>ok i will preserve SET_KERNEL_SEGMENTS as before

<damo22>i changed it because i had an iteration of my code where SWAPGS_* was calling it

<youpi>I wonder if syscall64 is not already without interrupts enabled

<youpi>I have no idea if that's the case or not

<youpi>it'd be good to check, because that can be troubles more generally, and the tiny window just before calling cli could already be a problem

<damo22>aha

<damo22>thanks i have a lot to work on now

<youpi>I don't see why swapgs in _syscall64_check_for_ast, aiui it already has kernel gs base there (set by syscall64)

<youpi> swapgs /* switch to user gs */in syscall64: no we don't want that

<youpi>we're about to call the kernel code for the trap

<youpi>and in _syscall64_restore_state we are still with kernel gs base

<youpi>really, syscall64 only needs two swapgs, at very beginning and very end

<damo22>but if the syscall blocks it may switch context and never return to the syscall code to swap back?

<youpi>again, that's not a problem

<youpi>if it blocks we get into another thread, another context, which will restore its own state

<youpi>whenever we unblock, we'll get back to syscall and restore the user gs base

<youpi>which will have been put in kgsbase by switch_ktss

<youpi>(performed by the thread just before ours)

<damo22>gs is set to percpu value at entry to syscall64

<youpi1>that's swapgs yes

<youpi1>and it'll remain that way until we swapgs again within another thread, to load that user thread's gs base

<damo22>i mean before swapgs is executed, at the entrypoint into syscall64: gs is already set to kernel value

<damo22>when i run swapgs it sets to 0

<damo22>something is backwards

<youpi1>? I don't see how it can already be the kernel value

<youpi1>except if we got it wrong *before*

<youpi1>in the previous syscall

<damo22>no i set breakpoint on syscall64 this was the first syscall

<damo22>is the trap gate calling a different function that already swaps gs?

<youpi1>no, syscall64 is the first instruction executed after userland did syscall

<youpi1>but possibly it's the initial user context load that didn't put the user base gs but the kernel base gs

<damo22>maybe we dont load real kernel one at the bootstrap we can just load zeroes

<damo22>boothdr

<damo22>bedtime

<youpi1>damo22: thinking about it: maybe you'd want to test the gnumach testsuite, before running a full distribution

<youpi1>that'll give you way simpler userland programs to check

<youpi>and you can make them print debugging stuff with mach_print and such

<youpi>without having to care about tls and whatnot

<azeem>if I am trying to debug Postgres processes, and attach gdb to them, all I ever see are 2-3 threads in _start () from /lib/ld-x86-64.so.1 with no meaningful stack, am I doing something wrong?

<youpi>azeem: no, external attach should be just working, even on amd64

<sobkas>What is best option if I have some random crashes to find what happened?

<sobkas>code=139

<sobkas>and such

<youpi>you can run through gdb

<youpi>if you are using 32b, you can look at the core file

<youpi>on 64b you can symlink /servers/crash to crash-suspend so the process stays there, and you can attach to it with gdb

<sobkas>youpi: thanks

<azeem>youpi: hrm

<azeem>so the main postgres process as a reasonable stack trace, but all its children are just in _start () from /lib/ld-x86-64.so.1 (same on 32bit, with ld.so.1)

<azeem>s/as a/has a/

<azeem>I'm trying to debug the frequent test suite hang, but without gdb that's a bit hard

IRC channel logs

2026-01-14.log