IRC channel logs
2026-01-17.log
back to list of logs
<damo22>youpi: what about when calling the actual interrupt handler, do we need to wrap the call in swapgs? <sobkas>I'm thinking about switching hurd console to evdev <sobkas>With long term plan to make libinput work <sobkas>Splitting input from hurd console and using libinput there <damo22>can a syscall64 interrupt an interrupt context? <youpi>syscall64 is only called from user land <youpi>so it can't be within an interrupt since userland cannot have interrupt handlers <youpi>when an interrupt interrupts user land, we need swapgs yes <damo22>start pci-arbiter: Kernel Page fault trap, eip 0xffffffff8103e243, code 0, cr2 4a0 <damo22>i think it got out of sync because there may have been an interrupt interrupting user land but it didnt swap? <damo22>or an interrupt interrupting kenel and it swapped by mistake <damo22>i am testing if gs base == percpu_array[cpu] to know whether to swapgs <damo22>that doesnt really care if we are interrupting from user <damo22>and then r12 is saved on the stack for every interrupt context containing the return mode <damo22>youpi: this git branch reproducibly fails on syscall64 with the wrong value of gsbase on entry * dbf86634 (HEAD -> smp64-test, zammit/smp64-test) x86_64: Implement swapgs logic <damo22>are you sure user interrupts cant call a syscall64? <rrq>isn't syscall what userland does to call kernel code? <rrq>and then isn't interrupt handlers already kernel code? <damo22>when syscall64 enters, the value of gsbase is already kernel value <rrq>isn't it part of the instruction itself to swap gs ? <damo22>that would be cool but news to me <rrq>I don't know x86 in that level of detail (too much arm in my head :) <damo22>my searches of LSTAR msr indicates syscall instruction does not swapgs builtin <damo22>if i only swap once at the end of syscall64, after the 3rd syscall entry it is out of sync <damo22>(01:50:04 PM) rrq: isn't it part of the instruction itself to swap gs ? No <damo22>unless i am getting interrupted from a syscall <rrq>(sorry, yes, don't mind me. I've much to learn about x86) <damo22>no problem, everyone is learning <damo22>ah my bug is in SWAPGS_ENTRY_IF_NEEDED <damo22>er, SYSRET calls swapgs internally <damo22>so i dont need to swap at the end <damo22>what if there are nested syscalls?? <damo22>can a syscall call another syscall during itself? <damo22>=> 0xffffffff8103e1dd <syscall64+1>: swapgs <damo22>#0 syscall64 () at ../x86_64/locore.S:1427 <damo22>i think im getting an interrupt from a syscall <damo22>and then sysretq swaps gs regardless <azert>sysret doesn’t call swapgs automatically <azert>but it enables interrupts I think, so you can disable interrupts before doing swapgs before sysret <damo22>and entering the syscall64 entrypoint also calls it <azert>because you added a swapgs instruction there? <damo22>i think its because FMASK msr set to EFL_IF masks the interrupts and also swaps gs <azert>then I am afraid you are leaking the kernel gsbase to userspace <damo22>i need to find the manual, but i tried everything <azert>If you add swapgs unconditionally after syscall entry and before sysret it should also work <damo22>smp64 boots but gets stuck due to high precision clock not being smp safe <damo22>maybe the STAR msr is set wrong? <damo22>hey, when we launch the first userspace thread, dont we need to swapgs to enter userspace first time? <damo22>ah i think my swap logic was wrong <youpi>damo22: I don't see where you saw that syscall/sysret implicitly perform swapgs. They are not even doing a stack change. fmask is only for the rflags <damo22>i also cant find evidence of that <youpi>damo22: do you not have the intel manual? <youpi>I can't see how to do assembly without it :) <damo22>"cmpq -> je" is that equivalent to if (expr) goto <youpi>it's equivalent to if (a ==b ) goto <youpi>(the result of the difference is zero) <damo22>gs base == percpu_array[cpu] ? then return to kernel without swap <damo22>no, i think i need to not swap, but swap on exit <youpi>no, you need to avoid swapping on exit too <youpi>if gs base was that at entry, it means the code you are interrupting will swap before returning to userland <youpi>so you have to leave things as they are <damo22>i dont follow, where will it "swap before returning to userland" ? isnt that what SWAPGS_EXIT_IF_NEEDED_R12 is supposed to do? <youpi>we are talking about an interrupt, right? <youpi>if gsbase is the kernel gs base, whatever code that we have interrupted has done the swapgs, so when we return to it, it will do the swapgs before returning <youpi>and in the case of nested interrupts, it's r12 which remembers if ourself has done the swapgs or not <youpi>so the first interrupt does the swapgs and sets r12 to remember that <youpi>nested interrupts see it's already swapped, so don't swap and remember that <damo22>but the first interrupt sees gs base already with kernel value <damo22>because it has to be set to make the kernel work <youpi>if the interrupt happens during a syscall, that's expected <youpi>and that's fine, since we'll swapgs back before returning to userland <damo22>no we wont because the first r12 wont be set <youpi>what r12? syscall doesn't change r12, and that's fine <damo22>we get an interrupt when calibrating lapic timer <damo22>before the first user thread exists <youpi>ok but we are in kernel land <youpi>and remember in r12 that we haven't <damo22>i have just pushed a commit that fails reproducibly with gs already set to kernel on entry to syscall64 <damo22>the commit before seems to mostly work <youpi>I'd say for now make the return code check for both values explicitly, in case r12 is clobbered for reasons yet unknown to us <youpi>and make the codes really random <youpi>damo22: in case you hadn't seen, the first "return" to user is thread_bootstrap_return <youpi>you'd probably need to set r12 in set_user_regs <youpi>ah no it's not setting an interrupt state, only the normal user state <youpi>ah, it fakes a return from trap, not return from interrupt <youpi>but then it's the kernel part of the state in which you want to set r12 <youpi>so the fake return from trap knows it has to switch <youpi>I would say it's in thread_bootstrap_return that you want to set r12 to "go back to user" <youpi>thread_exception_return also needs it indeed <damo22>so something must be clobbering r12? <damo22>i did a breakpoint on thread_exception_return and thread_bootstrap_return <youpi>that's not surprising since for new threads that bootup userland, nothing said what r12 should be <damo22>GS =0000 ffffffff810bb000 000fffff 00000000 <damo22>i continued after thread_exception_return and it hit the next breakpoint at ud2 <damo22>but i dont know what happened in between <damo22>somehow a syscall64 then thread_exception_return? <youpi>ah no that's not the same as isr <youpi>but somehow there's probably something like that <youpi>did you make thread_exception_return set r12 as returning to user? <youpi>so syscall64->thread_exception_return will be fine <youpi>syscall64 has no way to set r12 otherwise, and that's expected <youpi>if code is not returning into syscall64, it has to cope with it <damo22>thread exception return (ter) x6 then syscall64, ter, syscall syscall ter ter syscall64, syscall64 then ud2 <youpi>there are also callers of _take_trap that need to set r12 <damo22>-smp 6 hangs at [ 1.0000050] ahcisata0: 64-bit DMA <damo22>i think its because the hpc clock is not smp safe <damo22>#0 0xffffffff8101787d in hpclock_read_counter () at ../i386/i386/apic.c:493 <damo22>#1 0xffffffff81056458 in time_value64_add_hpc (value=value@entry=0xfffffffff0a4a058, last_hpc=35443913) at ../kern/mach_clock.c:525 <damo22>#2 0xffffffff81057608 in host_get_uptime64 (host=<optimized out>, uptime=uptime@entry=0xfffffffff0a4a058) at ../kern/mach_clock.c:718 <damo22>#3 0xffffffff8107aa8d in _Xhost_get_uptime64 (InHeadP=<optimized out>, OutHeadP=0xfffffffff0a4a020) at kern/mach_host.server.c:2950 <damo22>#4 0xffffffff8105295c in ipc_kobject_server (request=request@entry=0xffffffff81013000) at ../kern/ipc_kobject.c:179 <damo22>#5 0xffffffff810852f3 in mach_msg_trap (msg=0x7ffffffff3c0, option=<optimized out>, send_size=32, rcv_size=72, rcv_name=5, time_out=<optimized out>, notify=0) at ../ipc/mach_msg.c:1310 <damo22>#6 0xffffffff8103e306 in syscall64 () at ../x86_64/locore.S:1521 <youpi>btw, instead of difficultly checking for being equal to the percpu array, you can as well just test if it's negative: testl %rdx,%rdx ; js its_kernel <damo22>oh, nothing can set bit63 in userspace ? <youpi>without the fsgsbase feature, userspace can't set gsbase <youpi>testing for negative values is not really less secureless than testing for an exact value <damo22>negative values are equivalent to bit63 being set, but if the userspace memory mapping doesnt allow writing to memory addresses that high, it might be some protection? <damo22>even if they had fsgsbase feature <damo22>you cant offset gs with 32 bit immediate lower than about (1<<63) - (1<<32) <damo22>if kernel max address was lower than that, it would be inaccessible? <youpi>it doesn't say how we prevent userland from setting a negative gsbase <youpi>we apparently really can do it <youpi>there is a comment about it in the fsgsbase support <youpi>This enablement requires careful handling of the exception entries which go through the paranoid entry path as they can no longer rely on the assumption that user GSBASE is positive (as enforced via prctl() on non FSGSBASE enabled systemn) <youpi>ok, the bit of linux code that tests for the value being negative is only used on systems without fsgsbase support <youpi>« If we are at an interrupt or user-trap/gate-alike boundary then we can use the faster check » <youpi>which is just to test for %cs to be the kernel cs <damo22>ive force pushed a semi working smp64-upstream branch <damo22>-smp 1 boots, -smp 6 doesnt quite <youpi>I'd say simplify it into testing for gsbase being negative, and we can probably commit that for a start <youpi>we don't enable fsgsbase support yet, so it's safe for now <youpi>add the check for negative value in the i386_FSGS_BASE_STATE load code, though :) <youpi>and you can leave a todo note about simplifying it into checking for the saved %cs in the cases where this is safe (maskable interrupts and traps) <youpi>the non-maskable ones would be double-fault and machine check exception, <youpi>double-fault doesn't have an exit path anyway, so we can as well just force-load gsbase <youpi>and we don't seem to be enabling MCE <youpi>but it's trap 0x12, we'd probably want to make that safe <youpi>but again, we'll probably not have an exit path for that <sobkas>So pathconf with _PC_PATH_MAX hangs, or crashhangs <youpi>note that it can return -1, which is expected when there is no limitation <sobkas>Right now it doesn't return anything because it (crash)hangs <sobkas>\/hurd/tmpfs --noexec --nosuid --size=5242880 --mode=1777 tmpfs <youpi>mount doesn't necessarily show everything <sobkas>\/hurd/tmpfs --nosuid --size=10% --mode=755 tmpfs <youpi>it'd then be a bug of tmpfs, which wouldn't surprise me <youpi>gdb-ing tmpfs shouldn't be too hard to at least see where it hangs <sobkas>It also hangs when I send NULL instead of string <youpi>that's more surprising, but also worth debugging <sobkas>Ok reboot, because ps aux also hangs <youpi>you can use option -M of ps to avoid hangs <sobkas>df shows /run and /run/lock and their sizes are in line with tmpfs processes I see <youpi>so not /tmp, so it's surprising that pathconf hangs <youpi>but again, that would often just report -1, so not usable a "reasonable path length" that people insist on getting, but is deemed not to exist <sobkas>when I did pathconf for /home it returned -1 <sobkas>[1207: 2 (255)] tcsetattr: Interrupted system call <sobkas>So it sometimes hangs when used with "/tmp", NULL hangs always <sobkas>"/tmp" hangs when also ps aux hangs <youpi>ps aux hanging is rather a consequence, so that's not what you want to investigate <youpi>the question is which rpc hangs exactly <youpi>probably simpler to investigate the null case first <youpi>getting a segfault is expected in that case <sobkas>it probably should check for that <youpi>is that an i386 or x86_64 system? <youpi>no, libc function never check for poniters, that's only expected <sobkas>but it doesn't segfaults outside of gdb <youpi>yes, that's the real problem <youpi>is that an i386 or x86_64 system? <youpi>does /servers/crash point to crash-kill ? <youpi>you really want that, as core dumps are still todo (see contributing page) <youpi>(the blabla above seems to be problematic for some) <sobkas>Apparently connecting with my microphone crashes my desktop <jab>sobkas you can still chat with us in the jitsi chat room! <jab>and/or ask your questions here