IRC channel logs

<damo22>youpi: what about when calling the actual interrupt handler, do we need to wrap the call in swapgs?

<damo22>just in case the user uses gs?

<sobkas>I'm thinking about switching hurd console to evdev

<sobkas>With long term plan to make libinput work

<sobkas>Splitting input from hurd console and using libinput there

<damo22>can a syscall64 interrupt an interrupt context?

<damo22>i think it might be able to

<youpi>syscall64 is only called from user land

<youpi>so it can't be within an interrupt since userland cannot have interrupt handlers

<youpi>when an interrupt interrupts user land, we need swapgs yes

<damo22>good point

<damo22>start pci-arbiter: Kernel Page fault trap, eip 0xffffffff8103e243, code 0, cr2 4a0

<damo22>kernel: Page fault (14), code=0

<damo22>Stopped at syscall64+0x13: TODO

<damo22>syscall64(...)+0x13

<damo22>>>>>> user space <<<<<

<damo22>i think it got out of sync because there may have been an interrupt interrupting user land but it didnt swap?

<damo22>then it did a syscall

<damo22>or an interrupt interrupting kenel and it swapped by mistake

<damo22>then userspace did a syscall

<damo22>i am testing if gs base == percpu_array[cpu] to know whether to swapgs

<damo22>that doesnt really care if we are interrupting from user

<damo22>and then r12 is saved on the stack for every interrupt context containing the return mode

<damo22>i think the design should work

<damo22>youpi: this git branch reproducibly fails on syscall64 with the wrong value of gsbase on entry * dbf86634 (HEAD -> smp64-test, zammit/smp64-test) x86_64: Implement swapgs logic

<damo22>are you sure user interrupts cant call a syscall64?

<rrq>isn't syscall what userland does to call kernel code?

<rrq>and then isn't interrupt handlers already kernel code?

<damo22>when syscall64 enters, the value of gsbase is already kernel value

<damo22>but it should not be

<rrq>isn't it part of the instruction itself to swap gs ?

<damo22>uh, really?

<damo22>that would be cool but news to me

<rrq>I don't know x86 in that level of detail (too much arm in my head :)

<damo22>my searches of LSTAR msr indicates syscall instruction does not swapgs builtin

<damo22>if i only swap once at the end of syscall64, after the 3rd syscall entry it is out of sync

<damo22>(01:50:04 PM) rrq: isn't it part of the instruction itself to swap gs ? No

<damo22>unless i am getting interrupted from a syscall

<damo22>no no, its wrong before that

<rrq>(sorry, yes, don't mind me. I've much to learn about x86)

<damo22>no problem, everyone is learning

<damo22>ah my bug is in SWAPGS_ENTRY_IF_NEEDED

<damo22>mul clobbers rdx ??

<damo22>oh man yes

<damo22>er, SYSRET calls swapgs internally

<damo22>so i dont need to swap at the end

<damo22>so confusing

<damo22>what if there are nested syscalls??

<damo22>can a syscall call another syscall during itself?

<damo22>=> 0xffffffff8103e1dd <syscall64+1>: swapgs

<damo22>(gdb) bt

<damo22>#0 syscall64 () at ../x86_64/locore.S:1427

<damo22>#1 0x0000000000435629 in ?? ()

<damo22>#2 0x00007fffffffff70 in ?? ()

<damo22>#3 0x0000000000448b61 in ?? ()

<damo22>#4 0x0000000000000000 in ?? ()

<damo22>i think im getting an interrupt from a syscall

<damo22>and then sysretq swaps gs regardless

<damo22>boots!!

<azert>damo22: great

<azert>sysret doesn’t call swapgs automatically

<azert>but it enables interrupts I think, so you can disable interrupts before doing swapgs before sysret

<damo22>it does

<damo22>and entering the syscall64 entrypoint also calls it

<azert>because you added a swapgs instruction there?

<damo22>nope

<damo22>i think its because FMASK msr set to EFL_IF masks the interrupts and also swaps gs

<damo22>on syscall entry

<azert>then I am afraid you are leaking the kernel gsbase to userspace

<damo22>nope

<damo22>i need to find the manual, but i tried everything

<azert>I just looked at your code

<azert>If you add swapgs unconditionally after syscall entry and before sysret it should also work

<damo22>it absolutely fails

<azert>Im puzzled

<damo22>wait my code is old

<damo22>let me push again

<damo22>pushed

<damo22>smp64 boots but gets stuck due to high precision clock not being smp safe

<damo22>maybe the STAR msr is set wrong?

<damo22>(unrelated to clock)

<damo22>hey, when we launch the first userspace thread, dont we need to swapgs to enter userspace first time?

<damo22>ah i think my swap logic was wrong

<youpi>damo22: I don't see where you saw that syscall/sysret implicitly perform swapgs. They are not even doing a stack change. fmask is only for the rflags

<damo22>ok

<damo22>i also cant find evidence of that

<damo22>its not mentioned in the manual

<youpi>damo22: do you not have the intel manual?

<youpi>I can't see how to do assembly without it :)

<damo22>i do

<damo22>it mentions SYSCALL_FLAGS_MASK

<youpi>yes, that's for rflags

<damo22>but nothing about swapgs flags

<damo22>"cmpq -> je" is that equivalent to if (expr) goto

<youpi>it's equivalent to if (a ==b ) goto

<youpi>(the result of the difference is zero)

<damo22>okay thanks

<damo22>gs base == percpu_array[cpu] ? then return to kernel without swap

<damo22>is that correct?

<damo22>no, i think i need to not swap, but swap on exit

<youpi>no, you need to avoid swapping on exit too

<youpi>if gs base was that at entry, it means the code you are interrupting will swap before returning to userland

<youpi>so you have to leave things as they are

<damo22>i dont follow, where will it "swap before returning to userland" ? isnt that what SWAPGS_EXIT_IF_NEEDED_R12 is supposed to do?

<youpi>we are talking about an interrupt, right?

<youpi>if gsbase is the kernel gs base, whatever code that we have interrupted has done the swapgs, so when we return to it, it will do the swapgs before returning

<youpi>and in the case of nested interrupts, it's r12 which remembers if ourself has done the swapgs or not

<youpi>so the first interrupt does the swapgs and sets r12 to remember that

<youpi>nested interrupts see it's already swapped, so don't swap and remember that

<damo22>aha

<damo22>rigt

<damo22>right*

<damo22>but the first interrupt sees gs base already with kernel value

<damo22>because it has to be set to make the kernel work

<youpi>if the interrupt happens during a syscall, that's expected

<youpi>and that's fine, since we'll swapgs back before returning to userland

<damo22>no we wont because the first r12 wont be set

<youpi>what r12? syscall doesn't change r12, and that's fine

<damo22>we get an interrupt when calibrating lapic timer

<damo22>before the first user thread exists

<youpi>ok but we are in kernel land

<youpi>so we don't swap

<youpi>and remember in r12 that we haven't

<damo22>i have just pushed a commit that fails reproducibly with gs already set to kernel on entry to syscall64

<damo22>but it looks correct to me

<damo22>the commit before seems to mostly work

<youpi>I'd say for now make the return code check for both values explicitly, in case r12 is clobbered for reasons yet unknown to us

<youpi>and make the codes really random

<youpi>(the r12 codes)

<damo22>ok

<youpi>(and ud2 if it's niether)

<youpi>damo22: in case you hadn't seen, the first "return" to user is thread_bootstrap_return

<youpi>you'd probably need to set r12 in set_user_regs

<youpi>ah no it's not setting an interrupt state, only the normal user state

<youpi>ah, it fakes a return from trap, not return from interrupt

<youpi>but then it's the kernel part of the state in which you want to set r12

<youpi>so the fake return from trap knows it has to switch

<damo22>that would be it

<damo22>i put hlt not ud2

<damo22>and it wakes up periodically

<damo22>so boots, but slooww

<damo22>i am putting ud2 now

<youpi>I would say it's in thread_bootstrap_return that you want to set r12 to "go back to user"

<youpi>thread_exception_return also needs it indeed

<damo22>it hits ud2

<damo22>so something must be clobbering r12?

<youpi>or not setting it

<damo22>or not setting it initially

<damo22>i did a breakpoint on thread_exception_return and thread_bootstrap_return

<damo22>r12 has something else in it

<damo22>looks important

<damo22>R12=ffffffffdc2dace0

<youpi>that's not surprising since for new threads that bootup userland, nothing said what r12 should be

<damo22>hehe i set a breakpoint on ud2

<damo22>R12=0000000000000000

<damo22>GS =0000 ffffffff810bb000 000fffff 00000000

<damo22>i continued after thread_exception_return and it hit the next breakpoint at ud2

<damo22>but i dont know what happened in between

<damo22>somehow a syscall64 then thread_exception_return?

<youpi>mach_msg_interrupt looks so

<youpi>ah, syscall64 fills iss

<youpi>it has to fill r12 as well

<youpi>to tell it's from userland

<youpi>ah no that's not the same as isr

<youpi>but somehow there's probably something like that

<youpi>did you make thread_exception_return set r12 as returning to user?

<youpi>really you want that

<damo22>yes

<damo22>i havent pushed my test code

<youpi>so syscall64->thread_exception_return will be fine

<youpi>syscall64 has no way to set r12 otherwise, and that's expected

<youpi>if code is not returning into syscall64, it has to cope with it

<damo22>thread exception return (ter) x6 then syscall64, ter, syscall syscall ter ter syscall64, syscall64 then ud2

<damo22>i will push my test code

<damo22>pushed

<youpi>there are also callers of _take_trap that need to set r12

<youpi>also, thread_syscall_return

<damo22>holy crap it booted

<damo22>-smp 1 works

<damo22>-smp 6 hangs at [ 1.0000050] ahcisata0: 64-bit DMA

<damo22>i think its because the hpc clock is not smp safe

<damo22>(gdb) bt

<damo22>#0 0xffffffff8101787d in hpclock_read_counter () at ../i386/i386/apic.c:493

<damo22>#1 0xffffffff81056458 in time_value64_add_hpc (value=value@entry=0xfffffffff0a4a058, last_hpc=35443913) at ../kern/mach_clock.c:525

<damo22>#2 0xffffffff81057608 in host_get_uptime64 (host=<optimized out>, uptime=uptime@entry=0xfffffffff0a4a058) at ../kern/mach_clock.c:718

<damo22>#3 0xffffffff8107aa8d in _Xhost_get_uptime64 (InHeadP=<optimized out>, OutHeadP=0xfffffffff0a4a020) at kern/mach_host.server.c:2950

<damo22>#4 0xffffffff8105295c in ipc_kobject_server (request=request@entry=0xffffffff81013000) at ../kern/ipc_kobject.c:179

<damo22>#5 0xffffffff810852f3 in mach_msg_trap (msg=0x7ffffffff3c0, option=<optimized out>, send_size=32, rcv_size=72, rcv_name=5, time_out=<optimized out>, notify=0) at ../ipc/mach_msg.c:1310

<damo22>#6 0xffffffff8103e306 in syscall64 () at ../x86_64/locore.S:1521

<damo22>#7 0x0000000000000000 in ?? ()

<youpi>btw, instead of difficultly checking for being equal to the percpu array, you can as well just test if it's negative: testl %rdx,%rdx ; js its_kernel

<damo22>oh, nothing can set bit63 in userspace ?

<youpi>without the fsgsbase feature, userspace can't set gsbase

<youpi>testing for negative values is not really less secureless than testing for an exact value

<damo22>right

<damo22>negative values are equivalent to bit63 being set, but if the userspace memory mapping doesnt allow writing to memory addresses that high, it might be some protection?

<damo22>even if they had fsgsbase feature

<damo22>you cant offset gs with 32 bit immediate lower than about (1<<63) - (1<<32)

<damo22>if kernel max address was lower than that, it would be inaccessible?

<youpi>I'm reading https://www.kernel.org/doc/Documentation/x86/entry_64.txt about swapgs

<youpi>it doesn't say how we prevent userland from setting a negative gsbase

<youpi>we apparently really can do it

<youpi>there is a comment about it in the fsgsbase support

<youpi>This enablement requires careful handling of the exception entries which go through the paranoid entry path as they can no longer rely on the assumption that user GSBASE is positive (as enforced via prctl() on non FSGSBASE enabled systemn)

<youpi>ok, the bit of linux code that tests for the value being negative is only used on systems without fsgsbase support

<damo22>yeah

<youpi> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c82965f9e53005c1c62632c468968293262056cb

<youpi>« If we are at an interrupt or user-trap/gate-alike boundary then we can use the faster check »

<youpi>which is just to test for %cs to be the kernel cs

<damo22>ive force pushed a semi working smp64-upstream branch

<damo22>-smp 1 boots, -smp 6 doesnt quite

<damo22>nn

<damo22>hurd party in ~10 hours?

<youpi>I'd say simplify it into testing for gsbase being negative, and we can probably commit that for a start

<youpi>yes for the party

<youpi>we don't enable fsgsbase support yet, so it's safe for now

<damo22>ok

<youpi>add the check for negative value in the i386_FSGS_BASE_STATE load code, though :)

<youpi>and you can leave a todo note about simplifying it into checking for the saved %cs in the cases where this is safe (maskable interrupts and traps)

<youpi>the non-maskable ones would be double-fault and machine check exception,

<youpi>double-fault doesn't have an exit path anyway, so we can as well just force-load gsbase

<youpi>and we don't seem to be enabling MCE

<youpi>but it's trap 0x12, we'd probably want to make that safe

<youpi>and there is nmi

<youpi>(trap 0x02)

<youpi>but again, we'll probably not have an exit path for that

<sobkas>So pathconf with _PC_PATH_MAX hangs, or crashhangs

<sobkas> https://paste.debian.net/hidden/c65eab74

<youpi>note that it can return -1, which is expected when there is no limitation

<sobkas>Right now it doesn't return anything because it (crash)hangs

<youpi>is your /tmp tmpfs-based ?