IRC channel logs

2023-08-16.log

back to list of logs

<apteryx>awesome! well done to everyone involved
<solid_black>hello :)
<damo22>hi
<damo22>solid_black: do you know if gnumach uses %fs or %gs
<damo22>i want to claim one of these for storing a pointer to per-cpu stuff
<solid_black>damo22: afaik it doesn't use either currently for its own purposes, it only tracks the userspace's values
<solid_black>but yeah, it really should
<solid_black>use gs
<solid_black>and the swapgs instruction
<solid_black>make gs point at some per-CPU struct
<damo22>then we can store the apic id in memory and read it very quickly
<damo22>reading it from lapic is 8microseconds and cpuid way is 60 nanoseconds
<solid_black>the thing is, you kind of want %gs to point it something, it's inconvenient to make it store an int
<damo22>yes
<damo22>make it point at a segment
<solid_black>so you'd allocate some per-cpu data block there, like userland's tls, and make it point there
<damo22>with struct percpu {
<damo22>int apic_id
<damo22>i guess it can be allocated with kalloc?
<solid_black>you could just store an integer, but my point is instead of using an integer and then indexing into some global arrays with that integer, it's better to store everything in per-cpu blocks in the first place
<solid_black>for instance currently current_thread() is defined to active_threads[cpu_number()]
<damo22>yes
<damo22>it could be a percpu value
<solid_black>but instead you'd do struct percpu { ...; thread active_thread; }
<solid_black>yes
<damo22>that would speed up a lot wouldnt it?"
<damo22>not having to call cpu_number
<solid_black>maybe? i don't know, you're the one who's done the profiling :)
<solid_black>so perhaps in the end you wouldn't need the number at all
<solid_black>and actually, Mach already has that struct percpu, and it's called struct processor (see kern/processor.h)
<solid_black>so make %gs point at struct processor for the current cpu
<damo22>but it doesnt have active_threads[]
<damo22>does it?"
<solid_black>no
<solid_black>there's struct thread next_thread, not sure if that's the same thing
<damo22>no
<damo22>i can put processor_t processor in the percpu thing
<damo22>how do you use %gs
<damo22>is it a pointer?
<solid_black>why don't you just use processor_t as percpu?
<damo22>because there are things i want to store that are not in processor_t
<solid_black>add them to processor_t
<solid_black>no?
<damo22>not sure
<solid_black>so %gs is a segment register, you can't quite use it directly, especially not on x86_64
<solid_black>we should add a bunch of macros/helpers for this like glibc does, but basically gcc supports a separate __seg_gs address space
<damo22>do i need to use a fixed address in the ld script?
<solid_black>for what?
<damo22>to allocate the percpu stuff
<solid_black>i don't think so
<solid_black>you should be able to kalloc it at runtime
<solid_black>and mach already does 'struct processor processor_array[NCPUS];'
<solid_black>as in not even kalloc'ed, just as static variables
<damo22>all those arrays can go away
<damo22>i think the most expensive part is continually looking up the cpuid
<damo22>to find the right index into all the arrays
<damo22>and it doesnt cache the value
<damo22>theyre all macros that expand to cpu_number()
<solid_black>it cannot cache the value because there's nowhere for it to cache it in
<damo22>yes, but i mean if it reuses the same cpu_number in multiple calls in the same function
<solid_black>ah in that case maybe
<damo22>within spl blocks
<solid_black>but still I'd expect cpu_number to be called many times during a syscall, across different functions
<damo22>yes
<solid_black>but yeah really gs and percpu data would be much better
<damo22>how do you make it arch agnostic
<damo22>do all cpus provide a reg you can use for that?
<solid_black>just declare a function like processor_t current_procssor()
<solid_black>and implement it differently for different archs
<solid_black>not that we have anything but x86 anyway
<solid_black>but yes, all archs have to support somthing like that
<solid_black>tls in userspace is the same thing
<damo22>i dont want to implement code that has to be totally refactored just to support a new arch
<damo22>i dont even care much for x86
<solid_black>well the specific mechanism how per-cpu data is found is surely arch-specific, there's no way around that
<damo22>yes but we need some kind of interface that allows arch-specific code to override the method
<solid_black>I'm saying, a current_procssor() function that is implemented in arch-specific code that returns this pointer
<damo22>i guess a macro can do
<solid_black>this is like THREAD_SELF() in glibc, but Mach-y
<solid_black>so on x86 (both i386 and x86_64) it'd just be processor_t current_processor(void) { return *(__seg_gs processor_t *) 0; }
<damo22>are you sure? in the kernel dont we have to implement %gs stuff in asm
<solid_black>or we could do it in asm, yeah, but __seg_gs also works
<solid_black>as long as you don't assign through it, 'cause then gcc miscompiles
<solid_black>so we should probably use asm
<damo22> https://git.sceen.net/rbraun/x15.git/tree/arch/x86/machine/cpu.h#n404
<solid_black>no, that won't work on x86_64 afaik
<solid_black>you can't really store a value there
<solid_black>you can only load a pointer into gs_base, and then you can do %gs:offset
<solid_black>really, see tls.h in glibc
<damo22>why not
<damo22>i think youre reading the asm wrong
<solid_black>he's storing the address itself in %fs, and then adding struct offset to the value of %fs, no?
<damo22>just replace fs with gs
<solid_black>no, I am reading it wrong
<solid_black>wait what
<solid_black>are all these per-cpu variables set to low addresses in the linker script?
<solid_black>in x15
<damo22>could be i dont knw