IRC channel logs
2025-03-20.log
back to list of logs
<matrix_bridge><Andrius Štikonas> gtker: your PR seems fine at quick glance but I want to spend a bit more time reviewing it (didn't have enough time today...) <fossy>I can't say I've personally ever seen *++a off the top of my head <matrix_bridge><cosinusoidally> and of course on x86 aligned/unaligned have identical performance. <matrix_bridge><cosinusoidally> hmm, I'm not sure how much I trust that benchmark. It seems to read the same memory location over and over. I imagine on x86 something clever may be going on to mask the overhead. Having said that, on my PC I do get a factor of 10 overhead for an unaligned read crossing a page boundry. <oriansj>well unaligned access at worst in x86 requires 2 loads, 2 shifts and an or. So 6 clock cycles more than an aligned access (assuming no multi-page spanning fun) <oriansj>but RISC-V does the following insane thing: trap to the Operating system (so a literal syscall amount of overhead) have the kernel do the 2 loads, 2 shifts and the or; set the memory address which corresponds to the register for which the unaligned load is to be put. Do a syscall return and boom there is your massive delay (of which 180x is quite heavily optimized in many ways) <matrix_bridge><Andrius Štikonas> Well, this is why compilers insert paddings into structs unless you instruct otherwise with packed attribute <matrix_bridge><Andrius Štikonas> gtker: given that non packed strict seems the norm, can we switch to that and that will also solve initialiser issues <matrix_bridge><Andrius Štikonas> Alignment should be to word size, I guess 32-bits is good enough for all supported arches? <matrix_bridge><Andrius Štikonas> Oh we actually do need 16-bit for x86/x86_64 <matrix_bridge><Andrius Štikonas> At the very lease UEFI support in M2libc relies on that <matrix_bridge><gtker> Stikonas: I have deliberately avoided adding padding since all platforms seemed to support unaligned access and I wasn't sure how big of a deal it was in the code that is actually compiked with M2. I'm not against it in the future but I would rather focus on features right now and take on performance when we xan compile more complex software <matrix_bridge><gtker> If it's an issue for people they can manually add padding <matrix_bridge><Andrius Štikonas> I'm just thinking from the point of view of adding support for (u)int16_t initialiser <matrix_bridge><Andrius Štikonas> (But as always we might want to prioritise other features first) <matrix_bridge><Andrius Štikonas> Or add all logic to M2-Planet and emit singly bytes <matrix_bridge><Andrius Štikonas> But I think oriansj prefers we add support to M1 <matrix_bridge><Andrius Štikonas> I could take a look at M1 but it might be next week... <stikonas>gtker: merged your increment/decrement PR. Thanks! <stikonas>it's really helping that you factored out all those emits into separate functions that abstract support for various architectures...