IRC channel logs

2025-03-20.log

back to list of logs

<matrix_bridge><Andrius Štikonas> gtker: your PR seems fine at quick glance but I want to spend a bit more time reviewing it (didn't have enough time today...)
<fossy>I can't say I've personally ever seen *++a off the top of my head
<matrix_bridge><cosinusoidally> wow I didn't realise the penalty for unaligned memory access on some riscv chips could be so high. If https://old.reddit.com/r/RISCV/comments/1frrai9/opinionrant_riscv_prioritizes_hardware_developers/lplemdh/ is to believed it can be a 180x overhead for unaligned access (on a Visionfive 2). Apparently other riscv cores may have much less overhead (the example they give is Milk-V Duo)
<matrix_bridge><cosinusoidally> and of course on x86 aligned/unaligned have identical performance.
<matrix_bridge><cosinusoidally> hmm, I'm not sure how much I trust that benchmark. It seems to read the same memory location over and over. I imagine on x86 something clever may be going on to mask the overhead. Having said that, on my PC I do get a factor of 10 overhead for an unaligned read crossing a page boundry.
<matrix_bridge><cosinusoidally> (my PC is an Ivy Bridge i5)
<oriansj>well unaligned access at worst in x86 requires 2 loads, 2 shifts and an or. So 6 clock cycles more than an aligned access (assuming no multi-page spanning fun)
<oriansj>but RISC-V does the following insane thing: trap to the Operating system (so a literal syscall amount of overhead) have the kernel do the 2 loads, 2 shifts and the or; set the memory address which corresponds to the register for which the unaligned load is to be put. Do a syscall return and boom there is your massive delay (of which 180x is quite heavily optimized in many ways)
<matrix_bridge><Andrius Štikonas> Well, this is why compilers insert paddings into structs unless you instruct otherwise with packed attribute
<matrix_bridge><Andrius Štikonas> gtker: given that non packed strict seems the norm, can we switch to that and that will also solve initialiser issues
<matrix_bridge><Andrius Štikonas> Alignment should be to word size, I guess 32-bits is good enough for all supported arches?
<matrix_bridge><Andrius Štikonas> Oh we actually do need 16-bit for x86/x86_64
<matrix_bridge><Andrius Štikonas> At the very lease UEFI support in M2libc relies on that
<matrix_bridge><Andrius Štikonas> E.g. these structs should be packed: https://github.com/oriansj/M2libc/blob/b8a98b77b076a24457a6a9feb816a13850ff5269/uefi/uefi.c#L208
<matrix_bridge><gtker> Stikonas: I have deliberately avoided adding padding since all platforms seemed to support unaligned access and I wasn't sure how big of a deal it was in the code that is actually compiked with M2. I'm not against it in the future but I would rather focus on features right now and take on performance when we xan compile more complex software
<matrix_bridge><gtker> If it's an issue for people they can manually add padding
<matrix_bridge><Andrius Štikonas> OK, uefi spec says structs are naturally aligned: https://tianocore-docs.github.io/edk2-CCodingStandardsSpecification/draft/5_source_files/56_declarations_and_types.html#56-declarations-and-types
<matrix_bridge><Andrius Štikonas> So we do need 16 bit initialiser...
<matrix_bridge><Andrius Štikonas> Yeah, I'm not worried about performance
<matrix_bridge><Andrius Štikonas> I'm just thinking from the point of view of adding support for (u)int16_t initialiser
<matrix_bridge><Andrius Štikonas> Which is a useful feature
<matrix_bridge><Andrius Štikonas> (But as always we might want to prioritise other features first)
<matrix_bridge><gtker> Won't we just need the fix in M1 for it to work?
<matrix_bridge><Andrius Štikonas> I think so
<matrix_bridge><Andrius Štikonas> Or add all logic to M2-Planet and emit singly bytes
<matrix_bridge><Andrius Štikonas> But I think oriansj prefers we add support to M1
<matrix_bridge><Andrius Štikonas> I could take a look at M1 but it might be next week...
<matrix_bridge><Andrius Štikonas> We'll see...
<stikonas>gtker: merged your increment/decrement PR. Thanks!
<stikonas>it's really helping that you factored out all those emits into separate functions that abstract support for various architectures...
<stikonas>definitely much easier to read