IRC channel logs

2023-07-07.log

back to list of logs

<oriansj>well the hyper majority of the complexity in GCC and Clang is just dedicated to taking advantage of everything in an architecture that would result in better performance; I don't think as much effort was made to enhance -Os to produce optimally tiny binaries.
<muurkha>yeah, one of the disappointments in the VAX was that some of the more complex instructions were actually slower than just generating the code to do it step-by-step
<muurkha>even if it was VAX code and not ARM code
<muurkha>if your code is fluffy it hurts both your cache hit rate on big machines and your ability to have functionality on small ones
<muurkha>so there has been a distinctly nonzero amount of effort devoted to code density
<muurkha>I spent a lot of last night studying the Cortex-M0 instruction set
<muurkha>which only implements [most of] Thumb plus a little of Thumb2
<muurkha>it's a lot denser than regular ARM code but also pretty limited
<oriansj>muurkha: I don't need any of thumb or thumb2; just a 28 instructions (it would be less if they had divide and modulus instructions)
<muurkha>28 instructions?
<oriansj>yes, the processor would only need to support 28 instructions to run the stage0 steps
<oriansj>such as bl label; push {r14}; ldr r0, [r8]; etc
<muurkha>oh, I see
<muurkha>it doesn't support any ARM instructions
<muurkha>only thumb
<oriansj>oh, then I guess I would need to a thumb only port to support it
<muurkha>yeah
<muurkha>heh, I wanted to see how GCC implements division on ARM, so I wrote a decimal print function called decout and compiled it. it implemented division by 10 with multiplication by 0x66666667 and a bit of cleanup
<muurkha>fine, so I'll rename it to basebout and pass a base parameter b, so it has to be variable at run time so it can't do that
<muurkha>the assembly has a call to basebout.constprop.0 with the same code as before
<muurkha>GCC produced a version of the function specialized for that constant parameter so it wouldn't have to divide at runtime
<muurkha>there we go, bl __aeabi_idivmod
<stikonas> https://matrix.org/blog/2023/07/deportalling-libera-chat/
<oriansj>muurkha: to be fair M2libc has a function for division for armv7l (making use of a couple conditional instructions along the way for performance reasons) https://git.sr.ht/~oriansj/M2libc/tree/main/item/armv7l/libc-full.M1 and it probably isn't as optimal as what GCC has
<oriansj>stikonas: so we need to plumb bootstrappable? ok, how does one do that?
<stikonas>no idea...
<stikonas>but we do have quite a lot of people connecting from matrix here
<stikonas>looks like there will be a more detailed guide in the future
<stikonas>(if we even decide to convert)
<jcmdln>that will make it much harder to lurk here and pretend I understand everything
<jcmdln>s/everything/anything
<muurkha>GCC's is bulky
<muurkha>but yeah probably several times faster
<oriansj>jcmdln: no one here understands everything but by working together; collectively everything can be understood and achieved. (we are about where I expected it would take me about 30 years of work to do)