IRC channel logs

<stikonas>oriansj: when you have some free time (but no rush) could you please review https://github.com/oriansj/M2-Planet/pull/44

<oriansj>stikonas: not all stacks grow in the same direction

<oriansj>and the reason some grow positive and some grow negative because that is the difference between the architectures.

<muurkha>which architectures have positive-growing stacks? maybe S/360?

<sam_>hppa, ia64 sometimes iirc

<sam_>not sure about others

<muurkha>oh neat, thanks

<muurkha>I've actually used hppa and never realized that

<muurkha>I don't think I ever used Itanic

<sam_> https://stackoverflow.com/questions/664744/what-is-the-direction-of-stack-growth-in-most-modern-systems?noredirect=1 apparently sparc can do either but convention is down (TIL)

<sam_>i swear ia64 has a weird bi-directional thing

<sam_>but yeah hppa is the big one

<muurkha>yeah on sparc it's up to the register window wraparound fault handler

<sam_>tangential but i'm interested if anyone else feels the same way

<sam_>why does nobody talk about arm be?

<sam_>like if you google it, you would think it doesn'te xist

<sam_>(or barely exists, or most hw isn't capable of it, despite the fact they can)

<oriansj>not to mention a bunch of 8/16 bit processors grow their stack in the correct direction

<oriansj>68000, knight, etc

<muurkha>:)

<oriansj>and risc-v if you choose the correct instructions

<muurkha>hey, don't be dissing my bro 68000

<muurkha>he's got 32 full, thick, rich bits in every register

<oriansj>muurkha: well the split address/data registers was a bad idea but the cold-fire follow-on did fix it

<oriansj>stikonas: merged

<stikonas>oriansj: thanks

<stikonas>oriansj: well, for stacks that grow the other direction there is no need to adjust struct offsets

<muurkha>it's at least a debatable idea. a lot of processors did that, like the CDC 6600, both because commonly your addresses are an inconvenient size for data, and because it saves you an operand bit in every operand field

<stikonas>for normal stacks that grow downwards, I had to put struct at the bottom of the allocated space, so that positive struct offsets end up in the right location

<muurkha>the disadvantage is that it makes your instruction set less orthogonal and sometimes requires ugly workarounds, though I don't remember any of those on the 68k

<stikonas>but if I understand, everything that M2-Planet supports had downwards stacks

<stikonas>anyway, struct support in M2-Planet is now much more complete

<oriansj>stikonas: I appreciate you helping to make M2-Planet a more complete C compiler ^_^

<muurkha>:)

<oriansj>the problem with stacks growing down is you either need to put your code at some address above the stack area (and below your heap area) or you need to setup guard pages (assuming your architecture supports it) to prevent your program from being altered by stack variables

<oriansj>of course if your architecture doesn't treat wrapping around the top to the bottom address (and/or bottowm to top address) as an exception, it doesn't save you much either.

<muurkha>you know, it'd be nice to have hardware that traps when your stack pointer exceeds a limit

<muurkha>it's the kind of thing that's easy and cheap in hardware (assuming you have traps already and that they don't use the stack) and a pain in software

<muurkha>you could use the same kind of thing for fast generational GC

<muurkha>I mean I know currently fashionable CPUs with OoO and speculative execution don't benefit much

<oriansj>muurkha: you mean like x86 segments, well yes bounds pointers work well if your address is too small or you lack proper virtual memory support

<muurkha>not like x86 segments, because those don't have bounds and because in the GC case you don't want to apply the same bounds to object pointers that you apply to the allocation pointer

<oriansj> so more like the Burroughs Large Systems architecture

<oriansj>'s segments?

<muurkha>well, those do at least have the bounds, but they still don't handle the other consideration I mentioned

<muurkha>let me elucidate

<muurkha>what i mean is that an open-coded pointer-bumping nursery allocator in a generational GC is typically 3-5 instructions: copy the bump-pointer register into some other register, add some compile-time constant to it, compare the new value against some bound (possibly in another register or possibly a compile-time constant) and conditionally call the garbage collector

<muurkha>(if you have exceeded the bound)

<muurkha>this is the main CPU hog in hot loops in a lot of pure-functional code, because it needs to allocate new memory for every new larger-than-a-register value it computes

<oriansj>so per segment exception pointers?

<oriansj>sounds like tagged memory would be a better match

<muurkha>when the GC gets invoked it copies all the (hopefully few) surviving objects in the nursery out into the next generation, and it rewrites all the pointers to them, and possibly the pointer that was just allocated as well

<muurkha>the crucial part here is that when user code follows a pointer, it doesn't normally know whether that pointer points into the nursery or not. in fact it might change from one moment to the next if the garbage collector got invoked in between

<muurkha>now, if you had the CPU trap whenever you incremented the allocation pointer past the bound, the 3???5 instruction allocation sequence would become 2 instructions, which makes a lot of difference on an in-order processor

<oriansj>hmmm, I am not a fan of hardware garbage collectors. I see their advantages and recognize the benefits that they bring but I don't feel they are the right abstraction in hardware.

<muurkha>but you don't want to trap whenever *any* pointer into the nursery goes out of the bounds of the nursery

<muurkha>yeah, I'm not suggesting a hardware garbage collector or Burroughs-style segmentation or tagged memory

<muurkha>I'm saying that a CPU that invoked a trap handler when you bumped the allocation pointer out of the nursery would speed up the allocation fast path on in-order CPUs, which would speed up a lot of things

<muurkha>I guess you could do this with Burroughs-style segmentation if your ordinary object pointers were (segment, offset) pairs. when they were nursery objects the segment would be the nursery segment, and when the GC came around it would rewrite them to be pointers into some other segment

<oriansj>muurkha: well that is the thing, not many things should need to be reallocated

<muurkha>you mean copying is bad?

<muurkha>(but Burroughs-style segmentation is definitely not what I had in mind, even if it wouldn't be actually incompatible as I thought at first)

<oriansj>when you don't have to copy, why waste the cycles copying

<oriansj>copy-on-write is perfectly reasonable

<muurkha>well, because it turns out that programs written using generational copying collectors are usually faster than programs that do the same task written using explicit allocation and freeing

<muurkha>I mean, heap allocation. programs that do the same task with only stack allocation are even faster when they are possible

<muurkha>also, programs written using garbage collection (of whatever kind) are usually simpler than programs that use explicit heap allocation and freeing, even now that we have Rust to automate the freeing

<oriansj>well if the freeing of memory can be calculated at compile time, is it really garbage collection?

<muurkha>sometimes people call it that, but I'm talking about the cases where it can't

<muurkha>I think tagged memory and stuff is interesting too, it's just a different thing

<muurkha>the same kind of bounding thing is useful for stack segmentation and thus reducing the cost of multithreading and call/cc

<stikonas>oriansj: I suspect the next low hanging fruit for M2-Planet are local arrays, though I won't be working on them (at least for now)

<oriansj>stikonas: or array initialization

<oriansj>but M2-Planet is already well past the minimal C feature set to write clean C code and do real useful work; and now it is just a matter of slowly expanding to the level needed for any code that wants to be M2-Planet bootstrapped.

<muurkha>do you think that's a good idea? maybe it would be better to compile that other code with tcc

<muurkha>I mean every new feature is a potential backdoor, right?

<oriansj>muurkha: well at this point it isn't about what I think is a good idea or not but rather what C features other people *need*

<oriansj>and as long as it remains buildable by cc_*; then any backdoor would have to be in the vary easy to reason about assembly (in M2libc) or in C code (Which is missing most of the easy to abuse C features) so it would be relatively easy to reason about.

<oriansj>and any constructs that look abusive (or atleast more complex than any of the code in M2-Mesoplanet or blynn-compiler) should immediately be suspect.

IRC channel logs

2022-11-08.log