IRC channel logs

<fossy>stikonas: hmmm, --external-sources makes a lot of sense for development, but for users, we want the least system dependence by default...

<fossy>i wonder whether implementing the "atlernative server" thing, so it downloads sources from some kind of cache/archive and then making --external-sources serve the sources/ directory to the bootstrap

<fossy>it would simplify quite a few things a bit if we did that because then we don't need to copy in any files

<stikonas>well ok, I can just add it manually

<stikonas>fossy: I'll have a new review for you in a few minutes

<fossy>cool

<fossy>another idea, we could have a config file for rootfs.py, where all the options we commonly use in development can go

<fossy> https://github.com/fosslinux/live-bootstrap/pull/199 Does make re-run configure for subdirectories or something? very odd if it re-runs top level configure

<stikonas>fossy: https://github.com/fosslinux/live-bootstrap/pull/200

<stikonas>fossy: I suspect it's subdirectories for 199

<stikonas>hmm

<stikonas>fossy: although there is just 1 configure there

<stikonas>only top-level

<fossy>of course autogen scripts use which

<stikonas>yeah...

<stikonas>I could do command -v wrapper and overcome it

<stikonas>but we can as well build which

<fossy>yeah, why not

<stikonas>fossy: the other thing we need is newer grep

<fossy>can you also null out/regen which.1 (although I have no idea how to make a manpage from texinfo) and makeinfo which.info? they're autogened although i don't really care about them

<stikonas>oh sure

<stikonas>yeah, I don't care super much about manpages but yes, let's rebuild it

<stikonas>or remove...

<stikonas>argh, that is not super trivial...

<stikonas>either I have to remove it pre-install

<stikonas>or fix make rule

<stikonas>make[2]: *** No rule to make target 'which.1', needed by 'all-am'. Stop.

<stikonas>oh, maybe I need maintainer mode..

<stikonas>but that one doesn't work with tarball

<stikonas>so I think I'll just rm and touch it

<stikonas>fossy: updated

<stikonas>I should probably re-run live-bootstrap from scratch to double check the hash

<stikonas>fossy: also if you have some time, it might be useful if you can also double check autogen bootstrap for pregened files

<stikonas>(I've started it myself but haven't finished it)

<fossy>stikonas: do you have a link for that? i know its in irc history but i can't find it easily

<stikonas> https://github.com/schierlm/gnu-autogen-bootstrapping/

<fossy>ah yes, that's it

<stikonas>there are two options, bootstrpap.sh and bootstrap-tarball.sh

<stikonas>though for most part they do similar things

<fossy>does it start with the latest version of autogen? it seems to

<stikonas>yes

<stikonas>and replaces a bit of autogened stuff with handwritten C

<fossy>yep

<stikonas>first it builds columns binary

<stikonas>that one just uses autogen for it's command line options

<stikonas>then getdefs

<stikonas>those two I checked

<stikonas>when over C files, checked what is included

<stikonas>didn't find anything more pregened

<stikonas>so that's where I stopped yesterday

<stikonas>will probably resume tomorrow to check autogen itself

<muurkha>that's great

<fossy>i see

<stikonas>fossy: if you want to run it in live-bootstrap you need which, newer grep and edit pkg-config option to include --static

<stikonas>I'll later add some env variable to those scripts

<stikonas>to let static builds

<stikonas>although we now have dynamic libguile.so too

<stikonas>other than that I think it should be fairly compatible with live-bootstrap now

<stikonas>I've fixed some musl issues two days ago

<oriansj>I may have an absolutely batshit crazy idea but I need to play more with it first

<muurkha>yay

<oriansj>this might actually end up being faster

<oriansj>say 2-3 months of effort

<fossy>what is it?

<oriansj>strip down TCC into something that can still self-host TCC but be buildable by M2-Planet

<oriansj>because, I for the life of me can't build mes.c with GCC+glibc

<fossy>it is very doable

<fossy>i made moderate progress on that about 12 months ago, however my approach wasn't particualrly sound

<fossy>be warned that tcc code isn't written with the nicest constructs and theres overuse/abuse of macros

<fossy>should be a lot easier now that m2-planet is more stable

<oriansj>yeah, hence why I am going to have to do some ugly things for a few weeks before it'll be in any shape to move forward on.

<oriansj>the idea: break out the C preprocessor into a separate program. Rip out the assembler and linker into separate programs as well

<oriansj>Then convert those 4 simpler programs to M2-Planet's subset

<muurkha>running it as a separate thread/process might simplify the control flow

<muurkha>and text pipes ease debugging somewhat

<muurkha>the downside is that serializing and deserializing adds some complexity to passing data around

<oriansj>yep

<muurkha>probably not saying anything you haven't thought a hundred times already

<oriansj>and if I do specializers the changing out of assembly back ends becomes trivial

<muurkha>that sounds optimistic

<oriansj>not really, just insanely inefficient

<oriansj>I also have the advantage of being willing to throw all cross-platform functionality in the trash

<oriansj>and willing to ditch optimizaations

<muurkha>yeah, that frees you up a lot

<oriansj>and if I output a platform neutral ISA instructions; then I can write a specializer in M2-Planet to convert that to M1 instructions and then I can write a second specializer that outputs a more advanced assembly after we bootstrap that

<oriansj>then all of this ELF crap can be ripped out of the C compiler

<oriansj>because that definitely never belonged in there

<oriansj>(in one's linker, sure) (in one's assembler, maybe) (but compiler? nope)

<oriansj>and take out dynamic linking support while I am at it

<oriansj>and remove the need for floating point support to compile C code

<oriansj>and tear out all non-deterministic bits

<oriansj>and just ditch the error recovery logic as fail fast works better for bootstrapping

<muurkha>sounds like a good direction to go in

<muurkha>if doing it all at once turns out to bog you down, you might try making the changes more incrementally

<muurkha>it might turn out that some aspects of the plan are good while others are not

<oriansj>well right now I am just cleaning up the TCC build to a minimal macro form

<muurkha>(though they all sound good to me at this level of detail)

<oriansj>and honestly I expect to fail a shitload

<oriansj>the biggest problem is finding time to write tests

<oriansj>as introducing bugs is very easy to do and a full and proper test suite doesn't quite exist in TCC

<muurkha>have you looked at Hypothesis?

<muurkha>I found it improved my testing cost/benefit ratio a lot

<muurkha>DRMaciver wrote a pedagogical version that's easier to clone in other languages called Minithesis if you don't like Python

<muurkha>Minithesis is 472 lines of Python

<oriansj>muurkha: I haven't seen it yet, nor heard of it as a program

<oriansj>but how would integrate with C programming?

<muurkha>well, for example, in https://github.com/kragen/dumpulse, I wrote my tests in Hypothesis

<muurkha>but the implementation was in C

<muurkha>so I compiled the implementation to a shared library and wrote https://github.com/kragen/dumpulse/blob/master/server.py to test the shared library using Python's cffi

<muurkha>but a different alternative would be to use Minithesis as the model for a property-based testing library that's actually written in C

<muurkha>and then write the tests in C

<oriansj>I'm fine with python, I just would need to get up to speed before I could be productive with it

<muurkha>yeah, I meant that you might feel that the rigmarole with cffi and shared libraries was a bit much (though it sure paid off in the case of Dumpulse)

<muurkha>the Hypothesis docs are pretty good: https://hypothesis.readthedocs.io/en/latest/quickstart.html

<oriansj>I don't deny that. I'm just thinking of all the C language lawyer style tests

<muurkha>yeah, there's definitely a real difficulty in knowing when a C compiler's output is correct

<muurkha>amusingly I just ran across a slide in one of DRMaciver's talks about that: https://drmaciver.github.io/hypothesis-talks/hypothesis-ipr0gram.html#/9

<muurkha>where someone found a bug in the CompCert proven-correct C compiler

<oriansj>well looking at the 23,619 lines of C that make up TCC. I must say; minus a few things done for linking and binary generation. everything else (minus a couple switch statements) are supported in M2-Planet

<oriansj>So if I can just break that bit off, then TCC would be rather quickly ported to M2-Planet

<muurkha>nice

<muurkha>what's the hair with linking and binary generation?

<oriansj>bitslices are a big chunk of it

<oriansj>and manually clearing out the macros is gonna eat a shitload of time

<muurkha>oh, like struct foo { int bar: 7; int baz: 3; };

<muurkha>?

<oriansj>mostly for assembly to bits

<stikonas>fossy: so are you happy now with "which" PR?

<stikonas>should I merge it?

<stikonas>or maybe I should also remove an empty file which.1 post-install

<doras>stikonas, fossy, the `i386-linux-musl` PR is up: https://github.com/fosslinux/live-bootstrap/pull/201

<stikonas>doras: I'm looking at your previous PR, in particular https://github.com/fosslinux/live-bootstrap/pull/197/commits/1918b12c614d1a931e31d6c802e6283a3004c1ee

<stikonas>I did rely a bit on those variables for development (to re-run stuff)

<stikonas>maybe we can save them to some file

<stikonas>so then one can source them easily

<stikonas>I'll mention it on PR...

<doras>stikonas: I also wasn't sure how to handle this best. The issue is that we can be dropped to bash at any point due to a build failure.

<stikonas>well, we have that trap, you can also add some handling there if you want

<doras>stikonas: maybe we could print them as part of the trap?

<doras>I mean, having them in the environment is a bad idea regardless, even for build re-runs.

<stikonas>yeah, printing them makes sense too

<stikonas>although we can do both

<stikonas>both print and save it into some fime

<stikonas>file

<stikonas>ok, just some minor comments for https://github.com/fosslinux/live-bootstrap/pull/197 looks good in general

<doras>stikonas: it would have been best to have each package build script load them individually from a file in its own shell execution.

<stikonas>doras: but how can we achieve this?

<stikonas>some variables are used outside build script

<stikonas>and build script does not run in its own shell

<stikonas>because subshells are initially a bit broken

<stikonas>I think subshell itself works

<stikonas>but traps don't work

<stikonas>possibly due to meslibc bugs

<doras>Why are traps important for us?

<doras>Can't we simply read status code of the bash execution?

<stikonas>hmm, I might be misremembering things a bit

<stikonas>but something was not working properly

<stikonas>you can try to edit things and see if first bash still works

<stikonas>but that's probably another big PR anyway

<doras>Hmmm...

<stikonas>when I was writing helpers.sh, my original plan was to have subshells for each package

<stikonas>but something was not working

<stikonas>I suspect it's due to bash using meslibc

<stikonas>which might go away at some point

<stikonas>(i.e. if we get gash working)

<stikonas>oriansj: can you pull in https://github.com/oriansj/bootstrap-seeds/pull/32 ?

<oriansj>stikonas: merged

<oriansj>and now I see why Fabrice Bellard stopped working on TCC

<oriansj>one would have to do some serious feature regressions to break it up into something easy to maintain.

<stikonas>hmm, that's a bit unfortunate...

<stikonas>well, M2-Planet -> tcc bootstrapping steps were a bit rough for some time now

<oriansj>yeah, that is my fault. I didn't estimate correctly the effort required for a spawned process.

<stikonas[m]>Not really your fault, it's just that gap between simple C compiler and real world compiler is fairly big

<oriansj>it is just a 5x between TCC and M2-Planet in terms of lines of code

<oriansj>and if I can break the pieces out; each might only be 2x

<stikonas[m]>Well, you only need C to M2 compiler

<stikonas[m]>Can even drop assembly support

<oriansj>M2 is C's core so it is possible to do (like a C99 to C89 compiler)

<stikonas[m]>If that's simpler...

<stikonas[m]>Which it might be

<oriansj>I clearly always go the harder route

<oriansj>so this will be a bif fiasco but atleast it'll be fun and educational

<oriansj>^bif^big^

<oriansj>especially given that TCC seems to break real hard when you comment out dlopen

<stikonas>strange, why would tcc depend on dlopen...

<stikonas>I thought it can work completely statically

***Andrew is now known as WaxCPU

<oriansj>maybe a build flag for TCC I needed to set

<oriansj>but it looks like everytime you hit a #include, it dlopen's it and uses dlsym to extract names

<oriansj>it in inside of a #ifdef TCC_IS_NATIVE block in libtcc.c

<oriansj>and the dlsym is in tccelf.c inside of a #if defined TCC_IS_NATIVE && !defined TCC_TARGET_PE

<oriansj>which is why TCC needs compiled libraries and doesn't use pure source libraries

<muurkha>it dlopens what, the .h file?

<muurkha>I don't see how that could work

<muurkha>dlsym is a pretty good way to extract names from a library, too bad libdl can't read stabs

<oriansj>muurkha: feel free to explain: https://paste.debian.net/1254199/

<stikonas>oriansj: does it still fail if you don't use libraries?

<stikonas>we only need to build tcc itself

<stikonas>or is it also the problem with libc

<muurkha>stddef.h seems to be tcc-$version/include/stddef.h

<muurkha>which mostly defines types like size_t and int64_t, although it does also declare alloca()

<oriansj>stikonas: well -E will be successful but doing tcc blah.c -o blah will error out with: tcc: error: file 'crt1.o' not found

<stikonas>that is a strange design choice...

<oriansj>but now we know even more about TCC

<oriansj>and the logic for compiling seems to be mixed in with the parsing.

<muurkha>yeah, that's one of the main design features of TCC

<muurkha>and also, say, Wirth's one-pass compiler for Oberon

<muurkha>or Crenshaw's series

<oriansj>hmmm.

<muurkha>I think it really hurts readability but it does require less code, especially in an environment without GC

<muurkha>though if you're running a compiler on a machine with 4 gibibytes of RAM I don't know why you need to free things. they'll get freed anyway when the compile finishes

<oriansj>gcc is broken up into a preprocessor, c compiler, assembler and separate linker right?

<muurkha>I don't think the preprocessor is actually separate from the compiler since GCC 2

<muurkha>previously it was called cccp

<stikonas>but assembler and linker are in binutils...

<oriansj>can GCC 2 build GCC 4?

<muurkha>though you can get the preprocessor output with gcc -E, I think it's actually the same executable doing the work (cc1? I forget)

<stikonas>well, combining preprocessor and compiler lets you avoid doing tokenization twice...

<muurkha>I haven't tried but I wouldn't be surprised. but I think maybe you meant "can GCC 1 build GCC 4?" and I think the answer is probably not

<stikonas>oriansj: guix starts with gcc 2.95, so you can check what is the next step there

<stikonas>how hard would it be to upgrade M2-Mesoplanet to have full C99 preprocessor?

<oriansj>muurkha: no, I am thinking of IF GCC is 4 independent programs (or could with reasonable effort be made so); there may be a route to building GCC directly from M2-Planet

<stikonas>I think we already support quite a bit

<aggi>minor note, TCC is missing one feature to compile recent linux kernel: an equivalent to gcc -S (to emit pre-generated assembly)

<oriansj>stikonas: well a handful of tweaks in the parser and then we would need to add a combine stage to fix the breakup between - and = into -=

<stikonas>gcc 4.0.4 already combines evertything into cc1

<stikonas>but I think C99 preprocessor is an easier task anyway

<oriansj>and then we can tweak the output to be more trivial to compile

<stikonas>at the very least it should be trivial to retokenize

<muurkha>oriansj: I think from the point of view of being 4 independent programs or not, GCC 2 is the same as GCC 4 or GCC 10

<oriansj>but does GCC combine tokenization and the compiling

<stikonas>I think tokenization is done by preprocessor

<stikonas>7a35239a2ad2f39220daa888650dbc44ba4a5664856997d2afb6165d305a8f82

<stikonas>argh, wrong paste

<stikonas> https://gcc.gnu.org/onlinedocs/cpp/Tokenization.html#Tokenization

<aggi>and they're talking about a separation into compiler-backends (for different architectures), and frontends (for different languages to parse/tokenize)

<aggi>TCC doesn't need different language frontends, it does C-lang only

<stikonas>well, it's more efficient to have different backends and frontends

<aggi>in between those language-frontends and backends, an intermediate representation exists, not sure how it was called

<ekaitz>oriansj: maybe this helps: https://ekaitz.elenq.tech/bootstrapGcc1.html

<stikonas>that way you only have to write the really hard bit (optimizer) once

<ekaitz>there are several ir-s in GCC

<ekaitz>one is called GIMPLE and other is called RTL

<ekaitz>RTL is target specific and GIMPLE is not

<ekaitz> https://www.cse.iitb.ac.in/grc/index.php?page=videos <-- you can learn about those here

<oriansj>ekaitz: oh, I am skipping absolutely ALL optimizations

<muurkha>stikonas: some optimizations are target-dependente

<aggi>ekaitz: interesting, however, my plan is to avoid GNU-toolchain (gcc,binutils) entirely, and use TCC only

<oriansj>so we only need the bit that reads the C code and outputs the IR

<ekaitz>it's a little bit hard to understand that there are GIMPLE and RTL IRs in GCC because normally we think about the IR as an AST only, but GCC uses a different compiler architecture than we are used to

<stikonas>still, it's only some of them. Otherwise if you have n languages and m arches, you would have to write n * m compilers if compiler could only target once arch

<ekaitz>in GCC the IR you are talking about is RTL, and that's not useful for you because it's target specific

<stikonas>anyway, we don't do optimizations in bootstrap at all

<muurkha>ekaitz: I think there are a lot of IRs that aren't ASTs

<muurkha>also didn't GCC add a third IR a few years ago?

<ekaitz>muurkha: yes, of course, but it's what many people have in mind when they think about compilers

<ekaitz>muurkha: I'm not sure about that, what I know is they started to focus more on the GIMPLE and added GIMPLE optimizations (tree-level optimizations)

<ekaitz>GCC is weird because they optimize the tree and later optimize the RTL

<ekaitz>the conversion between the GIMPLE and the RTL happens in a very simple level, but then the RTL is matched against some templates that then generate the assembly

<ekaitz>(it's superhard to explain btw, please don't kill me)

*muurkha hugs ekaitz

<ekaitz>(the link I posted is the best I can do)

<ekaitz>muurkha: thanks for the understanding :))

*oriansj hugs ekaitz

<ekaitz>so, about the language frontends: they generate a GIMPLE or a similar representation that is later converted to GIMPLE by the next step

<ekaitz>oriansj: :D

<ekaitz>if you want to watch how does gimple works, the gcc internals video series I shared spend a lot of time in that

<ekaitz>i'm not the best to explain that step, because I only worked in a backed, so I was focused on the RTL only

<oriansj>ok anyone know a way to extract the bit in C that just does the C code to gimple output

<ekaitz>oriansj: that should be the language frontend, which is pretty much independent

<ekaitz>oriansj: see gcc/c-parser.c

<ekaitz>or better gcc/c-*

<ekaitz>if you want to see how is that process made maybe the fortran frontend is easier to read: it's in the gcc/fortran/ folder

<muurkha>is there a command line option to GCC to dump out the GIMPLE representation?

<ekaitz>muurkha: yes

<ekaitz>muurkha: there are many but I can't remember

<ekaitz>muurkha: you can dump all the GIMPLE optimization steps

<ekaitz>muurkha: -fdump-tree*

<ekaitz>or maybe I'm wrong :S

<stikonas>oriansj: I wonder if this supports C99... https://github.com/logological/gpp/

<ekaitz>it was correct, the format is the following `-fdump-<ir>-<passname>` with that you can choose

<ekaitz>if you do `-fdump-tree-all` you'll see the GIMPLE is kinda complex and has many optimization passes

<ekaitz>also oriansj scrolling back... yes, the preprocessor I think it's an independent program: cpp

<ekaitz>but all this is kind of dangerous to extract directly from the codebase because some files are generated by the build process so be careful with that

<muurkha>hmm, I do have a separate cpp executable!

<muurkha>so maybe I was just totally wrong about that?

<muurkha>is it used in normal compilation, the way cc1 and ld are?

<ekaitz>muurkha: yes, it's a separate program, you can even run it separately and see it's output

<ekaitz>also I don't think cpp does really a tokenization... it only works around the preprocessor directives I think

<ekaitz>muurkha: yes, that's what is used during the normal compilation

<ekaitz>I think you can `gcc -v ` and see all the internal calls to other programs

<muurkha>strace says it's running gcc, cc1, as, collect2, and ld, but not cpp

<muurkha>strace -ff -o hellocompile gcc hello.c

<muurkha>grep exec hellocompile.*

<ekaitz>try with `gcc -v hello.c`

<muurkha>I did but it didn't tell me anything it was running

<ekaitz>hmmm

<ekaitz>let me try

<muurkha>oh it did mention as

<muurkha>and cc1

<muurkha>I just missed it amid all the noise

<muurkha>oh, and it mentioned collect2 as well

<unmatched-paren>there's https://github.com/h8liu/mcpp

<muurkha>gcc -v doesn't seem to mention ld (or the gcc driver executable itself)

<muurkha>this is on GCC 4.7 FWIW

<ekaitz>muurkha: try with -no-integrated-cpp

<ekaitz>that way it will call a separate cpp

<ekaitz>if you read that option in the man page it says by default gcc does the preprocessor in the same tokenization step, but you can make it call it independently

<ekaitz>I didn't know this but, hey! here we are to learn together

<muurkha>that's awesome!

<ekaitz>muurkha: ld is called by collect2

<ekaitz>that's why you can see it in the strace but not from the -v

<muurkha>oddly enough in that case instead of invoking the cpp executable it invokes cc1 -E to do the preprocessing

<muurkha>yeah, I thought the ld thing might be something like that

<ekaitz>the collect2 i think also calls the LTO and stuff like that, that's why we need a different program not just the ld call

<ekaitz>muurkha: cc1 -E... interesting

<ekaitz>muurkha: i think cpp also calls cc1

<ekaitz>give it a try with strace

<muurkha>oh hey, you're right, cpp is a wrapper around cc1

<muurkha>why does it have to be 600K then!?

<oriansj>oh my https://gcc.gnu.org/onlinedocs/gccint/GIMPLE-instruction-set.html#GIMPLE-instruction-set GIMPLE isn't that far off from what M2-Planet has internally

<muurkha>GIMPLE_PHI is presumably the operation for unifying two different assignments on different control paths so you can have SSA?

<muurkha>there's an overview in https://gcc.gnu.org/onlinedocs/gccint/GIMPLE.html

<ekaitz>muurkha: I shared a link before from the university of Bombay, they explain it really well too

<muurkha>yeah but it was videos, right? that's what the URL said

<muurkha>so I didn't follow it

<muurkha>by contrast that doc overview page is 500 words, you can read it in a minute and a half

<oriansj>(OP R_out R_in1 R_in2)

<oriansj>I can imagine looking at GIMPLE and GENERIC is C to S-expressions

<ekaitz>muurkha: the difference is the docs are... not really clarifying :)

<ekaitz>oriansj: kinda, but not really... it's more like a some kind of assembly with tree-way instructions

<oriansj>ekaitz: (jne :lable R0 R1)

<ekaitz>oriansj: haha i see what you mean

<ekaitz>if you output the gimple to a file it's still looking like a C subset or something

<ekaitz>but yeah, it's a three-way thingie

<oriansj>see lisp originally mapped straight to assembly; (add a b) => (RETURN (load R0 :a) (load R1 :b)(add R0 R1))

<muurkha>that's sort of a decompilation of the gimple though

<oriansj>this is all is giving me ideas

<muurkha>:)

<ekaitz>oriansj: if I can give you more ideas... just squeeze me

<oriansj>I am starting to think we have been doing C compilers the way that makes things harder than needed.

<ekaitz>oriansj: specifically GCC is REALLY complex

<ekaitz>need for speedTM you know

<ekaitz>C itself is not that complex to need this amount of code... GCC has millions of lines split in thousands of files

<oriansj>well GCC is overly engineered to enable chasing every possible optimization.

<oriansj>some of that engineering is a really good idea.

<ekaitz>yeah but the basics are overcomplicated

<ekaitz>i think a C nanopass compiler is possible

<oriansj>slightly but not in a way that can't be cribbed off in a productive fashion.

<ekaitz>and it could be enough

<oriansj>ekaitz: my thoughts exactly

<ekaitz>:)

<oriansj>so C-orchastrator spawns c-preprocessor (reads all sources and dumps a single source file in a form easiest to parse), c-compiler (reads dump and produces an IR dump), c-optimizer (skipped but easy to add), c-specializer (converts IR to target assembly), machine-optimizer (skipped but easy to add), assembler (converts assembly to object code), link-time-optimizer (skipped but easy to add), linker (converts object code to final binary)

<oriansj>everything after the c-compiler becomes sharable by different programming languages

<oriansj>hmmmm

<oriansj>and we know all the C keywords and tokens that TCC supported, so once we support those correctly we should be able to Compile GCC too

<ekaitz>oriansj: the c-specializer is done in two steps that are pretty separate actually

<ekaitz>the gimple opcodes are matched in a hardcoded table to a generic RTL template

<ekaitz>then the RTL is matched to the machine specific RTLs

<ekaitz>so the RTL is not just a target related thing, it serves too purposes

<ekaitz>when you write a new backend you don't need to write how is the GIMPLE matched to the RTL, but you write how the RTL is converted to assembly

<ekaitz>and also you give some info about the machine so the RTL is generated with that in mind

<ekaitz>it's hard to explain

<ekaitz>but here the real IR is the RTL, but the GIMPLE started to gain some importance in the process some years ago

<ekaitz>in the videos of the university of bombai they explain it correctly

<ekaitz>GCC follows an architecture that is not very common

<ekaitz>there are two ways to make compilers: one that separates the backend from the frontend in a target independent IR and optimizes on top of that one

<ekaitz>and the other that has a target dependent IR and optimizes on that

<ekaitz>the first is the one that is usually taught in books like the dragon book and stuff like that

<ekaitz>but GCC follows the second

<ekaitz>so GIMPLE is not really the main IR in GCC (but it has become more important last years)

<oriansj>if you can find a link to those videos you mentioned, please share so I can take a look when I get a few free minutes

<ekaitz>the technical word i was looking for: GCC follows the Davidson Fraser Model

<ekaitz>in contrast to the Aho Ullman model

<oriansj>as I am planning on using as many good idea as I can

<ekaitz> https://www.cse.iitb.ac.in/grc/index.php?page=videos

<ekaitz>those are the videos

<oriansj>thank you

<ekaitz>so the thing is GCC uses an expander that takes the GIMPLE and expands the code to register transfers, then optimizes that and uses a pattern recognizer to generate the assembly

<ekaitz>while the aho ullman style compilers generate an AST, optimize it and finally generate the target code

<ekaitz>of course, the aho ullman compilers can have some peephole optimizations in the target code too, but they don't normally have many

<ekaitz>in the case of gcc, they started adding tree-level optimizations later

<ekaitz>so now it's kind of a mixed approach

<ekaitz>(sorry for the supermegalong explanations, I got excited, I normally don't have the chance to talk about these things...)

*ekaitz feels alone sometimes

<oriansj>me hugs ekaitz

*oriansj hugs ekaitz

<oriansj>you are with friends here

<oriansj>and if you ever need to talk to someone feel free to give me a call on Signal

<ekaitz>:)) thank you

<oriansj>and you should have that number as you are a friend after all

<ekaitz>oriansj: don't worry mate I don't use signal either, i'm fine with this IRC chat :)

<oriansj>I'm also on matrix for people who prefer different communication methods

<muurkha>oriansj: part of GCC being overcomplicated is a result of GCC being 35 years old

<ekaitz>muurkha: I don't think that justifies everything

<muurkha>I'm pretty sure it contains decisions that we've known were a bad idea for 25 years but have been too much work to change

<ekaitz>I'm almost 35 and I'm not that complex after all LOL

<ekaitz>the Davidson Fraser Model was one of those weird decisions they are regretting now

<ekaitz>there were some efforts to remove RTL too

<ekaitz>and use other kind of representation...

<muurkha>oriansj: you might want to read one of the early papers on Steve Johnson's pcc compiler, which was split into a front end and back end that communicated through a byte stream

<muurkha>more recent versions of pcc don't work that way

<ekaitz>but that's not the only reason... GCC is also a Compiler Generation Framework, and that makes the thing really complex

<muurkha>I think it's pretty common for compilers to have both a target-independent IR and a target-independent one. some even have a much larger number of IRs

<muurkha>ekaitz: I'm not saying it justifies things; there are probably also mistakes in GCC's design that someone who wasn't writing their first compiler would have avoided

<muurkha>although I'm not sure what they are

<ekaitz>muurkha: GCC was built by people doing research so I don't think it has novice mistakes

<ekaitz>It's a really good piece of software

<ekaitz>it supports hundreds of architecture

<ekaitz>s*

<ekaitz>it's easy to port

<ekaitz>it's powerful...

<ekaitz>but in the end it's impossible to read by an individual

<oriansj>the only weakness is that it isn't easy to bootstrap

<stikonas>well, M2-Planet is even easier to port, supports a few (7 or so) architectures, fairly easy to read by an individual but not that powerful...

<oriansj>stikonas: actually it could have been even easier to port if I was smarter about it and did IR output and architecture specializers but it would have been harder to do cc_* support

<oriansj>so I guess I have a top down plan for M3 instead of just the bottom up one I was previously planning

<muurkha>ekaitz: no, Stallman wasn't doing research and he was a novice at compilers, though he had a quarter of century of experience programming

<muurkha>Cygnus wasn't a research venture either

<muurkha>though it did have people with compiler experience

<aggi>binutils and assembly, are a headache, to detangle from GNU too

<oriansj>aggi: I think I can solve that

<oriansj>and give you a better than TCC compiler

<aggi>oriansj: my focus is the _entire_ system integration, and for this i use tcc-toolchain simply because it is possible to it with

<aggi>and this reveals problems, which will be hit by any other approach, when detangling GNU-toolchain and GNU-buildsystem

<aggi>hence, even when you prefer another compiler (cproc), assembler whatever, it will be much easier, if any distro removed _all_ c++ dependencies already

<aggi>and if such a distro did pass with tcc-toolchain already too

<muurkha>cproc?

<aggi>that's another alternative compiler, didn't test this one yet

<aggi>anyhow, the rationale with the system integration approach of mine, covers different aspects than bootstrappable ones too

<muurkha>ah, hadn't heard of it

<aggi>including full removal of GNU-toolchain/buildsystem, removal of bashism, no-c++, etc.

<ekaitz>muurkha: cproc is really interesting, it uses qbe backend

<aggi>suckless or toybox userspace integration, removal of GNU make replaced with POSIX make

<aggi>POSIX-shell only, no bashism

<ekaitz>aggi: but posix is not a really well defined standard either

<ekaitz>what to do with the kernel api?

<muurkha>BSD make?

<ekaitz>posix is old-ish

<muurkha>or heirloom-tools make?

<aggi>if possible, not GNU make, i'll try this one https://frippery.org/make/

<aggi>with kernel, the plan is ROLL-BACK to linux-2.4, for various reasons

<aggi>this won't be POSIX-complete, however, with GNU, it is their extensions beyond POSIX which are a trouble-source

<muurkha>dunno, when I spent all day hacking makefiles I considered the lack of extensions beyond POSIX to be a trouble-source!

<muurkha>ekaitz: qbe is interesting but I haven't tried using it

<aggi>muurkha: suckless.org Makefiles tend to be simpler than the typical GNU makefiles (and even worse than what autoconf/automake excrete)

<aggi>toybox passed with tcc-toolchain already (except wget widget), not sure yet what trouble awaits when switching to POSIX make

<aggi>that's why, i want to do it

<aggi>and i won't shed that many tears, if various software is sacrificed due to this type of problems

IRC channel logs

2022-09-17.log