IRC channel logs

2022-09-17.log

back to list of logs

<fossy>stikonas: hmmm, --external-sources makes a lot of sense for development, but for users, we want the least system dependence by default...
<fossy>i wonder whether implementing the "atlernative server" thing, so it downloads sources from some kind of cache/archive and then making --external-sources serve the sources/ directory to the bootstrap
<fossy>it would simplify quite a few things a bit if we did that because then we don't need to copy in any files
<stikonas>well ok, I can just add it manually
<stikonas>fossy: I'll have a new review for you in a few minutes
<fossy>cool
<fossy>another idea, we could have a config file for rootfs.py, where all the options we commonly use in development can go
<fossy> https://github.com/fosslinux/live-bootstrap/pull/199 Does make re-run configure for subdirectories or something? very odd if it re-runs top level configure
<stikonas>fossy: https://github.com/fosslinux/live-bootstrap/pull/200
<stikonas>fossy: I suspect it's subdirectories for 199
<stikonas>hmm
<stikonas>fossy: although there is just 1 configure there
<stikonas>only top-level
<fossy>of course autogen scripts use which
<stikonas>yeah...
<stikonas>I could do command -v wrapper and overcome it
<stikonas>but we can as well build which
<fossy>yeah, why not
<stikonas>fossy: the other thing we need is newer grep
<fossy>can you also null out/regen which.1 (although I have no idea how to make a manpage from texinfo) and makeinfo which.info? they're autogened although i don't really care about them
<stikonas>oh sure
<stikonas>yeah, I don't care super much about manpages but yes, let's rebuild it
<stikonas>or remove...
<stikonas>argh, that is not super trivial...
<stikonas>either I have to remove it pre-install
<stikonas>or fix make rule
<stikonas>make[2]: *** No rule to make target 'which.1', needed by 'all-am'. Stop.
<stikonas>oh, maybe I need maintainer mode..
<stikonas>but that one doesn't work with tarball
<stikonas>so I think I'll just rm and touch it
<stikonas>fossy: updated
<stikonas>I should probably re-run live-bootstrap from scratch to double check the hash
<stikonas>fossy: also if you have some time, it might be useful if you can also double check autogen bootstrap for pregened files
<stikonas>(I've started it myself but haven't finished it)
<fossy>stikonas: do you have a link for that? i know its in irc history but i can't find it easily
<stikonas> https://github.com/schierlm/gnu-autogen-bootstrapping/
<fossy>ah yes, that's it
<stikonas>there are two options, bootstrpap.sh and bootstrap-tarball.sh
<stikonas>though for most part they do similar things
<fossy>does it start with the latest version of autogen? it seems to
<stikonas>yes
<stikonas>and replaces a bit of autogened stuff with handwritten C
<fossy>yep
<stikonas>first it builds columns binary
<stikonas>that one just uses autogen for it's command line options
<stikonas>then getdefs
<stikonas>those two I checked
<stikonas>when over C files, checked what is included
<stikonas>didn't find anything more pregened
<stikonas>so that's where I stopped yesterday
<stikonas>will probably resume tomorrow to check autogen itself
<muurkha>that's great
<fossy>i see
<stikonas>fossy: if you want to run it in live-bootstrap you need which, newer grep and edit pkg-config option to include --static
<stikonas>I'll later add some env variable to those scripts
<stikonas>to let static builds
<stikonas>although we now have dynamic libguile.so too
<stikonas>other than that I think it should be fairly compatible with live-bootstrap now
<stikonas>I've fixed some musl issues two days ago
<oriansj>I may have an absolutely batshit crazy idea but I need to play more with it first
<muurkha>yay
<oriansj>this might actually end up being faster
<oriansj>say 2-3 months of effort
<fossy>what is it?
<oriansj>strip down TCC into something that can still self-host TCC but be buildable by M2-Planet
<oriansj>because, I for the life of me can't build mes.c with GCC+glibc
<fossy>it is very doable
<fossy>i made moderate progress on that about 12 months ago, however my approach wasn't particualrly sound
<fossy>be warned that tcc code isn't written with the nicest constructs and theres overuse/abuse of macros
<fossy>should be a lot easier now that m2-planet is more stable
<oriansj>yeah, hence why I am going to have to do some ugly things for a few weeks before it'll be in any shape to move forward on.
<oriansj>the idea: break out the C preprocessor into a separate program. Rip out the assembler and linker into separate programs as well
<oriansj>Then convert those 4 simpler programs to M2-Planet's subset
<muurkha>running it as a separate thread/process might simplify the control flow
<muurkha>and text pipes ease debugging somewhat
<muurkha>the downside is that serializing and deserializing adds some complexity to passing data around
<oriansj>yep
<muurkha>probably not saying anything you haven't thought a hundred times already
<oriansj>and if I do specializers the changing out of assembly back ends becomes trivial
<muurkha>that sounds optimistic
<oriansj>not really, just insanely inefficient
<oriansj>I also have the advantage of being willing to throw all cross-platform functionality in the trash
<oriansj>and willing to ditch optimizaations
<muurkha>yeah, that frees you up a lot
<oriansj>and if I output a platform neutral ISA instructions; then I can write a specializer in M2-Planet to convert that to M1 instructions and then I can write a second specializer that outputs a more advanced assembly after we bootstrap that
<oriansj>then all of this ELF crap can be ripped out of the C compiler
<oriansj>because that definitely never belonged in there
<oriansj>(in one's linker, sure) (in one's assembler, maybe) (but compiler? nope)
<oriansj>and take out dynamic linking support while I am at it
<oriansj>and remove the need for floating point support to compile C code
<oriansj>and tear out all non-deterministic bits
<oriansj>and just ditch the error recovery logic as fail fast works better for bootstrapping
<muurkha>sounds like a good direction to go in
<muurkha>if doing it all at once turns out to bog you down, you might try making the changes more incrementally
<muurkha>it might turn out that some aspects of the plan are good while others are not
<oriansj>well right now I am just cleaning up the TCC build to a minimal macro form
<muurkha>(though they all sound good to me at this level of detail)
<oriansj>and honestly I expect to fail a shitload
<oriansj>the biggest problem is finding time to write tests
<oriansj>as introducing bugs is very easy to do and a full and proper test suite doesn't quite exist in TCC
<muurkha>have you looked at Hypothesis?
<muurkha>I found it improved my testing cost/benefit ratio a lot
<muurkha>DRMaciver wrote a pedagogical version that's easier to clone in other languages called Minithesis if you don't like Python
<muurkha>Minithesis is 472 lines of Python
<oriansj>muurkha: I haven't seen it yet, nor heard of it as a program
<oriansj>but how would integrate with C programming?
<muurkha>well, for example, in https://github.com/kragen/dumpulse, I wrote my tests in Hypothesis
<muurkha>but the implementation was in C
<muurkha>so I compiled the implementation to a shared library and wrote https://github.com/kragen/dumpulse/blob/master/server.py to test the shared library using Python's cffi
<muurkha>but a different alternative would be to use Minithesis as the model for a property-based testing library that's actually written in C
<muurkha>and then write the tests in C
<oriansj>I'm fine with python, I just would need to get up to speed before I could be productive with it
<muurkha>yeah, I meant that you might feel that the rigmarole with cffi and shared libraries was a bit much (though it sure paid off in the case of Dumpulse)
<muurkha>the Hypothesis docs are pretty good: https://hypothesis.readthedocs.io/en/latest/quickstart.html
<oriansj>I don't deny that. I'm just thinking of all the C language lawyer style tests
<muurkha>yeah, there's definitely a real difficulty in knowing when a C compiler's output is correct
<muurkha>amusingly I just ran across a slide in one of DRMaciver's talks about that: https://drmaciver.github.io/hypothesis-talks/hypothesis-ipr0gram.html#/9
<muurkha>where someone found a bug in the CompCert proven-correct C compiler
<oriansj>well looking at the 23,619 lines of C that make up TCC. I must say; minus a few things done for linking and binary generation. everything else (minus a couple switch statements) are supported in M2-Planet
<oriansj>So if I can just break that bit off, then TCC would be rather quickly ported to M2-Planet
<muurkha>nice
<muurkha>what's the hair with linking and binary generation?
<oriansj>bitslices are a big chunk of it
<oriansj>and manually clearing out the macros is gonna eat a shitload of time
<muurkha>oh, like struct foo { int bar: 7; int baz: 3; };
<muurkha>?
<oriansj>mostly for assembly to bits
<stikonas>fossy: so are you happy now with "which" PR?
<stikonas>should I merge it?
<stikonas>or maybe I should also remove an empty file which.1 post-install
<doras>stikonas, fossy, the `i386-linux-musl` PR is up: https://github.com/fosslinux/live-bootstrap/pull/201
<stikonas>doras: I'm looking at your previous PR, in particular https://github.com/fosslinux/live-bootstrap/pull/197/commits/1918b12c614d1a931e31d6c802e6283a3004c1ee
<stikonas>I did rely a bit on those variables for development (to re-run stuff)
<stikonas>maybe we can save them to some file
<stikonas>so then one can source them easily
<stikonas>I'll mention it on PR...
<doras>stikonas: I also wasn't sure how to handle this best. The issue is that we can be dropped to bash at any point due to a build failure.
<stikonas>well, we have that trap, you can also add some handling there if you want
<doras>stikonas: maybe we could print them as part of the trap?
<doras>I mean, having them in the environment is a bad idea regardless, even for build re-runs.
<stikonas>yeah, printing them makes sense too
<stikonas>although we can do both
<stikonas>both print and save it into some fime
<stikonas>file
<stikonas>ok, just some minor comments for https://github.com/fosslinux/live-bootstrap/pull/197 looks good in general
<doras>stikonas: it would have been best to have each package build script load them individually from a file in its own shell execution.
<stikonas>doras: but how can we achieve this?
<stikonas>some variables are used outside build script
<stikonas>and build script does not run in its own shell
<stikonas>because subshells are initially a bit broken
<stikonas>I think subshell itself works
<stikonas>but traps don't work
<stikonas>possibly due to meslibc bugs
<doras>Why are traps important for us?
<doras>Can't we simply read status code of the bash execution?
<stikonas>hmm, I might be misremembering things a bit
<stikonas>but something was not working properly
<stikonas>you can try to edit things and see if first bash still works
<stikonas>but that's probably another big PR anyway
<doras>Hmmm...
<stikonas>when I was writing helpers.sh, my original plan was to have subshells for each package
<stikonas>but something was not working
<stikonas>I suspect it's due to bash using meslibc
<stikonas>which might go away at some point
<stikonas>(i.e. if we get gash working)
<stikonas>oriansj: can you pull in https://github.com/oriansj/bootstrap-seeds/pull/32 ?
<oriansj>stikonas: merged
<oriansj>and now I see why Fabrice Bellard stopped working on TCC
<oriansj>one would have to do some serious feature regressions to break it up into something easy to maintain.
<stikonas>hmm, that's a bit unfortunate...
<stikonas>well, M2-Planet -> tcc bootstrapping steps were a bit rough for some time now
<oriansj>yeah, that is my fault. I didn't estimate correctly the effort required for a spawned process.
<stikonas[m]>Not really your fault, it's just that gap between simple C compiler and real world compiler is fairly big
<oriansj>it is just a 5x between TCC and M2-Planet in terms of lines of code
<oriansj>and if I can break the pieces out; each might only be 2x
<stikonas[m]>Well, you only need C to M2 compiler
<stikonas[m]>Can even drop assembly support
<oriansj>M2 is C's core so it is possible to do (like a C99 to C89 compiler)
<stikonas[m]>If that's simpler...
<stikonas[m]>Which it might be
<oriansj>I clearly always go the harder route
<oriansj>so this will be a bif fiasco but atleast it'll be fun and educational
<oriansj>^bif^big^
<oriansj>especially given that TCC seems to break real hard when you comment out dlopen
<stikonas>strange, why would tcc depend on dlopen...
<stikonas>I thought it can work completely statically
***Andrew is now known as WaxCPU
<oriansj>maybe a build flag for TCC I needed to set
<oriansj>but it looks like everytime you hit a #include, it dlopen's it and uses dlsym to extract names
<oriansj>it in inside of a #ifdef TCC_IS_NATIVE block in libtcc.c
<oriansj>and the dlsym is in tccelf.c inside of a #if defined TCC_IS_NATIVE && !defined TCC_TARGET_PE
<oriansj>which is why TCC needs compiled libraries and doesn't use pure source libraries
<muurkha>it dlopens what, the .h file?
<muurkha>I don't see how that could work
<muurkha>dlsym is a pretty good way to extract names from a library, too bad libdl can't read stabs
<oriansj>muurkha: feel free to explain: https://paste.debian.net/1254199/
<stikonas>oriansj: does it still fail if you don't use libraries?
<stikonas>we only need to build tcc itself
<stikonas>or is it also the problem with libc
<muurkha>stddef.h seems to be tcc-$version/include/stddef.h
<muurkha>which mostly defines types like size_t and int64_t, although it does also declare alloca()
<oriansj>stikonas: well -E will be successful but doing tcc blah.c -o blah will error out with: tcc: error: file 'crt1.o' not found
<stikonas>that is a strange design choice...
<oriansj>but now we know even more about TCC
<oriansj>and the logic for compiling seems to be mixed in with the parsing.
<muurkha>yeah, that's one of the main design features of TCC
<muurkha>and also, say, Wirth's one-pass compiler for Oberon
<muurkha>or Crenshaw's series
<oriansj>hmmm.
<muurkha>I think it really hurts readability but it does require less code, especially in an environment without GC
<muurkha>though if you're running a compiler on a machine with 4 gibibytes of RAM I don't know why you need to free things. they'll get freed anyway when the compile finishes
<oriansj>gcc is broken up into a preprocessor, c compiler, assembler and separate linker right?
<muurkha>I don't think the preprocessor is actually separate from the compiler since GCC 2
<muurkha>previously it was called cccp
<stikonas>but assembler and linker are in binutils...
<oriansj>can GCC 2 build GCC 4?
<muurkha>though you can get the preprocessor output with gcc -E, I think it's actually the same executable doing the work (cc1? I forget)
<stikonas>well, combining preprocessor and compiler lets you avoid doing tokenization twice...
<muurkha>I haven't tried but I wouldn't be surprised. but I think maybe you meant "can GCC 1 build GCC 4?" and I think the answer is probably not
<stikonas>oriansj: guix starts with gcc 2.95, so you can check what is the next step there
<stikonas>how hard would it be to upgrade M2-Mesoplanet to have full C99 preprocessor?
<oriansj>muurkha: no, I am thinking of IF GCC is 4 independent programs (or could with reasonable effort be made so); there may be a route to building GCC directly from M2-Planet
<stikonas>I think we already support quite a bit
<aggi>minor note, TCC is missing one feature to compile recent linux kernel: an equivalent to gcc -S (to emit pre-generated assembly)
<oriansj>stikonas: well a handful of tweaks in the parser and then we would need to add a combine stage to fix the breakup between - and = into -=
<stikonas>gcc 4.0.4 already combines evertything into cc1
<stikonas>but I think C99 preprocessor is an easier task anyway
<oriansj>and then we can tweak the output to be more trivial to compile
<stikonas>at the very least it should be trivial to retokenize
<muurkha>oriansj: I think from the point of view of being 4 independent programs or not, GCC 2 is the same as GCC 4 or GCC 10
<oriansj>but does GCC combine tokenization and the compiling
<stikonas>I think tokenization is done by preprocessor
<stikonas>7a35239a2ad2f39220daa888650dbc44ba4a5664856997d2afb6165d305a8f82
<stikonas>argh, wrong paste
<stikonas> https://gcc.gnu.org/onlinedocs/cpp/Tokenization.html#Tokenization
<aggi>and they're talking about a separation into compiler-backends (for different architectures), and frontends (for different languages to parse/tokenize)
<aggi>TCC doesn't need different language frontends, it does C-lang only
<stikonas>well, it's more efficient to have different backends and frontends
<aggi>in between those language-frontends and backends, an intermediate representation exists, not sure how it was called
<ekaitz>oriansj: maybe this helps: https://ekaitz.elenq.tech/bootstrapGcc1.html
<stikonas>that way you only have to write the really hard bit (optimizer) once
<ekaitz>there are several ir-s in GCC
<ekaitz>one is called GIMPLE and other is called RTL
<ekaitz>RTL is target specific and GIMPLE is not
<ekaitz> https://www.cse.iitb.ac.in/grc/index.php?page=videos <-- you can learn about those here
<oriansj>ekaitz: oh, I am skipping absolutely ALL optimizations
<muurkha>stikonas: some optimizations are target-dependente
<aggi>ekaitz: interesting, however, my plan is to avoid GNU-toolchain (gcc,binutils) entirely, and use TCC only
<oriansj>so we only need the bit that reads the C code and outputs the IR
<ekaitz>it's a little bit hard to understand that there are GIMPLE and RTL IRs in GCC because normally we think about the IR as an AST only, but GCC uses a different compiler architecture than we are used to
<stikonas>still, it's only some of them. Otherwise if you have n languages and m arches, you would have to write n * m compilers if compiler could only target once arch
<ekaitz>in GCC the IR you are talking about is RTL, and that's not useful for you because it's target specific
<stikonas>anyway, we don't do optimizations in bootstrap at all
<muurkha>ekaitz: I think there are a lot of IRs that aren't ASTs
<muurkha>also didn't GCC add a third IR a few years ago?
<ekaitz>muurkha: yes, of course, but it's what many people have in mind when they think about compilers
<ekaitz>muurkha: I'm not sure about that, what I know is they started to focus more on the GIMPLE and added GIMPLE optimizations (tree-level optimizations)
<ekaitz>GCC is weird because they optimize the tree and later optimize the RTL
<ekaitz>the conversion between the GIMPLE and the RTL happens in a very simple level, but then the RTL is matched against some templates that then generate the assembly
<ekaitz>(it's superhard to explain btw, please don't kill me)
*muurkha hugs ekaitz
<ekaitz>(the link I posted is the best I can do)
<ekaitz>muurkha: thanks for the understanding :))
*oriansj hugs ekaitz
<ekaitz>so, about the language frontends: they generate a GIMPLE or a similar representation that is later converted to GIMPLE by the next step
<ekaitz>oriansj: :D
<ekaitz>if you want to watch how does gimple works, the gcc internals video series I shared spend a lot of time in that
<ekaitz>i'm not the best to explain that step, because I only worked in a backed, so I was focused on the RTL only
<oriansj>ok anyone know a way to extract the bit in C that just does the C code to gimple output
<ekaitz>oriansj: that should be the language frontend, which is pretty much independent
<ekaitz>oriansj: see gcc/c-parser.c
<ekaitz>or better gcc/c-*
<ekaitz>if you want to see how is that process made maybe the fortran frontend is easier to read: it's in the gcc/fortran/ folder
<muurkha>is there a command line option to GCC to dump out the GIMPLE representation?
<ekaitz>muurkha: yes
<ekaitz>muurkha: there are many but I can't remember
<ekaitz>muurkha: you can dump all the GIMPLE optimization steps
<ekaitz>muurkha: -fdump-tree*
<ekaitz>or maybe I'm wrong :S
<stikonas>oriansj: I wonder if this supports C99... https://github.com/logological/gpp/
<ekaitz>it was correct, the format is the following `-fdump-<ir>-<passname>` with that you can choose
<ekaitz>if you do `-fdump-tree-all` you'll see the GIMPLE is kinda complex and has many optimization passes
<ekaitz>also oriansj scrolling back... yes, the preprocessor I think it's an independent program: cpp
<ekaitz>but all this is kind of dangerous to extract directly from the codebase because some files are generated by the build process so be careful with that
<muurkha>hmm, I do have a separate cpp executable!
<muurkha>so maybe I was just totally wrong about that?
<muurkha>is it used in normal compilation, the way cc1 and ld are?
<ekaitz>muurkha: yes, it's a separate program, you can even run it separately and see it's output
<ekaitz>also I don't think cpp does really a tokenization... it only works around the preprocessor directives I think
<ekaitz>muurkha: yes, that's what is used during the normal compilation
<ekaitz>I think you can `gcc -v ` and see all the internal calls to other programs
<muurkha>strace says it's running gcc, cc1, as, collect2, and ld, but not cpp
<muurkha>strace -ff -o hellocompile gcc hello.c
<muurkha>grep exec hellocompile.*
<ekaitz>try with `gcc -v hello.c`
<muurkha>I did but it didn't tell me anything it was running
<ekaitz>hmmm
<ekaitz>let me try
<muurkha>oh it did mention as
<muurkha>and cc1
<muurkha>I just missed it amid all the noise
<muurkha>oh, and it mentioned collect2 as well
<unmatched-paren>there's https://github.com/h8liu/mcpp
<muurkha>gcc -v doesn't seem to mention ld (or the gcc driver executable itself)
<muurkha>this is on GCC 4.7 FWIW
<ekaitz>muurkha: try with -no-integrated-cpp
<ekaitz>that way it will call a separate cpp
<ekaitz>if you read that option in the man page it says by default gcc does the preprocessor in the same tokenization step, but you can make it call it independently
<ekaitz>I didn't know this but, hey! here we are to learn together
<muurkha>that's awesome!
<ekaitz>muurkha: ld is called by collect2
<ekaitz>that's why you can see it in the strace but not from the -v
<muurkha>oddly enough in that case instead of invoking the cpp executable it invokes cc1 -E to do the preprocessing
<muurkha>yeah, I thought the ld thing might be something like that
<ekaitz>the collect2 i think also calls the LTO and stuff like that, that's why we need a different program not just the ld call
<ekaitz>muurkha: cc1 -E... interesting
<ekaitz>muurkha: i think cpp also calls cc1
<ekaitz>give it a try with strace
<muurkha>oh hey, you're right, cpp is a wrapper around cc1
<muurkha>why does it have to be 600K then!?
<oriansj>oh my https://gcc.gnu.org/onlinedocs/gccint/GIMPLE-instruction-set.html#GIMPLE-instruction-set GIMPLE isn't that far off from what M2-Planet has internally
<muurkha>GIMPLE_PHI is presumably the operation for unifying two different assignments on different control paths so you can have SSA?
<muurkha>there's an overview in https://gcc.gnu.org/onlinedocs/gccint/GIMPLE.html
<ekaitz>muurkha: I shared a link before from the university of Bombay, they explain it really well too
<muurkha>yeah but it was videos, right? that's what the URL said
<muurkha>so I didn't follow it
<muurkha>by contrast that doc overview page is 500 words, you can read it in a minute and a half
<oriansj>(OP R_out R_in1 R_in2)
<oriansj>I can imagine looking at GIMPLE and GENERIC is C to S-expressions
<ekaitz>muurkha: the difference is the docs are... not really clarifying :)
<ekaitz>oriansj: kinda, but not really... it's more like a some kind of assembly with tree-way instructions
<oriansj>ekaitz: (jne :lable R0 R1)
<ekaitz>oriansj: haha i see what you mean
<ekaitz>if you output the gimple to a file it's still looking like a C subset or something
<ekaitz>but yeah, it's a three-way thingie
<oriansj>see lisp originally mapped straight to assembly; (add a b) => (RETURN (load R0 :a) (load R1 :b)(add R0 R1))
<muurkha>that's sort of a decompilation of the gimple though
<oriansj>this is all is giving me ideas
<muurkha>:)
<ekaitz>oriansj: if I can give you more ideas... just squeeze me
<oriansj>I am starting to think we have been doing C compilers the way that makes things harder than needed.
<ekaitz>oriansj: specifically GCC is REALLY complex
<ekaitz>need for speedTM you know
<ekaitz>C itself is not that complex to need this amount of code... GCC has millions of lines split in thousands of files
<oriansj>well GCC is overly engineered to enable chasing every possible optimization.
<oriansj>some of that engineering is a really good idea.
<ekaitz>yeah but the basics are overcomplicated
<ekaitz>i think a C nanopass compiler is possible
<oriansj>slightly but not in a way that can't be cribbed off in a productive fashion.
<ekaitz>and it could be enough
<oriansj>ekaitz: my thoughts exactly
<ekaitz>:)
<oriansj>so C-orchastrator spawns c-preprocessor (reads all sources and dumps a single source file in a form easiest to parse), c-compiler (reads dump and produces an IR dump), c-optimizer (skipped but easy to add), c-specializer (converts IR to target assembly), machine-optimizer (skipped but easy to add), assembler (converts assembly to object code), link-time-optimizer (skipped but easy to add), linker (converts object code to final binary)
<oriansj>everything after the c-compiler becomes sharable by different programming languages
<oriansj>hmmmm
<oriansj>and we know all the C keywords and tokens that TCC supported, so once we support those correctly we should be able to Compile GCC too
<ekaitz>oriansj: the c-specializer is done in two steps that are pretty separate actually
<ekaitz>the gimple opcodes are matched in a hardcoded table to a generic RTL template
<ekaitz>then the RTL is matched to the machine specific RTLs
<ekaitz>so the RTL is not just a target related thing, it serves too purposes
<ekaitz>when you write a new backend you don't need to write how is the GIMPLE matched to the RTL, but you write how the RTL is converted to assembly
<ekaitz>and also you give some info about the machine so the RTL is generated with that in mind
<ekaitz>it's hard to explain
<ekaitz>but here the real IR is the RTL, but the GIMPLE started to gain some importance in the process some years ago
<ekaitz>in the videos of the university of bombai they explain it correctly
<ekaitz>GCC follows an architecture that is not very common
<ekaitz>there are two ways to make compilers: one that separates the backend from the frontend in a target independent IR and optimizes on top of that one
<ekaitz>and the other that has a target dependent IR and optimizes on that
<ekaitz>the first is the one that is usually taught in books like the dragon book and stuff like that
<ekaitz>but GCC follows the second
<ekaitz>so GIMPLE is not really the main IR in GCC (but it has become more important last years)
<oriansj>if you can find a link to those videos you mentioned, please share so I can take a look when I get a few free minutes
<ekaitz>the technical word i was looking for: GCC follows the Davidson Fraser Model
<ekaitz>in contrast to the Aho Ullman model
<oriansj>as I am planning on using as many good idea as I can
<ekaitz> https://www.cse.iitb.ac.in/grc/index.php?page=videos
<ekaitz>those are the videos
<oriansj>thank you
<ekaitz>so the thing is GCC uses an expander that takes the GIMPLE and expands the code to register transfers, then optimizes that and uses a pattern recognizer to generate the assembly
<ekaitz>while the aho ullman style compilers generate an AST, optimize it and finally generate the target code
<ekaitz>of course, the aho ullman compilers can have some peephole optimizations in the target code too, but they don't normally have many
<ekaitz>in the case of gcc, they started adding tree-level optimizations later
<ekaitz>so now it's kind of a mixed approach
<ekaitz>(sorry for the supermegalong explanations, I got excited, I normally don't have the chance to talk about these things...)
*ekaitz feels alone sometimes
<oriansj>me hugs ekaitz
*oriansj hugs ekaitz
<oriansj>you are with friends here
<oriansj>and if you ever need to talk to someone feel free to give me a call on Signal
<ekaitz>:)) thank you
<oriansj>and you should have that number as you are a friend after all
<ekaitz>oriansj: don't worry mate I don't use signal either, i'm fine with this IRC chat :)
<oriansj>I'm also on matrix for people who prefer different communication methods
<muurkha>oriansj: part of GCC being overcomplicated is a result of GCC being 35 years old
<ekaitz>muurkha: I don't think that justifies everything
<muurkha>I'm pretty sure it contains decisions that we've known were a bad idea for 25 years but have been too much work to change
<ekaitz>I'm almost 35 and I'm not that complex after all LOL
<ekaitz>the Davidson Fraser Model was one of those weird decisions they are regretting now
<ekaitz>there were some efforts to remove RTL too
<ekaitz>and use other kind of representation...
<muurkha>oriansj: you might want to read one of the early papers on Steve Johnson's pcc compiler, which was split into a front end and back end that communicated through a byte stream
<muurkha>more recent versions of pcc don't work that way
<ekaitz>but that's not the only reason... GCC is also a Compiler Generation Framework, and that makes the thing really complex
<muurkha>I think it's pretty common for compilers to have both a target-independent IR and a target-independent one. some even have a much larger number of IRs
<muurkha>ekaitz: I'm not saying it justifies things; there are probably also mistakes in GCC's design that someone who wasn't writing their first compiler would have avoided
<muurkha>although I'm not sure what they are
<ekaitz>muurkha: GCC was built by people doing research so I don't think it has novice mistakes
<ekaitz>It's a really good piece of software
<ekaitz>it supports hundreds of architecture
<ekaitz>s*
<ekaitz>it's easy to port
<ekaitz>it's powerful...
<ekaitz>but in the end it's impossible to read by an individual
<oriansj>the only weakness is that it isn't easy to bootstrap
<stikonas>well, M2-Planet is even easier to port, supports a few (7 or so) architectures, fairly easy to read by an individual but not that powerful...
<oriansj>stikonas: actually it could have been even easier to port if I was smarter about it and did IR output and architecture specializers but it would have been harder to do cc_* support
<oriansj>so I guess I have a top down plan for M3 instead of just the bottom up one I was previously planning
<muurkha>ekaitz: no, Stallman wasn't doing research and he was a novice at compilers, though he had a quarter of century of experience programming
<muurkha>Cygnus wasn't a research venture either
<muurkha>though it did have people with compiler experience
<aggi>binutils and assembly, are a headache, to detangle from GNU too
<oriansj>aggi: I think I can solve that
<oriansj>and give you a better than TCC compiler
<aggi>oriansj: my focus is the _entire_ system integration, and for this i use tcc-toolchain simply because it is possible to it with
<aggi>and this reveals problems, which will be hit by any other approach, when detangling GNU-toolchain and GNU-buildsystem
<aggi>hence, even when you prefer another compiler (cproc), assembler whatever, it will be much easier, if any distro removed _all_ c++ dependencies already
<aggi>and if such a distro did pass with tcc-toolchain already too
<muurkha>cproc?
<aggi>that's another alternative compiler, didn't test this one yet
<aggi>anyhow, the rationale with the system integration approach of mine, covers different aspects than bootstrappable ones too
<muurkha>ah, hadn't heard of it
<aggi>including full removal of GNU-toolchain/buildsystem, removal of bashism, no-c++, etc.
<ekaitz>muurkha: cproc is really interesting, it uses qbe backend
<aggi>suckless or toybox userspace integration, removal of GNU make replaced with POSIX make
<aggi>POSIX-shell only, no bashism
<ekaitz>aggi: but posix is not a really well defined standard either
<ekaitz>what to do with the kernel api?
<muurkha>BSD make?
<ekaitz>posix is old-ish
<muurkha>or heirloom-tools make?
<aggi>if possible, not GNU make, i'll try this one https://frippery.org/make/
<aggi>with kernel, the plan is ROLL-BACK to linux-2.4, for various reasons
<aggi>this won't be POSIX-complete, however, with GNU, it is their extensions beyond POSIX which are a trouble-source
<muurkha>dunno, when I spent all day hacking makefiles I considered the lack of extensions beyond POSIX to be a trouble-source!
<muurkha>ekaitz: qbe is interesting but I haven't tried using it
<aggi>muurkha: suckless.org Makefiles tend to be simpler than the typical GNU makefiles (and even worse than what autoconf/automake excrete)
<aggi>toybox passed with tcc-toolchain already (except wget widget), not sure yet what trouble awaits when switching to POSIX make
<aggi>that's why, i want to do it
<aggi>and i won't shed that many tears, if various software is sacrificed due to this type of problems