IRC channel logs

2022-02-19.log

back to list of logs

<stikonas>yaeh, it would have been nice to have a tag but it's not such a big deal
<stikonas>definitely much better state than autogen is in...
<muurkha>heh
<fossy>stikonas: i'm doing the very last part of my PR now, which is different checksums (bc. checksums in script isn't great). do you think for checksumming tarballs put them in a global file or keep the sys/pkg/checksums file but with a single checksum?
<stikonas>hmm, can we easily extract a single checksum that we want to check?
<stikonas>if we put everything in one file?
<stikonas>in principle one file might make things easier
<fossy>yes
<fossy>just grep the single file out
<fossy>single tarball filename*
<stikonas>ok, if we already have grep
<fossy>oh uh
<stikonas>i.e. later we can add command line flag for development to create that file rather than read
<stikonas>so that one can obtain checksums for the whole bootstrap chain without failing every single step
<fossy>yes, and can be trivally done manually now anyways
<fossy>with this PR
<fossy>if you just comment out the checksum check
<stikonas>good
<stikonas>because checksumming was getting a bit unscalable
*fossy nods
<stikonas>also, I haven't look at latest pushes, but at some point link search stuff was printing a lot of warnings
<stikonas>is that still the case
<stikonas>?
<fossy>oh, that is something else i need to fix, a number of command not found warnings are not properly nulled, thanks
<stikonas>hmm, actually only a few packages create symlinks
<stikonas>anyway, you already wrote the code to deal with them, so I guess let's keep it
<fossy>yes, there are only 75 symlinks, across 15 packages
<fossy>16*
<stikonas>btw, at some point I commented that some tarballs are empty
<stikonas>"You might already be aware but autoconf-2.12_0.tar.gz, autoconf-2.13_0.tar.gz, musl-1.1.24_2.tar.gz packages are empty. Not sure what's going on."
<stikonas>is that still the case?
<fossy>musl fixed, those two autoconf is not, oddly, i thought i fixed autoconf
<stikonas>I also had some problems with qemu
<stikonas>memory-wise I think it was fine
<fossy>qemu should? be fixed now, although i need to rerun the latest changes on qemu -- there could be some unsolved reproducibility issues there
<stikonas>ok, good
<stikonas>anyway, sounds like you are almost there
<fossy>yes
<stikonas>fossy: after that should we try to minimize rootfs.py a bit?
<stikonas>as doras requested
<stikonas>I thought a bit about how to do that
<fossy>yes, i think we should try to do that next
<stikonas>and I think we can make it much smaller
<stikonas>first step is to rename /after to sysa
<stikonas>then it will have the same path in bootstrap as in the repo
<fossy>we can get the structure into a place where rootfs.py does close to no preperation
<stikonas>yeah
<stikonas>then after that we can try to deal with sources
<fossy>how do you mean?
<stikonas>i.e. copy froun /sources rather than /sysa/${pkg}/src
<fossy>oh, right
<stikonas>that's another big thing rootfs.py does
<stikonas>but we can just do it in kaem/bash
<stikonas>in the end I think one should be able to do something like that: take stage0-posix dir, copy over live-bootstrap on top (the only think that will be overwritten would be after hook) and run kaem-optional-seed
<stikonas>oh, and put tarballs into sources
<stikonas>anyway, let's merge your PR first
<stikonas>and then do /after -> /sysa rename
<stikonas>I briefly look at it, should be quite easy, but it changes almost all checksums
<fossy>but that will be quite easy, now
<stikonas>so I postponed it till your PR is done
<stikonas>yes. I was actually initially surprised that /after->/sysa affects all checksums
<stikonas>but it turns out tcc puts its build path into binary somewhere
<muurkha>huh, I wonder if that's fixable
<stikonas>well, it's a minor inconvenience
<fossy>hmm, i reckon i know where it does that
<stikonas>we don't change build path often
<muurkha>I mean obviously it's possible to fix, but I wonder if the cost is low enough to be worth the benefit
<muurkha>we don't, it's true
<stikonas>probably not...
<muurkha>but it offends me aesthetically ;)
<fossy>not really, i reckon it's in --help, but i'm not too worried
<muurkha>also it could be a privacy leak if someone uses tcc for something else
<stikonas>gcc might do that too
<stikonas>if I remember correctly
<fossy>well, not in live-bootstrap, but generally for tcc, yes
<stikonas>normally it just leaks that you use /usr prefix
<muurkha>right
<stikonas>so that's not a big privacy leak
<muurkha>but sometimes /home/stikonas/.local/bin
<muurkha>also if you validate a compiler in /home/stikonas/.local/bin and then install the same compiler systemwide it's kind of offensive that that will cause your build artifact checksums to change
<stikonas>that would be if I used stikonas as my username, which I don't :D
<stikonas>though you can probaly find what I use in some of my pastes here
<muurkha>and you could imagine people mapping snapshots into the filesystem and getting different checksums when building from /vob/toolchain/version/0/1/1/tcc/bin/tcc than /vob/toolchain/tag/stable/tcc/bin/tcc
<muurkha>but it's kind of out of scope maybe
<fossy>still, that's not a problem for live-bootstrap, because paths are deterministic
<muurkha>right
<muurkha>except when we change them anyway
<fossy>yeah lol
<oriansj>I swear the more I look at C macros, the more I wonder how it even ever got added in to the standard.
<oriansj>I can even understand #ifdef, #ifndef, #else and #endif being useful for flag, architecture and compiler specific code paths.
<oriansj>(the lack of #elifdef and #elifndef seems like a minor mistake that looks like it is going to be fixed in the next version of C)
<oriansj>as the second I try to do #if support in the cc_reader.c of M2-Mesoplanet, boom in drops the *ENTIRE* C macro support
<stikonas>do we actually support everything now?
<stikonas>probably not, at least we don't support stuff that's used to concatenate strings, etc...
<oriansj>stikonas: well yes we don't support things yet but I am referring to the User facing space of what we would need to support
<oriansj>aka, if I make #if $thing #include $file behavior decide to load the file or not
<oriansj>I would have to evaluate $thing
<oriansj>which could be anything
<oriansj>C preprocessing is strangely a turing complete interpreted language
<oriansj>sorry almost turing complete
<oriansj>maybe I'm just frustrated from trying to shoe horn that logic into cc_reader.c and it is basically forcing me to rewrite cc_macro.c in a very very complex an easy to get wrong fashion.
<oriansj>however because most compilers don't support #elifdef and #elifndef yet, I can't take the simple path out of restricting to __$arch__, __$OS__ and __$flag__
<oriansj>which would be a nothing function to support and provide the desired "don't #include things inside a block that isn't going to be used"
<stikonas[m]>I see...
<oriansj>maybe I am just cursed to always hit walls when dealing with interpreted languages
<stikonas[m]>C macros are at least not Turing complete, even if they are "close". But C++ templates are. And comes with its own halting problem
<stikonas[m]> https://medium.com/@mujjingun_23509/full-proof-that-c-grammar-is-undecidable-34e22dd8b664
<oriansj>You know the C core is clean and simple
<oriansj>but because you need libraries and people don't want to specify the source files they depend upon, we get #include and that is fine until now you are sharing code and then you need different includes for different architectures but hey that preprocessing logic can easily be expanded to do....
<oriansj>I know that some of the features are helpful for debugging (why can't you just use gdb?)
<oriansj>or various other useful things (like adding support for NEW keywords in the language where you shouldn't do that)
<oriansj>but at some point, can't people just admit they don't want to write C code and just have a different programming language generate the code they are compiling and stop stuffing crap into the C spec to make their mess bigger
<oriansj>sure __FILE__ and __LINE__ as magic defines; no problem they already have that info stored in them internally for debugging of cc_reader.c and basic error messages from the C compiler
<stikonas>I'm not too familiar with old C but I wonder when all these preprocessor features appeared, some might be from before C was standartized
<oriansj>well they certainly where there before GNU
<oriansj>the historical fact that the C preprocessor has been used as a preprocessor for other languages (like FORTRAN) suggests a cause for many of the features added.
<nimaje>of course your interpreter should be written similar to the language you are implementing https://minnie.tuhs.org/cgi-bin/utree.pl?file=V7/usr/src/cmd/sh/mac.h
<oriansj>nimaje: yeah...
<oriansj>the lets make C compiler compile our non-C code because we can't get a language X compiler for arbitrary reasons
<oriansj>I swear half of all programming sins are the result of desperation
<oriansj>and the other half is hubris mixed with sloth
***aisha[m] is now known as epsilonknot
<stikonas>at least you don't have to write any of that macro stuff in assembly. By the time we need to implement it we can use core C constructs.
<unmatched-paren>is it possible to implement inline assembly in an interpreter? fpc uses it in a number of places, so if it's not possible my entire Pascal bootstrap plan won't work...
<unmatched-paren>could you write all inline asm to a tmpfile, then gas/nasm it, write the resulting machine code to memory, then jump execution to it somehow?
<bauen1>unmatched-paren: i don't see why not ? you might have to interpret the assembly used too, but that is possible
<bauen1>unmatched-paren: or is it too many instructions / performance critical so interpreting isn't an option
*unmatched-paren remembers that libyasm exists, but it only supports two archs...
<unmatched-paren>there's nothing performance critical, this isn't an integral part of the bootstrap (if it was, someone else would be doing it :P)
<unmatched-paren> <https://codeberg.org/unmatched-paren/rascal-boot/> is the interpreter i'm trying to extend so i can boot fpc
<unmatched-paren>i'd like to not have to worry about different archs, and just support everything that rust supports
<unmatched-paren>i'll probably need to use https://lib.rs/memmap2: https://stackoverflow.com/questions/66242567/how-to-jump-to-call-arbitrary-memory-in-rust
<unmatched-paren>anyway, this isn't something i really need to worry about now that i know it's possible
<unmatched-paren>i wouldn't be able to use `asm!()`, since that's a macro it just compiles asm into the binary, so you can't dynamically execute arbitrary asm
<oriansj>unmatched-paren: well inline assembly is generally more of a JIT'd Interpreted construct and if you are *REALLY* careful you could do it. Common Lisp is a classic example of an interpreted language that supports inline assembly. So it is entirely possible, it just is much simpler to implement in a compiled language.
<oriansj>in a compiled language like C or pascal, you can literally just let the strings pass through the compiler with little or no changes. With an interpreted language, you will need to support a full assembly syntax (a full assembler honestly, minus the elf bits) and glue logic to wire in your blob in memory and deal with stack problems that might occur.
<oriansj>So very possible to do if you need to do it that way.
<oriansj>I am however extremely biases towards compilers over interpreters.
<unmatched-paren>oriansj: hm, i'm leaning towards dropping the interpreter and rewriting it as a compiler then
<unmatched-paren>i dislike interpreters too, it's just that one already existed
<unmatched-paren> https://github.com/tylerlaberge/rascal
<unmatched-paren>maybe i should look into llvm
<unmatched-paren>i was doing that with ocaml-llvm before i discovered that ocaml wasn't completely bootstrapped
<oriansj>unmatched-paren: well it is more of a question of what seems like more fun for you
<oriansj>see bootstrapping is a long term sort of activity and the more enjoyable you make it for yourself, the better the odds you'll achieve more of your goals.
<unmatched-paren>from what i've learned today, i think i'm setting myself up for pain when i try to implement advanced features (like inline asm)
<unmatched-paren>i was in the process of rewriting the parser to use the rust-nom parser combinator library, to make it easier to add new rules
<unmatched-paren>so i'm throwing out a sizable portion of the code anyway
<oriansj>also don't feel limited to only already bootstrapped languages
<oriansj>as someone else might come along and bootstrap the missing piece to the language you wish to use
<unmatched-paren>to be honest, i'd like to do it in haskell, but... yeah...
<oriansj>haskell is fine
<oriansj>it'll get bootstrapped
<oriansj>yeah it isn't perfect now but then again GCC wasn't bootstrapped 2 years ago
<unmatched-paren>i'd prefer to do it in something already bootstrapped
<oriansj>if you wish, that is entirely your choice
<unmatched-paren>afk for a bit, sorry
<oriansj>no worries
<oriansj>but I do suggest you don't limit yourself to only already bootstrapped languages. Especially ones that are already in the process of being bootstrapped. As a partial bootstrap chain for a language set we don't have yet will be useful in the future.
<oriansj>M2-Planet and mescc-tools development would have been considerably delayed if I restricted myself to only the M0 assembly that I already bootstrapped.
<unmatched-paren>oriansj: are there any resources on learning x86_64 asm that you recommend? i can't find any books
<unmatched-paren>i'll need to figure out which compiler backend to use, preferably one that minimizes the amount of time i have to spend thinking about architecture-specific stuff
<unmatched-paren>llvm is apparently quite manual, but its rust bindings are better
<unmatched-paren>(than gcc)
<unmatched-paren> https://github.com/bytecodealliance/wasmtime/tree/main/cranelift is a pure-Rust compiler backend, but architecture support is very poor rn
<unmatched-paren>could i compile to the jvm maybe?
<unmatched-paren>hm...
<unmatched-paren>or i could compile directly to C, but that's basically what GCC does
<unmatched-paren>i could write my own gccjit bindings, but i'm lazy :)
<unmatched-paren>ugh, jvm has no inline asm: https://stackoverflow.com/questions/67592482/assembly-are-there-any-languages-other-than-c-and-c-that-allow-for-interacti
<unmatched-paren>it might be less painful doing gccjit bindings if i write the compiler in D instead of Rust, since D is compatible with the C ABI
<unmatched-paren>so bindings are easy
<Avichi>What I wonder about is why languages like Haskell still use Cpp, even though a functional alternative exists (m4)
<unmatched-paren>or i could just use good old C :)
<unmatched-paren>which would basically be writing a modern version of GPC
<unmatched-paren>oriansj: do you know if https://ziglang.org is bootstrapped?
<unmatched-paren>it is, actually, interesting
<unmatched-paren>buuut very immature
<stikonas>unmatched-paren: good way to learn x86_64 asm is to go over stage0-posix and review it
<stikonas>writing it would be better but we already have amd64 port of stage0-posix
<stikonas>stage0-posix is how I learned assembly (well, risc-v at least but I picked up a few bits of amd64 too)
<unmatched-paren>ah, there's a lot of comments in there, yeah
<unmatched-paren>there's so many magic numbers in there
<stikonas>unmatched-paren: you can look at prototypes
<stikonas>if you look at hex0.hex0 you are learning machine code, not just assembly
<stikonas>if you don't care about machine code, then just look at https://github.com/oriansj/stage0-posix/blob/master/AMD64/NASM/hex0_AMD64.S
<unmatched-paren>that's what i'm looking at right now, actually :)
<stikonas>and we also have high level prototypes https://github.com/oriansj/stage0-posix/tree/master/High%20Level%20Prototypes
<unmatched-paren>there's still an uncomfortable amount of magic numbers
<stikonas>unmatched-paren: https://github.com/oriansj/stage0-posix/blob/master/AMD64/NASM/hex0_AMD64.S ?
<stikonas>like what?
<unmatched-paren>like the syscall opcodes
<stikonas>577 or 448 ?
<stikonas>ok, those yes
<unmatched-paren>yes, exactly that page
<stikonas>there is no way around that
<stikonas>syscalls are always some numbers
<stikonas>yes, in high level language it will be some define
<stikonas>but in the end syscall number is just some constant, there is no way around that
<unmatched-paren>i remember seeing #defines in assembly code once
<stikonas>they are not even the same across different arches
<stikonas>the good thing is that number of syscall does not grow that much in later programs
<stikonas>in cc_amd64 we add some more, but not that much more
<stikonas>there are only 8 syscalls in cc_amd64 and not all of them different
<unmatched-paren>is there anywhere i can look up a (brief) description of each instruction? it's pretty obvious what each of them do (like cmp is for doing comparison), but not how to use them (how do i extract the result of the cmp? what number represents less/greater/equal? etc)
<stikonas>hmm, for risc-v I used risc-v isa manual
<stikonas>there should be something for amd64 too
<stikonas>those manuals are large, but you can use search and it's usually just a few pages that you are interested in anyway
<muurkha>oriansj: C macros got added into the standard because they were in wide use in existing C codebases, where they provided limited forms of separate compilation, manifest constants, inlined functions, parametric polymorphism, and as you point out also compile-time metaprogramming
<stikonas>there is https://www.amd.com/system/files/TechDocs/40332.pdf
<stikonas>(which I have never looked at)
<unmatched-paren>if the manual for risc-v is big, the manual for amd64 is probably gargantuan
<stikonas>unmatched-paren: but out of 120 pages of risc-v manual I was mostly using 2 pages or so
<muurkha>also, if you compare the C preprocessor to its predecessors m4, m6, and GPM, it's small, clean, and simple
<stikonas>most of it I have not read
<muurkha>the fact that the C preprocessor was implemented as a separate program meant that on the PDP-11 you could get all kinds of powerful programming features that would have strained the 64K memory space of the compiler
<muurkha>yes, stikonas, all these preprocessor features are indeed from before ANSI
<stikonas>well, that's why I thought it makes sense that it grew a lot
<unmatched-paren>i wish dlang didn't have a garbage collector, i'd use it all the time if it didn't
<stikonas>before standards it was probably: this is useful for me, let's add it
<unmatched-paren>if you disregard the gc, it's just slightly more powerful and less warty c
<muurkha>there are smaller introductions to amd64
<unmatched-paren>the @nogc attribute basically bars you from using the stdlib entirely, since it uses the gc extensively
<muurkha>stikonas: really C was distinguished rom its predecessors by *not* adding lots of features
<muurkha>*from
<muurkha>unmatched-paren: yeah, I've used #define in assembly code
<stikonas>muurkha: probably, I've never read any amd64 ISA instructions, so I'm not the best person to judge
<unmatched-paren>could i learn asm by writing small c programs and disassembling them?
<unmatched-paren>with gdb
<stikonas>unmatched-paren: that's hard for a few reasons
<stikonas>you can learn a bit
<stikonas>but first of all you need to turn off all optimizations
<unmatched-paren>yes, that makes sense
<stikonas>and sometimes asm code that compiler generates looks quite a bit different from what person would write
<unmatched-paren>right
<stikonas>especially for amd64 it would use much more instructions
<stikonas>less of a problem for risc instructions as they don't have too many spare ones
<unmatched-paren>for syscalls, they wouldn't be documented in the isa since they're os-specific, so i guess there's some linux documentation that lists them?
<stikonas>unmatched-paren: yeah, there are some websites
<unmatched-paren>the manpages only seem to show you how to use them from c
<stikonas>unmatched-paren: https://marcin.juszkiewicz.com.pl/download/tables/syscalls.html
<stikonas>unmatched-paren: I also used musl C library to get those numbers
<stikonas>although glibc would also have them
<stikonas>unmatched-paren: see https://git.musl-libc.org/cgit/musl/tree/arch/x86_64/bits/syscall.h.in
<unmatched-paren>stikonas: thanks!
<unmatched-paren>i'll try writing a hello world
<stikonas>yeah, the steep curve of learning assembly is that it forces you to use syscalls from the very beginning
<stikonas>as you start writing more complicated programs, you actually start doing more normal coding
<unmatched-paren>the musl header looks pretty useful, but i'm still not about things like which registers to put input into
<stikonas>obviously assembly itself doesn't have built-in functions but you should still implement them
<stikonas>unmatched-paren: search for syscall calling convention
<stikonas>it's always the same for all syscalls
<stikonas>if I remember correctly rax would syscall number, rbx is first argument then rcd, rdx
<stikonas>s/rcd/rcx/
<stikonas>you'll rarely use syscall with more than 3 arguments
<stikonas>unmatched-paren: Hello world will be about 60 lines of assembly
<stikonas>if you try to write reusable code
<unmatched-paren>to get a stdout, i use open on /dev/stdout, right?
<stikonas>stdout has file descriptor 1
<unmatched-paren>ah, it's automatically opened?
<stikonas>so you just call open with argument 1
<stikonas>no, you need to open it
<unmatched-paren>ok... i thought open was for creating new file descriptors mapped to a file on the filesystem
<stikonas>hmm, it's probably openat actually
<unmatched-paren>i found a syscall reference https://faculty.nps.edu/cseagle/assembly/sys_call.html
<stikonas>that you need to use
<stikonas>oh, sorry, I mislead you a bit
<stikonas>so you don't need to open stdout
<stikonas>but you need to pass 1 when you call sys_write
<unmatched-paren>ok, so it's automatically opened
<unmatched-paren>as 1
<unmatched-paren>right?
<stikonas>yes, it's already there
<stikonas>stdin is 0, stdout is 1, stderr is 2
<stikonas>yes
<stikonas>I confused it with openning some files passed as an argument in stage0-posix (and if file is missing we used stdout for output)
<stikonas>so yes, those standard ones are open
<stikonas>anyway, try printing some char first
<stikonas>with sys_write
<stikonas>instead of Hello world, start with 'H'
<unmatched-paren>the cause of my confusion was that i thought that that was something handled by the standard library or runtime of a programming language
<unmatched-paren>so i'd assumed that it wouldn't be handled in raw assembly
<unmatched-paren>stikonas: ok, thanks for the tips :)
<stikonas>well, when you have file path, you still need to call sys_openat
<unmatched-paren>yes
<stikonas>to go from syscall to file descriptor
<stikonas>well, at least in this POSIX assembly
<stikonas>on baremetal of course there are now syscalls or files...
<stikonas>s/now/no/
<stikonas>well, I guess x86 has bios calls
<stikonas>so you might still be able to write hello world there
<stikonas>but e.g. on arm you can't...
<unmatched-paren>and then the resulting file descriptor would be written to ebx, according to the table i found
<unmatched-paren>actually, this table doesn't mention openat, only open...
<stikonas>openat is newer syscall if I remember correctly
<stikonas>amd64 probably has open (historical) and newer openat
<stikonas>newer arches on Linux do not implement those retired syscalls
<stikonas>well, open vs openat just differs in first argument
<unmatched-paren>ah "First, openat() allows an application to avoid race conditions that could occur when using open(2) to open files in directories other than the current working directory."
<stikonas>so I was just passing AT_FDCWD as first argument to openat
<stikonas>which basically does what open would do
<stikonas>AT_FDCWD is -100 if I remember correctly
<unmatched-paren>well, now i know why openat isn't listed: "The following table lists the system calls for the Linux 2.2 kernel."
<unmatched-paren> https://eds000n.github.io/syscalls-x86_64.html is more up to date
<unmatched-paren>oh, no, that's v3.10.17