IRC channel logs

2021-10-14.log

back to list of logs

<oriansj>stikonas: merged
<stikonas>fossy: shall I just push stage0-posix update in live-bootstrap?
<stikonas>nothing else changes, no checksum chage, etc...
<fossy>yeah
<stikonas>(since those early checksums are done as part of stage0-posix)
<fossy>go ahead
<stikonas>that should bring kaem conditionals and match in
<civodul>hello!
<civodul>i've automated rebuilds of https://bootstrappable.org/
<civodul> https://git.savannah.gnu.org/cgit/guix/maintenance.git/commit/?id=b8d25fc43a51e0d8259e88c808ebd68c729fffa8
<civodul>so if you push smoething to the repo, it should show up on-line within an hour
<oriansj>nice
<Hagfish>yeah, that's smart
<gbrlwck>i just tried to bootstrap stage0-posix on my HiFive Unmatched (riscv64) and failed executing kaem.riscv64 (Subprocess error; ABORTING HARD). compiled hex0 manually, but the binaries differ (except in length)
<stikonas>gbrlwck: and you pulled in all submodules?
<stikonas>well, riscv64 was only tested on qemu, so if something is wrong we'll have to rely on your help
<gbrlwck>yes (git clone ... --recurse-submodules)
<stikonas>can you paste the whole log?
<gbrlwck>stikonas: that's exactly why i'm here! so it all works fine on qemu?
<stikonas>gbrlwck: yes
<gbrlwck>i can also paste you the hexdumps ;)
<stikonas>Subprocess error just means that child process failed
<gbrlwck>yeah, i figured
<stikonas>either hexdumps or i fyou have diffoscope
<stikonas>between two binaries
<gbrlwck>oh wow, diffoscope is a *huge* package? let's see how long that takes
<stikonas>oh, is it huge?
<stikonas>I thought it's some python script
<gbrlwck>it has *lots* of inputs
<stikonas>anyway, maybe paste run log first
<gbrlwck>on my HiFive-ready ubunut it asks to install 931 pacakges and up to 5GiB space ;)
<stikonas>I guess it's the command after hex0 that fails (if hex0 differs from seed)
<stikonas>doesn't matter, I can run diffoscope locally
<stikonas>I see this in my qemu run: https://paste.debian.net/1215410/
<gbrlwck> https://termbin.com/jagv
<gbrlwck> https://termbin.com/wv97
<gbrlwck>the error makes sense: output of the very first hex0 artifact has no valid header so it doesn't execute
<stikonas>hmm, hex0 is completely different...
<gbrlwck>yes
<gbrlwck>at first i thought it was a big-/litte-endian issue, but it is not
<stikonas>it's slightly reminds it
<stikonas>but in a bit more complicated way
<stikonas>somehow some bits are swapped
<stikonas>but not in big/little endian wy
<stikonas>way
<gbrlwck>though the second row: the quartets seem the same (it's the same hex characters) but in a scrambled order
<stikonas>I think yes, the bits are the same but somehow scrambled
<stikonas>which is very strange...
<stikonas>gbrlwck: and they are always scrambled in the same way
<stikonas>can you run hex0 on 12345678 ?
<stikonas>just so that we see order
<gbrlwck>sure
<stikonas>too lazy to manually work it out...
<gbrlwck> https://termbin.com/hh69t
<stikonas>hmm, this one is not evne the same length...
<gbrlwck>yeah, i need some coding tasks anyhow, so this might be the adequate point to dive in :)
<stikonas>somehow numbers go asscci encoding
<stikonas>ascii encoded...
<gbrlwck>not sure if i understand
<stikonas>I mean I get just this from my run on teststring: "0000000 2301 6745 ab89 efcd" (this is hexdump)
<stikonas>your paste is somehow much longer
<stikonas>which seemed to me strange given that your miscompiled hex0 had the same length
<stikonas>and just scrambled characters
<gbrlwck>well, it was not "just" scrambled characters. the first line of both are really different
<gbrlwck>but yeah, i know what you mean
<gbrlwck>what image are you running in qemu?
<stikonas>gbrlwck: it's qemu-user
<stikonas>so I'm running it directly on amd64 machine
<stikonas>hmm, yeah, I think some later lines are also not simply scambled in oktets...
<stikonas>and oriansj was getting the same hashes on his qemu-user instance
<stikonas>well, in the worst case, we'll have to fire up gdb and see what happens
<stikonas>although, debugging this early code in gdb is not super easy...
<stikonas>gbrlwck: one test you can try, is to try compiling GAS prototype in GAS/hex0_riscv64.S and see if you get the same issue
<stikonas>riscv64-unknown-linux-gnu-as hex0_riscv64.S -o hex0.o; riscv64-unknown-linux-gnu-ld hex0.o -o hex0
<gbrlwck>now we get a 1.3K binary file with really similar first bytes! it starts to differ at e_ident[EI_OSABI]
<stikonas>when you use GAS version of hex0?
<gbrlwck>yes
<gbrlwck>no
<gbrlwck>wait
<gbrlwck>i compiled the GAS version, this results in a 1.3K binary
<stikonas>well, the built binary is different (GAS compiled version will be bigger)
<stikonas>that's expected
<stikonas>it has larger header with section tables
<stikonas>but I mean use this binary on
<stikonas>hex0_riscv64.hex0
<gbrlwck>i did
<stikonas>and see if output is still garbled
<stikonas>oh ok
<stikonas>hmm
<gbrlwck>it is still scrambled and identical to the first bootstrapped version (riscv64/artifact/hex0)
<stikonas>ok, so if we need to debug it it will be easier than debugging that smaller hex0
<stikonas>althoguh for debug info you need to build it with riscv64-unknown-linux-gnu-as -g hex0_riscv64.S -o hex0.o; riscv64-unknown-linux-gnu-ld hex0.o -o hex0
<gbrlwck>i did just `as` and `ld` (because i wasn't cross compiling)
<stikonas>yeah, that's fine
<stikonas>I just copy/pasted what I had
<stikonas>well, full trippled would work for you too, but short version is fine
<stikonas>anyway as -g can produce bigger file with some debug info
<stikonas>so far I have no other ideas besides looking what happens in gdb
<stikonas>oriansj: any thoughts?
<stikonas>for gdb I guess one can start looking at what happens on this line: https://github.com/oriansj/stage0-posix/blob/master/riscv64/GAS/hex0_riscv64.S#L130
<stikonas>what is the value $a0 for each write
<gbrlwck>installing gdb will take a while. i'll be back :)
<stikonas>gbrlwck: if you need help using gdb also feel free to ask
<stikonas>I've also never used gdb on assembly programs until 3 months ago...
<xentrac>gdb sort of wants you to be using a high-level language
<stikonas>sort of but there are some useful tips
<xentrac>yeah, it definitely copes okay with assembly
<stikonas>first of all "layout asm" followed by "layout regs" can help you see assembly code and cpu registers
<xentrac>although I haven't been able to figure out how to get `finish` to work
<xentrac>I don't know how to use things like radare2 which are designed for debugging machine code
<stikonas>oh, I've never used it either
<xentrac>I tend to do `display/i $pc` and `info registers` in GDB rather than the TUI
<stikonas>there is also some trick to display memory contents
<stikonas>something like "x/8x memory address"
<stikonas>although we don't need that for hex0
<xentrac>yeah, I use that a lot
<xentrac>also p/x *$@8
<stikonas>oh, what does $@8 do?
<stikonas>I've used p/x *$register a lot
<stikonas>but it's a bit hard when you work with 64bit pointers
<stikonas>as gdb only shows 32-bits
<xentrac>well, $ is the last output
<xentrac>so if the register pointed to memory, p *(void**)$ would follow that pointer
<xentrac>and @8 means "an array of 8 things"
<gbrlwck>$a0 is the content of register 0?
<stikonas>no, it's content of register a0
<stikonas>risc-v has two names for registers
<stikonas>it's either x0 to x31
<gbrlwck>yeah, right :)
<stikonas>or there are some more semantic names like zero, a0, a1, a2,... (for function calls), t0, t1, ... (for temporaries), and similar
<gbrlwck>the content at the first break is: $1 = 4395898842368
<stikonas>there is a list or registers here https://web.eecs.utk.edu/~smarz1/courses/ece356/notes/assembly/
<stikonas>this is value of a0?
<stikonas>strange, that looks like memory address to me
<stikonas>hmm
<stikonas>let me fire up my gdb
<gbrlwck>that's the output of: p $a0
<stikonas>ok, let me check mine
<xentrac>may be more useful to p/x $a0
<stikonas>and is this before that line or after?
<stikonas>I have $1 = 0x7f
<gbrlwck>how do i check that?
<gbrlwck>it might be before, since i added the breakpoint at line 130
<stikonas>actually it shouldn't matter, that line does not change a0
<stikonas>ok, same here
<stikonas>anyway, that's fie
<stikonas>but we see some difference already
<stikonas>hmm
<gbrlwck>so, when i continue, the next value is 0x12 (18)
<gbrlwck>which seems to be more like it
<stikonas>still, 18 is a bit strange number
<stikonas>it's not a readable letter or number in ascii encoding
<stikonas>hmm
<stikonas>oh, you mean continue to 2nd breakpoint
<gbrlwck>yes
<stikonas>I get 0x45
<stikonas>actually, instead of p/x it might be good to run p/c $a0
<stikonas>it decodes it into letter
<xentrac>yeah, /x is useful for values like 4395898842368
<oriansj>my first question is what happens when given 01 23 45 67 89 AB CD EF and single step in gdb with si
<xentrac>but 7f is delete
<stikonas>oh, hmm
<gbrlwck>oriansj: i can do that ;)
<stikonas>ok, let's try that 0123... file
<oriansj>and put break points on the reading and writing of bytes
<gbrlwck>ah
<gbrlwck>i'm already doing that file
<gbrlwck>sorry
<stikonas>oh, that's why
<stikonas>ok
<stikonas>I was doing hex0_riscv64.hex1
<oriansj>So you should see exactly 2 reads that are correct followed by one write that is correct
<stikonas>so I have file "01 23 45 67 89 AB CD EF" (with newline at the end)
<stikonas>and sha256sum c45792734f2045a48f4db7f86189009be6824055b9f139f2d4d80b831303218e
<stikonas>gbrlwck: on risc-v read will be stored in a0 after ecall
<stikonas>sorry, that's totally wrong
<stikonas>a0 will have number of bytes read
<stikonas>value will be in the address pointed out by stack pointer, but the next line loads it into a0
<stikonas>lb a0, (sp)
<stikonas>so you can put breakpoint there
<stikonas>that's line 66
<gbrlwck>so, i added a newline to my teststring file but the sums dont check out (no idea why)
<oriansj>and if the gas version has the same error, we can safely assume it isn't our instruction encoding (as gas shouldn't be encoding the wrong instructions)
<stikonas>yes, that's why this is strange, one would think kernel bug... but that's such a basic functionality
<stikonas>reads and writes shouldn't be buggy
<oriansj>using gdb does it read the correct values?
<gbrlwck>(gdb) p/c $a0
<gbrlwck>$1 = 1 '\001'
<stikonas>oh, do it after lb line
<stikonas>you still have number of bytes read in a0
<gbrlwck>this is when breaking on line 66
<stikonas>can yo do line 67
<stikonas>or do si
<stikonas>to step to it
<stikonas>but 1 on line 66 is good
<gbrlwck>(gdb) p/c $a0
<gbrlwck>$2 = 48 '0'
<stikonas>that's what I expect
<gbrlwck>also good, no?
<stikonas>that's good
<stikonas>yes, that's the first byte
<stikonas>next should be 1
<gbrlwck>it is
<gbrlwck>then comes a space
<stikonas>oh, maybe restart and also add breakpoint on 130
<stikonas>so that you can check writes
<stikonas>so you should get reads 0 then 1, and then 1 at the write part
<stikonas>then indeed space
<stikonas>then '2', '3' for reads (p/c $a0)
<gbrlwck>the first interation has 48 '0' on line 66 and 0 '\000' on line 130
<stikonas>and 0x23 for write (p/x $a0)
<gbrlwck>second iteration has 49 '1' on line 66 and 32 ' ' on line 130
<stikonas>oh, you got to write before read?
<stikonas>that's strange
<stikonas>it should do 2 reads before write
<gbrlwck>wait
<gbrlwck>sorry
<stikonas>so for writes do p/x rather than p/c
<stikonas>as it is binary stuff that we are writing
<oriansj>it should always do atleast 2 reads before write (more if whitespace or comments)
<gbrlwck> https://termbin.com/3jshv
<gbrlwck>it doesn't
<oriansj>the contents of the register a0 at combine should be what is written out
<stikonas>ok, something is messed up if this is what happens
<oriansj>break point at hex_read
<stikonas>also value of s4 might be goot to watch
<oriansj>what happens at the: bnez s4, combine
<stikonas>that is a boolean toggle to decide whether we write the combined byte or not
<stikonas>oh, what we should check
<stikonas>is what are the values of all registers when we start
<gbrlwck>should i stream this live via jitsi or something? might be easier?
<stikonas>I thought kernel should initialize all registers to 0
<stikonas>(except special purpose ones like stack pointer)
<oriansj>does s4 start out as zero?
<stikonas>hmm, let's check initial values of registers
<stikonas>when your start
<gbrlwck>how do i do that?
<gbrlwck>i'm not that fluent in gdb
<oriansj>b _start
<oriansj>p $s4
<stikonas>there is also starti command
<stikonas>I think...
<gbrlwck>183253003312
<stikonas>but b _start would put breakpoint
<stikonas>oh, that's the problem then
<gbrlwck>0x2aaabaec30
<stikonas>I have $s4 = 0
<stikonas>we need to initialize them then
<stikonas>I thought kernel does that when loading binaries
<stikonas>strange...
<gbrlwck>li s4,0 ?
<stikonas>yes
<stikonas>and maybe s5 too
<xentrac>yeah, the kernel needs to do that when loading binaries; the alternative is a covert information leak through execve()
<gbrlwck>so that fixed it :)
<stikonas>just s4?
<stikonas>or s5 too?
<gbrlwck>0000000 2301 6745 ab89 efcd
<gbrlwck>i initialized both
<stikonas>ok, so at least we understand what was going wrong
<stikonas>but not why
<stikonas>can you try just s4?
<gbrlwck>sure
<stikonas>the fewer bytes we have in seed the better
<gbrlwck>works with only initializing s4
<stikonas>ok, that's good
<stikonas>do you want to make PR?
<gbrlwck>i'd love to
<stikonas>well, I can help you with changes
<gbrlwck>do i need a github acoount for that?
<stikonas>oriansj: ?
<stikonas>that's one option
<stikonas>might be the easier one, but not necesserily the only way
<gbrlwck>ok, i'll set up an account, then
<stikonas>luckily, it would be too hard to fix up this, as this is at the beginning of the file
<stikonas>so don't need to redo any calculations of where to jump
<oriansj>gbrlwck: I can pull from any git repo that I can git fetch from
<oriansj>So notabug, savannah, gitlab, etc are all fine
<stikonas>but in the meantime I'm confused
<stikonas>why s4 is not zeroed
<stikonas>oh, it probably depends on ABI
<stikonas>maybe s registers are not zeroed...
<oriansj>stikonas: for relative jumps yes but fortunately hex0 doesn't need absolute addresses
<stikonas>well, we just need to copy paste the line from the end of the file where we also initialize s4 to 0
<stikonas>and in hex0 file we also need to recalculate file size
<stikonas>anyway, let's wait for gbrlwck to be ready
<oriansj>first fixing bootstrap-seeds and then updating stage0-posix
<stikonas>well, for pull yes
<oriansj>and there are probably similiar bugs in hex1, hex2, M0 and cc_riscv64
<stikonas>but for fixing files it might be easier to update GAS file, then M1 file then hex2 prototype and then we can update hex0 source and commit to bootstrap-seeds
<stikonas>yes, there might be
<stikonas>fortunately, they are not too hard to fix
<stikonas>now that we found what causes them
<oriansj>best not to assume the state of a register before we set it with RISC-V
<oriansj>as there may be register side-effect differences from ecalls as well
<stikonas>hmm, risc-v elf.h file in kernel source does not have ELF_PLAT_INIT
<stikonas>so I think we can't assume that they are set to 0
<stikonas>on x86 they are
<stikonas>e.g. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/include/asm/elf.h?h=v5.15-rc5#n107
<stikonas>but yes, at the very least we'll see this bug in hex1 and hex2
<stikonas>maybe not in M0
<oriansj>actually every architectures including AArch64, PowerPC64LE zero on exec
<oriansj>but not RISC-V for some reason
<stikonas>yes, I can see e.g. aarch64 here https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/include/asm/elf.h?h=v5.15-rc5#n159
<stikonas>hmm
<stikonas>that is a bit surprising
<stikonas>we'll need a new release of stage0-posix at some point
<oriansj>failing to zero would result in a data leak from the kernel
<stikonas>at least with minor number update...
<oriansj>it is a potential security problem
<stikonas>maybe should ask on #riscv
<oriansj>so until they fix that, lets assume it will not be fixed and update our binaries accordingly
<oriansj>afk
<stikonas>we should update anyway
<stikonas>as there are some kernels out that don't do zeroing
<gbrlwck>should i add my chmod'ed kaem.riscv64 too, or is there a reason why it's not executable?
<stikonas[m]>It doesn't have to be executable
<stikonas[m]>Kaem doesn't care
<stikonas[m]>But you'll have to update prototypes in riscv64/Development
<gbrlwck> https://github.com/oriansj/stage0-posix/pull/58
<stikonas>gbrlwck: yeah, .S change looks alright. But can you do others too?
<gbrlwck>i'll go for it
<stikonas>for M1 file you need to copy this line https://github.com/oriansj/stage0-posix/blob/master/riscv64/Development/hex0_riscv64.M1#L135
<stikonas>RD means destination register there
<gbrlwck>copy this where to?
<stikonas>well to the beginning
<stikonas>where you inserted li in .S file
<stikonas>li s4, 0 translates to RD_S4 MV in .M1 language
<gbrlwck>:) thanks!
<gbrlwck>comment it with "initialize register"?
<gbrlwck>are there any more left? sorry, lost track
<stikonas>yeah, that comment is fine
<stikonas>yes, then there is hex0_riscv64.hex2 prototype
<stikonas>and finally hex0_riscv64.hex0 file itself
<stikonas>for hex2 you again hex the same place to copy from https://github.com/oriansj/stage0-posix/blob/master/riscv64/Development/hex0_riscv64.hex2#L244
<stikonas>first line is comment (M1 code) and second code is actual, hex encoding
<stikonas>so copy both
<gbrlwck>i got that :)
<stikonas>and then for hex0 file
<stikonas>all the same stuff plus
<stikonas>update file size in https://github.com/oriansj/stage0-posix/blob/master/riscv64/hex0_riscv64.hex0#L66
<stikonas>right now it is 0x188
<stikonas>88 01 00 00 00 00 00 00 ## p_filesz
<stikonas>and same in next line
<gbrlwck> https://github.com/gbrlwck/stage0-posix/commit/14792da03ad46ada370986671f95b50c4f00ed41
<stikonas>ok, looks good
<stikonas>do hex0 file and update the seed too...
<gbrlwck>p_filesz is the file-size of the compiled file in bytes?
<stikonas>yes
<stikonas>so you are adding 1 instruction
<stikonas>that's +4 byes
<stikonas>byteds
<gbrlwck>this will be 8b 01 00 00 00 00 00 00 ## p_filesz ?
<gbrlwck>and also adjust p_memsz?
<stikonas>yes, that's right
<stikonas>sorry I'm on and off here, so might need to wait a bit for my answers
<gbrlwck>ok, so now my 1.3K assembly hex0 produces a 396 Byte hex0_bootstrapped; but this in turn produces a 392 Byte hex0_bootstrapped2
<stikonas>396 is the correct size
<stikonas>are you running it on new source?
<stikonas>when it produces 392 byte binary
<gbrlwck>all run against the new source
<stikonas>or sorry, also
<stikonas>it's not 88 01
<stikonas>not 8b 01
<stikonas>but 8c 01
<gbrlwck>yeah, right
<stikonas>c is 12
<stikonas>but that shouldn't really affect the size
<stikonas>against new source it should be producing 396 byte binary
<stikonas>maybe show the commit?
<gbrlwck>this looks promising: https://termbin.com/nrcm
<gbrlwck>i fixed the size(s) and now it seems to work
<stikonas>ok, good
<gbrlwck>so should this (hex0_b or hex0_b2) become the new seed?
<stikonas>yes
<stikonas>well, they are identical
<stikonas>so first push this seed to bootstrap-seeds repository
<stikonas>along with a copy of hex0 source
<stikonas>to here https://github.com/oriansj/bootstrap-seeds/tree/master/POSIX/riscv64
<stikonas>then once oriansj merges it
<stikonas>you can also update bootstrap-seeds submodule in stage0-posix
<stikonas>(after bootstrap seeds is merged just go to bootstrap-seeds subdirectory, git switch master; git pull)
<stikonas>then it will show up in git status
<stikonas>and cna do git add bootstrap-seeds
<stikonas>after that you should be able to proceed 1 step further until things break...
<stikonas>I think hex1 binary and hex2 will also need fixing
<gbrlwck>oriansj: https://github.com/oriansj/bootstrap-seeds/pull/11
<stikonas>ok looks good
<stikonas>I can't merge myself though
<stikonas>gbrlwck: does it build (broken) hex1 now?
<gbrlwck>lemme check
<stikonas>oh, actually hex1 has initialization at the beginning
<stikonas>so everything else might just work
<gbrlwck>hex1 compiles fine
<stikonas>and later?
<gbrlwck>kaem-0 built successfully
<gbrlwck>./riscv64/artifact/kaem-0 stops at
<gbrlwck> +> ./riscv64/artifact/M0 ./riscv64/artifact/cc_riscv64.M1 ./riscv64/artifact/cc_riscv64.hex2
<gbrlwck>Subprocess error
<gbrlwck>i'll probably continue my works tomorrow; need to eat some dinner now
<stikonas>sure
<stikonas>thanks for notifying us about this
<gbrlwck>you're welcome!
<stikonas>so I guess next is M0 that is broken
<gbrlwck>i guess so, too ;)
<stikonas>and it's coincidentally also s4
<stikonas>although it does completely different thing there
<stikonas>it's a pointer to linked list
<stikonas>fossy: oh, so actually we don't even have M2-Planet -> mes step on amd64
<stikonas>I think nobody tried building it yet, and there is no lib/linux/x86_64-mes-m2 directory
<stikonas>I have some initial adjustment for live-bootstrap that allows using stage0-posix on amd64 https://github.com/stikonas/live-bootstrap/commit/8c0694bcdd8ce0f01e77b8e55d5cb73ea230b85e
<stikonas>although, of course it later builds 32-bit mes
<fossy>yes, correct
<oriansj>bootstrap-seeds has been merged
<stikonas>oriansj: rest is https://github.com/gbrlwck/stage0-posix/commits/master but if you want to merge you probably want to squash and also pull in bootstrap seeds
<oriansj>yep, all merged in now
<stikonas>so hex1 and hex2 are fine
<stikonas>but M0 has similar issue
<oriansj>fossy: I still haven't had time to incorporate the meslibc functions from mes-m2 into M2libc but once I do, then all of the architectures should be able to build and run MesCC (even if MesCC doesn't support that arch yet)
<oriansj>plus I still need to fix the security issue with untar with untrusted inputs.
<oriansj>and get back to doing the armv7l port of stage0-posix
<stikonas>oriansj: can we actually take those functions? M2libc is LGPL and meslibc is GPL
<stikonas>although, M2libc does need some improvements
<stikonas>it's a bit annoying to write each function in assembly
<stikonas>I think usually libc's have a few syscall functions (one per syscall number of arguments)
<oriansj>stikonas: true; however debugging M2libc syscalls is easier than meslibc syscalls (you can just do b FUNCTION_name and boom done)
<stikonas>yeah, that's true
<stikonas>we had to do quite a bit of debugging for risc-v just before stage0-posix release
<oriansj>So in many ways meslibc gets to benefit from what is learned by stage0-posix and M2-Planet; it is why MesCC always gets architectures AFTER M2-Planet is up and working
<oriansj>hence why despite starting months earlier on RISC-V support, M2-Planet+stage0-posix finished getting it first
<stikonas>well, laanwj got stuck with old hex2 syntax...
<oriansj>until I had to fix it for M2-Planet+stage0-posix
<stikonas>or maybe too many bitcoin PRs to merge...
<oriansj>hard to say honestly
<oriansj>but now the work is done and it should be much easier for MesCC to gain RISC-V support