IRC channel logs

2021-10-26.log

back to list of logs

<oriansj>well supporting envp is far cheaper and simpler to implement than getenv()
<oriansj>to add support for getenv out of the gate would require a larger and more complex libc.M1 than is currently in M2libc
<oriansj>And only 2 programs actually use envp: kaem and mes.c/MesCC
<gbrlwck>i fail building mes-m2 on riscv64, because it can't find lib/m2/riscv64/ELF-riscv64-debug.hex2; when i copy it from stage0-posix hex2 complains: "Target label _start is not valid"
<gbrlwck>stikonas: how did you build it, exactly?
<stikonas>gbrlwck: kaem --verbose --file kaem.riscv64
<stikonas>let me check if I forgot to commit it
<gbrlwck>no, kaem.riscv64 is there :)
<stikonas>gbrlwck: ok, ELF pushed
<stikonas>it got ignored because .hex2 files are in .gitignore
<stikonas>it should be there
<stikonas> https://github.com/stikonas/mes-m2/tree/riscv64
<stikonas>gbrlwck: are you on my fork (and not on oriansj's or upstream mes)
<gbrlwck>yes i am, but i was on the wrong branch :)
<gbrlwck>ok, built!
<gbrlwck>does `./bin/mes-m2 -c (display 'Hello,M2-mes!) (newline)` actually produce output in your setup?
<gbrlwck>i'll try to compile with gcc and see what's happening there
<gbrlwck>no, that obviously won't work (since we output M1 in asm calls)
<gbrlwck>what do these _common_recursion comments mean/indicate in m2/mes.M1 (there's 5262 of them)
<stikonas>gbrlwck: no, it does not produce output
<stikonas>it crashes
<gbrlwck>i get output like this when running the binary through strace:
<gbrlwck>write(214, NULL, 274877297592) = -1 EBADF (Bad file descriptor)
<gbrlwck>does 214 mean SYS_brk? is this NULL the reason for segfault?
<gbrlwck>does strace document a write sys-call here?
<qyliss>yes
<qyliss>try strace -yy to see what it's writing to
<qyliss>although I suppose if it's writing to a file descriptor that doesn't exist that isn't actually going to help you
<gbrlwck>that's actually -yy output here
<gbrlwck>not sure if i understand correctly, but do we want to write to file-descriptor 214? with content in NULL? and length of 274873537464274873537464?
<gbrlwck>seems like we want to execute a syscall (214/SYS_brk) to get some more memory (?)
<nimaje>as it says EBADF the NULL doesn't matter, but it doesn't seem like that should be a write(…)
<gbrlwck>wdym "the NULL doesn't matter"?
<nimaje>does linux have (s)brk on riscv64? my brk manpage on freebsd says "They are deprecated and not present on the arm64 or riscv architectures."
<gbrlwck>nimaje: the 214 had to come from somewhere ;)
<gbrlwck>i find references to SYS_brk i.e. https://github.com/westerndigitalcorporation/RISC-V-Linux/blob/master/riscv-pk/pk/syscall.h
<nimaje>that was an answer to the "is this NULL the reason for segfault?" question; as it is a bad file descriptor write will just fail, but shouldn't cause a segfault
<gbrlwck>huh, thanks for clarification!
<stikonas>gbrlwck: nimaje we definitely have sys_brk on riscv64
<stikonas>that's what stage0-posix uses
<nimaje>the segfault happens somewhere after that strange write, probably some misgenerated code to make a brk syscall and code after that trying to use invalid memory as a result
<gbrlwck>is it possible to step through the whole execution with gdb (label _start is missing)?
<stikonas>_start should be present
<stikonas>it should be in lib/linux/riscv64-mes-m2/crt1.M1
<deesix>gbrlwck, I'm not sure that the word recursion is the right one everytime, but those PUSH/POP are saving intermediate results before going deeper into expressions, IIRC. See cc_core.c of M2-Planet.
<gbrlwck>does `RS1_SP RS2_A0 SD` store the value in A0 in SP or the other way round?
<stikonas>gbrlwck: it stores A0 in SP
<stikonas>so it's pushing onto the stack
<stikonas>then you can pop from the stack with RD_A0 RS1_SP LD
<gbrlwck>i see
<muurkha>not *in* SP but *at* SP
<muurkha>you have to increment and decrement SP separately, but I'm guessing gbrlwck is looking at the code so that goes without saying
<gbrlwck>thanks for clarification! are we sure RV64 follows RV32 conventions (i saw mentions of this not being the case)
<stikonas>gbrlwck: which conventions?
<stikonas>M2-Planet has its own calling convention and does not follow standard risc-v callin conventions
<stikonas>M2-Planet is mostly stack based (and does the same thing for all arches)
<stikonas>so it uses very few registers
<gbrlwck> https://inst.eecs.berkeley.edu/~cs61c/sp18/lec/06/lec06.pdf page 50
<stikonas>(since x86 does not have that many)
<stikonas>well, the difference is pointer size I guess
<stikonas>s/pointer/register/
<stikonas>for RV32 you add 4 bytes to push register to stack
<stikonas>but for RV64 you need 8 bytes
<stikonas>.reserved .text .data and .stack spaces should be the same...
<stikonas>well, mes uses different base-address
<stikonas>but that just sets where .text starts
<stikonas>but M2-Planet mostly uses only stack pointer SP, base pointer FP, A0, A1, return address RA, TP register as temp register when doign recursion and T1 when loading big number
<stikonas>but mostly A0 and A1
<stikonas>well, and implicitely it uses zero register...
<stikonas>but that's not visible in M1
<stikonas>and when making function calls, M2-Planet's convention is to push arguments onto stack while in calling function
<stikonas>then called functions pulls them from stack
<gbrlwck>do we expect the bug to be in M2-Planet?
<gbrlwck>and circling back to the SYS_brk thingy: i've read that SYS_brk may work with small amounts of data but might raise a segfault when trying to gain access on too much memory. the crash happens *really early* -- maybe we're just trying to initialize too many globels (there are a few in mes) ?
<gbrlwck>readelf does shows neither .heap nor .stack sections. is this as intended?
<nimaje>why would there a .stack section in a binary? heap and stack start both empty and are dynamically managed while the program is running
<stikonas>nimaje: no, .stack is not in binary, I was talking about memory layout after loading
<stikonas>gbrlwck: unlikely to be because of SYS_brk
<stikonas>we did hit similar SYS_brk issue with qemu
<stikonas>but that was after reserving maybe 16 MB of data
<stikonas>(this happened in M1 I think )
<stikonas>or rather M0
<stikonas>gbrlwck: as for whether the bug is in M2-Planet, we don't know
<stikonas>but M2-Planet is self-hosting, we build it with itself
<stikonas>and it uses (slightly different malloc compared to meslibc) when building itself
<stikonas>so can't be completely broken
<stikonas>anyway, right now the bug could be anywhere...
<stikonas>might even be in crt1.M1
<stikonas>(given that M2-Planet is self-hosting, it can't be completely broken)
<stikonas>(so bug is more likely to be in new code)
<gbrlwck>would it be an option to compile simpler programs with our M2 toolchain until we hit the same bug?
<gbrlwck>TDD style?
<stikonas>yes, that's one way to find the bug
<stikonas>find a smallest reproducer
<stikonas>although all previous programs were built with M2libc
<stikonas>and mes has its own libc
<stikonas>so you might want to build something very small with mes libc
<stikonas>start with hello world or something like that
<gbrlwck>M1 reports: Target label GLOBAL___stdin is not valid. did i forget an include? https://github.com/gbrlwck/mes-m2/tree/tdd-bughunt
<gbrlwck>this happens with both: kaem-test.riscv64 and kaem-test.riscv64-minimal
<stikonas>gbrlwck: probably missing include/m2/lib.h
<stikonas>that's where that global is defined
<stikonas>actually you have it in kaem file...
<stikonas>gbrlwck: I tried it and my error is include/stdio.h:27:#define missing actual definition
<stikonas>I think you are running kaem without --strict
<stikonas>so it continues with linking despite compile failure
<stikonas>it's somewhat annoying but meslibc has lots of ifndefs that M2-Planet does not support
<stikonas>gbrlwck: and printf might be too complicated here
<stikonas>start with fputs
<gbrlwck>i'll try that
<gbrlwck>i get a segfault when trying to compile
<stikonas>gbrlwck: ok, I got it working
<stikonas>gbrlwck: not sure if all changes are needed https://github.com/stikonas/mes-m2/tree/tdd-bughunt
<stikonas>I removed some files from compilation list later
<stikonas>it's also the issue that nor all mes libc is supported by M2-Planet
<stikonas>you sometimes have to restrict yourself to mini subset
<stikonas>argh, now accidentally squashed my changes into your commit
<stikonas>ok, fixed (force-pushed)
<stikonas>gbrlwck: so diff is https://github.com/stikonas/mes-m2/commit/a7a81d2671237bf3b4f722fd92d4b63583dd0503
<gbrlwck>this is weird
<stikonas>?
<gbrlwck>still got a segfault with M2-Planet
<stikonas>are you using latest M2-Planet?
<stikonas>or the one from stage0-posix
<stikonas>there are newer commits
<gbrlwck>ahh
<gbrlwck>M2-Planet is most recent?
<stikonas>you need newest
<stikonas>oriansj made some fixes
<stikonas>"Catch segfault for half defined #defines and provide a warning for #unkowns"
<stikonas>feel free to update M2-Planet in stage0-posix
<gbrlwck>this might be easiest :)
<stikonas>although, you need to setup qemu for other arches too
<stikonas>to update checksums
<stikonas>if you don't have that, I can make merge request
<stikonas>ok, I'm preparing update then
<stikonas>will update mescc-tools too
<gbrlwck>i pushed to my stage0-posix
<gbrlwck>but i haven't updated the checksum yet
<stikonas>well, I'll update all
<stikonas>cause we need to update checksums for other archees too
<stikonas>testing aarch64 now
<stikonas> https://github.com/oriansj/stage0-posix/pull/63
<gbrlwck>nice, thanks
<stikonas>this also pulls in newer kaem that will not continue after segfault in --strict mode
<gbrlwck>i'm not completely sure how i've read the backlog of how you guys did that and still missed it
<gbrlwck>huh, different output, but it's still not working.. i'm gonna take a break and report back later
<stikonas>hmm, malloc seems to work too
<stikonas>ok, I might have found the bug...
<stikonas>I messed something up in syscall.c
<stikonas>some copy-paste error
<stikonas>so all syscalls were write syscalls
<stikonas>now the crash is somewhere further
<stikonas>in getenv...
<stikonas>ok, I didn't implement environment yet
<stikonas>or maybe messed up implementing it
<oriansj>stikonas: merged
<stikonas>oriansj: thanks
<stikonas>oriansj: maybe you can spot something bad in my crt1.M1?
<stikonas> https://github.com/stikonas/mes-m2/commit/1928eef6236c772e2b309fa066e6b03216db8a7b
<oriansj>I don't see the setting up of argc
<stikonas>oh yes, that's right...
<oriansj>as it is push argc, argv and then envp
<stikonas>hmm, strange that it worked in m2libc...
<stikonas>I think it's missing there too
<stikonas>unless I misunderstand something
<oriansj>well M2libc does it simpler: https://github.com/oriansj/M2libc/blob/main/x86/libc-full.M1
<oriansj>as it leaves argc on the stack and then puts argv and envp in the M2-Planet order, rather than the linux envp, argv, argc order
<oriansj>and M2libc does file setup in C code to keep everything simpler.
<oriansj>(less pieces in assembly make for easier porting)
<oriansj>it is the reason why I wanted to port mes-m2 to able to directly use M2libc (so it gains all architectures for free)
<stikonas>indeed, the more is in C, the simpler it is
<oriansj>although I might need to add a few additions to M2libc to make it finally be a drop in replacement.
<oriansj>although getting mes-m2 GCC buildable would be an important half step
<oriansj>I probably should take the time to fix that for you
<stikonas>oriansj: is it intentional that it's argc argv and envp. I am now looking at other crt's, e.g. x86_64-mescc and it does the opposite order, envp, argv and then argc
<oriansj>well the other crts are what MesCC uses not M2-Planet if I remember correctly
<stikonas>yes, we only have x86-mes-m2 for now...
<oriansj>So doing different ordering on the stack for arguments isn't unexpected.
<stikonas>hmm, something is still messed up
<stikonas>even with argc added
<stikonas>hmm, compared to x86, I don't have sub____$i32,%esp %0x1054
<stikonas>what is 0x1054?
<oriansj>that is unfortunately meslibc magic which I believe is related to setenv but am not certain.
<stikonas>oh ok, that might be the issue
<stikonas>not sure if it has to be adjusted to 64-bits...
<stikonas>possibly...
<oriansj>mes.c is EXTREMELY touchy even with GCC (in fact it took me a couple months just to get GCC to compile mes.c in my first attempt at porting mes.c to M2-Planet (which failed))
<oriansj>it ultimately took 2-3 years and janneke doing a heroic effort to get M2-Planet building mes.c
<stikonas>yes, I know...
<oriansj>not sure if it speaks more to all the ugly flaws in M2-Planet or the complexity of doing a scheme.
<stikonas>and it's still only in a branch
<stikonas>well, M2-Planet now has some minimal preprocessor
<stikonas>so things might be a bit easier
<oriansj>I hope but I think janneke is done with dealing with M2-Planet and mes.c is a bit of reminder how bad I am at working with interpreters
<oriansj>So I'll try to help but I can't promise I'll be very helpful.
<stikonas>well, if mes-m2 can use M2libc at some point in the future, starting mes might be easier
<oriansj>well it would be instantly ported to every architecture supported by M2libc
<stikonas>ok, found another issue, not environment seems fine
<stikonas>(or better)
<stikonas>instead of crash I'm getting
<stikonas> +> ./bin/mes-m2 -c (display 'Hello,M2-mes!) (newline)
<stikonas>mes: boot failed: no such file: boot-0.scm
<oriansj>touch boot-0.scm will solve that
<oriansj>(you can even put your test in that file)
<stikonas>no, doesn't even put it
<stikonas>sorry, I meant touch doesn't help
<oriansj>you can copy ./mes/module/mes/boot-0.scm too
<stikonas>might end up debugging that function...
<stikonas>to see what happens
<stikonas>it's not finding it anywhere
<stikonas>probably looking somewhere bad
<oriansj>cat ./mes/module/mes/boot-0.scm >| boot-0.scm
<oriansj>it is the logic in mes.c: try_open_boot
<stikonas>no, still the same error
<stikonas>maybe environment is still messed up
<stikonas>and breaks logic in try_open_boot
<oriansj>do you have MES* environmental variables set?
<oriansj> https://www.gnu.org/software/mes/manual/html_node/Environment-Variables.html#Environment-Variables
<stikonas>oh, not yet
<stikonas>ok, I'll set them up
<stikonas>environmental variables seem to work though
<stikonas>MES_DEBUG=2 prints more info
<stikonas>even MES_ARENA=20000000 MES_MAX_ARENA=20000000 MES_STACK=6000000 MES_DEBUG=2 MES_PREFIX=$PWD ./bin/mes-m2 doesn't work
<stikonas>maybe file reading stuff is broken