IRC channel logs

2022-07-22.log

back to list of logs

<muurkha>wrote my first RISC-V assembly program last night
<muurkha>I feel like RISC-V assembly is not as easy to read as i386, amd64, or m86k?
<muurkha>in part just because it's longer but also because the operand order in the standard assembly syntax is so inconsistent
<muurkha>or is it just unfamiliarity?
<stikonas[m]>muurkha: only sd instruction has somewhat different order
<muurkha>also sw, sh, sb
<stikonas[m]>well, yes, same thing
<stikonas[m]>but basically it is explained that it uses 2 source registers
<stikonas[m]>rather that source and destination
<muurkha>and t3 is not a destination register in bne t3, t4, overflow
<stikonas[m]>But in general I find it easy enough to read after working on stage0-posix
<muurkha>that's good! there's still hope for me yet
<stikonas[m]>Probably just takes time to get used to if you worked with other asm
<stikonas[m]>I just started with risc-v
<stikonas[m]>Well B instructions just compare two values...
<muurkha>or sometimes one, as in bnez t3, foo
<stikonas[m]>I only found those SD, etc a bit unintuitive
<stikonas[m]>Well yes but that's just shorthand for bne zero, t3, foo
<muurkha>right
<muurkha>possibly part of the problem is that I'm mostly reading disassembled GCC output :)
<stikonas[m]>Oh that might be less readable
<stikonas[m]>M0 riscv is somewhat different though
<stikonas[m]>E.g. see https://github.com/oriansj/stage0-posix-riscv64/blob/master/Development/hex2_riscv64.M1
<stikonas[m]>zero is always implicit in M0
<muurkha>yeah, I don't know if M0/M1 is more or less readable syntactically
<muurkha>but surely your code is more mentally coherent than GCC's output
<stikonas[m]>I don't know either
<stikonas[m]>It's just different
<stikonas[m]>But far easier to parse for assembler
<stikonas[m]>I guess it's like at&t vs Intel syntax
<stikonas[m]>At&t is easier to parse for assembler
<stikonas[m]>But maybe Intel syntax is easier for humans to read
<muurkha>maybe, dunno
<muurkha>I prefer the AT&T mov source, dest syntax to the Intel mov dest, source syntax
<stikonas[m]>Well, it's not just that. It is also more explicit, e.g. registers are with %, etc
<muurkha>in some cases; in other cases AT&T syntax is less explicit
<muurkha>one particularly egregious example is that mov 15, %eax is a segfault
<muurkha>you meant mov $15, %eax
<muurkha>the other case is that I think (%ebp,%esi,4) is less explicit than [EBP + ESI*4]
<stikonas[m]>It's harder to parse esi*4
<stikonas[m]>Than %esi,4
<stikonas[m]>(I mean automatically parse for assembler, not humans)
<muurkha>plausibly, yeah
<jbowen>I feel like it's more a matter of which you learn first. My first exposure to asm was Intel x86 and it still feels like "home" to me, even though I've probably written more lines of 6502 for MCUs than x86 code
<muurkha>I also feel like i386 is a bit higher level than RISC-V
<muurkha>I mean, consider addl $3192, (%ebp, %esi, 4)
<muurkha>in unabbreviated RISC-V assembly that's something like slli t1, a0, 2; add t1, s0, t1; lw t2, (t1); lui t3, 1; addi t3, t3, 904; add t2, t3, t3; sw t2, (t1)
<muurkha>except the 904 is wrong
<muurkha>that's kind of an extreme case though
<jbowen>Yeah, i386 is CISC, so you'll have more "programmer friendly" instructions
<muurkha>yeah
<muurkha>even a lot of RISCs are terser though
<muurkha>I mean RISC-V had good reasons for not including ARM-like LDM/STM or SPARC-like register windows
<muurkha>or ARM-like ubiquitous bitshifts or predication
<muurkha>whatever, compared to a 6502 I guess it's all luxury ;)
<jbowen>6502 just feels really quaint to me
<muurkha>aha, it should have been -904
<muurkha>clearly time for bed ;)
<jbowen>Recent quick video (< 2 min) about x86's `repne scasb` as basically a oneliner for computing string length: https://www.youtube.com/watch?v=WiyUf8u78-w
<muurkha>oh yeah, of course
<stikonas>but x86 machine code is significantly smaller
<stikonas>risc-v has all instructions 32-bit
<stikonas>there is no way we can fit hex0-riscv32 into 256 bytes
<stikonas>one would only have 43 instructions to do that
<oriansj>stikonas: well that wouldn't be enitrely impossible if one was willing to do some really clever hex hack but yeah, one can't do hex0 cleanly in just 43 RISC instructions. (VAX might be able to do it in 43 instructions cleanly [maybe])
<oriansj>muurkha: well most "clever" cpu instructions end up be just a waste of transistors but PowerPC showed, even having a boatload of simple instructions is still actually a useful option as well.
<oriansj>It could have done with out the extra special case registers and more general design but too late now
<muurkha>stikonas[m]: RVC is usually denser than amd64, though uncompressed RISC-V is looser
<stikonas>yeah, compressed risc-v might be
<stikonas>but I didn't use it for stage0-posix...
<muurkha>no, much more headache than just rv
<oriansj>muurkha: well compressed instruction support hasn't been deeply looked at yet, There are a great deal many encoding details that would have to be worked out first.
<oriansj>if I remember correctly stikonas did the impressive task of just using the core RISC-V instructions that all RISC-V chips are required to support
<muurkha>the core RISC-V instructions are pretty expressive
<stikonas>oriansj: well, I used multiplication and division a bit...
<muurkha>oh really?
<muurkha>I didn't realize that
<muurkha>those are in M
<stikonas>yes, we use them in a couple of places
<stikonas>but I think fairly late
<stikonas>maybe in M0
<muurkha>the summary of compressed instruction support is that in addition to the 5 basic 32-bit instruction formats, RVC adds another 8 16-bit formats for the most commonly used instructions
<muurkha>they're just alternate encodings for instructions that could be expressed as 32-bit instructions
<muurkha>so you can implement them in an assembler; the compiler doesn't have to know about them
<oriansj>muurkha: yes, however in M1/hex2 we need to know the details about how to encode the bits
<stikonas>ok, it's in cc_riscv64
<muurkha>yes
<stikonas> https://github.com/oriansj/stage0-posix-riscv64/blob/master/cc_riscv64.M1#L1718
<oriansj>stikonas: so easy to remove if needed
<stikonas>and https://github.com/oriansj/stage0-posix-riscv64/blob/master/cc_riscv64.M1#L833
<stikonas>yeah, one can write functions to multiply and divide
<muurkha>this gives RV64C significantly better code density than things like amd64, sparc64, or alpha, on par with arm thum2
<muurkha>*thumb2
<muurkha>stikonas[m]: in this case it could be a very short function because member_type->type->size is presumably a small integer, like, less than 64
<stikonas>yes, it is small
<muurkha>so you could implement multiplication by repeatedly adding the multiplier without even shifting
<oriansj>muurkha: depends upon the task as there is no universal optimal density instruction set possible
<stikonas>divu a bit more complicated, but not too hard
<muurkha>oriansj: that is of course true! and surely there are exceptions
<stikonas>and it's already assembly code, no hex needed
<oriansj>stikonas: steal the division solution from armv7l
<stikonas>yeah, I saw it
<muurkha>in particular I was disappointed with the density of RV64C code for dumpulse, which I optimized to be as little code as possible
<muurkha>...on a big-endian ARM
<stikonas>but I thought that either I can leave it for somebody else to play with, or maybe until somebody actually needs it
<oriansj>it is mod, modu, divu and div
<muurkha>yeah, probably sensible
<stikonas>and I suspect it's might be more needed on riscv32 rather htan riscv64...
<oriansj>stikonas: you can leave it until somebody actually needs it; although a minor comment next to it might be kind
<muurkha>yeah, anything that runs Linux probably has the M extension
<stikonas>yeah, I can add it...
<stikonas>probably...\
<muurkha>I don't know if Linux can even be compiled for RV64I or RV64IA rather than RV64IMACFD
<oriansj>muurkha: no one has a more dense string comparison than x86
<muurkha>oriansj: on x86 it's two bytes, plus more bytes to set up the length. surely you could do it in one byte. or ΒΌ byte
<oriansj>muurkha: I have yet to see any instruction set dedicate one of their prime 256 opcodes to string comparision, let alone multiple to make it even more efficient
<stikonas>muurkha: well, Linux might not, but we might reuse most of the same code for e.g. baremetal bootstrap
<oriansj>also stage0 doesn't have to run on a kernel so RV64IMACFD isn't ensured.
<stikonas>if some RV32I system comes with UEFI support then it might be useful
<stikonas>or even without UEFI support if somebody does something like builder-hex0 but replaces bios calls with hardware specific driver
<oriansj>and if we gain more comfort, just straight off a trivial bootloader
<oriansj>indeed stikonas
<stikonas>speaking of UEFI, I tried to create some TE executable but failed to get it to run...
<stikonas>but I have not found any examples of TE executables online
<oriansj>there is a chance it was something planned but not actually implemented?
<stikonas>hard to tell, more likely I just messed something up while trying to guess...
<stikonas>I could find some references to TE in tianocore ed2k
<stikonas>possibly need to sort out more fields...
<stikonas> https://paste.debian.net/1248086/
<stikonas>or maybe something wrong with addresses...
<oriansj>well the one thing I noticed about the assembly is there is a reference to SystemBoot there might need to be a reference to it in the binary to set it regardless of it it is used or not
<stikonas>maybe... But I would guess the issue is somewhere else...
<stikonas>well, SystemBoot there is actually system->boot. And system is just second argument to the entry point (so on x86_64 it is stored in rdx)
<oriansj>also you would want to set RAX to 42 to be sure that it did run successfully
<stikonas>in principle yes...
<stikonas>though at this stage is simply refuses to load
<stikonas>so I thought RAX 42 is the next step
<stikonas>but if you want, we can add it...
<oriansj>is stripped size supposed to be 16bits?
<stikonas>hmm, I can try that...
<stikonas>I have no idea what it has to be set to...\
<stikonas>right now I only get "Command Error Status: Unsupported"
<stikonas>so it might be something even more wrong
<stikonas>should probably grep for that error in tianocore...
<oriansj>do we know of any method of making a TE binary with any standard assembler?
<stikonas>no...
<stikonas>I've only found https://uefi.org/sites/default/files/resources/PI_Spec_1_6.pdf
<stikonas>page 243
<stikonas>and TE2PE converter
<stikonas> https://uefi.org/sites/default/files/resources/PI_Spec_1_6.pdf
<stikonas> https://github.com/LongSoft/TE2PE/blob/master/TE2PE.c
<stikonas>right now we don't even know if UEFI can actually load TE executables later (not in it's own internal stages but as bootloaders)
<stikonas>but if it can, TE seems to be terse enough that binaries might be as small as in stage0-posix
<oriansj>StrippedSize: would require us to figure out the number of bytes a PE header would have and do some subtraction
<stikonas>oh yes, it's not just size of file...
<oriansj>but the size of the shrink from a PE file
<stikonas>probably best to try to check some UEFI implementations...
<stikonas>since the docs/specs are not very clear
<stikonas>oriansj: maybe this is useful https://github.com/tianocore/edk2/blob/master/BaseTools/Source/C/Common/BasePeCoff.c#L1281 ?
<stikonas>hmm, StrippedSize might be more used if sombody uses tools to convert PE32+->TE
<stikonas>maybe we can just set it to 0
<muurkha>stikonas: yeah, RV32I is a reasonable target, and something like SeRV is a much more manageable piece of hardware to build than something like ARM or Knight or something
<muurkha>tianocore eDonkey 2000?
<stikonas>well, typo...
<stikonas>it's edk2
<stikonas>I think edonkey is ed2k
<muurkha>aha