IRC channel logs

2022-02-20.log

back to list of logs

<muurkha>I've definitely learned a lot from looking at gcc asm output
<muurkha>gcc -S and especially gcc -g -Wa,-adhlns=foo.lst
<muurkha>22:35 < stikonas> yeah, the steep curve of learning assembly is that it forces you to use syscalls from the very beginning
<muurkha>this is not really true unless you're compiling without the standard C library
<muurkha>or building for a platform that doesn't have one
<muurkha>you do not need to call open to open stdout normally
<unmatched-paren>do i need a .data section, even if there's no data?
<unmatched-paren>and do i need `ELF_end:` like in the stage0 file?
<unmatched-paren>bash: ./hello-world: cannot execute binary file: Exec format error
<unmatched-paren>:(
<muurkha>no
<muurkha>`file hello-world` will tell you what format it is
<unmatched-paren>hello-world: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped
<muurkha>possibly that you just assembled it without linking
<muurkha>rename it to hello-world.o and try `ld -o hello-world hello-world.o`
<unmatched-paren>oh, yes, i just `nasm -o hello-world -f elf64 hello-world.asm`
<unmatched-paren>thanks :)
<muurkha>the kernel isn't as helpful as it could be in that case
<unmatched-paren>./hello-world produces... nothing :(
<muurkha>try `strace ./hello-world`
<muurkha>nothing is a big step up from "Segmentation violation"
<muurkha>or "exec format error" — nothing means it ran!
<unmatched-paren>`-1 EFAULT (Bad address`
<muurkha>aha
<unmatched-paren>segfaults are fun :)
<muurkha>EFAULT on exec or what?
<unmatched-paren>stat(0x1, 0x48) = -1 EFAULT (Bad address)
<unmatched-paren>on stat, apparently
<muurkha>yeah, you're passing it two addresses that both don't make sense
<muurkha>but... are you calling stat on purpose?
<unmatched-paren>i never used stat...
<unmatched-paren>no
<muurkha>may be useful to know that 0x48 is 'H'
<muurkha>perhaps you're accidentally calling stat() instead of write()
<unmatched-paren>yes i am, thanks :)
<muurkha>but also you're passing it 'H' instead of the memory address where 'H' lives, which is an easy mistake to make in assembly
<unmatched-paren>oh, should i mov 'H' somewhere?
<muurkha>no
<muurkha>probably you should have something like mystring: DB "Hello world\n", 0
<unmatched-paren>in .data
<unmatched-paren>?
<muurkha>and pass $mystring
<muurkha>potentially but it doesn't really matter
<muurkha>although I'm a little rusty on Intel-style syntax
<unmatched-paren>what does DB mean?
<muurkha>data bytes. it means "put these bytes in the program here"
<muurkha>maybe in Intel-style syntax $mystring is just mystring
<unmatched-paren>k
<muurkha>write() expects a memory address at which to find the data to write
<muurkha>so you have to store the data at a memory address in order to appease it. and DB (.asciz in gas syntax) is the easiest way to do that
<unmatched-paren>paren@guix-aspire ~/code/asm [env]$ ./hello-world
<unmatched-paren>Hello, world!
<unmatched-paren>\o/
<unmatched-paren>the db line is supposed to look like:
<unmatched-paren>hello_world db `Hello, world!\n`
<unmatched-paren>nasm does not process escape sequences, unless you use backticks
<unmatched-paren>and you don't use a $ for variables in intel syntax
<muurkha>congratulations!
<unmatched-paren>muurkha, stikonas: thanks for all the pointers (haha)
<muurkha>do the `` make nasm append a 0?
<muurkha>or is there just a 0 following it by coincidence?
<unmatched-paren>i think it might be implicit? idk
<muurkha>it's not implicit
<unmatched-paren>but it works without it
<muurkha>it might be coincidental
<muurkha>like, if you don't have anything after it, the rest of the segment might be filled with 0s by default
<unmatched-paren>what does that number mean, anyway?
<muurkha>number?
<muurkha>the `` syntax is new to me, and it might be a nasm thing for appending a 0 "implicitly" too
<muurkha>um, I guess in the case of write() you don't need the 0!
<muurkha>you just need to tell it how many bytes to write
<stikonas>unmatched-paren: now you can check how M0 does printing
<unmatched-paren>muurkha: ah, okay
<muurkha>it's just the C string stuff that uses the trailing 0
<stikonas> https://github.com/oriansj/stage0-posix/blob/master/AMD64/NASM/M0_AMD64.S#L818
<unmatched-paren>oh, does the zero add a null terminator?
<muurkha>"null" means "zero byte"
<stikonas>zero is null terminator
<unmatched-paren>sorry, nul
<stikonas>but it's just a convention to terminate strings with zero
<stikonas>well, it's either terminate with some character or always have string + length
<stikonas>string + length is probably more secure
<unmatched-paren>what should i try to do next? write to a file?
<stikonas>you don't want kernel to loop forever in case somebody forgot null byte
<stikonas>unmatched-paren: you can write to a file but it's more of the same
<unmatched-paren>except with added openat :)
<stikonas>just another syscall to open the file
<stikonas>yes
<stikonas>well, you can do that as intermediate step
<stikonas>but later you should start learning functions
<stikonas>that will force you to start using stack
<muurkha>open() or openat() does use a nul terminator for the filename
<unmatched-paren>i should probably also figure out how to write to memory and not just registers
<stikonas>unmatched-paren: well, easiest way (and that's what stage0-posix does) is to allocate memory using brk
<unmatched-paren>there's two openats???
<unmatched-paren>in musl:
<unmatched-paren>#define __NR_openat 257
<stikonas>unmatched-paren: actually you already dealt with reading from memory
<unmatched-paren>#define __NR_openat2 437
<stikonas>openat2 might be newer
<unmatched-paren>openat and openat2...
<stikonas>it's the same story for forking
<unmatched-paren>#define __NR_openat2 437
<unmatched-paren>oops
<unmatched-paren>The openat2() system call is an extension of openat(2) and provides a superset of its functionality.
<stikonas>after forking, you have to wait for processs
<unmatched-paren>that's what i meant to paste
<stikonas>there is waitpid, waitid, wait4...
<unmatched-paren>openat2 is very recent, it seems: https://www.phoronix.com/scan.php?page=news_item&px=Linux-5.6-Adds-Openat2
<stikonas>none of the old syscalls get removed due to syscall interface compatibility
<stikonas>unmatched-paren: anyway, you already deal with memory a bit
<stikonas>sys_write syscall takes a pointer to address in memory
<unmatched-paren>yes
<stikonas>so you already did read from memory
<stikonas>on risc-v I had to use load/store instruction to deal with memory access, but I think on amd64 opcodes can take memory addresses directly
<unmatched-paren>i'll try doing file io and memory writes tomorrow. thanks for all the help \o
<muurkha>yes, generally they can
<oriansj>unmatched-paren: this subset https://paste.debian.net/1231572/ is all you will need to know about AMD64 syscalls for a while
<oriansj>all registers indicated may change during a syscall so if you care about the values in those registers save them onto the stack prior to the syscall
<oriansj>the best way to learn AMD64 assembly is to write a simple couple line program and compile it with nasm and then use gdb (layout asm then layout regs) then do si to single step your binary. Look at the registers and notice how they change when each instruction is executed. You can do this with any instruction to get a clue to how it behaves (only div and udiv require you to read the manual)
<oriansj>the core assembly instructions you need to know are mov, add, sub, cmp, je, jne, push, pop, call and ret with mov being the one you will use the most
<oriansj>I suggest nasm assembly over at&t assembly (unless you plan on being multi-architecture then at&t is a better choice) but they both work just fine
<oriansj>stikonas: the simplest hello world in assembly would be mov 1, rax; mov 1, rdi; mov $message, rsi; mov 12, rdx; syscall; mov 60, rax; mov 0, rdx; syscall; message: "hello world" (so 8 instructions)
<oriansj>now syscalls don't care if the strings are null terminated or not, they just care how many bytes you told it to write and to what file (which stdin, stdout and stderr are; just files)
<oriansj>the reason strings in assembly usually are null terminated is because it is much easier to just write a function that reads bytes until null and syscalls than passing the length of your string (unless of course your assembler supports .length macros which handles calculating that for you)
<oriansj>everything you need to know about assembly, you can see in cc_*.S and you should always ask questions if something is not perfectly clear
<oriansj>now if you care about learning about byte coding, displacement calculation, immediate encoding and other machine level details there are other tools you need to know about and I certainly have some notes that I can share.
<muurkha>oriansj: that is good advice but unmatched-paren was already closed
<oriansj>muurkha: We have logging for a reason; and those who choose not to be constantly active still can see what they miss while offline.
<oriansj>and others can learn too by reading of the logs
<muurkha>I'll try to link unmatched-paren to the logs if they reappear