IRC channel logs

2021-09-17.log

back to list of logs

<stikonas>just compiled my first C program with cc_riscv64, although cc_riscv64 is still very much WIP
<stikonas>it was just compiling int main() {}
<Jeremy_Rand>civodul: I could paste the desired HTTP header text here if you're able to convert that into the Scheme format used; would that be suitable?
<civodul>Jeremy_Rand: yes, let's do that
<Jeremy_Rand>civodul: is enabling inclusion in the HSTS preload list okay? Advantage is that web browsers will know to use HTTPS even before the first visit (so you get better security against sslstrip attacks), disadvantage is that if you decide to disable HTTPS in the future, you'd have to wait for browser updates to remove you from the preload list.
<Jeremy_Rand>(Asking because that affects which header text I paste)
<stikonas>I would say it should be fine
<stikonas>unless something happens to let's encrypt, there shouldn't be any reason to disable it
<Jeremy_Rand>stikonas: yeah, that's my thinking too. Looks like civodul disconnected; guess I'll wait for them to return before proceeding
<oriansj>Jeremy_Rand: no need. this channel is logged; civodul or rekado will see the logs and we can incorporate your fix
<oriansj>so just post your suggested fix when you think it is ok for others to see
<oriansj>stikonas: I'll be doing the AMD64 hex2 fix later today and hopefully the AArch64 this weekend.
<stikonas>yeah, AArch64 especially needs it
<stikonas>it takes 15 minutes on my AMD64 laptop on qemu
<stikonas>I think after the fix it will be down to maybe 2 minutes
<stikonas>not sure if it can be called a "fix", it's more of a workaround
<oriansj>performance enhancement as it isn't actually fixing anything?
<stikonas>what I mean it's qemu issue...
<stikonas>I don't think it matters for non-emulated systems
<stikonas>but ok, it fixes performance in qemu
<stikonas>on risc-v I've also fixed hex1, but it might be harder on other arches
<stikonas>I didn't want to recalculate jumps, but it just happened that I only had to insert 1 noop instruction
<oriansj>well it doesn't really save much time on other architectures to fix hex0 and hex1
<stikonas>exactly
<stikonas>it's spending 95% of time in hex2
<oriansj>oh and the raspberryPI memory mapped performance improvement still is only in the 12% range; So something else might be in play there but I'll look into that later.
<stikonas>well, try running it in strace
<stikonas>to see if you can spot where time is spent
<stikonas>maybe i/o is slower there
<stikonas>but it's really dramatic on more powerful machines
<oriansj>well ramdisk should eliminate i/o as the bottleneck but I guess I might as well byte the bullet and just strace it (it'll take about an hour)
<stikonas>bottleneck might be visible quite quickly
<stikonas>without running to the end
<xentrac>so, I'm noticing some things in Qfitzah I hadn't noticed before.
<xentrac>the context is that I'm trying to make the executable as small as I can; I'm down below 900 bytes at this point
<xentrac>for i386
<xentrac>if I put a value in a callee-saved register, that saves me from having to save and restore it across calls to other subroutines (2 bytes of code per call)
<xentrac>but I have to save and restore the register on entry and exit; this is 2 bytes, plus an extra byte per early return, because I have to put push %ebp or whatever at the top and pop %ebp before every ret
<xentrac>up to a maximum of 1 extra byte per early return, because I can replace pop %ebx; pop %ebp; ret with a simple jump to a shared procedure epilogue that pops each thing
<xentrac>you'd think this would sometimes be a win for space, but it basically never is because I never have enough child function calls with live temporary values. instead what wins is putting all my values in call-clobbered (caller-saved) registers and pushing them and popping them around the calls
<xentrac>which as an extra bonus allows me to move the values from one register to another for free when there's a call in between
<oriansj>a bunch of cacheflush(0x611be900, 0x611be968, 0) = 0
<xentrac>sometimes I'm also doing things like mov 4(%esp), %ecx even though that's 4 bytes; I think that's probably an error
<xentrac>this is sort of going to the opposite extreme from hex0 though: compromising comprehensibility and bug-proneness in pursuit of bumming the code down by a few bytes
<xentrac>I'm thinking I might reassign %ebx, %ebp, and maybe even %esi and %edi, to be call-clobbered (caller-saved) in Qfitzah to take advantage of the more compact instruction encodings that come from the 8086
<xentrac>I'm curious to hear about other people's experiences
<stikonas>ok, shoul be enough work on cc_riscv64 for now, got easy stuff working (labels, goto, return and asm statements)
<stikonas>and the binary is 6.4 KB now, so I guess I'm about 1/3 done
<xentrac>congratulations!
<stikonas>well, these are really easy, basically directly translates to assembly instruction
<stikonas>I guess parsing expressions will be where most of the remaining complexity is
<xentrac>expressions are not so bad
<xentrac>I mean if you're doing recursive descent you just refactor things like <sum> "+" <term> | <sum> "-" <term> into things like <term> ("+" <term> | "-" <term>)*
<xentrac>although C has a lot of precedence levels and implicit coercions and so it ends up still being a huge pain
<xentrac>I've actually done an LR-style parser by hand in the past, which might be easier, dunno. using repeated attempted pattern matches on the stack instead of a constant-time table
<stikonas>I don't think cc_* M2_Planet look at precedence levels (but I still need to look at that code
<stikonas>I think it requires explicit brackets
<xentrac>that's simpler!
<xentrac>in http://canonical.org/~kragen/sw/sk if you want to look at it. http://canonical.org/~kragen/sw/sk.js is the code that compiles λ-calculus expressions to SK-combinators and then evaluates them with combinator graph reduction
<xentrac>holy shit, I wrote that 15 years ago
<xentrac>did my grammar-factoring example make sense?
<stikonas>well, I guess. But in any case I'm just following what we already have for C code (M2-Planet/cc_* prototype) and cc_amd64.M1
<stikonas>oh, continue statement is basically no-op in cc_*
<stikonas>M2-Planet has only 1 continue statement but it looks like in a code that M2-Planet won't hit when building itself
<oriansj>yes it does have precedence levels. The hardest part is the tokenization which is why we have the debug_list being the most important function you write
<stikonas>well, that one is done some time ago
<stikonas>I wrote debug_list too...
<stikonas>didn't use it much though...
<oriansj>and the tokenization looks good? then you only have a couple hours of work before you are done.
<stikonas>a bit more than 2 hours...
<stikonas>yeah, tokenization is good
<stikonas>well, I need to do flow control / local variables and expressions
<stikonas>but hopefully will be done over weekend
<stikonas>cc_reader.c is only 200 lines in size...
<oriansj>but the hardest to debug and do correctly
<stikonas>yeah, I guess so
<oriansj>hence why debug_list is absolutely essential until it is done
<stikonas>I did debug it a bit... but it was not too bad
<oriansj>you'll need to debug it until debug_list's output matches what the C code would produce
<oriansj>hence take the C code and have it just dump the list so easier inspection.
<stikonas>well, I didn't try it yet on big programs...
<oriansj>the pointer addresses might be different but the tokens should be identical
<stikonas>but on smaller programs it was working just fine
<stikonas>well, I can compile something like that at the moment: https://paste.debian.net/1212061/
<oriansj>the real question is M2-Planet building just fine
<stikonas>yeah, we'll see once I'm done
<stikonas>right now I'm just trying to do one small feature at a time
<oriansj>fair
<stikonas>and then retest...
<stikonas>easiet to catch any errors
<stikonas>I suppose tokenization is much harder to get working right when you do it for the first time in assembly
<stikonas>anyway, bed time here
<oriansj>here is the full strace of hex0 building hex0: https://paste.debian.net/1212062/
<Jeremy_Rand>Good point oriansj. Okay, so, if you are confident that HTTPS is supported for all subdomains where HTTP exists, the raw HTTP response header that should be sent on TLS port 443 is:
<Jeremy_Rand>Strict-Transport-Security: max-age=63072000; includeSubDomains; preload
<Jeremy_Rand>TCP port 80 should also be changed to do an HTTP redirect to HTTPS, instead of serving the same content that TLS port 443 has; let me know if you need help with that change
<Jeremy_Rand>If you are concerned that there may be unnoticed subdomains where HTTP exists but HTTPS does not, you may wish to temporarily decrease the max-age number at first while you monitor server logs for issues
<Jeremy_Rand>Hope this helps, ping me if you have any issues/questions. Or if I've disconnected accidentally, you can email me at jeremyrand at danwin1210.me to get my attention
<fossy>stikonas[m]: got almost all your guile PR rebased, just testing rn
<stikonas>fossy: good to hear that
<stikonas>oriansj: hmm, that strace looks like it's still getting stuck in mprotect