IRC channel logs

2020-12-19.log

back to list of logs

<OriansJ>mihi: yes I am interested in the performance enhancements to hex2 and M2-Planet
<OriansJ>as for the file buffering; we could create a more advanced libc written in C and called by the minimal libc.M1 we have
<OriansJ>mihi: your mes-m2 pull request has been merged
<OriansJ>mihi: the file_print :: fputs was simply to catch me from from making assumptions about what functions are available.
<OriansJ>So it is just a tedious exercise to fix the references.
<OriansJ>as for the use of in_set; it makes certain conditions easier to implement in assembly; as converting in_set(c, "\t\n !#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~") to if('\t' == c || ..)and implementing in assembly is a non-starter. but in_set is just mov eax, [c] ; mov ebx, [string_1] call in_set
<OriansJ>and as such I like how it looks better; it produces denser assembly (but slightly slower).
<OriansJ>mihi: fossy is entirely responsible for the wonderful documentation work they did for mescc-tool-seed and all of the stage0 pieces honestly
<OriansJ>we could strip FILE out of M2-planet and make it struct defined in file.c; with seperate read and write buffers that are allocated when openned and flushed when closed. The only thing we would need to do is ensure fflush or close is done before terminating the program (except on errors)
<OriansJ>probably could fix all of the C file primitives to better match the standard (file_print -> fputs) while we are at it
<OriansJ>what do you think mihi?
<xentrac>hooray!
<OriansJ>mihi: honestly everything except for the pointer lookup tables; I would merge happily. The pointer lookup tables; I'd want a closer look at because if you can get it working in M2-Planet. I could figure out a way to add switch statement support to M2-Planet in a form that cc_x86 supports
<xentrac>that's wonderful!
<siraben>pder: ok, if the thousand of lines of assignments aren't good, would it be acceptable to output binary code from Haskell and load it in from a file?
<OriansJ>pder: we could easily add support for putchar to M2-Planet's file.c
<OriansJ>and getchar too
<OriansJ>but if you could provide an example C program that breakss because of assignments; I can fix M2-Planet so that it isn't an issue anymore or help figure out another solution
<siraben>OriansJ: is there a more elegant way to load the raw bytecode?
<pder>Oriansj, the easiest way to demonstrate the crash, is to checkout my branch marginally-m2-wip and run make marginally.c && ./marginally.sh The second script is a build script for marginally using M2-Planet
<pder>I was going to try splitting that code out into a separate file and compiling it with M2-Planet on its own
<pder>I also wonder how involved it would be to add support for array assignment with an initializer list to M2-Planet?
<pder>siraben, that might work and it would be handy to be able to open and write to multiple files from Haskell
<siraben>ok, let me know what sort of binary format you need it in, little/big endian and structure
<pder>siraben, could we add more stuff to the ffi interface so we could open files and read and write to them?
<pder>If thats the case, I could output C code to one file and the prog data to another
<siraben>yes, that would be a good start, extending the FFI
<siraben>what sort of FFI functions do we have ATM?
<pder>getchar, putchar, getargcount, getargchar
<siraben>Right, and how is that called from C?
<pder>take a look at the foreign() method either in vm.c or in generated/marginally.c
<siraben>Ooh so it's some sort of blend of C and lambda calculus
<OriansJ>pder: nice you managed to hit the recursive limit in recursive_output; I can fix that by reversing the list and looping through it iteratively
<OriansJ>siraben: the most elegant way to load bytecode is --raw; it does no processing and just loads it
<OriansJ>I am considering of taking out the messy logic used in -l and put it into a seperate program that converts its input into something that --raw will use happily
<OriansJ>pder: well I am not absolutely sure about the complexity of array assignment being added but it probably is something managable with a little effort. The big problem is making sure you output the right sized output. As you would have to do something like :CONSTANT_foo &CONSTANT_foo_address :CONSTANT_foo_address %3 $181 ... %8 (or !3 !181 depending on the size of the elementes of the array)
<OriansJ>%3 being 32bit !3 being 8bit
<OriansJ>siraben: a good change in the haskell code would to be harmonize the foreign function numbers in the pieces we have already converted to run on vm.c so that we can eliminate the need for --foreign 2
<OriansJ>pder: I fixed M2-Planet to handle the larger output size you needed and the patches are up
<OriansJ>and if you just stick https://paste.debian.net/1177580/ in your file.c; it'll compile without issue
<OriansJ>also why I don't I do the simple thing and just add the bigger string option to M2-Planet
<Hagfish>it feels like we're really seeing some "network effects", and this project is showing it's more than the sum of its parts
<Hagfish>keep up the great work, everyone
<OriansJ>Hagfish: we have always had network effects; janneke's development of MesCC got a lot easier the second mescc-tools started catching bugs for him. Then the introduction of label>label offset calculations made all ELF headers much easier for everyone
<OriansJ>pder: the good news is ./bin/marginally builds and runs now
<OriansJ>the bad news it runs long enough that I think it is hung in a loop
<Hagfish>OriansJ: the beginnings of an exponential growth curve always look less significant, in hindsight, than they were at the time :)
<mihi>OriansJ, thanks for the feedback. About the file buffering, I see multiple ways of processding (having only looked at x86 arch and assuming the other ones are similar)
<mihi>1) change libc.M1 to call exit() instead of hardcoded syscall, make current exit to _exit (as by POSIX) and implement exit() in C that closes all files
<mihi>2) Rely on application code to close all files
<mihi>3) Make it optional - when application calls allow_file_buffering() it is responsible to close all files
<mihi>4) have separate file.c and file_bufio.c
<mihi>I currently am using a different struct name than FILE so that my prototype can work without changing the compiler. Also I only use one buffer and a flag whether it is currently buffering input or output (and on mode switch, seek or flush appropriately).
<mihi>your in_set example is a good example for a bad usage: I assume (without having checked all the special chars) that it should match newline, tab, space and all printable US-ASCII. And it is definitely harder to read than (c == '\t' || c == '\n' || (c >=' ' && c <= '~)).
<mihi>side anecdote: End of last century, 11th alliance (11a.nu) released a bios password cracker, which - due to similar logic and the fact that the letter W is not used in the Swedish language - was unable to bruteforce passwords containing a 'W'.
<mihi>later came the much faster pwdigits password cracker which (due to short password hashes) could quickly find an equivalent password consisting only of digits that was at most 16 chars long for evey possible hash, so iBios320 became obsolete and they never bothered releasing a fixed version)
<mihi>I'm confident it is possible to build a M2-Planet compatible version of fasthex2 with pointer lookup tables, the main question is whether it is possible to do so without sacrificing speed when compiling with mescc/gcc/tcc
<mihi>anyway, got to go now. Will next try to polish my changes to hex2.c (except the pointer lookups) and blood-elf, and maybe add another patch across repos to add missing fclose calls.
<OriansJ>Hagfish: completely fair; 4 years of work did produce an interesting foundation to build upon
<OriansJ>mihi: I like option 2
<OriansJ>mihi: perhaps easier to read with in_set is perhaps a matter of taste and your example doesn't actually provide matching results: https://paste.debian.net/1177589/
<OriansJ>mihi: I do like that historical example and you are probably correct in that it would be vulnerable to that class of bugs if the goal was bruteforce password cracking but fortunately we are just dealing with a subset of ASCII chars as acceptable input.
<pder>OriansJ: thanks for fixing that so quickly. I'm looking into the hang now. It appears to be related to another comparison issue. I should know something more soon.
<OriansJ>another reason why I do it this way is *IF* the mapping between letters and the binary values that encode them is different; the behavior is universally identical. which is nice that it doesn't have a reasl performance impact as M2-Planet can compile a 24.599 line source file in 0.35 seconds (wall clock)
<OriansJ>pder: in a few minutes you should have an update to allow arbitrary MAX_STRING sizes
<OriansJ>just got to put it though its paces
<OriansJ>just make sure to use --max-string ## before you do -f
<OriansJ>otherwise it will ignore --max-string entirely and just default to 4096
<OriansJ>and patches are up
<OriansJ>it should support 0b10101, 00400, 0xFF and 1234 number formats
***clemens3_ is now known as clemens3
<Hagfish>i think that if someone wanted to work on more publicity, they could highlight how much work has gone into this (and of course who did that work)
<Hagfish>this project is not just impressive as a technical feat, but interesting as a human story too
<Hagfish>it's probably too early to film a documentary about it yet, though :)
<pder>Thanks OriansJ, I will take a look shortly. I'm still trying to figure out this hang bug. Its very strange because I can get it to work by simply adding a line that increments an unrelated variable
<OriansJ>Hagfish: well publicity has never been a skill of mine. and if you notice there are a lot of contributors to the various pieces that have created over these last 4 years. If nothing else this would be more of a story of how a group of people who saw something they could solve and me making a mess of all of it and creating the next batch of problems to solve.
<OriansJ>I guess we are now in the race to the finish phase of the story. With 3 different potential winners. A Haskell compiler (blynn-compiler), a Scheme interpreter (mes-m2) and an insane rewrite of binutils and GCC 4.7.4 in M2-Planet's C subset (M3). With the question of which will successfully complete bootstrapping GCC first.
<OriansJ>pder: could you share that diff, so I can try to spot it?
<pder>Hmm if I assign to an unsigned variable before doing a signed comparison it seems to do an unsigned comparison instead
<pder>Sure, one moment
<pder>I just pushed the branch marginally-m2-wip. I temporarily commited generated/marginally.c, but thats just for convenience. If you checkout that branch and run marginally.sh it should run and succeed. However if you remove the line that I add in the latest commit, it will hang. If you diff the M1 output, you will see a SETLE vs SETBE
<OriansJ>pder: dissecting now
<pder>I appreciate all your help
<OriansJ>ok, right now M2-Planet treats all function calls as returning a void type; so when calling promote_type with the 2 functions it is comparing NULL to NULL and returning NULL; which should always return CMP\nSETLE\nMOVEZBL\n
<OriansJ>for <=
<pder>Should I assign the results of num() to a variable before doing the comparison?
<OriansJ>There is a deeper bug in M2-Planet to fix here and I will find it and crush it
<OriansJ>now to reduce the test to something more trivial to trace while preserving the difference
<OriansJ>down to 155 lines to search
<OriansJ>odd promote_type is getting 2 unsigneds
<OriansJ>I reduced the test down to: https://paste.debian.net/1177634/
<pder>in generated/marginally.c num() has a return type of int, but the returns a value from the mem array which is unsigned. num() used to return unsigned in earlier stages.
<pder>If I change it to unsigned num() the gcc build hangs
<OriansJ>pder: ok, I managed to find the minimal change that produces the desired behavior. (eg isolate type information side of a single statement)
<OriansJ>^side^inside^
<OriansJ>I should have a patch up shortly
<OriansJ>and ./marginally.sh *WORKS*
<OriansJ>and patches are up
<pder>awesome, thanks!
<mihi>OriansJ, probably we should agree to disagree on in_set. The fact that I did not notice that the quote char was missing from the long character sequence proves it is definitely not my style :)
<mihi>by the way, can it be that mescc-tools-seed is broken when updating M2-planet submodule inside? some functions in string.c have disappeared :)
<mihi>but I guess I can send my first pull request if only the tests in mescc-tools pass without testing the bootstrap :P -> https://github.com/oriansj/mescc-tools/pull/13
<mihi>and your machine definitely is faster than mine. Compiling M2-Planet (<4000 lines) with M2-Planet compiled M2-Planet takes 5.5 seconds for me.
<mihi>(before my file.c changes)
<fossy>What is your cpu? Even on my laptop it takes ~3 seconds
<mihi>Intel(R) Core(TM) i3-3110M CPU + one layer of VirtualBox (running 32-bit Linux on 64-bit Windows)
<mihi>fossy, but still 3 seconds is one order of magnitude slower than .35 seconds
<mihi>usually I don't write that trademarks - that is copied verbatim from msinfo :)
<mihi>btw get_machine behaves funny on cygwin - it detects amd64 cpu and tries to run the linux amd64 tests. I "patched" my local get_machine to return "other" instead :)