IRC channel logs

2022-06-05.log

back to list of logs

<oriansj>oohh, I could make stdout be the default output target
<oriansj>/dev/stdout to be precise
<oriansj>should make testing easier
<oriansj>and fuzzing safe
<muurkha>stdin being default input and stdout being default output is more convenient in a lot of ways, but of course we can't do that before we have ways to redirect input and output
<oriansj>muurkha: we have no pipes at this stage
<Hagfish>speaking of testing, is there any instrumentation set up for measuring test coverage of some reasonable segment of the bootstrap path?
<Hagfish>maybe not test coverage, but execution coverage?
<Hagfish>i don't imagine anyone would start deleting functions from e.g. the tcc source if we found they weren't ever used, but it might provide some insights
<oriansj>well i have AFL setup and testing scripts to check for segfaults in all stage0-posix pieces but I don't think that is what you mean
<Hagfish>AFL can generate minimal testcases, right?
<Hagfish>can it generate a minimal coverage corpus?
<oriansj>Hagfish: segfaulting and crashing cases but not known good test cases
<Hagfish>ah, pity
<oriansj>well, I would never reject if someone were to create more test cases for any of the various pieces
<Hagfish>of course
<oriansj>I just haven't had time to do that beyound the basic level so far
<Hagfish>i've never thought about what testing assembly code would look like
<Hagfish>is it possible to just statically analyse that all code is reached for a few given inputs?
<oriansj>Hagfish: the same as one would write a test for a C program or a python program. write a file with valid inputs and compare the output result against what known good values should be
<muurkha>oriansj: I didn't mean pipes, just file redirection
<oriansj>muurkha: we don't have that either
<muurkha>I know
<muurkha>that's why I said "of course we can't do that before we have ways to redirect input and output"
<oriansj>Hagfish: think ruby spec tests if you are wish
<Hagfish>oriansj: i don't think i can define code being "correct" any better than that. it just feels like treating the code as a black box, which doesn't give any insights if the test fails
<oriansj>^are^^
<oriansj>Hagfish: You can easily target a single code path in all of the assembly pieces
<muurkha>but it does simplify things to not have each program opening and closing its own input and output files
<Hagfish>oriansj: i guess statically analysing code probably quite quickly bumps into the Halting Problem or exploding complexity as you try to reason about what else the code could have done if there were a bug
<oriansj>muurkha: replace -f foo -o foo .... ummm it seems simple enough for me
<oriansj>Hagfish: also we tried to make the assembly code as small and simple as possible
<Hagfish>yes, good
<oriansj>so it'll definitely fail badly if given questionable input
<oriansj>So you could do an output compare between hex0, hex1, hex2 in assembly with mescc-tools hex2 and the output should always be identical
<oriansj>same for M0 and M1
<oriansj>cc_* does have a C version which it should directly match
<Hagfish>yeah, that sort of makes the different versions be test suites for each other
<oriansj>not ideal but something we certainly can improve given enough effort
<Hagfish>and i think you're right, a decent test just has to be a simple input (that can be reasoned about) and the relevant expected outputs, so that the failure mode makes it clear which part of the implementation needs fixing
<oriansj>so simple that it obviously is correct has benefits ^_^
<stikonas>oriansj: thanks, I'll try although probably not today
<oriansj>stikonas: no worries, I'll be tweaking today anyway
<stikonas>was doing a lot of driving and hiking in the mountains today, so a bit tired
<oriansj>stikonas: then absolutely relax and enjoy yourself. no rush
<oriansj>muurkha: but sure, I'll include /dev/stdin for the default input (assuming fseek(input, 0, SEEK_END); works correctly with that)
<muurkha>oriansj: it's simple to invoke, but replace.c has to open() or fopen() the files and also parse more arguments
<oriansj>muurkha: we do the fopen *after* all of the arguments are parsed
<muurkha>right, of course
<muurkha>and in itself that is not a huge amount of complexity, it's just that each of the other programs in the early-stage bootstrap has to duplicate that same small amount of extra complexity
<muurkha>or unbootstrap or whatever
<muurkha>even though a lot of them read one input and write to one output
<oriansj>well doing replace a few times is pretty easy to reason about and audit
<muurkha>yeah, I'm not saying the replace command is a bad idea
<oriansj>just the pattern of -f file -o output is duplicated functionality
<muurkha>I'm saying that moving a little bit of its complexity into kaem or something similar would reduce the overall system complexity
<muurkha>right
<muurkha>it would also make it a little easier to audit I think
<muurkha>because there are less places where some program opens a file for output which could potentially be overwriting something unexpected
<oriansj>tempfiles are pretty easy to audit and being able to see every processing step as an audit record means that questionable changes can be traced back to the source program
<oriansj>so the chain M2 -f input.c -o input.M1 && M1 -f input.M1 -o input.hex2 && hex2 -f input.hex2 -o input vs M2 -f input.c | M1 -- | hex2 -o input
<muurkha>I agree
<muurkha>fwiw a super stupid replace.c without file opening or string library functions is 60 lines of C: http://canonical.org/~kragen/sw/dev3/replace.c
<muurkha>I don't think we have enum support until tcc tho
<oriansj>nor fprintf
<oriansj>and I allow \n to be in my patterns
<oriansj>fgets would break on \n
<muurkha>this implementation works okay with \n in the pattern, as long as it's at the end
<oriansj>true
<muurkha>but I didn't think that was important
<oriansj>and you can bust the buffer
<muurkha>undetectedly?
<oriansj>but you do support nulls in your input/output
<oriansj>muurkha: o replaced by ooo
<oriansj>my version would just turn hello world into hellooo wooorld
<muurkha>it crashes with "./replace: string too long" in that case
<muurkha>if I feed it 3000 o's
<muurkha>it does have a much nastier problem actually
<muurkha>if you run it as ./replace bab x
<muurkha>and you feed it a line consisting of babbabbab... many times (I tested with 3001 repetitions)
<muurkha>it outputs xxxxx...xxbabxxxx... with three unreplaced "bab"s
<muurkha>I blame fgets!
<muurkha>because with fgets there's no reliable way to tell the difference between an input line that lacks a "\n" because it ends with EOF and one that lacks a "\n" because fgets ran out of buffer space
<muurkha>if it happens that the input line is shorter than the buffer then you know it's the former
<muurkha>fixed tho
<muurkha>it does successfully turn hello world into hellooo wooorld of course
<muurkha>it doesn't really handle nulls in the input or output; fgets doesn't tell you how many bytes it read, so you're reduced to looking for nulls, and argv also doesn't tell you how long the argument strings are, so again you're reduced to looking for nulls
<muurkha>it'd be kind of fun to compile a pattern-replacement pair into a KMP-style state machine
<muurkha>you stick an input byte into it, it changes state and emits zero or more output bytes
<muurkha>harder to tell if such a compiler is buggy though. this find/cmove loop is dumb enough that probably you could see if it was buggy
<muurkha>surely find() could be expressed better. probably with struct slice { char *s; char *end; char *buf_end; } the whole program could be expressed better
<oriansj>well a single fixed buffer that the input is pushed through which either matches or doesn't match the pattern is about as simple as it can get
<muurkha>do you mean what I wrote in that replace.c, or something else?
<oriansj>muurkha: that is what my replace.c does
<oriansj>just pushes in bytes at the end of the buffer and shift by 1 until it either matches or end of file
<muurkha>that is indeed simpler than looping over lines
<stikonas>fossy: another packaging bug https://github.com/fosslinux/live-bootstrap/issues/180
<muurkha>is that what you mean?
<stikonas>fossy: automake includes build month
<oriansj>so [][][][][][a] -> [][][][][a][b] -> [][][][a][b][c] -> [][][a][b][c][d] -> [][a][b][c][d][e] -> [a][b][c][d][e][f] -> [b][c][d][e][f][g] sort of pattern
<oriansj>on match , just dump the replacement and clear the buffer
<oriansj>hence why null is just ignored and not supported in my replace.c
<muurkha>aha, similar to the "stick an input byte into it, it changes state and emits zero or more output bytes" thing I was saying, but with an actual buffer and emitting some bytes later than you strictly need to
<muurkha>that's probably a better approach!
<muurkha>maybe a little slower but definitely simpler
<muurkha>(the KMP-style state machine approach is that you don't need an actual buffer; if you're in state 3 then you know that the last 3 bytes you saw were the first 3 bytes of the pattern/needle, so if the next byte is a mismatch, you know that the "buffered" bytes you need to spit out are the first three bytes of the pattern)
<muurkha>(and then usually you transition to state 0 or 1, but in cases of patterns like "abad" you might transition to some other state instead, which is the tricky part about KMP search)
<muurkha>I guess you're using null instead of a counter to keep track of how many bytes you have in the buffer? a counter might be just as simple
<oriansj>muurkha: trimming down my replace.c it is just 74lines https://paste.debian.net/1243058/
<muurkha>nice
<oriansj>if I drop the buffering it should be a handful smaller
<muurkha>more readable too
<muurkha>ah but it's using string.h
<muurkha>sizeof(char) is defined to be 1 btw
<muurkha>more readable than the replace.c I wrote, I mean, not more readable than your previous version
<muurkha>in ANSI C, if you're on a machine like a TI DSP where a char is 32 bits, sizeof(int32_t) is also 1. that is, sizeof is defined in terms of multiples of sizeof(char), not in multiples of 8 bits or something. disclaimer: I haven't ever actually used C on such a machine!
<muurkha>not sure if GCC has a warning for "* sizeof(char)" but it probably should
<muurkha>probably it isn't intentional that you're setting hold[pattern_length-1] to hold[pattern_length] before you set it to buffer[buffer_index]
<oriansj>dropping the buffering gets it down to 60 lines: https://paste.debian.net/1243061/
<muurkha>though I guess it's harmless since hold is allocated with 4 extra bytes in it
<muurkha>very nice. and you're even counting the blank lines and comments, so it's really more like 55 very clear lines
<muurkha>if you don't like writing while (i < pattern_length - 1) you could avoid the suspicious read of hold[pattern_length] by saying size_t i = 1; while (i < pattern_length) { hold[i-1] = hold[i]; i = i + 1; }
<muurkha>it's maybe undesirable to keep fgetcing on the input once you've already seen an EOF on it, just because it keeps you from handling terminal input
<muurkha>also maybe undesirable to signal being invoked without enough command-line arguments with "Segmentation violation"
<oriansj>muurkha: so yeah, more dense and simple is easy but the complexity does allow a more clear user experience
<oriansj>I could probably add a line like: if(argc < 4) { puts("need replace input pattern replacement [output]"); exit(EXIT_FAILURE);}
<oriansj>which would clear up that segment violation
<oriansj>I also could merge read_next_byte and check_match to save an additional 5 lines
<oriansj>using hold[i++] would also be a cheap way to cut 2 lines
<oriansj>so down to 41 lines: https://paste.debian.net/1243070/
<oriansj>only the ++ behavior is invalid for M2-Planet
<oriansj>ok now down to 31 lines (and a little ugly): https://paste.debian.net/1243071/
<oriansj>and using linux C formatting saves a couple extra lines: https://paste.debian.net/1243073/ 21 lines, single C expression per line
<oriansj>if I do lisp line formatting it goes down to 18 lines
<oriansj>hmmm; why the heck isn't while(0 != hold[i-1]) hold[i--] = 0; not decrementing i at all
<oriansj>17 lines: https://paste.debian.net/1243077/
<oriansj>I probably could turn that into ascii art without adding lines but I probably wasted enough time proving the point
<oriansj>getting rid of the file opening and #include <string.h>; but making it more readable: 34 lines https://paste.debian.net/1243078/
<oriansj>now the real question, is this too agressive? https://pdp10.guru/stage0
*pabs3 would say yes :)
<unmatched-paren>yes
<fossy>stikonas[m]: ugh, i swear i fixed that...
<fossy>yes that is quite aggressive...
<unmatched-paren>i'd personally write it in a way that celebrates the achievements we've made, instead of trying to middle-finger all the detractors
<fossy>agreed
<fossy>it doesn't really help people to contribute when it's aggressive
<fossy>stikonas[m]: hm, so i fixed it for *some* things https://github.com/fosslinux/live-bootstrap/blob/master/sysa/help2man-1.36.4/patches/date.patch
<unmatched-paren>fossy: yeah, it alienates people
<fossy>now why isn't that patch working there, is help2man not regening the manpage?
<oriansj>I guess, I'll have to rewrite a good bit
<stikonas[m]>Yeah, less aggressive would be better
<unmatched-paren>as much fun as it might be to vent at the people who said it was impossible, you certainly don't want to make enemies of them
<oriansj>I guess I was just doing a bit too much Jason Scott
<oriansj>especially the archive warrior, we are here to save your shit routine. But it does seem to rhyme with bootstrapping-team: we are here to bootstrap your shit
<muurkha>on the contrary, the people who said it was impossible are the ones who build your reputation
<muurkha>your enemie sare the people who say it's trivial
<muurkha>oriansj: I think "Segmentation violation" is preferable to writing an error message to stdout
<oriansj>muurkha: generally I disagree with that perspective but I was kinda pushing to see how far I could get
<oriansj>and a 17line replace.c is a bit terse
<oriansj>and it would have cost 3 lines to honestly provide the correct exit behavior for bad input
<muurkha>you disagree with the perspective of not corrupting output data with error messages?
<oriansj>muurkha: oh, I only write error messages to stderr not stdout
<muurkha>ah, I meant 01:29 <@oriansj> I could probably add a line like: if(argc < 4) { puts("need replace input pattern replacement [output]"); exit(EXIT_FAILURE);}
<muurkha>I agree that the message itself is better than "Segmentation violation" :)
<oriansj>and fputs would enable a proper write to stderr instead of puts
<muurkha>yeah. write(2, "need replace input pattern replacement [output]", 54); is hard to audit
<oriansj>muurkha: well fcntl.h includes #define STDERR_FILENO 2
<muurkha>to me the 2 is not the hard part
<oriansj>the counting of the bytes in the string
<oriansj>hence why fputs instead of write
<muurkha>right
<oriansj>but if one allows error.h we can get away with: https://paste.debian.net/1243125/
<oriansj>but 12 lines could get shaved if we just #include <string.h>
<Hagfish>muurkha: "the people who said it was impossible are the ones who build your reputation" <-- that's such a positive mental hack, thank you
<bauen1>muurkha: yes that's sadly true way too often
<stikonas[m]>fossy: I think help2man is renenerating manpage, else the date wouldn't be current month
<stikonas[m]>I think I'll push the new hash to unbreak the build and then we can sort it out properly
<muurkha>what is, bauen1?
<muurkha>Hagfish: it's also strategically useful, not just mentally. it gives you a free, deniable way to inflate your own reputation: fervently praise the people who said it was impossible
<muurkha>by saying that such-and-such is a respected computer scientist (rather than just some loser shitposting on a forum) you are making your own achievement (doing what they said was impossible) seem more impressive. but it doesn't sound like you're bragging! it sounds like you're being humble
<muurkha>also it reduces their motivation to continue to claim that it's impossible and that you haven't actually done it
<bauen1>muurkha: the thing Hagfish quoted
<Hagfish>it's like a form of mental jujitsu, where the more negativity you face, the happier you are, and you defeat your enemy by helping their self-esteem. it's incredible
<Hagfish>or it's like a jedi mind trick, making your adversary say "you're right, maybe i should go away and rethink my life choices..."
<muurkha>also those people give you useful information; things that everyone thinks are trivial are often not worth spending your time on
<muurkha>if you have money or subordinates you can get someone else to do them
<oriansj>muurkha: much like the idea, no fight is worth entering unless there is the risk you will be knocked the fuck out but also a slim chance to win. If no risk of getting knocked out, it is below you. If no chance of success, discretion is the better part of valor.
<oriansj>So pick up goals you think are just a little too hard for you and either get knocked out or discover that you are a stronger person than you thought you were. The more you lose, the farther you are along the path to becoming the best version of yourself.