IRC channel logs
2026-04-17.log
back to list of logs
<rlb>jcowan: looks like mem_iconveh leaves us still out of spec for truncated input, i.e. utf16->string for #vu8(120) -> "". <rlb>but rnrs says "\ufffd" <rlb>I suppose I could try to patch that one case back up by detecting what the count *should* have been and adding a final missing replacement myself, i.e. add one if input % 4 != 0. <rlb>Though that's still assuming undocumented and consistent iconveh behavior, so I'll probably just leave it alone for now... <rlb>(assumes consistent behavior for an undocumented (afaik) case) <rlb>utf8 branch now has the utfN->string string->utfN fixes and additional tests <apteryx>but the documentation says "If OPTIONAL-ARG?, an argument will be taken if available."; maybe this only work for short options? <apteryx>re earlier SRFI 37 question, nevermind, it works correctly now <old>hey trick question here <old>I have a file timestamp using stat + stat:mtime + stat:mtimensec <old>What type I must use to create a time object with this timestamp with srfi-19 <old>MY guess would be UTC, but I am not knowledgable enough on file-system timestamp to be sure <jcowan>Yes, filesystem timestamps are always kept in UTC. Almost all parts of the Unix tradition keep all non-UTC systems, whether TAI or LCT, at the edges only. <dsmith>Yep. UTC. TAI seems like it would be a better choice, so don't have to deal with leap seconds. <old>In the context of a build-system that compare input/output timestamps, I'm not sure leap seconds would matter much <old>it could probably in some rare cases trigger a spurious build <dsmith>I don't remember the actual numbers, but some filesystems have a huge granularity to recorded timestamps <jcowan>Yes, technically Posix time accounts for leap seconds by doubling up on the broken-out labels. <jcowan>Very few systems maintain nanosecond *accuracy*, especially since even TAI is only known retrospectively: it is a weighted average of a great many clocks worldwide. <dsmith>When comparing the timestamps of a .go file and its corresponding .scm, Guile chooses the .go if the times are the same. <jcowan>Huh. I would have made the opposite choice. <jcowan>(compiling is safer than not compiling) <dsmith>jcowan, I did some work implementing PTP on embedded Linux. We had devices that could synchronize to 30ns. <dsmith>jcowan, That's the way I was leaning too. <jcowan>OTOH, if almost all compiles take less than a second, then perhaps conservatism is the wrong choice because everything will get recompiled every time. <dsmith>The argument was that after a build, the files might be copied somewhere with same timestamps. And the desire to not re-compile everything after that <dsmith>But it still feels wrong to me... <old>So, I should I stick with UTC or TAI? <old>Also for BLUE, there would be a user preference for hashing file content instead of just timestamp comparison <old>but I want to support both mode because hashing could be heavy weigh in some cases. Like in a CI where the environment is controlled (e.g. fresh VM), I think it might makes more sens to do timestamp instead of hashing <jcowan>Hashing is an interesting idea, but of course it's O(n) instead of O(1) <jcowan>Fortunately source code files are not measured in terabytes. <old>true, but as I measured the other days <old>on my machine, I can throughput ~260 Kib/s of sha256sum with Guile <old>I made some changes to rnrs arithmetic fixnums and now I am at 1 Mib/s <old>good yield, but nothing compared to sha256sum(1) of coreutils at 350 Mib/s <jcowan>I wonder if there are incremental hash algorithms with the property that if A is a prefix of B, then hash(A) is related to hash(B) such that you don't have to start over from scratch. <old>that would defeat the cryptographic nature of such algorithm I think <old>I'm not an expert on that field <jcowan>Yes, but you don't need cryptographic hashes for this purpose. <old>for local build sure, but what if we want at some point to have a distributed build-system <old>kind of what Guix does I guess <jcowan>For that matter, forging timestamps is ridiculously simple if you want to pretend that a compiled file is newer than a source file when it isn't. <old>it's more toward sharing build artifacts <old>but I guess this is more of a trusting issue anyway <jcowan>Such a system would not be just distributed but byzantine, and I wouldn't compile my code on untrusted systems: how do you know what it's injecting? <old>ACTION wonder how Bazel does it <rlb>I believe I recall some discussion (maybe lwn) of having a file "generational" counter/value --- might have been partially in the context of nfs, but don't recall the details. <rlb>Think that may be what I remembered. <old>there's n2 also that's interesting <ieure>People need to stop making new build systems. <jcowan>Why? That's like saying people need to stop making new programming languages, or DEs, or editors. <jcowan>"Let a hundred flowers bloom; let a thousand schools of thought contend." <jcowan>Of course Mao didn't mean it, but we do. <rlb>(It also talks about how that flavor of redo mitigates the timestamp issues lower down.) <rlb>old: that might or might not be relevant to your situation (i.e. the additional data/logic it uses to decide --- in the next section there, and looks like it talks about bazel a bit too). <old>rlb: right virtual mapping of file is a mess with timestamp <old>oh ya that blog post, I have it pinned down somewhere <rlb>...redo's approach overall seemed interesting, but I've not looked at it in detail. <rlb>There's also some summary info/overview in that version of redo's docs somewhere, I think. <rlb>Part of it is just the way it learns the finer grain deps as they're revealed, and just explicitly tracks the state (in sqlite last time I looked, maybe?). Don't recall --- been a good while since I poked at it. <jcowan>rlb: did we talk about noncharacter-based error recovery while encoding and decoding? I forget. <rlb>I don't recall that specifically, but perhaps. <ieure>jcowan, Too dang many of them, too many features (downloading deps is IMO *not* a build tool problem), too much fragmentation means you're often learning a new build system to do anything on a new project, even if you already know the language. Build system affects how *everyone* uses the software, DEs/editors do not. <jcowan>I find that I change build files far less often than code files. <jcowan>If I'm working on a greenfield project, then I need to make a choice and create a build file <jcowan>rlb: It's about dealing with dirty UTF-n data that comes in from pathnames, environment variables, etc. in a way that doesn't penalize people for the rare cases when it actually is dirty. <jcowan>The three standard error recovery modes all have problems: they discard information. The alternatives are to allow dirty data to infect your strings (contrary to R[67]RS) <rlb>Are you just talking about having noncharacters as another "error handler" --- if so, then I'd assumed that'd be part of the approach *if* we decide to go some noncharacters-involving route. <jcowan>or else to have a twofold system, where you might get a bytevector instead of a string and have to handle it. <rlb>That's also what python does(ish). <rlb>e.g. errors="surrogateescape", etc. <jcowan>Except that works because Python strings are allowed to contain unpaired surrogates, unlike Scheme strings. <rlb>I didn't mean their specific approach, we'd use noncharacters, of course. <rlb>Just the general approach of making it available (at least) as one of the recovery strategies <rlb>If we do go this route, I'm now leaning toward the idea that maybe we distinguish "kinds"/sources of data and don't have a blanket policy, e.g. perhaps apply noncharacters to say paths, env vars and argv by default, but not port content. <jcowan>The two basic ideas are to encode a byte XY as U+FDDX U+FDDY and to quote incoming noncharacters with U+FFFE. <rlb>That might also be a practical approach wrt the liklihood of any "data smuggling" risks we've discussed --- still covering the important/common cases. <rlb>Right, I know the strategy fairly well --- have part of it already implemented in a branch (the enc/dec part) here. <jcowan>Corruption in UTF-8 files is not uncommon, though. <rlb>Though I haven't implemented the newer \s bit. <jcowan>I'm not sure how useful that is. <rlb>Sure, though I'm thinking that for file content, we perhaps default to "error", and you need to ask for anything else. <rlb>file/network/etc. --- i.e. port content. <rlb>But for paths, etc. you get noncharacters by default, so that they "round-trip" without any special effort or need to know about all this mess. <jcowan>When Plan 9 was converted to UTF-8 throughout (in one day!) they originally went with error mode, but found that it required too much ceremony. <jcowan>"Originally the conversion routines, described <jcowan>below, returned errors when given invalid UTF, but we found ourselves repeatedly checking for errors and <jcowan>ignoring them. We therefore decided to convert a bad sequence to a valid rune and continue processing." <jcowan>They used U+0080 instead of U+FFFD, on the grounds that distinguishing the uncodable from the undecodable made sense. <rlb>I'm currently thinking we should consider switching ports to error by default in one of our forthcoming compatibility-breaking releases (perhaps with an easy way to opt-out --- i.e. change the default). <jcowan>Does Guile use character buffers as well as byte buffers when reading from textual ports? <rlb>I know our convention in the C code is to have a space before function call aruments; is that also the case when we're selecting a field in a returned struct? e.g. <rlb> x = y + some_thing (z).some_field; <rlb>(I'll assume so for now.) <rlb>(Just looks a bit odd.) <old>always found that C style .. weird <rlb>Not what I'm used to either. <rlb>But I've been writing so much of it, I at least don't have to correct myself all the time anymore :) <ekaitz>maybe it doesn't explain that specific case <jcowan>It doesn't *explain* at all. It just dictates, which means it is no more than a prejudice, and a weird one at that. <jcowan>Nobody but GNU writes function calls in any context as foo (x). <jcowan>The arrogance of "We find it easier to read a program when it has spaces before the open-parentheses" is stunning. Who is "we"? <identity>«you people use spaces in your programs?» <identity>i tried one of my small programs, the ratio of other characters to spaces is close to 32… i would guess that is just me, though <probie>jcowan: Different language, but the most commonly used style in Ada also adds an unneeded space between function/procedure calls and their aguments <dsmith>Ada also uses () for arrays. Weird. <Arsen>20:29:23 <jcowan> The arrogance of "We find it easier to read a program when it has spaces before the open-parentheses" is stunning. Who is "we"? <Arsen>it's a royal "we". fairly usual - any document that sets any standard for any project uses the word "we" <Arsen>> In certain structures which are visible to userspace, we cannot require C99 types and cannot use the u32 form above. Thus, we use __u32 and similar types in all structures which are shared with userspace. <probie>If we ignore the actual implementation on metal, what is an array except a function from index to value? <Arsen>probie: their set of indices is also contiguous ;) <rlb>In clojure vectors *are* also functions, and so are maps, and sets... <Arsen>20:25:51 <jcowan> It doesn't *explain* at all. It just dictates, which means it is no more than a prejudice, and a weird one at that. <Arsen>(foo bar baz) => foo (bar, baz) <- there's the explanation <rlb>commonly used for filtering, etc. (filter some-set some-collection) <Arsen>in fact that also explains the positioning of curlies under flow control statements such as if (consider progn/begin) <rlb>(filter #{42} [1 2 3 ...]) <Arsen>re the question on 'foo ().bar'; yes, the space is correct. example from gcc: get_global_range_query ()->range_of_expr (r, op0, stmt); <Arsen>here's one from emacs: print_string (BVAR (XBUFFER (XWINDOW (obj)->contents), name), <Arsen>... and from libguile ;) #define SCM_HASHTABLE_N_ITEMS(x) (SCM_HASHTABLE (x)->n_items) <ekaitz>jcowan: Do you know *everybody*? <ekaitz>i found it surprising at the beginning, and now I started to like it <ekaitz>maybe only GNU does this, but they have been doing it for 40 years so I respect that <ekaitz>it's not like I'm going to be a guest in your home and don't listen to your habits <ekaitz>if you like to take your shoes off, i do <ekaitz>i don't think having a preference is arrogance <ekaitz>(the half indentation is weirder, and nobody seem to complain about it) <dsmith>(Not really. I've got to the "whatever" point a long time ago) <dsmith>I learned C from K&R, and to me that's what C code should look like. <ekaitz>ACTION is so dumb he doesn't have very strong opinons about anything <ekaitz>ACTION thinks he is getting dumber <dsmith>It's really what you are used to. I really hated how "noisy" Rust looked after previously learnign Go. <dsmith>But you just get used to it, and then it's ok. <ekaitz>dsmith: wait! but doesn't let me complain and be judgemental about other people! Unacceptable! <ekaitz>also, it requires some effort from my side! doubly unacceptable! <ekaitz>it must be somebody else's problem! they are *so arrogant*! <dsmith>The default emacs C-mode does gnu by default. As does the indent prog <dsmith>A nice thing about Rust and Go is there is an enforced layout. And so no pointless trivial arguments. <rlb>...I kinda want a "remember" variant like scm_remember_and_return_1 (remembered, return) for cases like scm_remember_and_return_1 (stringbuf, u8[i]) where u8 is the internal content of stringbuf. <rlb>Avoids a bunch of otherwise unnecessary 4 line blocks. <rlb>(no idea what it should be named, if it's even plausible) <rlb>e.g. you can just "if (foo) scm_remember_and_return_1 (buf, u8[i]);" <rlb>though of course that particular case doesn't work --- it'd only work (easily) for SCM return values. <rlb>OK, think all the likely string_refs are gone from libguile in utf8 --- i.e. the simplicity of the remaining calls may be worth the minimal extra cost, but can always revisit later. Well, there's still the use in array-handle.c that I haven't investigated, but offhand, I'm guessing it's not something that could be easily changed. <rlb>I'll push an update later.