IRC channel logs

2021-04-11.log

back to list of logs

<raghavgururajan>civodul: sneek is drunk. It delivered that last message too late. xD
<raghavgururajan>That too in a wrong channel. hmm.
<rlb>wingo: so say you have sparsely indexed utf-8strings, but (internally) for an operation you want to traverse the string to the end anyway, and you know how many *chars* it has, so you'd like yo avoid the cost of "finding the end, byte-wise". But u8_mbtoucr() requires you to specify how many valid bytes are remaining. So you either have to pay the cost, or lie to it, which seems questionable?
<rlb>(The cost of finding the end shouldn't be all that high, given the index, so could just accept it,but it's not *completely* trivial.)
<rlb>wingo: hmm, maybe in situations where it's OK to assume the string is valid, u8_next() might be plausible, i.e. even when the string's not null terminated, if we know that there's at least one more character.
<manumanumanu>rlb: there is a SRFI you might be interested in that has an implementation of immutable strings that has O(n)-ish access, but using bytevectors as a backend for efficient storage.
<manumanumanu>is it immutable texts?
<manumanumanu>I can't remember
<manumanumanu>it is based around the idea of string cursors and byte chunks of something like 128 bytes
***apteryx_ is now known as apteryx
<stis>Tja guilers!
***Noisytoot is now known as Noisytoot__
***Noisytoot__ is now known as Noisytoot
<rlb>manumanumanu: right, thanks, I've seen it if it's the "texts" srfi, and iirc I'm toying with something not unrelated.
<rlb>Should a hypothetical scm_c_put_utf8_chars(const uint8_t *u8, ...) require the char_count, byte count, or both?
<yoctocell>Hi, I am trying to convert a bytevector to a string using 'utf-8->string', but I am getting an error: Throw to key `decoding-error' with args `("scm_from_utf8_stringn" "input locale conversion error" 0 #vu8(1 88 97 227 44 138 231 228 132 202 66 51 172 245 9 153 162 251 92 121))'.
<rlb>yoctocell: offhand, I'd guess the argument might not be valid utf-8.
<yoctocell>Never mind, I had to use 'bytevector->base16-string' from (guix base16), I was playing around with (guix openpgp).
<yoctocell>rlb: I don't really know much about utf-8, but thanks anyways :)
<rlb>certainly
<wingo>heyo
<rlb>wingo: should a hypothetical scm_c_put_utf8_chars(const uint8_t *utf8, ...) require the char_count, byte count, or both (if you have any firm opinion, offhand). And should it trust the content, or insist on trying to verify it?
<wingo>no real firm opinion. if we go by e.g. scm_from_utf8_string vs scm_from_utf8_stringn, there should be a "n" variant that takes a byte count
<wingo>i guess verification is cheap, right? if someone wanted to skip verification they could use a byte interface
<wingo>though i guess you don't get transcoding in that case
<rlb>depends? I mean it's a full extra pass over the string, one code-point at a time.
<wingo>right but what's the ns/byte for verification
<rlb>But of course I think you *can* make it very fast (even going as far as sse/avx/whatever if you like). But still more than doing nothing.
<wingo>right but compared to whatever the port is going to do, probably it's lost in the noise. dunno
<rlb>Part of the question here is related to internal vs external clients -- i.e. if the bytes are coming from a *string*, we know they're valid. But in other cases, I'm wondering how careful our api intends to be.
*wingo nod
<rlb>(Also in the case of internal uses, getting the byte_n is a bit more expensive, since we have to indirect through the index.)
<rlb>So possibly worth avoiding when we can.
<wingo>ah yeah indeed
<rlb>i.e. when we know we're going to have to traverse the string at least once anyway.
<rlb>(as part of the work)
<rlb>But on other cases, we can memcpy, if we know the byte_n, etc.
<rlb>Anyway, not critical, just wondered.
<rlb>(And can always have scm_c_put... anc scm_i_c_put... or whatever if we decide it's warranted.)
<wingo>whee https://wingolog.org/archives/2021/04/11/guiles-reader-in-guile
<ft>\o/
<civodul>wingo: yay!
<lampilelo>oh, cool
***nckx is now known as jorts
<ArneBab>wingo: nice! Thank you for accompanying your great hacking with great writing!
<zzappie>great article! Does it mean that we are on the way to edebug style debugger?
<zzappie>wien it possible to retrieve source locations
<zzappie>*when
*zzappie zzz