IRC channel logs

<raghavgururajan>civodul: sneek is drunk. It delivered that last message too late. xD

<raghavgururajan>That too in a wrong channel. hmm.

<rlb>wingo: so say you have sparsely indexed utf-8strings, but (internally) for an operation you want to traverse the string to the end anyway, and you know how many *chars* it has, so you'd like yo avoid the cost of "finding the end, byte-wise". But u8_mbtoucr() requires you to specify how many valid bytes are remaining. So you either have to pay the cost, or lie to it, which seems questionable?

<rlb>(The cost of finding the end shouldn't be all that high, given the index, so could just accept it,but it's not *completely* trivial.)

<rlb>wingo: hmm, maybe in situations where it's OK to assume the string is valid, u8_next() might be plausible, i.e. even when the string's not null terminated, if we know that there's at least one more character.

<manumanumanu>rlb: there is a SRFI you might be interested in that has an implementation of immutable strings that has O(n)-ish access, but using bytevectors as a backend for efficient storage.

<manumanumanu>is it immutable texts?

<manumanumanu>I can't remember

<manumanumanu>it is based around the idea of string cursors and byte chunks of something like 128 bytes

***apteryx_ is now known as apteryx

<stis>Tja guilers!

***Noisytoot is now known as Noisytoot__

***Noisytoot__ is now known as Noisytoot

<rlb>manumanumanu: right, thanks, I've seen it if it's the "texts" srfi, and iirc I'm toying with something not unrelated.

<rlb>Should a hypothetical scm_c_put_utf8_chars(const uint8_t *u8, ...) require the char_count, byte count, or both?

<yoctocell>Hi, I am trying to convert a bytevector to a string using 'utf-8->string', but I am getting an error: Throw to key `decoding-error' with args `("scm_from_utf8_stringn" "input locale conversion error" 0 #vu8(1 88 97 227 44 138 231 228 132 202 66 51 172 245 9 153 162 251 92 121))'.

<rlb>yoctocell: offhand, I'd guess the argument might not be valid utf-8.

<yoctocell>Never mind, I had to use 'bytevector->base16-string' from (guix base16), I was playing around with (guix openpgp).

<yoctocell>rlb: I don't really know much about utf-8, but thanks anyways :)

<rlb>certainly

<wingo>heyo

<rlb>wingo: should a hypothetical scm_c_put_utf8_chars(const uint8_t *utf8, ...) require the char_count, byte count, or both (if you have any firm opinion, offhand). And should it trust the content, or insist on trying to verify it?

<wingo>no real firm opinion. if we go by e.g. scm_from_utf8_string vs scm_from_utf8_stringn, there should be a "n" variant that takes a byte count

<wingo>i guess verification is cheap, right? if someone wanted to skip verification they could use a byte interface

<wingo>though i guess you don't get transcoding in that case

<rlb>depends? I mean it's a full extra pass over the string, one code-point at a time.

<wingo>right but what's the ns/byte for verification

<rlb>But of course I think you *can* make it very fast (even going as far as sse/avx/whatever if you like). But still more than doing nothing.

<wingo>right but compared to whatever the port is going to do, probably it's lost in the noise. dunno

<rlb>Part of the question here is related to internal vs external clients -- i.e. if the bytes are coming from a *string*, we know they're valid. But in other cases, I'm wondering how careful our api intends to be.

*wingo nod

<rlb>(Also in the case of internal uses, getting the byte_n is a bit more expensive, since we have to indirect through the index.)

<rlb>So possibly worth avoiding when we can.

<wingo>ah yeah indeed

<rlb>i.e. when we know we're going to have to traverse the string at least once anyway.

<rlb>(as part of the work)

<rlb>But on other cases, we can memcpy, if we know the byte_n, etc.

<rlb>Anyway, not critical, just wondered.

<rlb>(And can always have scm_c_put... anc scm_i_c_put... or whatever if we decide it's warranted.)

<wingo>whee https://wingolog.org/archives/2021/04/11/guiles-reader-in-guile

<ft>\o/

<civodul>wingo: yay!

<lampilelo>oh, cool

***nckx is now known as jorts

<ArneBab>wingo: nice! Thank you for accompanying your great hacking with great writing!

<zzappie>great article! Does it mean that we are on the way to edebug style debugger?

<zzappie>wien it possible to retrieve source locations

<zzappie>*when

*zzappie zzz

IRC channel logs

2021-04-11.log