<raghavgururajan>civodul: sneek is drunk. It delivered that last message too late. xD <rlb>wingo: so say you have sparsely indexed utf-8strings, but (internally) for an operation you want to traverse the string to the end anyway, and you know how many *chars* it has, so you'd like yo avoid the cost of "finding the end, byte-wise". But u8_mbtoucr() requires you to specify how many valid bytes are remaining. So you either have to pay the cost, or lie to it, which seems questionable? <rlb>(The cost of finding the end shouldn't be all that high, given the index, so could just accept it,but it's not *completely* trivial.) <rlb>wingo: hmm, maybe in situations where it's OK to assume the string is valid, u8_next() might be plausible, i.e. even when the string's not null terminated, if we know that there's at least one more character. <manumanumanu>rlb: there is a SRFI you might be interested in that has an implementation of immutable strings that has O(n)-ish access, but using bytevectors as a backend for efficient storage. <manumanumanu>it is based around the idea of string cursors and byte chunks of something like 128 bytes ***apteryx_ is now known as apteryx
***Noisytoot is now known as Noisytoot__
***Noisytoot__ is now known as Noisytoot
<rlb>manumanumanu: right, thanks, I've seen it if it's the "texts" srfi, and iirc I'm toying with something not unrelated. <rlb>Should a hypothetical scm_c_put_utf8_chars(const uint8_t *u8, ...) require the char_count, byte count, or both? <yoctocell>Hi, I am trying to convert a bytevector to a string using 'utf-8->string', but I am getting an error: Throw to key `decoding-error' with args `("scm_from_utf8_stringn" "input locale conversion error" 0 #vu8(1 88 97 227 44 138 231 228 132 202 66 51 172 245 9 153 162 251 92 121))'. <rlb>yoctocell: offhand, I'd guess the argument might not be valid utf-8. <yoctocell>Never mind, I had to use 'bytevector->base16-string' from (guix base16), I was playing around with (guix openpgp). <yoctocell>rlb: I don't really know much about utf-8, but thanks anyways :) <rlb>wingo: should a hypothetical scm_c_put_utf8_chars(const uint8_t *utf8, ...) require the char_count, byte count, or both (if you have any firm opinion, offhand). And should it trust the content, or insist on trying to verify it? <wingo>no real firm opinion. if we go by e.g. scm_from_utf8_string vs scm_from_utf8_stringn, there should be a "n" variant that takes a byte count <wingo>i guess verification is cheap, right? if someone wanted to skip verification they could use a byte interface <wingo>though i guess you don't get transcoding in that case <rlb>depends? I mean it's a full extra pass over the string, one code-point at a time. <wingo>right but what's the ns/byte for verification <rlb>But of course I think you *can* make it very fast (even going as far as sse/avx/whatever if you like). But still more than doing nothing. <wingo>right but compared to whatever the port is going to do, probably it's lost in the noise. dunno <rlb>Part of the question here is related to internal vs external clients -- i.e. if the bytes are coming from a *string*, we know they're valid. But in other cases, I'm wondering how careful our api intends to be. <rlb>(Also in the case of internal uses, getting the byte_n is a bit more expensive, since we have to indirect through the index.) <rlb>So possibly worth avoiding when we can. <rlb>i.e. when we know we're going to have to traverse the string at least once anyway. <rlb>(as part of the work) <rlb>But on other cases, we can memcpy, if we know the byte_n, etc. <rlb>Anyway, not critical, just wondered. <rlb>(And can always have scm_c_put... anc scm_i_c_put... or whatever if we decide it's warranted.) ***nckx is now known as jorts
<ArneBab>wingo: nice! Thank you for accompanying your great hacking with great writing! <zzappie>great article! Does it mean that we are on the way to edebug style debugger? <zzappie>wien it possible to retrieve source locations