IRC channel logs
2026-04-07.log
back to list of logs
<rlb>The case our current *_stringn api doesn't cover is when you don't want null termination, and you do want both the length *and* an indication of whether or not the string has internal nulls. <rlb>You have to either use *_string and then compute the length afterward via strlen, or use *_stringn and check for nulls elsewhere. <rlb>perhaps not a big deal, though <old>rlb: how many null byte encoding are possible? <old>well null codepoint? Maybe I did not understood here what you meant wrt to uint32_t and null byte check <rlb>old: who knows --- how many encodings are there? :) <rlb>i.e. I assume the null might not even be 0 in some encodings? So I guess it depends on what we really meant. <rlb>unicode null might not be made up of some number of 0 bytes. <rlb>But it is for latin-1, utf-8, and I think utf-32, at least. <rlb>But it's 1 or 4 for those three. <rlb>i.e. did we mean that if you call with lenp == NULL, we'll put an encoded unicode null after the string, or did we mean that we'll put zero byte(s) after the string? <rlb>And I think utf-16 encodes it as two zero bytes, but need to double check. <rlb>And whatever we meant, I suspect we didn't mean to put just 1 zero byte at the end of a utf-32 encoded string... <jcowan>There are no encodings in which null does not consist of some number of zero bytes. <rlb>Nice --- including older ones like shift-jis and the older Korean and Cyrillic encodings, etc? <rlb>OK, great, and thanks. <jcowan>All encodings are based on either ASCII or EBCDIC, in both of which the zero byte is NUL <jcowan>Historically, NUL was a row of paper tape with no punches, just as DEL was a row with all 8 columns punched. <rlb>So then I think that patch's approach of just letting the encoder handle it is fine. <jcowan>Encodings with less than 8 bits per character don't necessarily have NUL, but we don't need to support them. <rlb>Our current code supports whatever libunistring suports (and maybe we use iconv in some places --- I don't aways remember which lib which call came from offhand :) ). <rlb>Right now I'm mostly just talking about what scm_to_stringn does, but there are some other places in the port code, etc. too. <old>rlb: tell me when the branch is stabilized with your latest change. I will update locally and start reviewing <rlb>And we may just want to live with it, since changing it (I suppose) could break someone who depend(s|ed) on it, but scm_strndup() does allocate/copy all n bytes, no matter how long the string is. i.e. it's really "scm_memdup" right now. (In the utf8 branch we no longer use it internally.) <rlb>old: wrt #160 is it that too often addresses aren't "posifixable"? <old>rlb: I think so. It's odd because <old>we have a number and we return its bits and check the low 2-bit to determine if it's a fixnum <old>since these are used for assertion, and I don't get any error, I assume they are actual fixnum <old>in other words, some fixnum requires to allocate a none-fixnum number to determine if the former is a fixnum <old>the proper solution is to implement a `fixnum?' predicate in C that does the unpacking on the bits and return a SCM boolean <old>IIRC, it reduces GC overhead by 50% in a test I did with guile-hashing <old>sha1 and md5 sum IIRC on a bunch of files <old>also, feels odd to rely on internal C knowledge to manipulate bits in Scheme <rlb>We have to do that in various places (e.g. compiler string optimizations...I learned). <old>indeed, and it's okay if using the table with cell types in it <old>but rns arithmetic just does it own thing there <old>anyway. I will make a fix for this with proper showing of the effect on runtime in term of GC allocations <old>as for your encode-string-null, do you mind opening a PR so I can review it there? <old>you can also just sent the patch to guile-devel and I will reply to it <old>wathever fits best your workflow <rlb>As mentioned, I can't easily create prs right now because my guile repo predates our repo, and so it's not a "fork", and codeberg told me that couldn't be changed. So I think I may need to delete and recreate mine after figuring out if there's anything I want to preserve... <rlb>I *could* create a (temp) branch in the main repo, though I suppose, if you'd prefer a PR. Otherwise, list is fine. <rlb>i.e. whichever you like <old>well I could also juste push to my account <old>and make the PR on your behalf <old>does that sound good to you? <rlb>Sure that's fine --- and it's just that one commit at the end. <rlb>Also, is this some object-address optimization in the compiler? Because looking at the C, it goes straight to scm_integer_from_uint64, which I didn't think would allocate in SCM_FIXABLE, offhand. <rlb>(well for 64-bit it does) <rlb>ACTION is just curious <old>hmm no I don't think the compiler is aware of anything about object-address <rlb>Maybe I misunderstand, but what I thought I was seeing was object-address (in gc.c) going to scm_from_uintptr_t, which for 64-bit arch redirects to scm_integer_from_uint64, which should just MAKINUM for POSFIXABLE... <old>indeed it should .. yet I had cases where allocations happened due to object-address <old>I have yet to determine which number triggered that case <rlb>yeah, so maybe it's "something related" <rlb>anyway, does sound worth pursuing :) <old>in any case, I think that having `fixnum?' in C instead of ad-hoc definition in rnrs arithmetic is probably better <old>oh wow and it's exported <rlb>OK, though if you're not sure, I'd contemplate the possibility of chesterton's fence --- i.e. wonder if there's a chance that's there because it exposes optimization to the scheme optimizer that wouldn't be possible otherwise. <rlb>i.e. the optimizer can see duplicated operations then and remove them or... <old>yes the assertion also provide hint to the compiler for the instructions <old>it's really bad that fixnum? was defined in Scheme that way and exported <rlb>I'd guess that might lead to duplication of implementations for some bit-fiddling in scheme vs C in places. <old>in theory, with cross-module inlining, we can't change the representation of fixnum internally without breaking ABI <old>ofc, I don't think we will ever do that, but still .. <rlb>Oh you're saying that maybe fixnum? should be defined in scheme in the first place, and we'd just have an independent scm_fixnum or whatever if we need it? <rlb>ACTION is not up to speed, and may not need to be atm :) <jcowan>One point to make (not guile-specific) is that fixnum? in the (... fixnum) library is not necessarily the same as the implementations notion of fixnums (i.e. integers that don't require allocation). <rlb>i.e. I could imagine needing to know how to do "is it a fixnum" in both C and scheme... <rlb>Oh, right, I guess I'm really not up to speed. <old>jcowan: ahh good to know <old>because this was leaking way too much abstraction of guile internal otherwise <rlb>I was solely thinking about guile's fixnum. <jcowan>It certainly should be, because the whole point of the fixnum library is efficiency. <rlb>I suspect it's fine to leak in some places if we need to (e.g. the existing compiler string optimizations), but agree we want to be careful/conservative there, and make sure those places are "well known/documented" --- I generally like to have (at least) comments on each side, reminding about keeping the other in sync. <rlb>e.g. "// Changes here need to be accommodated in some-func" <old>the main problem with this leaking here is: a) it's not efficient wrt to object-address <jcowan>My opinion FWIW is that you should define fixnum? in C only <dsmith-work>rlb, could you do something like git-format-patch your repo, fork from codeberg, and then git-apply (or whatever) them? <old>b) the call can be inlined with some compiler optimization and so the representation of number are ABI <old>this can all be avoid by defining fixnum? in C instead <old>keeping the internal reprensetation bits opaque (minus in the compiler code) <rlb>dsmith-work: the main thing is whether there are any issues or branches, or whatever else I'd want to preserve. Just need to sit down and review; may turn out to be fine to nuke it. <rlb>btw, I did forget about one case (that our test suite *does* cover) where you want a "hidden strng" after a null --- abstract unix domain sockets (unix(7)) where you have to say "\0name". <rlb>see the manpage under "abstract" <old>rlb: I've check the scm_to_stringn patch <jcowan>because unix(7) says they have to be null terminated <old>I can point out things here if you don't mind <rlb>whatever you prefer :) <old>okay well it's minor thing really: <old>1. C++ comments -> C comments /* */ <old>2. Missing spaces in call to scm_num_overflow(__func__) <old>3. I would probably make encode_null do the allocation of the null byte instead and maybe memoize the result ? <old>so that we don't make new allocation everytime for encoding new null-byte <old>so it would become: get_null_byte(encoding, char **null, char *null_len), or something like that <old>but for correctness, 1. and 2. are enough <rlb>Hmm, I didn't realize we banned "//" comments. That's may be a *lot* of work to remove in the utf8 series. We do have some already in main (probably why I didn't realize/remember they were banned if they are). <old>oh I don't think it's ban per-say <jcowan>'man abstract' tells me 'n such thing' <rlb>OK, well I'm happy to do whatever here. <old>just following the convention here. it's jsut nit-picking at that point <old>don't waste time refactoring the utf-8 branch for this <rlb>jcowan: I meant "an abstract socket address is distinguished (from a pathname socket by the fact that sun_path[0] is a null byte ('\0')." at least here. <old>this can all be automatize by a tool at some point <rlb>old: we can't allocate unless we want to have to free unnecessarily in the second case. <rlb>i.e. in the second case, re have to realloc the whole string thing and copy it in? <old>ah right we are just copying it the result <old>and the first case is allocating the emptry string hm <jcowan>msvc is still on C89 by default, but does guile even support that (and /std:c11 will get you full C11 compliance, it says here) <old>well okay then, I think it's fine. I guess there are fast paths anyway in libunistring for handling null bytes <rlb>I'll fix the other bits soon (very often forget the "foo (...)" space) --- and thanks for the review. <rlb>jcowan: I *think* we assume c99 in places, but not sure, and Andy has intimated we may move to c11 for the next backward-incompatible release (i.e. for whippet, and/or perhaps utf8). <old>I think we will set the minimum bar to be c99 <rlb>I think maybe Andy's series requires c11 atm. <rlb>(but not sure I recall correctly) <old>I mean, just need to find a for loop with a definition like so: for (int x ..) <old>that's not c89 compliant AFAIK <old>so I guess we are indeed c99 <old>is we have c11 minimum it would certainly be nice <old>we could drop lots of legacy stuff wrt to atomics <old>emulated with mutexes <rlb>I'm *pretty* sure that's what Andy was planning, but not positive. <old>hmm so I guess the only fix for your patch is the space to the overflow call <rlb>Of course at some point we should figure that out, and if we already *know*, then that'd be useful (to your point wrt making some things easier). <old>if we are going to break things wrt to utf-8, whippet, migh as well bump a few minimum requirements <old>at leaast it sounds like a good moment to do so for me <rlb>Oh, I was going to change the comments too (I do like "//" format comments for some things, but it's not a huge deal to me either way). <old>but like I said, don't do that for the utf-8 branch <rlb>We can wait for utf8 for the flood. <old>we can always just apply a static tool to do so instead <rlb>As long as that doesn't mess up my deluxe ascii art, sure :P <rlb>ACTION is planning some more for the string(buf) structures layouts. <rlb>Don't forget that when you get to it, you may want to read the utf8 readme (in the top-level dir, added at the *end* of the series) first. <rlb>utf8-transition readme. <rlb>old: do you want me to just push the patch after I fix those two things, or did you have a pr you wanted to use to handle it, or...? <rlb>(happy with whatever) <old>I wonder if we want to add a test <old>do we have test in C that are self-contained? <old>otherwise fine with the push <rlb>hah, raising the bar --- fair enough. We must not have anything complete, since it's never been a problem (or maybe we were getting lucky with uninitialized memory). <rlb>I'll see if it's not too hard to add one. <old>is scm_malloc doing 0 allocation? <rlb>(clearly there's no rush) <rlb>You can't alloc nothing. <old>in any case, I guess the hard problem was the length of the null <rlb>malloc(0) is "unspecified" <rlb>might return null, acc to posix <rlb>That's why we have to allocate there. I have a more detailed comment about that in the utf8 branch... <old>okay well fine by me <rlb>I can fairly easily test in test-conversion that the result has 4 nulls for utf-32. <rlb>but we could be fooled by trailing nulls in ram... <old>well, it we can have that tested, then it would certainly maybe help to catch potential issue during the utf-8 merge? <rlb>moar buffer-over/underrun related tests are nearly always a good idea when C's involved... <rlb>I'll poke at it and ping you once I have (might or might not be immediate) --- as mentioned, it's been this way "forever", so there's no hurry. <old>won't change anything for users until release anyway <old>although, I think we might want to hit a release for the end of summer .. if we can get utf-8 merge by then and some other bits <old>that would certainly be nice <old>would ask ludo what he thinks of that <old>I'm saying this, but I have tons of home renovation todo this summer <rlb>Personally, unless we have other demands, I might just let it happen "naturally", i.e. we'll keep working (and we're certainly moving along a bit better now), i.e. figure out what we want, and then release it "whenever it's ready". Main thing we might want to decide is whether we think we want another non-abi-breaking release anytime soon, which might depend on how much we accumulate before utf8 is ready. <rlb>Of course that also might beg both branching, and versioning questions in a bit... <rlb>i.e. we'll need a 3.0 branch if/when we merge utf8 and want to keep messing with on 3.0. <rlb>(Assuming we're fine with it, I might want that branch regardless, in case we want me to upstream any future patches I have to make for debian anyway, since 3.0 is now effectively long-term there.0 <old>I think we really ought to consider more to use the semantic versioning <old>Do you think that could help wrt to Debian? <old>If we go that route, then the utf-8 bits can go into the next Y serie instead <old>keeping the current Y=0 serie for bug fixes <rlb>I'm fine with that (semantic versioning), and it doesn't have to have anything to do with whether or not (or how many) Z releases we ever decide to make. i.e. we could just keep mostly releasing Ys like we have been, and the versioning semantics would just be the common ones then. <old>sure sure. But I think it might make sens to release more often we go that route <old>in that, we would make a Z release twice a year <rlb>It'll actually make more work for debian up front since I'll have to adjust all the packaging, and "multi-version install" related tooling :) <old>up front, but in the long run? <rlb>Perhaps --- personally I don't really care about release cadences, i.e. I'd just release whenever we have things we want (and are ready for) people to have "now". <rlb>That might be every month for a bit if we have "trouble" or it might be once a year, or every 3 months, or... <rlb>I'd let the cadence fit the code. <rlb>Especially given our currently (I think) somewhat limited resources. <rlb>But I'd also favor making it easy to release, so the decision *can* just be content based. <old>I think content base yes <old>given some bugs can sometime be minor but sometime major <rlb>And with respect to debian, the style of versioning doesn't really matter all that much, as long as we're doing something "sane" --- and I'd say "it varies"; in some cases, if I know what the fix is, it's actually easier to fix it directly in debian since I don't have to also pull in a new upstream release. <old>it's important to fix them and make them upstream to the users ASAP when it's major <old>ah right, you are keeping a set of patches with quilt I think? <old>I had to use that to reproduce a bug and was not sure if it was introduced by the patches or was upstream <rlb>So bascially, speaking jsut for me, I wouldn't worry much about any official cadences, I'd figure out what versioning style I want, focus on making releases easy, and just go from there, releasing whenever I needed to. <rlb>No, I work directly in git (using git-dpm) so the debian work is just branches, and the patches are all handled as ephemeral branches/series so I can just use git for all the heavy lifting. <rlb>I haven't used quilt directly in a long time. <rlb>git-dpm maintains the debian/patches/*.patch files automagically. <rlb>Though now there are some newer options I might investigate. <rlb>dgit, tag2upload, etc. <rlb>see if I think they're preferable <rlb>Oh, and any major transition (abi incompatible) is more work, perhaps the most, wrt debian. i.e. as compared to any switch to semantic versioning, because it requires new branches, another set of packages, sometimes manual accommodations for path changes, etc. <rlb>Not to mention the long term support questions, particularly if we can't get the previous version out of debian before the next stable release. <rlb>we had guile 2 and 3 in debian for a *looooong* time... <old>guile 2 is not on debian anymore? <rlb>ACTION is very happy <rlb>ACTION needs to investigate <rlb>I may be wrong --- I (clearly) could have sworn we'd settled that. <rlb>ACTION will figure it out. <rlb>(If it's not, it sure better be for forky.) <old>debian 11 seems to have guile 2.2 <rlb>will have to pursue (...again) <rlb>in previous rounds, there was always some dependency that wasn't updated, and I wasn't up to fixing it myself... <rlb>Hah, the utf-16 null test failed, but that's because u8_conv_to_encoding returns a BOM when asked to encode "\0" as "UTF-16". I should have said "UTF-16LE"... <rlb>ACTION thanks old for the testing nudge. <rlb>Though I'll also have to figure out what to do about that wrt encode_null... <rlb>i.e. if it's just utf-16, then we could special case it, but I have no idea if there are other encodings that won't/can't just give you the character you asked for. <rlb>I may need to rethink what we mean. <old>also, I checked and the problem with fixnum? <old>was that if you it on a negative value, object-address returns a huge value, way higher than what can be encoded <rlb>i.e. utf-16/32 are ambiguous unless you say which endianness (it's a mess) <old>so fixnum? on -1 will allocate a heap number <old>LE or not should yield the same for null byte no? <rlb>But in encode_null, all we have is whatever encoding was given to scm_to_stringn, and that might be "UTF-16"? <rlb>and then we get a BOM rather than the correct null <rlb>And I don't know if any other odd encodings do something similar --- need to poke around and ponder. <old>I'm not familiar enough with the libunsitring API but .. <rlb>also need to make sure I'm doing it right, but I'm prety sure he -1 -2 we get is a BOM <old>maybe we could just poll for the size of the encoding and just emit null bytes according to that size? <rlb>If there's a way to ask for the "size of the encoding"... <old>this is major impact <rlb>And presumably all variable lenth encodings would have to be just 0 for null. <old>jcowan: you got an opinion on that? (wrt to BOM) <old>yeah. I did some benchmarking the other days of guile-hashing <old>to see if I could use a crypto hash for BLUE <rlb>I'm a bit surprised I don't see it in the info pages, but then again, I've had some issues (iirc) with libunistring in the past. <old>but guile is just too slow for now for that. <old>I think it would actually be a very good addition to add to core Guile <old>crypto hashing in C expose <old>at the same time, one can still use libffi for that so <old>the only problem with my fix is that I introduce fixnum? as a core bindings in number.c <old>and rnrs arithmetic just import that and re-export it <old>maybe we would be okay with this> <rlb>don't know offhand --- is this another case where we'd like to be able to implement part of a module provided by libguile/*.c in scheme? <rlb>(like I wanted for srfi-13 and/or 14) <rlb>btw, if we eventually want it, I have uuids (RFC 4122 (Leach-Salz)) because I needed them for lokke. <rlb>And nvm wrt fixnum? question --- I need to understand what you did better before I'd know what I'm talking about. <old>I sure would like uuid in core or at least provided by core guile (not in boot-9 per say) <old>I define `scm_fixum_p` in number.c <old>so `fixnum?` is defined every where <old>you don't need to import (rnrs arithmetic fixnums) to have it <old>also it's a new C API <old>which can come handy if you want to manipulate fast guile number in C I guess <rlb>So did we have a fixnum? already outside rnrs, or was that it? <rlb>i.e. are you just asking whether we want to "promote" fixnum" to the core namespace? <old>is was only defined in that module <old>well my current fix is doing that <old>we could avoid this I guess if we really want <rlb>If so, I worry a little that that's such a common thing to want, we might actually have conflicts... <old>or you know, just define it as %fixnum? instead <rlb>But why wouldn't you just want to get it from rnrs? <rlb>if you knew you needed it? <old>I think I will open a PR :-) <old>it will be easier for you to see what I mean <rlb>I mean if I'm a user, and it's in rnrs, I can just get it from there. So if we can manage it, we don't need to add it to the default namespace (I'm wondering). <old>yes it would be the ideal <rlb>I assume the problem is more mechanical... <old>well the problem is that we need this core binding to be defined, _for_ rnrs <old>but not actually exposed <old>without explicitly asked for <rlb>Oh, right, that's what I'd originally wondered. <old>maybe there is a trick I don't know for that <rlb>Indeed -- I'll think about that. <old>Because rnrs was using (guile) object-address <old>now it's using (guile) fixnum? but that expose fixnum? in core globally <old>which I think we ought to avoid if possible <rlb>one way would be a .so? <rlb>i.e. rnrs depends on a new .so <rlb>This may well be back to the thing I wanted too... <old>sounds way to heavy for that <rlb>i.e. I wanted to have a srfi-13 that was part in scheme, but depended on some C functions (so part in C too), and iirc there was no great way to do that as-is. <rlb>i.e. without a new shared lib <rlb>I think *maybe* I talked to Andy about it eventually, but can't recall if there was a reasonable path forward. I would like to fix it, though. <rlb>but yeah, worst case, maybe make it a completely "private" binding in the default namespace, and so the current supported way to use it is via ((rnrs arithmetic) #:select (fixnum?))... <rlb>But we should *fix* it (in our infinite spare time). <old>What about just defining %fixnum? instead? <old>I don't think we have a rule on that tho <old>in that, we don't have a reserve namespace <rlb>Right, perhaps --- i.e. I forget if %* is effectively private, but if so, then yeah. Alternately, if we have one, use that, and if we don't, until/unless we fix the broader issue, we should designate one now-ish. <rlb>Actually, hang on a sec... <rlb>I may have had to do something (a hack) after a chat with Andy for srfi-207. Looking... <rlb>04799ab95ae8d845854cd7a0bbc4609ad30bc17d i.e. %boot-9-shared-internal-state is what I was remembering. <rlb>I'm not sure that's quite right here, but there, that's what Andy said "seemed fine for now" given what we have. <rlb>(effectively a side-channel) <rlb>And I think maybe %* is supposed to be private, but I'm also not sure we don't have things named that way that really aren't... <rlb>In any case, there's no reason, if we decide that % really isn't a good enough indicator now, whatever the original intent, we can't just pick something new and document it *clearly*, e.g. say ___DO_NOT_USE_ME_YES_I_MEAN_YOU_* :) <rlb>To be *really* pedantic, we should only do that in an X release, but... <rlb>But if you can use the "secret" hash-table (not sure), then we've already paid for that. <old>We could use a UUID 128-bit long :p <old>that *ought* to be safe <rlb>though technically this might not exactly be "boot-9 shared internal state" --- of course ideally it'd only be a temporary home, and we'll come up with a better solution for the sharing between C and scheme on this front "soon". <rlb>yeah, we clearly mention things like %load-extension, but maybe it's fine --- i.e. if the rule is that %* is reserved for guile proper. <rlb>We should ask Andy/Ludovic about %* if it's not documented. <rlb>If it is actually reserved for guile proper, then we could also say that as guile, we reserve %__* for "internal only", and then you could just use %__fixnum? for now... <old>well I would argue that we need a prefix <old>that is internal and can break anytime <old>in other words, don't use it <old>as opposed to say, %load-path <old>which is documented and well known <rlb>My current guess is that %* is defined as "just for guile", but that we may or may not have any "internal only" naming yet. <rlb>But reserving "%__*" now (or whatever) could easily fix that, if so. <rlb>(if %* is already reserved for guile itself) <rlb>i.e. even in a "Z" release <old>wonder if the Scheme standard has some reserved identifier like C <rlb>We could throw in some snowman emojies... <old>rlb: let's not go that route :-) <old>it would certainly discourage its usage by users .. <old>not at bad idea after all <rlb>Nice. How do you feel about making that a private binding of some kind and making the rnrs version the public one? If we'd prefer that, I'm happy to try to figure out "officially private". i.e. badger Ludovic or whoever, etc. <old>yes I think it make sens <old>so we keep scm_fixnum_p internal also <rlb>I wonder what the underlying issue is too, i.e. is it "just" that SCM_DEFINE can only put things one place, or... <rlb>I may also look in to that briefly too. <rlb>i.e. if we could just fix it with a manual def, or a new macro that let you say where it should be bound, or... <rlb>But first, I'll badger Ludovic :) <dsmith-work>So. we expose number? complex? real? rational? integer? exact-integer? in guile/libguile/numbers.c <dsmith-work>I'm pretty sure it's possible to define a module in C that's part of libguile and then later use-modules it from a Scheme module <dsmith-work>Ah, but then we need to somehow expose *that* to scheme. <old>What's also important is that we have something define in (guile) <old>I just figured that in tree-il, we can expand %fixnum? or fixnum? (if defined in (guile)) as a primcall <old>with CPS optimization, the compiler by-pass the call to the C function entirely and just emit a macro-instruction <old>which is essentially a tag check of the object <old>this is very fast and avoid the need to call into C <old>the instruction is immediate-tag=? <old>under it is the version I have locally with the optimization made in tree-ill <old>basically, the function is now as small as the original check using fixnum? with object-adress .. <old>to get the opitmization you need recompile yes but the old .go are still valid