IRC channel logs
2026-04-09.log
back to list of logs
<rlb>old: related to the fixnum? discussion, I poked at something like SCM_MODULE_DEFINE ("rnrs arithemetic", scm_fixnum_p, "fixnum?", ...), and while it's not too hard to create, I suspect we may run in to trouble that I hit before wrt srfi-13, i.e. that the module system may not be fully available yet. (I just got a segfault, and that rang a vague bell, though I might also have made a mistake.) <rlb>old: which then made me wonder if an "easier" path might be to eventually just make some other "private" module the default module, and then create/populate (guile) later during initialization with bindings from the private module. Then it'd be trivial to SCM_DEFINE_INTERNAL a fixnum? that rnrs arithemetic could pull in from the startup module later. <rlb>the default module -> the initial default module during startup <rlb>Added a second independent/optional commit to that encode-string-null branch --- it lets strftime notice and reject format strings with internal #\nuls. <old>rlb: right that would certainly be nice, but I wonder if that would slow down boot time <rlb>oh, the module swap? <old>due to this indirection of pulling the bindings <old>I might have misunderstood your proposition here <rlb>I'm not sure whether it would cost anything much or not -- (guile) would eventually just be a module providing a subset of the "internal" module's bindings, and I think we can do that cheaply, but would have to consider it more carefully. <rlb>The bigger question, I'd imagine, is what all would break (and would any public things break) because undocumented "assumptions" during startup. <old>right that sounds like ABI breaking change :-) <old>But I could probably also not use SCM_DEFINE_INTERNAL and I could make a C file for rnrs arithmetic <old>I think we have this pattern already for some modules where primitmives need to be defined in C <old>So scm_fixnum_p could be define in libguile/artihmetic.c or something like this, which would inject into the rnrs module the binding <old>Wonder if we would be following the rnrs specification that way tho <old>in that, the fixnum? binding would be marked as export, but it would be coming out of no where in the module definition <rlb>Maybe I was doing it wrong, but I had trouble (and couldn't get) a scheme module with some code coming from libguile to work (i.e. srfi-13) --- as mentioned, I think I talked to Andy about it, and there were "issues". <old>I'm not knowledgable enough about this to have an answer <rlb>but maybe I'm wrong. <old>hmm yet we have this pattern for (system foreign) for example <rlb>where does it do that? <old>in any case, that's mostly a detail if we can just agree on a prefix for private bindings for now <old>no wait that's legacy <rlb>I'm now confused --- I see foreign.c doing scm_defines that don't appear to end up in (guile)? <rlb>I vaguely thought that was the issue atm, that anything in libguile ended up defining things (by default) from init in (guile), *and* that also meant that depending on where you are in the scm_init_* order during startup, you might or might not be able to change the module to put your bindings somewhere else... <rlb>i.e. during the scm_init_* calls earlier in the sequence, the only place you can put things is the default module, which is (guile) right now. <old>it define some primitive that will be add to (system vm loader) <rlb>where does it do that definition? <old>the Scheme module is doing: (load-extension (string-append "libguile-" (effective-version)) "scm_init_loader") <old>I guess we could do the same for rnrs <old>just load a init function when the module is first loaded <old>then fixnum? is only define in the module that load the init function <rlb>Sure, that would work. <rlb>I suppose part of my issue is that the bits I've been working on (strings/symbols) are needed fairly "early" :) <old>regarding fixnum optimization, I'm quite suprise that it's not efficient <old>I manage to speed-up the throughput of sha256sum by a factor of 5 <rlb>you mean that what we have now isn't, or that it's still not fast enough after your fix. <old>well it's not fast enough for me <old>I think I would be happy if we could get to something like 10 Mib /s <old>I don't think it's possible to reach the 300 Mib/s of C <rlb>not likely anytime soon I'd guess :) <old>there's probably some SIMD optimization there and ofc nothing to be manage at all. It's pure number crunching <rlb>but if we get within a factor of 5-10 someday, I'd guess we should be happy. <old>well, there is something I have not tried yet and is to re-write guile-hashing so that everyhing gets inline <old>right now, even some of the inlinable I defined in rnrs are used as callback <rlb>your idea wrt adding a flavor to the compiler sounded plausible too if we need it often enough --- imagine that might expose it to more optimization. <old>so the optimization could go further if the hashing module was written in a way that does not pass around the operators <rlb>i.e. like we do for strings <rlb>(or did --- as mentioned, I removed some of those for now for utf8)0 <old>what do you mean flavor? <old>wrt to scm_stringn, LGTM <old>assuming it compiles and the tests passed <old>I'm okay with you pushing it when you feel like it <rlb>I also added a second patch before that one if you're interested --- it "fixes" strftime to notice and reject internal #\nuls in the format string. <rlb>(came across it while reviewing all of this) <rlb>totally independent though <old>I think that you might want to add a test for this one also if that's doable <old>and verify that the test failed without your fix and then pass with it <rlb>I did do the latter, but I wondered about the test --- i.e. why that particular one and not a test of all the other syscall args that should also reject (unless just because we noticed this one and so "regression test"). <rlb>ACTION ducks in case old says "oh, right, lets test them all" :) <rlb>But I'm fine with adding the test if you like. <rlb>Test is pretty easy/small. <old>haha. well I think tests is was saving us in the long run so <old>so, more test all the way! if they make sens ofc <old>but I would just go with regression test for this one quickly and later we can add more as we see fit <rlb>Regarding the optimization, I just thought you were suggesting adding code to the compiler to handle the fixnum detection more directly -- bit-pattern-wise (primcall optimizations or whatever, which I only vaguely understand --- had to deal with some minor related bits for utf8). <rlb>OK, sounds good -- right now the rule is "always use the scm_to_*_string flavors for nul-forbidding values", not the scm_to_stringn ones, unless you're willing to check the content yourself. <old>ohh also regarding the fixnum? primcall .. I don't know if we can make that optimiztaion if fixnum? is not in (guile) .. <old>your ocmment about the compiler made me think of that <old>I will need to check <old>because the primitive expander of tree-il check if a binding was defined by a user or if its coming from core <old>and I use that primitmive expander to expande (fixnum? ...) to a primcall to fixnum? <rlb>Another point in favor, maybe, for the "internal module" approach for some things (though still just a vague idea). <old>which can then be translated to a comparison to immediate-tag=? in CPS I think <old>I have to go now. Talk later :-) <rlb>wrt that internal module, I think in that case (guile) really could just be republishing the same vars, so really just the cost of an additional "hash table" ish --- no extra indirection, perhaps. <rlb>(but would need to refresh my knowledge about a few bits) <jcowan>I've been scrolling back and forth, but I'm not sure if there are any unanswered questions for me right now <rlb>Sorry, didn't think about the fact that you might be looking for questions --- I think we're fine for now, and much appreciate the help. <rlb>I'll probably push that patch in a bit. <rlb>I think I might want to add a smarter pass-if-exception --- in pytest there's one that lets you specify both the expected exception *and* a regex that the exception's message has to match. <rlb>...because we have a lot of misc-error, which might or might not be the one the test expected. <rlb>Hmm, I guess that's not OK if tests aren't allowed to assume we have regex support... <rlb>(I still wonder if ideally, we might want to provide regular expressions unconditionally these days, with unchanging syntax/semantics cross-platform, even if we need an external library like pcre, instead of "maybe, and if so, it's 'whatever the platform does'".) <rlb>It's very nice to be able to count on them, and their behavior in say clojure/jvm or python. <rlb>...ahh, right --- pass-if-exception *does* do at least some matching, it's just strenuously undocumented. <rlb>(nvm, I just can't *read* the docs) <rlb>old: pushed the internal #\nul fix, and added a test to the other commit still at the end of the encode-string-null branch branch. Happy to push that too if/when you like. <jcowan>which uses S-expr syntax, but there is code to translate Thompson syntax into that. <rlb>interesting --- though in this case, it turns out our tests already depend on regex support via string-match, so my current concern was irrelevant. <rlb>I'm not sure what that means overall though, since I thought our regex support wasn't guaranteed (i.e. does the test suite work if you don't have regex support atm?). <rlb>ok, thanks --- I'll have to put that on the backlog to consider more carefully. Though I also wonder if we're fast enough to support real regex heavy lifting from scheme. <rlb>(ideally I'd like to have one very fast option always available, whatever else we do) <jcowan>Chicken and several other implementations use it, so I should think so <jcowan>I note that irregex is already packaged for Guile 2.2 <rlb>I meant fast like libpcre, etc. --- if it's pure scheme, then I'd be very pleasantly suprised if we're ven close. <rlb>(even remotely close) <rlb>But my understanding of the relative performances might be wrong. <rlb>i.e. I'd like to have something built-in that's close to a fast traditional C lib wrt regex even if we also want something "nicer". <rlb>right, pcre, re2, and a few others are in the "roughly, similarly efficient" category I'm thinking of. <rlb>...but adopting the bits you're suggesting is likely easier, and so might well be plausible sooner, whatever else we do. <jcowan> pcre is much less efficient, unless it has been rewritten, because it supports backtracking <rlb>last time I poked around, unless I misunderstood, it held up pretty well in "the graphs", for a lot of cases... <rlb>Might see if I can find that again later. <rlb>But no strong opinion about *which* efficient lib we pick if it's otherwise "reasonable". <rlb>(one of the things at least) <rlb>clearly much worse at some things. <apteryx>apparently it's valid, as this works: (let (((values a b) (values 1 2))) b) <old>rlb: forgot to update NEWS wrt to fix of scm_to_strign <old>I guess you can do that in a new commit <rlb>...though thinking about it, I can't come up with a reason you'd need to know that we currently might produce a terminator that's larger than necessary. <dsmith-work>It might reduce some confusion when someone is wondering why there are "extra" zero bytes when debugging something... <jcowan>"Need to know" thinking is for security weenies. Document the reasons for decisions, otherwise someday some fool will ignore Chesterton's Frence and say "One #x0 is enough" <rlb>Oh, we're very clear in the code, at least. <jcowan>I hope too. I have a long-term obsession with the problem of vanishing institutional memory. <old>rlb: I think that if somehow some user find this bug, they could always just check NEWS before making a bug report? <old>at least, I believe that this could avoid some false bug report that are already fixed <rlb>fair --- was actually in the process of adding one :) <old>furthermore, someone that has a custom build of Guile could git-blame NEWS to determine which commit did the fix and cherry-pick it <old>> and rip them out. Boom. <rlb>I also added a small change for consideration that will get the terminator right for the known cases (e.g. UTF-16/32 Latin-1, etc. <old>We have the regression test for that now! <rlb>I'm not sure exactly what this affects (I believe it's snarfing-related) but I've been noticing places where we've used plain integers instead of SCM_ARG[1234...N]. <rlb>e.g. for the SCM_VALIDATE calls, etc. <rlb>I think you're supposed to use SCM_ARG1 say instead of 1 because the snarfer uses that to "determine things". <rlb>But I'm not sure it's a big deal either. <rlb>I suspect I have done that myself any number of times (e.g. in the utf8 series). <rlb>Oh, and I'll push another update there soon, fwiw (fixed "more" over the past couple of days). <rlb>e.g. SCM_VALIDATE_STRING, etc. <rlb>There are some comments above SCM_ARG1, etc. in error.h <old>ah but it's just a hint no? <rlb>perhaps --- I didn't think it was likely a big deal, just something I noticed and started fixing when in the area. <old>> constructs must match the formal argument name, <old>that's not clear to me <rlb>Also not a priority, but I've wondered if we might prefer (or want to add) some VALIDADATE (and related) variants that take the function name as an argument instead of the hidden reliance on FUNC_NAME. Much of the time you could just use __func__, and being able to provide something else makes it easier to have shared implementations where the caller specifies the right validation name. <rlb>In lots of cases atm, I think we just give up and have the utility function specify NULL for the function name, but future us may thank us for having a real name in the error message ;) <old>that makes sens. I guess the FUNC_NAME stuff is to help the snarfer but potentially also because old GCC might not had had __func__ <rlb>I've been threading through the real "caller" some in the utf8 series. <rlb>Right --- *if* we can rely on __func__ (I thought maybe we could now, but if not...). <rlb>then I think it may be cleaner and less mysterious <rlb>do we already rely on __func__ in main (maybe I checked that...) <rlb>i.e. we may already have released versions that rely on it ;) <rlb>Hah, decoding_error in v3.0.11 (wonder if that was my fault). <rlb>Anyway, even if we need to explicitly pass the name, threading it through still seems better when feasible. <rlb>Technically, we'd also need to thread through the arg position, but even if we don't, and have to leave that as SCM_ARGN for now, then it's still better. <old>Yeah I prefer explicit over mysterious arcane <old>there's a lot of place where macros could be replaced with static inline also <rlb>Could even consider (would have to see how it looked) a trivial inline struct for the func and arg pos info. <old>that would certainly help debugging with GDB <rlb>Yes, I've started doing some of that in strings.c, etc. <old>SCM_PACK and friends .. <old>it's real nice to jsut be able to do: (gdb) call SCM_UNPACK (obj) <old>unfortunatelly, it's not possible with macro <rlb>Right now we still do the compat SCM_INLINE stuff in .h's but I don't know what say C99 supports now, etc. i.e. what's now "standard"... <old>inline is specified in c99 <old>so we should not even have this compat stuff anymore <old>wrt strings.c: that's great! <rlb>woohoo (I probably checked that when previously hacking on the branch) <rlb>if we really can just "inline" in headers, might consider handling some things a bit like I do sometimes with lisp-ish macros, i.e. even if you need a macro to capture (e.g. call site) information, perhaps put most of the work in a function... <rlb>once we get through utf8 and whatever other big changes are pending (at least Andy's work?) might be nice to take moment to make some changes (if there are any we want to) that might cause more churn --- though of course always have to balance that against the risk of accidentally breaking things that were just fine as is... <old>yeah I would start by merging the big chunks first to avoid any merge conflicts <rlb>so far the rebasing hasn't been too bad, but it certainly could be. <old>hopefully you won't have to rebase again after this month <old>I'm closing a couple or issues/PRs now and I will attack utf-8 after <rlb>That'd be great, but even if it takes longer, don't worry on my account --- just happy to be moving along. Still first want to find out if what I have is even broadly OK :) <rlb>I also think there's likely a good bit of work left (mostly for me). See the utf8 transition readme, but it's not "finished" yet. <rlb>(if nothing else, I have *many* commit messages to finish, and cleanups once a victim^h^h...reviewer can help figure out which way we want to jump, etc.) <jab>hey you awesome guile people...I've got a question about recursion. Suppose that you have you are summing the numbers 1 - 10 like so: https://paste.rs/THW9p ... does the guile compiler only need to =malloc ( sizeof (int))= only once? My intuition is yes. <jab>as in make room for the variable i only one time ? <jab>ok the answer to my question is yes. this is tall call optimization. <rlb>jab: you might be interested in ",help compile" from the repl. <rlb>i.e. if you define that as a function foo, you can ,disassemble foo, etc. <dthompson>jab: the call to loop is not in tail position <dthompson>because the addition of i happens after loop returns <rlb>though some of the operations might or might not be helpful without understanding lower level stuff... <old>rlb: was this not already merged ? wrt to encode-string-null <jab>so, adding 1 - 100,000 this way would be a bad idea... hahaha <old>hmm, not sure about the second commit <dthompson>jab: it would work since guile has a growable stack, but tail calls would be better here. <rlb>That branch is just at tmp branch, and I'd also stuck the strftime fix on it the other day, and now rebased it to add the new bits --- i.e. it's just a place to show the changes atm. And yes, looks like I need to finally figure out my repo... <old>not sure how often scm_stringn is used, but adding bunch of strcmp in it seems like a performance regression <rlb>I wondered about that, though I also suspect it's "cheap", and we have to do the first two tests anyway if the function isn't empty. <old>I mean, strcmp is very fast in most cases so I don't think it matter much tbh <rlb>Not sure -- can easily omit it. <rlb>i.e. omit the patch. <rlb>Just realized it was easy. <rlb>No strong feelings either way. <old>well I think we could just avoid that. it's just bits of optimization for small allocations <rlb>I could also restrict getting it right to the cases where we already know, i.e. when we're in the ascii branch, etc. <rlb>(or did I already...) <rlb>it won't cost anything (noticable) to just allocate the 4 bytes as is, i.e. we definitely don't need this, so happy to just drop it if we like. <rlb>"it all changes in the utf8 branch anyway" I suppose. <rlb>I mean still questions, just *different* questions :) <old>okay for the other commits, given the tests pass and all :-) <rlb>OK, thanks I'll push them later. <rlb>(right, push all but the second one) <old>the first and third mergE! <old>I'm merging with-modules <old>we now have local module bindings at expansiont ime <rlb>OK (haven't been following that one closely). <old>well I renamed it to `with-modules` instead of `using-modules` <old>hopefully some will try it so I can see if there are corner cases I did not figured in my tests <rlb>(I'll have to look at it later --- not sure if it's related, but wondered if it might be relevant to lokke...) <jab>dthompson: rlb thanks for the answer. I guess that I am having trouble knowing if a function is tail cursive or not. My current thought is, if you use the substitution model of a function and get something like this: https://paste.rs/fBxOo then you just made a linear recursive process. And all linear recursive processes cannot be tail call optimized ? <jab>I had to take a look at SICP to write that paragraph by the way. :) <jab>rlb: how can I use ,help compile and ,disassemble foo to know that a function is or is not tall call optimized ? <dthompson>jab: yeah you get a (+ 1 (+ 2 (+ 3 ...))) pattern <rlb>I'm not sure offhand --- I haven't used them much myself; I mostly just mentioned them in terms of a way to see what the compiler makes of the code. <rlb>(there are various expand/compile/disassemble commands0 <jab>ok. I'm currently writing a blog post about functional programming, and I'm trying to make sure that it is accurate. <dthompson>for tail calls, you would instead pass the relevant variables to the next iteration <dthompson>(let lp ((i 0) (sum 0)) (if (< i 10) (lp (1+ i) (+ sum i)) sum)) <jcowan>All the Scheme standards specify what is and what is not tail recursive. <jab>yeah, in SICP all "linear iterative processes" have more variables than "linear recursive process". <jab>jcowan: I should probably re-read the scheme standard again. Thanks. <old>tail-call are not unique to Scheme tho. If you find it more natural, you can also experiment the deligh of tail-call optimization in C <jcowan>See section 3.5 of R[57]RS, which enumerates exactly which positions of which syntaxes are tail calls. <jab>I'll add that to my list of things to do! <jcowan>e.g. `if` tail-calls both its then-expression and its else-expression, and `begin` tqil-cqllw its last expression. <rlb>I wish I'd kept a reference, but just recently I saw a discussion of the compiler optimization to transform the typical (cons x (recurse ...)) reverse into a single-pass operation. Wasn't too surprising, but might be nice to have if we could and don't. <dthompson>guile handles this situation nicely by having a growable stack <jab>dthompson: for smallish programs, writing "linear recursive proceses" are probably ok...ideally you should write "linear iterative processes"...2 followup questions. <rlb>right, though this transformation avoided the stack entirely iirc, "inverting" part of the operation (and iirc transforming to an iteration) --- it was clear/"obvious" at the time, but I forget the exact details. <rlb>(maybe I'll try to find it again, though I think you'd probably come up with it on your own if you poked at it much...) <dthompson>rlb: well if you find a link I'd be interested because I don't know of a way to avoid either using the stack or reversing the list at the end <jab>actually just one question: It would be really nice if flycheck could lint scheme code and say, -> this function is a "linear recursive process". Please try re-writing it as an linear iterative process. <rlb>it was something like "pushing out the end" *maybe* by keeping a next-to-last pointer or something (and I can't recall if it required/allowed "unobservable" mutation) <rlb>anyway, sorry, not very useful until I either find it, or actually remember details :) <jab>rlb: no worries. It's fun learning. :) <dthompson>jab: that sort of static analysis is only possible for a subset of programs <jcowan>5.3 also points out right at the end that some cases involve non-obvious tail-calls which a Scheme does not have to treat as such <jcowan>e.g. if the last thing done in a procedure is (let (x (h) x), then h can be tail-called, although it is not in tail position. <rlb>dthompson: I can't find it offhand, but maybe that's what it was, i.e. some variant of rewrite to track the "last pair" ish, and build "in place". <rlb>old: related to the C vs scheme "fixnum" discussions, if we thought there actually is a (sensible) way we could allow me to (re)write some of srfi-13 in scheme, that might be interesting (and I'd want to reevaluate), because while the "work is done" now, some of the "upper levels" of the algorithms might be a lot easier to read and/or maintain in scheme *if* the perf for those bits was "fine". <old>and that would require what exactly again <rlb>But as mentioned, last time I poked at it, I failed --- so unless we know more now or something... <old>defining more core bindings? <rlb>I *think* the problem I ran in to was that I was trying to hae a (srfi srfi-13) that got some of its bindings from C and some in scheme, but because of where init_srfi_13 is called (and may have to be) in the init process, "modules aren't ready", or similar. <rlb>ACTION recalls segfaults, trying to reorder init, more segfaults, etc., I think ;) <rlb>But now I'm wondering if I might have been too focused on one thing. <rlb>Anyway, could still be a dead end, because as we know, strings are much lower down in the "bootstrapping" that some other things. <rlb>(dead end without broader changes) <rlb>Discussion just got me wondering again. <rlb>The flexibility of some of the srfi-13/14 apis might be a lot easier in scheme, even if all the core "ok now do it" code, once you figure out *what* you're doing was still in C, etc. <rlb>But it's not important now, since we already have everything. <rlb>*and* we may need some of that all in C, i.e. for things that are part of the libguile C API. <rlb>(iirc the various filter operations were "notable work" in C) <rlb>...anyway, can always, also, do that later, i.e. migrate things to scheme if/when we like (and can). <rlb>(I do favor, at least in general, only having the necessary heavy lifting in C, when feasible.) <old>More icing, less cake! <rlb>yeah, for someone who really *likes* lisps, I've been spending an awful lot of time in C... <old>ehh I just realized that set! is not working on bindings introduced by with-modules <old>not sure if we want this feature, but I need to add support for that if so <rlb>Is (with-modules ...) roughly just a scoped way to do what you typically do with use-modules, etc.? <rlb>ACTION will go read the docs... <old>well, yeah and also it does not import the module <old>(with-modules ((srfi srfi)) (fold + 0 '(1 2 3))) => ((@ (srfi srfi-1) fold) + 0 '(1 2 3) <old>but srfi-1 is not pulled into the list of imported module <rlb>right -- it's only visible within that scope <rlb>I guess offhand, I might expect set! to work, but it's a new thing, so... <old>the question is if we also want to be able to do: (with-modules ((srfi srfi)) (set! fold my-fold)) => (set! (@ (srfi srfi-1) fold) #f) <dsmith-work>Does the set! depend on if it's a declarative module or not? <rlb>I'd guess there might be corner cases where that'd be "useful", but either way, could ping the discussion... <rlb>(e.g. see if civodul has a strong opinion --- I'm not up to speed enough to have one myself atm.) <old>dsmith-work: no the expander does not verify this <old>so it's the same as if a user manually do (set! (@ (srfi srfi-1) fold) #f), AFAIK declarative modules does not anything to this <old>rlb: I did a ping on the PRs <old>I'm putting it on hold for now. Adding the set! feature is not difficult at all <rlb>oh, ok --- haven't seen my mail in a big <dsmith-work>old, Oh, I didn't mean set! was special or anything. In olden times, everything exported from a module was mutable, and so the compiler couldn't inline things. Recent changes allow that inlining. <old>indeed with declarative modules and cross-module inlining that can happen <old>so, if the compiler inlined function F imported in module B inside module A <old>then if module C change that binding at runtime using the setter, A won't see it <rlb>just noticed that string-every-c-code is another example of something that would benefit from our hypothetical "internal only" name prefix, if nothing else (even better if it were just in an internal "startup" module), or something. Also TIL api-undocumented.texi...