IRC channel logs
2024-07-21.log
back to list of logs
<ArneBab>old: already 7 years ago … it’s crazy (and humbling) that I started with Enter Three Witches before Larian began development of Baldur’s Gate 3. <cow_2001>"If this all sounds very complicated, it’s because it is. Some of it is essential, but probably most of it is not. The advantage of using this minimal interface is that composability is more lexically apparent than when, for example, using a stateful interface based on GOOPS. But perhaps this reflects the cognitive limitations of the programmer who made the current interface more than anything <cow_2001>is there a tutorial for tracing and breakpoints and traps? <cow_2001>the manual tells me a lot of stuff but with very few examples <rlb>dthompson: ok, if you end up having time for a look here's what I currently have for the srfi-1 conversion to scheme: https://codeberg.org/rlb/guile/commits/branch/scm-srfi-1 It's the top N commits there. If you happen to know about any of the relevant bits and want to look at the other four too, so much the better :) (of course, anyone else too) <rlb>I thing all the commits should pass the tests (better for any future bisection to find my mistakes...). <dthompson>rlb: awesome, I will take a look when I get the chance! <rlb>certainly. Thanks for the review. <rlb>(A couple of those early commits are just needed to fix (parallel) builds from a clean (i.e. git clean -nfdx) tree.) <rlb>I'll probably push them in a while if I don't hear anything else on the list about them. Can always fix it later (my autofoo is middling). <rlb>dthompson: oh, well don't on my account :) I'm happy to put that somewhere else you'd prefer. And I didn't raise it as a pr, but I could. Also happy if you prefer to send mail, badger me here, or... <rlb>(Oh, and wrt "push them" - meant the early build-related patches, but suppose that could eventually apply for the srfi-1 changes too.) <dthompson>I used gitlab sso to get in. that was convenient. <dthompson>I guess these need to be in a PR in order to make comments <rlb>Ahh, hadn't noticed that was a thing. <rlb>Yeah, I'd assume. Haven't used codeberg much yet. <rlb>(I mean for collaboration.) <rlb>Right, I'll go see about it. <rlb>It's not fond of our ^Ls :) <rlb>wingo, civodul: unrelated - I wonder if when we set up our locale/decoding we knew that the ESCAPE handler we use didn't mean "undecodable bytes", only *characters* (for bytes it just falls back #\?). I suspect we might not have, but just a guess. Personally, I'd like to change all of that to be "data preserving" by default (i.e. error if it can't keep your data), but don't want to spend too much time on it without a consensus. <civodul>rlb: as discussed earlier, “data preservation” sounds like a worthy goal to me as well <rlb>Of course, even if escape escaped "everything", I still wouldn't want it is as the default, since (to me), it's "potentially unexpected data corruption. <rlb>civodul: ok, great, I wasn't sure that's where we landed. <rlb>civodul: and then the next question, is what guidelines, if any are there wrt to backward compat, and/or opt in/out. From "we could just say that's a bug and change it in a Z", to "defaults can only be be changed in an X, but we might have opt-ins like GUILE_LOCALE_CODING=strict or whatever. <rlb>Don't have to answer that now -- just need to have some guidelines before I can come up with a plan. <rlb>(a good one anyway :) ) <rlb>Personally, I might be closer to the "ok to break it since if it's affecting you, it's currently messing with your data" end of the spectrum :)_ <rlb>Possible step one in script.c: <rlb> lst = scm_cons (scm_from_stringn (argv[i], (size_t) -1, encoding, <rlb>- SCM_FAILED_CONVERSION_ESCAPE_SEQUENCE), <rlb>+ SCM_FAILED_CONVERSION_ERROR), <TCZ>rlb: civodul dthompson im baned on lispcafe ok <TCZ>rlb no problem contunie <rlb>Oh, I just realized I might also be behind a while wrt the list... <dthompson>rlb: I didn't read through everything in detail but I left some comments. I'll let the passing test suite be the indicator of correctness. :) <dthompson>mostly I'd just like to eliminate some usage of set-cdr! where it doesn't involve rewriting complicated algorithms <rlb>OK -- in general I only intended to use it in places where it'd avoid a potentially cache-unfriendly reverse!, etc. Since this was srfi-1, I put the bar a little higher on "mess worth it for perf". <rlb>Whichever way means "plausible to tolerate it for something that's fairly fundamental, I assume" :) <dthompson>there's little need for reverse these days since guile doesn't stack overflow <rlb>If you're building something in the "wrong order" there is? Maybe I misunderstand. <dthompson>but for example, in list-copy, you can just cons and recur, no need for reverse or set-cdr! <rlb>Hmm, maybe I'm too used to worrying about other lisps there... <rlb>That doesn't potentially cause heap allocation and/or "extra work at the end"? If not, then of course I'll need to rework a good bit of the series :) <rlb>Because then mutation might actually be worse wrt perf... <rlb>ACTION will be happy <rlb>(and a little said he didn't realize it sooner...) <dthompson>it's no problem, basically every other language imposes a strict stack limit on you <rlb>Well, I also don't want to grow some transient representation on the heap if I don't have to for this, I think... But I'll figure it out either way. <dthompson>it's harder to use this strategy in the case of, say, partition <dthompson>I don't see a way to avoid reverse there. but better to do a reverse at the end rather than set-cdr! all along the way <dthompson>in partition you could just reverse the input list and then the 2 accumulators will be in the correct order upon terminatino <rlb>I'm not sure -- i.e. for "big enough" lists I suspect the reverse approach may be a lot more expensive wrt the host cache, memory hierarchy, etc. <rlb>whether reverse or reverse! <rlb>(the latter even more so because it doubles the heap size, maybe) <rlb>Though Andy make's a good point wrt continuation safety too (whether that's a requirement or not). <dthompson>yeah avoiding mutation is good for continuation safety <rlb>I guess some of this may come down to what "guidelines" we want for core code like srfi-1, i.e. potential memory or cpu perf vs continuation-safety, etc. I'll finish reading (or re-reading?) the article and then ponder. Again - much appreciated. <rlb>...course, we can also approach that incrementally -- even if we went with the similar to C set-cdr! tilt, we can always rework it later (or vice versa). <dthompson>perf is important, of course, but much like how we no longer need to write C to get performant code, there's also less need to reach for mutation of pairs <dthompson>I wouldn't want to bikeshed this, though. if 1:1 translations from C to Scheme get this into main quicker, let's do that. <rlb>Sure wrt perf, though if the transient heap size were doubled for one vs the other, that's a big tradeoff we'll have to settle. <rlb>It'd make some data sets untenable that'd otherwise be "fine, but it'd break continuations -- who wins? :) <dthompson>at a certain scale, should you be using linked lists? probably not <rlb>Hah, yeah, they tend to be harder on the cache (for example) anyway. <rlb>(The set-cdr!y code also has the added "maintenance problem", i.e. it's a lot tricker...) <dthompson>yeah I have to think a lot harder to understand them. but it's okay. :) just happy to have them in scheme. <rlb>ACTION agrees wrt call-with-values :) <dthompson>heh yeah just feels like srfi-1, of all srfis, should stand on its own <rlb>OK, if I'm thinking about it right, then the set-cdr! approach will be memory and maybe cache friendly, but likely continuation, possibly concurrency (and maybe optimization?) hostile. wrt memory, the non-mutating version will likely grow a much larger transient stack (...contemplating what's on it, i.e. still might not be *large*?). <rlb>And the reverse! based variants are worst of both? i.e. larger stack, *and* blows out the cache at the end? <dthompson>yeah reverse should be avoided wherever possible. <rlb>Also, If I read the post right, the way the stacks work on the happy path, they're fairly cheap. <rlb>...or could be, i.e. wrt break() and madv don't need. <rlb>as long as we don't end up having to relocate <rlb>Not sure wrt the native compilation speculation, but that's still a way off anyhow, I suspect :) <rlb>(at the end of the article) <dthompson>yes it's better to just let the runtime allocate more stack as needed <dthompson>in a way we're kinda dealing with the problem at the end of the article with wasm <dthompson>it's not the exact same problem, but similar <rlb>So barring the continuation/concurrency question, I'm leaning toward thinking that the set-cdr! approach is possibly a bit "better" for critical/internal bits as long as you're OK with the (ideally) "one time" complexity -- i.e. assuming you don't have to revisit the code much. *But* we may want to weigh the continuation/concurrency friendliness higher enough to win. <rlb>My thinking there is just that ignoring the complexity, it may perform better both wrt memory (size and cache (at all levels) friendliness) and cpu (in cases where it leaves you with the final answer without additional work). Though might should temper that a bit with questions about many-core systems and mutation :) <rlb>OK - I have a couple of alternates for various functions here that I "kept" to see how review went :) <rlb>(I also have some from lokke, but those often differ too much to be directly useful.) <rlb>And there you're *very* unlikely to even be asking about mutation :) <rlb>(plus or minus transients, which also don't have to worry at all about call/cc...) <rlb>ACTION notes lokke also has no transient support atm -- no idea if it ever will. <rlb>That partition certainly looks a *lot* better :) <dthompson>yeah, we don't have to get into set-cdr! vs. reverse! questions if we just don't use them. :) both are gross <rlb>Yeah, overall agree with having a high(er) bar for any mutation. And offhand, if/when we do choose mutation here, I think set-cdr! (ignoring complexity) might just mostly win over reverse! based approaches, since both of them lose on the "mutation" front, but set-cdr! approaches have the potential cache/cpu/heap advantages. <dthompson>idk maybe. I'd reach for reverse (no !) and a functional algorithm first :) <rlb>(If so, I might try to sketch some flavor of a migration.) <rlb>(or augmentation, depending...) <civodul>rlb: back in the day it was missing on some platforms <rlb>OK, well perhaps I'm mistaken, but I feel like we might need it for a plausible way to say that for now people who need correct system data should just use latin-1. <rlb>i.e. if you're making a million fine-grain system calls, probably can't just setlocale with a lock. <rlb>(unless I misunderstand) <rlb>I'd also be happy to make the support optional/conditional if there's a sensible path forward there. <civodul>oh i thought you were talking about libguile/i18n.c, which does use ‘uselocale’ where available <mwette>rlb: Are you saying some code may need to run using latin-1? <rlb>Sure, every system call that involves paths or envt vars, for example, if you want accurate data. <rlb>Oh, hmm, civodul maybe I haven't investigated enough yet. <rlb>i.e. I want to figure out what the next steps are for things like opendir, readdir, getenv, and on and on... <mwette>I used to use LANG=C, but I discovered newer versions of GNOME won't function with that. I had to use LANG=utf-8. <rlb>Yep, and won't work well if you also need to communicate via http <mwette>Did you read the email on the emacs solution? <mwette>If that is in the right context. <rlb>mwette: not in detail yet, because as I commented, that's more of a "boil the ocean" solution, and I was under the impression that that was not in scope for now. <rlb>And if it *is* in scope, there are any number of fancy possibilities, *maybe* including that one. <civodul>rlb: i think we discussed it before and one idea was to have a fluid specifying the encoding that these wrappers would use? <civodul>and it’d default to current locale encoding <rlb>OK, that's what I'm double-checking -- more or less. Is the (general) idea that you'd have that fluid, and refer to it in every syscall, and then make suitable uselocale calls to establish whatever it specifies? <rlb>(on the "hand waving" level) <rlb>I guess uselocale, or possibly fall back to setlocale with a lock, with the understanding that the latter will be pretty expensive, relatively (I'd assume). <rlb>I also wondered whether it might be reasonable to expose uselocale when available, for broader use. <rlb>(Though of course if we did that, then in platforms that had it, you might not need the fluid...) <rlb>Also, don't let me take up too much of your time with this if you're swamped -- this can wait. I'd just been wanting to have it myself :) But I can fairly easily defer.