IRC channel logs
2026-05-11.log
back to list of logs
<rlb>I forgot about read-string/partial, etc. from rw.c --- they're a bit odd, and/or less desirable with utf8? i.e. they read the fd bytes directly into a string, which in main means you get a latin-1 string containing the binary data. With utf-8, it still works, but is likely wasteful. <rlb>I wonder if they should be dprecated, i.e. likely want bytevectors. <ArneBab>old: getting out of GC as bottleneck was pretty hard in low-level code, that’s why I have hopes for whippet. <old>wondering if could have none-heap flonum like for fixnum <old>rlb: where is this define? Is this only in the utf-8 branch? <probie>If I want an array of unboxed doubles, I should use a bytevector, shouldn't I? <lloda>which is a bytevector, but that's somewhat of an implementation detail <dthompson>old: non-heap flonums are only possible with unboxing because otherwise there isn't enough room in a word <probie>Ah. At some point I should actually read the Guile manual from end to end instead of skimming through parts of it <old>dthompson: what about single precisiong floating point <old>these can be encoded on a single word on 64-bit machines <dthompson>old: guile's flonums are exclusively doubles <old>hmm I remember reading about a technique with NaN <dthompson>iirc that technique is not applicable to guile <dthompson>but I'd have to hunt for the wingo blog post that explains it <dthompson>that mentions nan boxing but doesn't explain why it's not a good fit for guile <identity>«I looked at doing nan-boxing for Guile, and eventually decided against it. Guile has a few more constraints that V8 does not have. Firstly it is very portable, so it can't rely the address range constraints that x86-64 has, not without some hackery. It's also portable to the low end, like V8, so 64-bit values are a lose there. But the killer is the conservative GC on 32-bit systems. […]» https://wingolog.org/a <identity>rchives/2011/05/18/value-representation-in-javascript-implementations <old>But maybe Whippet would remove that killer thing <dthompson>in practice I find unboxing + JIT = good flonum performance <old>that requires the compiler to see everything <old>> Contrary to NaN-boxing, the newly obtained representation does not impact the performance of encoding and decoding other tagged objects. It is also applicable in the context of single-precision floats on 32-bit architectures, which NaN-boxing does not support. <old>would need to see if this self-tagging thing could be done <dthompson>old: even if flonums could be made to fit in a word, unboxing would still be needed for good peformance <dthompson>unoptimized flonum math would get faster, which is good <dthompson>but all the type dispatch overhead will still occur <probie>Out of curiosity, (since people are talking about flonum performance) what would have to happen for Guile to be able to generate SIMD instructions? <dthompson>I'm far from an expert on SIMD, but I see two paths: 1) compiler magic to detect math that can be SIMD-optimized 2) explicit SIMD vector api <old>dthompson: right. This is why I am adding rnrs flonum? as a primcall. From my testing, it can make some code much faster that way. Would be nice to see if it improve performance of chickadee <old>vs the (and real? inexact?) <dthompson>old: would be great to have that as a primitive! the reason I don't use any of the rnrs monomorphic number stuff is because guile doesn't optimize it <dthompson>chickadee may see marginal improvements but all the critical stuff has already been optimized to use unboxed ops <old>dthompson: Well I did added fixnum? as a primcall last week <old>Now I'am adding flonum? <dthompson>but being able to ask flonum? leads to nicer code <old>But I think what's trully missing is type inference in CPS <old>The compiler ought to optimize the usualy (and real? inexact?) to a simple heap-tag=? + flonum? <old>so I am making flonum? a primcall directly. But I would like eventually to revisit this and do the lifting in CPS <old>probie: I think that usually a compiler can detect access patern, say in a loop and generate some SIMD for it, if available. This can be done through ifunc in C <old>probie: For Guile, I assume that it would be possible to do something similar and decide a runtime what to JIT <dthompson>prescheme is not guile, of course, it's a lower level language that compiles to C, but it was interesting. <dthompson>prescheme doesn't have simd built in, but it was trivial to add bindings <dthompson>my general feeling is that code that is so performance sensitive that it requires SIMD is probably the type of code that benefits from explicitness rather than hoping the compiler optimizes it the way you want. <dthompson>like maybe once old has finished all these flonum improvements I will go through some of my chickadee code and change + to fx+, etc. <dthompson>if there were simd procedures I could then optimize my 4x4 matrix math routines, for example <old>dthompson: Actually I think you're better with staying with generic operators tbh <old>In my experience, with sha256 at least, using fixnum artihmetic was slower than generic one <old>In theory. In practice not everthing gets JIT <dthompson>it's because the compiler is not aware of monomorphic arithmetic procedures <dthompson>if it was then it would be just as fast or faster, depending on the circumstance <old>I was very suprised for example that fx+ was not emitted as a `add' instruction I think <old>but was instead a call <dthompson>yeah most of the rnrs stuff is there for compatibility with standards but they don't get optimized the way the core stuff is <probie>So, for a bit of context as to why I'm asking, I thought I'd try implementing an APL in Guile. A lot of operations would benefit from SIMD instructions <dthompson>it would be quite a bit of work to get simd into guile <old>well if there's vectorize primitives <old>that would require a new type I guess <old>that would be the easy path. The hard path is recognizing current Guile code and optimize it for SIMD access <dthompson>an explicit api for a v128 type would be nice <dthompson>I say "type" but it would need to avoid allocation of new heap objects to be useful <identity>probie: that got me thinking: does Guile have a sideways sum procedure? <identity>it does not seem to. would be nice to see JIT turn that into a single instruction… <old>there are many low hanging fruit that JIT seems to be missing <old>I think it was right shift or something like this <dsmith>anyone have a log of when the bot went away? <sneek>I've been aware for 2 minutes and 39 seconds <sneek>This system has been up 3 minutes <rlb>Note that it reads *bytes* without decoding into a latin-1 string. <rlb>A trick that probably only makes sense if you have a latin-1 string variant? <rlb>(rejecting wide strings) <old>hmm yet another undocumented interface <old>I think it should be bytevector <rlb>I wondered if newer bytevector functions supercede it, but haven't compared closely yet. <old>oh this is a mutation string <rlb>Yeah, I wondered if it might predate bytevectors. <ekaitz>rlb: doesn't your string implementation use bytevectors internally? <rlb>The fundamental store is now the "basis string", and other variants build on that. <rlb>I feel like strings are fundamental enough that the common case is worth optimizing, i.e. if we can have fully inline strings (cache-wise, etc.) in some cases, without a lot more complexity, we should, and of course the variable length encoding is the other larger constraint on the options. <rlb>In this case, basis strings perhaps reduce complexity a bit --- they eliminate the need for the "mostly internal only" stringbufs we used to have. <ekaitz>and what are bytevectors implemented with? a separate thing? <rlb>Yeah, they've always been independent, though in guile vectors are "fancy". <ekaitz>pretty much everything in guile feels fancy <rlb>i.e. bytevectors, homogeneous vectors, and general purpose vectors all work with a subset of the vector- functions. <old>if utf-8 was not variable length encoding, bytevector could be a good fit <old>as long as the character are properly aligned (not cache line bounadry) so that concurrent read/mutation are consistant <old>but since utf-8 is VLE, mutation of a string must be done diffrently <rlb>Heh, if UTF-8 were not variable length, we'd need a whole lot less of all kinds of things :) <ekaitz>old: could you elaborate on that? <old>There's an open issue on this hmm <ekaitz>ACTION is trying to learn, not confront <ekaitz>my first thought would be that utf-8 is not really a variable length encoding in a sense. It's just bytes and they are read one by one. <old>but basically, because of VLE, you need to be consistant with the order of stores and loads <identity>if UTF-8 was not variable-length, it would be UTF-32, and the we would need a whole lot more memory <old>compiler and CPU wise <identity>(or worse, return to US-ASCII/50 million encodings) <ekaitz>the order of stores and loads -> that is very interesting <old>The processor is free to re-order load and store as it fits, depending on the architecture. If some thread mutate say 2 bytes to change a character, a reader could see only one byte being modified, out of order <old>resulting in wrong encoding (invalid chracter) <old>All encoding that are fix lenght encoding are fine as long as the stores/loads do not overlap two cache lines <identity>ekaitz: ‹bytes› is not an encoding; UTF-8 is not just bytes, but an encoding of Unicode codepoints as 1–4 bytes. hence, the length of a codepoint—the basic unit of text—is variable <old>if you try to read a character that is 4 bytes and it span two cache line, then you might see some out-of-order result wrt to a store to that character <old>the only safe encodings for this are ASCII and latin-1 because characters are 1 byte and thus can't cross cache-line boundary <ekaitz>identity: i know, i know, but I was wondering at a different level <old>this is why rlb and I think we need a mutation model that is different. Instead of inplace mutation of the buffer, you read the buffer, copy it, modify the copy, and publish the result atomically <ekaitz>what i see from all this is the issue is not with the vle but the fact that you don't edit the full piece of information in one go <ekaitz>you could do the same in something that has a fixed length <ekaitz>but yeah, i really understood. That was a really interesting point i didn't think about. <old>indeed, fixed lenght are nto bullet proof <ekaitz>ACTION never thinks about multithread/multiprocess <old>the main problem with VLE wrt to this is the unaligned aspect <identity>to think about concurrent processes properly, you need at least 2 people thinking about them at the same time <old>it's much easier to think of concurrent processes if no mutations are involved <old>unfortuantelly, that is not possible at the lowest level <rlb>Updated utf8 --- now has a standard way to handle building result strings that automatically starts on the stack and overflows to heap fragments when necessary; reworked some additional functions like read-delimited, get-string-all, etc. to use it. Also have added, but not pushed, srfi-152 string-map support for the fn returning strings in addition to chars.