***sneek_ is now known as sneek
<lloda>wingo, could you have a look at the array patches? <wingo>lloda: i started looking at them last night <wingo>i got a few patches in, will continue tonight <lloda>wingo: there's this incompatible change where I made vector = simple vector. I hesitated, but I pushed forward b/c I couldn't get much agreement on the ml. You be the judge. <wingo>thank you and again, apologies for the delay! <mark_weaver>I also tried to review your patches long ago, but I got hung up because of many changes that I wasn't sure about. <b4283>i was just using the assoc-list api, and got stuck because i thought assq-set! always mutate the list <mark_weaver>I don't remember the details, but at least some of the changes seemed questionable to me. <lloda>they probably are, I don't disagree <b4283>is it intentional that users must the common gateway "set!" because of complier optimization reasons? <mark_weaver>b4283: well, for starters, there's no way to mutate an empty list. <mark_weaver>and if there's no matching association to mutate, then it gets added to the front, which again cannot be done by mutation. <mark_weaver>so it's not really about compiler optimization at all. <mark_weaver>it's just due to the nature of scheme lists. you must always 'set!'. <b4283>it's confusing to have a ! in assq-set <b4283>but i guess that's required by r?rs anyways <mark_weaver>well, it does mutate the existing alist in some cases, so it's needed. <mark_weaver>if the association is already in the list, then it's mutated. <mark_weaver>if it's not, then there's nothing to mutate. it's just added to the front. <b4283>mark_weaver: i get it, thanks for the explanation <mark_weaver>lloda: can you explain the nature of the incompatibility? I don't quite know what you mean by "vector = simple vector". I understand the distinction, but I don't know what that equation means. <lloda>it'll take me a minute to recall it <wingo>i can push an up-to-date patchset... <mark_weaver>b4283: we probably should make it more clear in the manual. <wingo>mark_weaver, lloda: lloda-array-cleanup in master <mark_weaver>we should probably emphasize that you should always use 'set!' in combination with those destructive alist procedures. <mark_weaver>wingo: thanks, I'll take a look tomorrow (need to sleep soon). <wingo>i removed the patches to compile-assembly.scm; they might need corresponding fixes to the new compiler, who knows <wingo>i am relying on the tests to help me figure that out <lloda>yes, iirc those things are tested. <wingo>lloda: do you have commit access? <wingo>hum, we should flip that bit anyway <lloda>the changes to compile-assembly.scm where in reading array literals. <mark_weaver>I have patches in r7rs-wip for compile-assembly.scm to handle cyclic literals. <wingo>lloda: go into savannah if you would and request to be added to the guile group <wingo>that will at least let you push branches to savannah, which is easier for everyone :) <wingo>and i think bugfixes are welcome as well, though mark or ludo will correct me ;) just send them to the list and if you get no response within a week or so, push them <wingo>mark_weaver: cyclic literals, interesting; i wonder if master can handle those... <wingo>should be an easy fix, if not... the linker should handle it almost automagically <lloda>mark_weaver: simple vectors are like bytevectors or uniform vectors, but for the SCM type. <wingo>they are one-dimensional packed arrays of SCM values <b4283>mark_weaver: the manual already clearified that "the only safe way to use it is to through set!", just that i missed it in the first place :/ <wingo>there is SCM_IS_SIMPLE_VECTOR, etc... <lloda>however, 'vector's can be simple vectors or certain kinds of arrays also <lloda>so whenever you use a vector-ref, vector? etc, checks are made to see if the array passes as a vector 'functionally' <lloda>I mean, if it's an array, and if that array passes as a vector <lloda>so 'vector's are not a unique type. <wingo>lloda: what was your thinking when you made vector-ref only work on simple vectors? <mark_weaver>so your patches would make the 'vector-ref', 'vector-set!', and maybe 'vector?' procedures work only on simple vectors, not arrays, is that right? <wingo>are you treating only the array interface as the all-singing polymorphic interface? <lloda>the thinking was that vector = uniform-vector = bytevector, just with a different element type. <mark_weaver>it means that we can generate much simpler code for the vector ops. <lloda>singing and polymorphic go well together <mark_weaver>especially when we have native compilation, that will be good. <lloda>array is then the only polymorphic type. <wingo>i've never known when any of those things should be polymorphic, and that sounds like a fine rule to me <mark_weaver>so the array operations will work on vectors, but not vice versa, right? <mark_weaver>and now I'm trying to remember what I found questionable. I guess I'll have to look through the patches again :) <mark_weaver>lloda: well, thanks for your patience on this. I'm truly sorry that you've had to wait so long. it's just a daunting review job, that's all. <lloda>nah, you're right it's messy code. <mark_weaver>I'm going to try to take a close look in the next week though. <wingo>mark_weaver: do you want to take the review? <wingo>i was going to start on it but we shouldn't duplicate work <mark_weaver>well, it's probably good for both of us to review it. <wingo>as you like, it doesn't matter to me <mark_weaver>please do review it, wingo. but also please give me a chance to review it before pushing, if you don't mind. <wingo>mark_weaver: ok, i'll hold off functional changes for more review, but i will push bug-fixes and test things and similar <wingo>still it seems like double-review is too much, but whatever <mark_weaver>well, I think you're probably more familiar with that area of the code. but at the same time, I want to remember what I found questionable. I might not do a full review. <mark_weaver>it's possible that I had 'stable-2.0' too much in mind when I tried to review it last time, and that the vector==simple-vector thing worried me. I hope that's the case. <b4283>there's a nice song about sleeping well <civodul>lloda: welcome to the Savannah group ;-) <lloda>I've built lloda-array-cleanup. make check passes. I've tested my programs against it. Everything seems to work except for this message: <lloda>;;; WARNING: compilation of [...] <lloda>;;; ERROR: don't know how to intern #2f64() <lloda>similar errors for other literals. <lloda>I was on 2.0.9 before, so this is new to me. ***DerGuteM1 is now known as DerGuteMoritz
<civodul>just add it to the symbol hash table! <lloda>it's lloda-array-cleanup which is on top of master. I want to blame master b/c I had the extra patches before on top of 2.0.9 and not this issue. But I haven't tested master yet. <wingo>yes, probably the new compiler doesn't handle that for whatever reason <wingo>also there is no symbol hash table in master; only a weak set :) <civodul>wingo: right, but still, it does know how to intern things, doesn't it? :-) <wingo>but as in stable-2.0, symbols are gc'd, so it's not really interning i guess <jmd>Calling (link "x/y/z" "w/x/y/z") I get the error ERROR: In procedure link: <jmd>ERROR: No such file or directory <jmd>which file or directory does it think does not exist? <civodul>the syscall doesn't provide more info <jmd>Oh. Then is there a mkdir which will create the necessary subdirs? <wingo>i wonder if we should use that <civodul>systemd and Rust have it, so i guess we must <wingo>it's to prevent hash collision attacks <wingo>if we switch to utf-8 strings we can hash directly over the utf-8 bytes <wingo>right now we have to be careful to hash over codepoints, since a given string can have multiple representations <wingo>it would be nice to pre-compute hashes for strings and symbols that we residualize into object files... <wingo>i reckon in that case we can just use a well-known seed <wingo>of course statically creating a hash table would be nice, too :) <mark_weaver>I've been thinking about statically creating the symbol table for symbols in core guile for a while, to speed up startup. <mark_weaver>but so far it's just a thought. never really looked into it. <wingo>it's possible; probably the biggest gains though would be pre-computing hashes and pre-allocating variable cells <wingo>at least according to valgrind <mark_weaver>I'm not sure why pre-allocating the variable cells would be more important than preallocating the hash table chain chells. <wingo>the symbol table doesn't have chain cells <mark_weaver>for that matter, in order to make 'equal?' and 'write' handle cycles without a severe performance regression, I'm going to need hash tables that don't allocate anything in the common case. *wingo gets grumpy whenever he thinks about cycles <mark_weaver>fortunately, in both cases, the elements are removed from the hash table in LIFO order, like a stack, which makes it much simpler. <mark_weaver>I can do the thing where I put the elements directly in the array, and if that bucket is already full, I scan until I find a free entry. <wingo>you might check out the weak table implementation then -- it uses an open-coded robin hood hashing scheme <wingo>with 2/3 of the memory usage of a table <wingo>(the weak set and table implementations store the raw hash value also) <wingo>i guess we could have deterministic stringbuf hashes, but a string's hash or a symbol's hash would have to be mixed with a per-invocation private key <wingo>dunno, just tossing around ideas on how to pre-compute hashes while not being a vulnerability <mark_weaver>btw, what is the symbol table keyed on nowadays in master? <wingo>the hash of the stringbuf ;) <mark_weaver>I seem to recall it used to be keyed on the symbol itself, which I always thought was ridiculous. <wingo>the hash of the stringbuf backing the string <wingo>or more precisely, the hash of the codepoints composing the symbol <wingo>but that hash is not a component of a stringbuf. <wingo>well, right now it goes character by character because there are different encodings <wingo>but if we had utf-8 strings it could just hash the utf-8 bytes <wingo>though there are utf-8 specialized string hashers in master <wingo>just not as fast as hashing bytes <mark_weaver>after 2.0.10 is out the door, I have two guile priorities: fixing the thread-safety of module autoloading, and utf-8 strings come after that I think. <mark_weaver>one question: why do we have to go character by character, anyway? <mark_weaver>the encoding that a string is in is deterministic, based on the contents of the string. <mark_weaver>I wonder if shared substrings is actually a win in practice. somehow, I doubt it. <wingo>depends on your use-case, i would think <wingo>i want to do subbytevector now... <mark_weaver>sure. if you take huge substrings of huge strings, then definitely a win. but how often is that, I wonder. <wingo>without it, my file upload code has to store double the memory of the upload <wingo>i think it's probably a win; marius took it out around 2006 or so but had to put it back in due to users complaining <wingo>and v8 has like 20 kinds of strings <wingo>so, i don't want 20 kinds of strings, but at least one project has deemed it important enough to invest lots of time on it <wingo>ropes, substrings, byte strings, ucs-16 strings, etc etc <wingo>but their strings are immutable, so that's a difference <mark_weaver>I'm going to want to allow the underlying bytes of a utf-8 string to be exposed as a bytevector. <mark_weaver>a large number of efficient utf-8 algorithms depend on working by byte. <wingo>ok let's do it; probably it pays off <civodul>exposing the internal representation? <wingo>the thing you lose is some type-based optimization things <wingo>civodul: for implementing algorithms in scheme <civodul>i understand, but it has to remain an internal API <mark_weaver>for example, searching can be done by bytes in utf-8. same for regexp searches. <wingo>it's definitely a privileged function <civodul>string->utf8 could return a COW bytevector <wingo>we don't have COW bytevectors <wingo>and i don't think we want them <wingo>i think they would make too many things slow <wingo>mark_weaver: yes that could work <mark_weaver>yeah, I don't know of an algorithm that needs write access to the bytes. <wingo>it still has a runtime cost (bytevector-u8-ref not implying that bytevector-u8-set! is valid) but that's probably ok <mark_weaver>as soon as you need to write, then you might need to change the length and that's a mess anyway. <wingo>ok let's do immutable bytevectors then <wingo>we can do string->utf8/read-only <wingo>though with immutable strings it could be that the backing store is in fact mutable <wingo>but i think we can describe that adequately inthe manual <wingo>well you might want to provide read-only capabilities to a piece of memory, but that memory might change <mark_weaver>when utf8 strings are mutated, that will be a slow path anyway. <mark_weaver>the string will have to be broken up into blocks, and then reassembled when converted to utf-8. <wingo>we could pessimize string-set! <wingo>try to do something sensible but not care too much about it <mark_weaver>right. we had a thread on this topic years ago, and I proposed a scheme that make 'string-set!' constant-time but with a large constant. <civodul>mark_weaver: you had posted arguments in favor of utf-8 internally, no? <wingo>logarithmic string-set! is fine with me too <mark_weaver>yeah, if we went logarithmic (for string-ref too), then we could probably do things like efficient concatenation as well. <wingo>it's a terrible design space <mark_weaver>civodul: "O(1) accessors for UTF-8 backed strings", March 2011 <civodul>mark_weaver: that's a message that says it's doable, not a message that says it's worthwhile :-) <mark_weaver>yeah, I'm looking for the right message. that's not quite the right one. <civodul>but yeah, the above has interesting arguments <mark_weaver>well, I'm having trouble finding the message where I gave the best arguments. but basically: (1) it's a single representation, so binary string operations could be optimized without handling 4 different cases, (2) it would allow us to use libunistring for many of our string operations, (3) UTF-8 operations can typically be done byte-wise without difficulty, which makes things much faster and simpler. <mark_weaver>and of course, conversion to/from utf-8 is very fast, and that's the common case these days. <mark_weaver>I don't know, but if it isn't, that should probably be fixed, no? <mark_weaver>unicode has a lot of intricate algorithms, and it seems worthy of a library rather than each project duplicating that work. <mark_weaver>handling the bare utf-8 code points is simple. but things like case conversion is nasty. <civodul>wingo: it's dormant, but not unmaintained, i'd say <mark_weaver>and at some point, people will want to be able to really deal with *characters* as opposed to code points for some things. <mark_weaver>in fact, what people really think of as characters, e.g. corresponding to a single glyph when rendered, are actually multiple code points. <mark_weaver>in fact the scheme standards kind of got it wrong to equate characters with code points. <tupi>wingo: whenever you feel like it, could you add me to guile-gnome, thanks <wingo>i see that your assignment came through, great <wingo>i think that's why i didn't do it in the past <wingo>please ask before pushing things :) <tupi>wingo, guile-gnome web pages are under git as wel? <wingo>tupi: they are under cvs actually; and i think they are generated from something in the guile-gnome source tree <wingo>there is also a script somewhere to update the docs <tupi>ok, where is the cvs server ? <wingo>see the "use cvs" link on the upper right hand side of the savannah project <wingo>heh, it seems that guile unrolls a factorial loop entirely if the number is 23 or less <wingo>,optimize (let lp ((n 23)) (if (zero? n) 1 (* n (lp (1- n))))) <wingo>$6 = 25852016738884976640000 <sneek>I last saw ijp on Jan 15 at 02:10 pm UTC, saying: who is also apparently at utah. <add^_>hm, ok, not to long ago then, just haven't seen him in quite a while. <davexunit>add^_: not too much. just working and stuff. frustrated with corporate software development at the moment. <add^_>err, why are you frustrated? <davexunit>add^_: adding hacks on top of hacks to please stakeholders. <add^_>maybe it's "gist" in English <davexunit>I have to implement something far more complicated than necessary for the sake of small user experience complaints. <add^_>Life in corporations is like that I suppose.. <add^_>On another note, I'll be buying a Kinesis keyboard soon :-D <add^_>I wonder how hard it'll be to get used to emacs with that :-P <add^_>A friend who has one says that it would probably take a month of use to get used to (just the keyboard) <add^_>Sounds pretty much like getting used to dvorak.. <add^_>But dvorak was annoying in emacs for me. Although I've heard people who thinks emacs is even easier with dvorak... <davexunit>I haven't tried dvorak or any other keyboard layout <add^_>If you already can touchtype or whatever it's called, on qwerty, it's probably not necessary to switch... <taylanub>Go with Colemak if you'll switch. Less different than Dvorak, and just as good or a tiny bit better. <add^_>taylanub: is that what you use? <add^_>So, what's the logic behind where the keys are placed? <taylanub>Not sure, some complicated algorithms and hand-tuning. <add^_>It seems like all the vowels are on homerow, like on dvorak <taylanub>(Of course, like any algorithm, they're only complicated until you know them.) <add^_>I think there are algorithms that are still complicated even when you know them :-P <taylanub>I once made a script for an IRC client, I think it was WeeChat, that printed the number of home-row keys for QWERTY, Dvorak, and Colemak, at the end of the sent line, for every line I sent. <taylanub>IIRC Colemak mostly had a small edge over Dvorak, and QWERTY a huge gap behind. <taylanub>(Of course home-row hits are just one very specific criteria.) <add^_>I don't feel like trying yet another keyboard layout though :-/ Maybe I should.. <taylanub>Colemak goes well with Emacs BTW, from what I can tell, though I never used Emacs with anything else than Colemak so I have no comparison. <taylanub>Also, w and f are aside, so don't accidentally C-x C-w a file instead of C-x C-f'ing it!! <add^_>There will always be those kinds of things ;-) <taylanub>Yeah, it actually prompts for a confirmation, you have to spell out y e s and hit RET, for some reason I did it as a reflect that one time. :\\ <add^_>well, testing out dvorak again right now... Takes forever to type <add^_>I think I'll get used to it again soon enough