IRC channel logs

2014-11-18.log

back to list of logs

*davexunit wonders how to write an sexp -> sql translator that doesn't assume too much about valid sql keywords
<davexunit>there's a ton of databases, and their SQL keywords vary, but I don't see a way to avoid having to know about pretty much every possible keyword in order to do a proper transformation.
<ijp>why do you want a sexpy sql? what is it buying you that normal sql isn't?
<davexunit>I want to write sql as sexps so I can do the usual cool stuff with sexps like unquote
<davexunit>'(select * (from "users") (where (= id ,user-id)))
<ijp>okay, but for common case, you already have prepared statements
<ijp>(and that's the wrong kind of quoting)
<davexunit>yeah, meant to use `
<davexunit>contrived example
<ijp>I've seen this attempted a few times, and it's never been convincingly useful
<ijp>and why quoted lists, and not a function with keyword arguments?
<ijp>why not a separate kind of object altogether
<davexunit>all good questions
<DerGuteMoritz>b
<DerGuteMoritz>oops, sorry!
<davexunit>I don't want to get into ORM territory, but it seems like that's the only way to go
<davexunit>so I will just stop this project.
<DerGuteMoritz>oh, I happen to accidentally stumble into a conversation I could even contribute something meaningful to
<mark_weaver>ijp: out of curiosity, what's your opinion of sxml?
<ijp>it's complicated
<DerGuteMoritz>davexunit: check http://www.more-magic.net/posts/lispy-dsl-ssql.html
<ijp>but it's easier to justify because there are more interesting manipulations you want to perform on it
<ijp>you get some xml, you pull out bits, rearrange them, and output them as new xml
<mark_weaver>it seems admirably simple to me. I'm not sure how it could be made much simpler.
<ijp>mark_weaver: I mean my opinion is complicated, not sxml
<mark_weaver>ah, okay.
<DerGuteMoritz>I actually find ssql useful if only for being able to use paredit for it
<turbofail>ijp: well you could do the same thing with an s-expression based SQL thing. and i've done such a thing before
<turbofail>there is some value in giving a query its own data type though, it just makes entering literals in slightly more verbose
<ijp>sql is not really as, er, composable as (schema-less) xl
<DerGuteMoritz>but being able to combine fragments of sql by means of unquoting and splicing is really useful, too
<davexunit>the problem for me is that sql syntax is more complicated than xml. it's not very easy to translate properly.
<DerGuteMoritz>that's true
<DerGuteMoritz>it's very barque
<DerGuteMoritz>baroque even
<ijp>a db api based on relational algebra would be, but I've not seen that done really well
<davexunit>some things are infix, prefix, suffix, etc.
<turbofail>ijp: it's composable enough
<davexunit>so the translator needs to learn what to do with each keyword
<davexunit>(= id 1) => id = 1
<davexunit>(not-null id) => id NOT NULL
<davexunit>and so on
<ijp>if you treat it as a sort of mapping from keywords to values, you can work around the ordering and regain some composability
<DerGuteMoritz>davexunit: should you be interested in porting the chicken ssql egg to guile, I'd recommend you check out the coops branch in which I started a clean rewrite in terms of coops (a generic function / oop system similar to guile's goops): https://gitorious.org/chicken-eggs/ssql/source/coops
<DerGuteMoritz>it also has more features than the master version
<turbofail>ijp: that's basically what i did, i had queries represented as alists
<ijp>turbofail: I use this representation for linear equations in some python code
<DerGuteMoritz>the heart of the translator is this method: https://gitorious.org/chicken-eggs/ssql/source/79bf9bf4052277d2be0bec6aecff8d4de13dca76:ssql-translator.scm#L146
<turbofail>which is also convenient when programatically modifying queries afterwards, as it makes it easy to look up the component parts
<mark_weaver>ijp: you asked: "and why quoted lists, and not a function with keyword arguments?"
<ijp>davexunit: I think nalaginrut has some code for doing a "scheme sql" in artanis
<turbofail>or modify the query by tacking on a different version of one of the components
<mark_weaver>ijp: so what do you do if you have a more complicated query that includes subqueries?
<ijp>I definintely had this discussion before
<mark_weaver>I'm not sure how you could do that sanely with procedure calls.
<mark_weaver>but admittedly, my knowledge of SQL is very weak.
<ijp>mark_weaver: a naïve way would be a nested function call
<ijp>assuming a really shallow translation
<mark_weaver>ijp: so these procedure calls are just data constructors then?
<ijp>more or less
<mark_weaver>well, you lose the ability to use quasiquoting then.
<turbofail>had i been working in scheme or clojure that's what i would have done, but i was using elisp, so alists seemed like the best choice
<ijp>mark_weaver: you don't need quasiquoting in the first place
<mark_weaver>I dunno, it seems that the use of lists for almost anything structured is out of fashion these days, but it seems to me a good way to represent expressions.
<turbofail>quoted expressions are still nice for doing the "expression" portions of the query
<ijp>turbofail: sure, the procedure example doesn't have a good story for that part
<mark_weaver>the other problem with using keyword arguments is that they are not portable at all. it would be nice to have a nicer way to deal with SQL that could be promoted across the scheme community.
<DerGuteMoritz> http://clsql.b9.com/documentation.html seems to take the keyword argument approach
<DerGuteMoritz>sorry, http://clsql.b9.com/ is the main entry point
<davexunit>DerGuteMoritz: that first article you sent to me is a good read
<davexunit>this is a hard problem space indeed
<DerGuteMoritz>davexunit: I will forward your praise -- be sure to read the rest of his series on lispy DSLs, too
<davexunit>harder problem than expected, I think I'll drop this project before it becomes a time sink :P
<DerGuteMoritz>:-D
<DerGuteMoritz>sounds familiar, heh
***fangism is now known as fangism-vacation
<davexunit> http://www.more-magic.net/posts/structurally-fixing-injection-bugs.html
<davexunit>guile's http library gets a mention in here
<davexunit>because it has a safe way to manipulate http headers :)
<nalaginrut>morning guilers~
<sneek>nalaginrut, you have 1 message.
<sneek>nalaginrut, ArneBab says: I’m AFK for now, but I’ll read the backlog. Please tell me when you finish reading the SRFI.
<nalaginrut>ArneBab: suer
<nalaginrut>sure
<nalaginrut>ArneBab: BTW, it's better to have wisp-mode in Emacs
<cky>sg2002: Re iterating over lists, in theory `do` is a macro over a tail-recursive loop. As to whether Guile 1.8 implements `do` that way (can't be bothered to run "git checkout" to find out), that's a different story.
<cky>sg2002: As such, again in theory, `do` and its equivalent tail-recursive loop should have identical performance.
<sg2002>cky: Thanks. Good to know.
<nalaginrut>cky: IIRC, in the past Guile, do is faster than tail-recursive, I guess it's a specific implementation, but I've no idea how about now
<nalaginrut>anyway, one shouldn't concern this issue since it's compiler defined
<mark_weaver>nalaginrut: I don't know what Guile 1.8 did, but in Guile 2 'do' is just a macro that expands into a tail-recursive loop.
<mark_weaver>(defined in boot-9.scm)
<mark_weaver>looks like 'do' was a primitive in Guile 1.8's evaluator
<cky>Yeah, Guile 1.x had many quirks that 2.0 tidied up. One of many reasons why I <3 2.x. :-D
<ArneBab>nalaginrut: wisp-mode is available from marmalade
<ArneBab>nalaginrut: package-install wisp-mode (though that’s an old version: I lost upload permissions when marmalade got updated): https://github.com/nicferrier/elmarmalade/issues/27
<civodul>Hello Guilers!
<nalaginrut>heya
<nalaginrut>ArneBab: nice~
<nalaginrut>mark_weaver: yeah, I'm glad to see it's actually tail-recursive
***Guest33437 is now known as spock
<dsmith-work>Morning Greetings, Guilers
<stis>evening guilers
<cluck>that was quick, must have been a polar day
<stis>:)
<sg2002>How do I get a dimension of a multidimensional array?
<mark_weaver>sg2002: I'm not sure what you mean by "get a dimension".
<mark_weaver>maybe 'array-dimensions' is what you want?
<sg2002>mark_weaver: #2((1 2 3)(4 5 6)) - I want to get (1 2 3).
<mark_weaver>lloda is the best person to ask
<mark_weaver>he's the resident expert on guile arrays :)
<sg2002>Ok, one way is converting it to a list. It's just weird that there's no bundled function.
<mark_weaver>one way is to use 'make-shared-array': (make-shared-array #2((1 2 3) (4 5 6)) (lambda (i) (list 0 i)) '(0 2)) => #(1 2 3)
<mark_weaver>but it's a very powerful tool for a simple thing. maybe there's a better way.
<mark_weaver>(array-ref #2((1 2 3)(4 5 6)) 0)
<mark_weaver>oops, sorry
<mark_weaver>(I've never used arrays myself. I tend to use nested vectors instead)
<mark_weaver>or even just nested lists, if I don't need O(1) lookups
<ArneBab>I did not know about the syntax for multi-dimensional arrays. It looks like I could use them for matrix stuff.
<mark_weaver>yes, they are well suited for matrices and tensors, obviously
<ArneBab>The syntax looks a bit unusual, though.
<sg2002>mark_weaver: I started with nested vectors but people here suggested that they'd work better in my case. :-)
<ArneBab>(giving the dimensions as prefix)
<ArneBab>mark_weaver: is the performance of multi-dimensional arrays acceptable for bigger matrix calculations?
<mark_weaver>these are all questions for lloda
<mark_weaver>my knowledge of guile arrays is very weak.
<davexunit>for my case, 2d/3d realtime graphics, guile's arrays were too slow, unfortunately.
<davexunit>but for your case they might be just fine! I was trying to do thousands of 4x4 matrix mults at 60FPS
<mark_weaver>individual vectors operations are definitely much faster in the current implementation. they have dedicated VM instructions, and of course they don't have to worry about things like arbitrary numbers of indices, etc.
<mark_weaver>however, I see no reason why array operations couldn't be made very fast in the future.
<ArneBab>davexunit: I might have to do matrix inversion, so speed could become an issue. Though thousands of operations per second will hopefully be some time away :)
<ArneBab>My search for efficient multi-dimensional arrays just turned up something which might be useful for this: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.101.7531
<ArneBab>Efficient representation scheme for multidimensional array operations (2002)
<ArneBab>hm… http://rosettacode.org/wiki/Matrix_multiplication#Scheme
<ArneBab>works on lists of lists…
<mark_weaver>basic matrix operations on lists of lists can be coded very concisely in scheme.
<ArneBab>I saw that - quite impressive
<ArneBab>is there a canonical way to represent hash-maps?
<ArneBab>(key-value data)
<ArneBab>(I really mean canonical - I read about a way in the info pages)
<mark_weaver>guile has association lists, hash tables, and vhashes.
<ArneBab>I think my main problem is when to choose which.
<mark_weaver>well, it depends on what kinds of operations you need to do quickly.
<ArneBab>(there is the hash-table, but why use them over alists? - https://www.gnu.org/software/guile/manual/html_node/Hash-Table-Reference.html)
<mark_weaver>association lists and vhashes allow you to extend a mapping without destroying the original.
<mark_weaver>hash tables are fundamentally mutable, there's no good way to extend one without mutating an existing one. but they are very fast.
<ArneBab>so for vhashes you can have multiple datasets share the same memory?
<mark_weaver>multiple "datasets"? I find that a confusing question.
<mark_weaver>vhashes act a lot like association lists, except that they will perform much better for huge dictionaries.
<mark_weaver>alists perform well for small dictionaries.
<ArneBab>mark_weaver: with datasets I meant sharing parts of the memory on datastructures in different parts of the program, though these different parts of the program see different content for the datastructures.
<ArneBab>(reusing memory for the parts which are the same)
<mark_weaver>alists, vhashes, and normal scheme lists all support sharing of the "tails".
<mark_weaver>(define a '(x y z))
<mark_weaver>(define b (cons 'b a))
<ArneBab>how do you define sharing of the tail in a vhash? The value part?
<mark_weaver>(define c (cons 'c a))
<mark_weaver>now, the lists a b and c all share a common tail.
<ArneBab>ah, no: vhash-cons key value vhash [hash-proc]
<mark_weaver>alists are nothing more than lists of pair.s
<mark_weaver>so (define alist1 '((rose . red) (violet . blue) (banana . yellow)))
<mark_weaver>now you can extend it like so: (define alist2 (cons '(plum . purple) alist1))
<mark_weaver>now alist1 and alist2 share a common tail.
<mark_weaver>the API for vhashes is essentially the same as for alists, and they behave essentially the same, except that lookups are faster for very large dictionaries.
<mark_weaver>but for smaller dictionaries they will be slower.
<mark_weaver>unfortunately, it is usually the case that algorithms and data structures that work well for large cases tend to be slower for smaller ones.
<dsmith-work>mark_weaver: Any idea of the "very large" and "smaller" breakpoints?
<ArneBab>^ that’s what I just wanted to ask :)
<ArneBab>it would be very interesting to have that in the documentation.
<mark_weaver>I dunno, it's easy enough to do some simple benchmarks :)
<mark_weaver>it probably depends on a lot of factors
<ArneBab>with 10k elements there already seems to be a factor of 10.
<ArneBab>(define alist1 '((1 . 0))) (do ((i 1 (1+ i))) ((> i 10000)) (set! alist1 (cons `(,(1+ i) . i) alist1)))
<ArneBab>(define vhash1 (alist->vhash alist1))
<ArneBab> ,profile (do ((i 1 (1+ i))) ((> i 1000)) (assoc 5 alist1))
<ArneBab>,profile (do ((i 1 (1+ i))) ((> i 1000)) (vhash-assoc 5 vhash1))
<ArneBab>with 1000 elements they seemt to be equal
<ArneBab>though vhash if 2x faster at retrieving element 500
<ArneBab>(which actually is earlier)
<ArneBab>the performance of vhash-assoc depends on the order of insertion, but seems to be bounded (0.04 seconds for the last element added to a 10k elements vhash, 0.44 for the first I added).
<ArneBab>(time divided by 10k - I use 10k calls to vhash-assoc so the ,profile has something to show)
<ArneBab>for the alist it is 0.02s for the last element added and 15.09s for the first added.
<ArneBab>on the other hand the alist is 2-7 times faster than the vhash for 10 element lists (depending on the index).
<dsmith-work>There ya go
<dsmith-work>ArneBab: Thanks
<ArneBab>dsmith-work: glad to ☺ it is something I wanted to check myself, and the ,profile functionality is pretty nice.
<mark_weaver>I once proposed rewriting vlists and vhashes in C to speed them up, but the idea was rejected on the (reasonable) grounds that the Scheme implementation will get faster when we have native compilation, and it's better to minimize the amount of C code we have.
<mark_weaver>just keep in mind that it's hard to do good benchmarks. the simple ones you did above are not very good, and may not reflect real-world usage much.
<ArneBab>would it be possible to create a generic structure which starts as alist and changes to a vhash if it gets too large?
<ArneBab>mark_weaver: the benchmarks are too simplistic, yes :)
<ArneBab>but they give a feeling for the size of the effect
<ijp>you'd be better off just changing vhashes to do that
<ijp>unless we expose the implementation of vhashes in some way that makes that infeasible
<ArneBab>for an alist/vhash with 1 million elements, retrieving the first element I added is 200x times slower in an alist than in a vhash.
<ArneBab>retrieving the last element added is 1.6x slower, though.
<ArneBab>in the vhash
<ArneBab>(so retrieving the last element is 1.6x faster in the alist ← just to avoid ambiguities)
<ArneBab>of 1 million elements
<ArneBab>ijp: you mean making vhashes use an alist if they are small?
<mark_weaver>vhashes could be made nearly as fast as alists if the implementation was optimized.
<ijp>ArneBab: yes
<ArneBab>mark_weaver: would there then be a reason to avoid vhashes?
<ArneBab>for the “middle” item (599999 of 1 million) the difference is even bigger: the vhash is 1000x faster than the alist
<ArneBab>(note though that I’m purely using integers as keys here)
<ArneBab>(and consecutive ones - I don’t know whether that makes a difference)
<ArneBab>mark_weaver: I ask that, because I’m thinking about what would go into a tutorial. Should a tutorial show alists or just go straight to vhashes?
<mark_weaver>well, alists still have two advantages: they are very simple, and since they are just normal lists, you can use all the list library procedures on them.
<mark_weaver>also, consing onto the front of a vlist or vhash actually does mutation behind the scenes, which has implications for multi-threaded programs.
<mark_weaver>i.e. vlists and vhashes are not thread safe.
<ArneBab>ah, ok. That’s a pretty good reason…
<mark_weaver>if you cons onto the front of a vlist or vhash, you must protect the vlist/vhash you are consing onto with a mutex or similar, whereas consing onto the front of a normal list or alist does not mutate the list you're consing onto, so no need for thread synchronization.
<mark_weaver>many a wizard has recommended keeping things as simple as you can possibly make them, and I have to agree.
<ijp>alists are way overused
<mark_weaver>ijp: what do you recommend?
<ijp>not because of any inherent advantage, but because of the simple reader syntax
<ijp>mark_weaver: if you don't need functional update, we have hashtables
<mark_weaver>I thought you favored purely functional programming, no?
<mark_weaver>though I agree that if you need speed above all else, it's hard to beat hash tables.