IRC channel logs

<davexunit>mark_weaver brought up some great points on guile-devel about addressing portability. I don't know how to do that right now, nor do I know the finer points of using the C API correctly, but I'd like to just hack together something that works on x86_64 and work on making it pretty later.

<davexunit>s/pretty/portable/

<davexunit>I guess the next step would be to add struct-f64-ref as a primcall in TreeIL and then to CPS

<davexunit>this is where I really have no idea what I'm doing ;)

<davexunit>I guess the VM will need these new instructions, too

<civodul>yeah

<civodul>well, sounds like you're getting there anyway :-)

<davexunit>yeah it's progress!

<davexunit>I suspect this will have a *huge* impact on my game engine.

<civodul>i can imagine

<davexunit>one of the procedures I spent the most amount of time in is a simple bounding box to vector collision test

<davexunit>it shouldn't allocate at all, but it does!

<davexunit>unboxed struct fields are one piece of that puzzle.

<davexunit>the other is doing inequality comparisions without boxing.

<davexunit>I'll need wingo's input on that one :)

<davexunit>when I inspect GC stats in my particle simulation thing <https://media.dthompson.us/u/davexunit/m/1500-bullets/> I see that the GC is running several times per second and reclaiming 3+MiB of memory.

<davexunit>I'm interested to see how much less memory is consumed once allocation of floats is removed.

<amz3`>héllo :)

<kristofer>morning!

<civodul>davexunit: oh you have your own MediaGoblin instance, neat!

<civodul>davexunit: you're already solved the deployment crisis it seems :-)

<davexunit>civodul: nope, I'm in crisis!

<davexunit>this is a hand-rolled debian server

<davexunit>that runs some stuff from guix

<davexunit>:)

<davexunit>not mediagoblin, though.

<davexunit>wingo: comparing two floats with < and friends seems to incur boxing. it would be mighty cool if that could be improved.

<wingo>davexunit: yep, that's a thing to improve :)

<davexunit>wingo: cool, so I've correctly identified the issue. how would you go about addressing this?

<davexunit>would it add to the "instruction explosion"?

<wingo>ACTION looks for a commit

<wingo>oh yes, it's part of the splosion :)

<wingo>a necessary part tho imo

<davexunit>okay :)

<wingo>163fcf5adb5700c8d5fe2e9bd0a57ce7c7bf1c34

<wingo>do that for f64 values

<davexunit>great :)

<davexunit>thanks!

<davexunit>and the prior commit, I guess.

<davexunit>where you add instructions to branch on u64 comparisons

<davexunit>great. this will make excellent reference material.

<davexunit>I re-read your blog post on unboxing and I now have a much clearer picture about how to achieve struct field unboxing.

<wingo>hehe, excellent :-))

<wingo>everything according to plan :)

<wingo>davexunit: btw. we don't do a great job in guile at preserving order of operands for < vs >=

<wingo>i.e. (< x y) doesn't mean (>= y x) for floating-point numbers, or something like that

<wingo>maybe it's (not (>= x y))

<wingo>anyway

<wingo>that's a thing to fix in general in guile

<wingo>which for u64 comparisons of course we can get away with just < <= and =

<wingo>but for f64 comparisons you'll need the whole set of instructions, methinks.

<davexunit>ah, I see.

<wingo>so >= and > also.

<davexunit>yeah

<davexunit>that's OK for now.

<davexunit>great.

<davexunit>I'm very excited about this.

<wingo>:)

***C_Keen is now known as C-Keen

<davexunit>it feels achievable, and I've never felt that way about hacking a compiler before.

<davexunit>wingo: I'd be real curious on your thoughts for how to handle typed struct fields on platforms where scm_t_bits isn't the same size as a ulong or double.

<wingo>davexunit: good q; i don't know.

<davexunit>okay. :)

<davexunit>not knowing the answer doesn't block me from putting together a prototype that works on x86_64

<wingo>:)

<davexunit>before work I added 'struct-f64-ref' to libguile.

<wingo>i heard about that! sweet stuff :)

<wingo>just needs good compiler integration now :)

<davexunit>after work, I will learn how to modify TreeIL and CPS to do the cool stuff.

<wingo>:)

<wingo>ok, /me knocks off. ttyl

<davexunit>see ya! thanks a bunch.

<mark_weaver>wingo, davexunit: right, (> x y) is not the same as (not (<= x y)) for floats, because of NaNs

<mark_weaver>however, (> x y) is the same thing as (< y x)

<mark_weaver>so, you need =, < and <=

<mark_weaver>but you shouldn't need > or >=

<davexunit>mark_weaver: thank you!

<davexunit>I'm excited to hack on this.

<kristofer>with geiser, I run C-c C-b to load the buffer into the repl, current-filename returns #f - what's the ideal way to create load-path relative to the working directory?

<kristofer>nevermind

<davexunit>okay, seems that I'm missing something when it comes to adding a new primitive. I found a list called *interesting-primitive-names* that I thought I should add to, so I did. it seemed too good to be true, and it was.

<davexunit>I may need to change something a layer higher than treeil

<wingo>meep meep

<amz3>héllo :)

<amz3>kristofer: how did you solve your issue with emacs?

<kristofer>C-c C-e C-l

<davexunit>wingo: to add a new primcall to CPS, must I first add it to TreeIL?

<wingo>davexunit: no

<wingo>you might have to add it to tree-il for other reasons tho

<davexunit>okay.

<wingo>e.g. bytevector-ieee-single-native-ref is a tree-il primcall

<wingo>it ends up reifying as bv-f32-ref + f64->scm

<davexunit>trying to get struct-f64-ref recognized as a primcall.

<wingo>probably you need to add it as a primcall to get that process started

<wingo>in tree-il

<wingo>resolve-primcalls turns toplevel or module references into primcalls

<wingo>in tree-il

<davexunit>I added it to *interesting-primitive-names* in TreeIL

<wingo>that might be sufficient

<davexunit>yeah, I got an explosion when compiling something that used struct-f64-ref, so that's a good start.

<davexunit>still learning how to correctly inform CPS about this new primcall

<davexunit>I found *instruction-aliases* in (language cps primitives)

<wingo>davexunit: did you make an instruction yet?

<davexunit>in the VM?

<wingo>yes

<davexunit>not yet, I was hoping to add to CPS and then see some VM error when I tried to run a program.

<wingo>prolly won't work as you want to

<davexunit>is adding the instruction to the VM a pre-requisite somehow?

<wingo>make the instruction first, it will make the primcall compilation work better

<davexunit>okay, will do.

<wingo>will define the primcall as having a particular arity

<wingo>see prim-instructoin

<davexunit>I guess I have my workflow a bit mixed up. not sure what things can be deferred.

<wingo>*instruction

<davexunit>I was reading that, yeah.

<wingo>it wants to get (instruction-list)

<wingo>and define primcalls based on that list

<davexunit>a-ha!

<davexunit>there we go.

<wingo>the (instruction-list) comes ultimately from vm-operations.h

<davexunit>I didn't chase that symbol down yet.

<davexunit>okay.

<davexunit>that makes way more sense. thanks.

<davexunit>I didn't realize the VM C code was linked to Scheme code.

<wingo>hoo, there are some, um, macro shenanigans

<davexunit>;)

<wingo>see instructions.c some day when you don't need to do anything else :)

<davexunit>hahaha I'll do that on a rainy day

<davexunit>wingo: did you see mark's suggestion for addressing portability issues on the list?

<davexunit>worth a read. interested in your thoughts.

<wingo>ACTION reads

<wingo>yeah that was my initial thought as well. i agree with your hack-it-up, portability-later approach fwiw

<wingo>you will get good results and be motivated to finish the job :)

<davexunit>thanks :)

<wingo>that said i dunno about the end-game. in some ways a more properly ffi-oriented approach can work better

<wingo>luajit has the ability to take any c struct definition and make a type out of it

<wingo>and properly optimize access to fields in that data

<wingo>and also to create arrays of unboxed data, in a generative way

<davexunit>wingo: sounds like a complete rewrite of structs?

<wingo>e.g. local foo_t = ffi.typeof('struct { uint8_t x; uint64_t y; }')

<davexunit>oh wow it has a C parser built-in?

<wingo>local array_of_foo_t = ffi.typeof('$[]', foo_t)

<wingo>yeah :)

<wingo>local array_of_foo = ffi.new(array_of_foo_t, 42)

<wingo>that gives you an array of 42 elements

<davexunit>that's pretty great.

<wingo>it's great.

<wingo>only thing is, the gc story is squirrely.

<wingo>but maybe that's ok for us too.

<wingo>i don't know.

<wingo>ffi.new creates a gc'able object, but one which is not traced

<wingo>ffi.cast follows a complicated heuristic about what can cast to what, but doesn't protect gc

<davexunit>do you think mark's suggestion would be "good enough" for now? I don't think it exposes anything publicly that we'd be locked into.

<wingo>e.g. local void_pointer = ffi.cast('void*', array_of_foo)

<wingo>now you have a void* pointer to array_of_foo

<wingo>but one which doesn't protect the storage that backs the void*

<davexunit>I'd like to have unboxed u64, s64, and f64 struct fields in the short term without making it harder to something better like you are suggesting later on.

<davexunit>to do*

<davexunit>but maybe they are necessarily in conflict.

<wingo>yeah i don't know.

<davexunit>but it sounds like the struct interface needs to be overhauled entirely to do what you'd like it to do.

<wingo>soooo another possibility is to add the ability to type bytevectors.

<wingo>right now bytevectors have display types

<wingo>e.g. f32vector, u8vector, etc

<wingo>these types are used for printing

<wingo>and stored in the first word

<wingo>after the tc7 for bytevectors.

<wingo>i think only 5 bits are used for this type, plus the tc7 is 12 bits

<wingo>leaving 20 or 42 bits for other types

<wingo>like struct types

<davexunit>pardon my ignorance, but what is tc7?

<wingo>it's part of how guile knows what the type of an object is

<davexunit>okay

<wingo>if you have a pointer to memory in heap

<davexunit>got it

<davexunit>thanks

<wingo>that data will have something in the first word describing what it is

<wingo>a tc7 is a typecode that takes 7 bits

<wingo>there are typecodes with more and fewer bits

<wingo>the leftover bits are up to the type to use

<davexunit>so for this idea, we'd avoid unboxing struct types entirely?

<davexunit>er, struct fields.

<wingo>yes

<wingo>and you could more easily express things like arrays of records

<wingo>without boxing

<wingo>this might be frustrating to you, i can stop :)

<davexunit>I guess I'm just wondering if it's even worth it to continue with what I was going to do.

<davexunit>if it's not, I'll just move on to the unboxing for float comparisons

<wingo>yeah i get what you're saying and i don't know

<wingo>one part of it is, how to i speed up my program that has a lot of data types with f64 fields

<wingo>another is, how do i efficiently deal with data whose unboxed layout i know

<wingo>and then, how do i create data abstractions while controlling the layout in memory, possibly including arrays or structs or otherwise compound abstractions of that data

<davexunit>maybe this is totally unrelated, but I'm also interested in ways of modeling "struct of arrays" as opposed to "array of structs"

<wingo>i was writing a ray tracer the other day and had these problems. you want a struct { v3 origin, direction; } but do you create a container that's a guile struct and a bunch of heap references? weird stuff

<davexunit>yeah that's roughly what I'm experiencing.

<wingo>davexunit: it's related tho! i mean, how much wrapping &c do you have to do when extracting structs from an array of structs

<wingo>for a compiler to turn that into a struct of arrays is hard tho

<wingo>i think you have to contort your program to make that happen, dunno tho

<davexunit>yeah, maybe not worth thinking about, but thought that I would throw it out there.

<davexunit>I'm not *really* interested in it unless I can use such things while continuing to use immutable objects.

<davexunit>but if I were writing an imperative particle simulation in C, a struct of arrays approach would be one valid way of doing things.

<wingo>ACTION nod

<davexunit>supposedly makes better use of the cache

<davexunit>but hey I was able to render 1500 particles represented as immutable record types, each of which individually scripted by some monad craziness, at 60FPS on my laptop.

<wingo>:)

<wingo>what changed to get you there?

<davexunit>downside: GC runs several times a second, but the animation didn't appear to be too jumpy.

<davexunit>wingo: nothing much on my part. rewriting matrix multiplication to use unboxed arithmetic helped.

<davexunit>I tried to re-implement 2d/3d/4d vectors and rects as records that wrap bytevectors, but it seemed to be a lose overall.

<davexunit>pretty much all of the boxing that was there to begin with was still there after the rewrite.

<wingo>ack

<davexunit>I had the feeling that unboxing wasn't working when plucking elements out of an f64vector and stuffing them into a f32vector

<davexunit>does that sound like a possibility?

<wingo>sounds unlikely

<davexunit>okay

<davexunit>it must be something else

<wingo>there could have been some other use that made them boxed

<wingo>yeah

<davexunit>I expected my sprite batching code to yield much nicer bytecode after the rewrite, but things were just as boxed as before.

<wingo>without allocation sinking it's possible the other use was on the side somehow and not usually hit

<wingo>but the allocation itself was still left at its initial position

<davexunit>I inlined the getters/setters for the "fields" of this bytevector so that the bytevector ref/set calls would be in the optimized code

<davexunit>guess I'll have to poke more.

<davexunit>wingo: well, thanks for this informative chat. I'm gonna drop struct field unboxing, but I'll give float comparison ops a shot.

<wingo>i am sorry for discouraging you!

<wingo>the field is very tricky to navigate

<wingo>and i don't know the way

<davexunit>it's okay :)

<mark_weaver>here's a thought: maybe we should move towards implementing structs, arrays, and vectors all in terms of bytevectors.

<mark_weaver>I haven't yet read the recent backlog though...

IRC channel logs

2016-02-29.log