IRC channel logs

2019-11-21.log

back to list of logs

<chrislck>manumanumanu: for my previous code I had simplified https://stackoverflow.com/questions/7313563/flatten-a-list-using-only-the-forms-in-the-little-schemer/7324493
<chrislck>combine all 4 snippets into one, using srfi-1 forms as appropriate, and it becomes clear how it works
<chrislck>but I admit passing the named let as a parameter to fold was a noob genius that made the code beautifully simple
<manumanumanu>jcowan: Which is of course not strange once you think of it (at least for a language with proper TCO). Converting it to something else than regular recursive calls seems like a waste of life :) I was just surprised, since I had never seen it before.
<roelj>Does anyone have a small example on how to use 'flock'?
<zig>hello #guile
<roelj>What's an efficient way to do thread-safe file access in Guile? (I want to read to read a list from a file and cons a value to it)
<roelj>And do that without losing a value from multiple threads.
<wingo>do you want to update the file?
<roelj>yes
<wingo>what would the contents of the file look like before and after, if before had 2 items and after had 3?
<wingo>if the items were numbers, say
<roelj>((D E F) (A B C)) -> ((G H I) (D E F) (A B C))
<roelj>Oh, sorry, numbers: (2 1) -> (3 2 1)
<wingo>in that case, i would create a new file -- i see that guile's api here is not great
<wingo>see the private call-with-output-file/atomic from (system base compile) actually
<wingo>i think that's what you want
<roelj>Ah, let me see
<roelj>So, IIUC, it creates a temporary file, does its I/O and puts it in the target place.
<roelj>What happens when two actions are performed on it simultaneously? Only one will end up in the target (the last action)?
<cehteh>roelj: would storing the stuff in a sqlite be an option?
<roelj>cehteh: Currently my program does not have sqlite as dependency, so I'd prefer a solution without external dependencies :)
<roelj>Maybe a log structure would work better. Is 'write' thread-safe?
<wingo>roelj: ah :) that is a more tricky issue
<wingo>write is not safe in that sense
<wingo>i agree i would use a database of some kind
<wingo>sqlite maybe, or git, or something like that
<manumanumanu>Have a .lock file then.
<wingo>yes, file locking can work too
<zig>fwiw, sqlite is a lot ceremony because of the single-writer thing. SQLite recommends to avoid direct manipulation of files on the file system, as the file system does not do what you really except to do. (sure, a software vendor will not say that its very own software is useless).
<manumanumanu>until your users have an accident which crashes the program and they then can't access the files :D
<manumanumanu>maybe store a unique "session id" in the lock file
<manumanumanu>and warn if the lock file is there, but with an old session id
<manumanumanu>roelj: or use a dedicated writer thread that accepts input through a queue.
<manumanumanu>I'm full of ideas this morning!
<manumanumanu>I am just trying to postpone the inevitable: going back to editing XSLT and XML DTDs.
<zig>that is database work.
<cehteh>sqlite isnt the best for that, but if it would be an option then likely the easiest as it does all the synchonization in a bullet proof (but low performing) way
<roelj>Thanks for the ideas :)
<cehteh>if its not, then as manumanumanu saied, single writer threas is a good idea
<cehteh>otherwise you end up in locking any synchonization hell
<zig>cehteh: wiredtiger (or sqlite lsm extension) is a little bit lower level but much more versatile.
<cehteh>i wouldnt go into details but rolling that on your own with low levle posix/filesystem calls will be pita
<zig>+1
<roelj>Hehe, I'm beginning to see the complexity of it
<cehteh>you can do a global lock thing and single writer
<cehteh>beware of deadlocks
<cehteh>and figure out how to prevent data races :D
<roelj>Reinvent a database :D
<cehteh>basically
<cehteh>lor just log writing, append only, still with locking (that you write full records)
<cehteh>once a while reconstruct the original file
<zig>it also depends whether the data is bigger than memory or not (oom killer is still around)
<wingo>made a little thing for wasm work: https://gitlab.com/wingo/guile-wasm
<wingo>i suppose we could compile wasm files into .go files
<manumanumanu>wingo: cool. What does "decode it into a scheme representation" mean? Actual scheme code?
<manumanumanu>But I guess that would not be a small side project :D :D
<manumanumanu>Answered my own question!
<zig>neat
<zig>TIL: there is popcount function wasm
<zig>TIL: there is popcount function in wasm
<zig>:/
<manumanumanu>everybody needs it!
<zig>manumanumanu: really? when is it useful?
<zig>I think, i stumbled upon it while computing the hamming distance or something like that.
<zig>hamming distance between two bit strings.
<manumanumanu>I use it when I work with bitvectors.
<manumanumanu>but mostly, it is used in HAMTs
<manumanumanu>at least, that is where I have seen it most
<zig>oh it make sense.
<zig>hamming distance in C: __builtin_popcount(x ^ y) (ref: https://en.wikipedia.org/wiki/Hamming_distance#Algorithm_example)
<zig>it is the count of substituions required to make x == y
<manumanumanu>Is x a power of 2? (= 1 (logcount x))
<zig>I used it in (novel?) algorithm with simhash to compute and query similar or duplicate document (ref: https://hyper.dev/blog/fuzzbuzz-hash-algorithm.html)
<wingo>i usually use (zero? (logand n (- n 1))
<wingo>though that is true for 0 also
<zig>manumanumanu: no it is the simhash of something where simhash is a locality-senstive-hash.
<zig>I only have an implementation in another programming language.
<manumanumanu>wingo: the satisfaction of doing something with bitfiddling is enormous, isn't it? :D
<wingo>:)
<manumanumanu>When I finally understood that constant time hex encode I was euphoric
<wingo>i bitfiddle all the time & find it a bit gnarly, personally :P
<manumanumanu>For me, a lowly classical musician, it is still novel. Every time i think of solving anything using it, it is like an endeavour! "Will it work? Will it be fast? Will I survive?"
<jcowan> https://vaibhavsagar.com/blog/2019/09/08/popcount/ is pretty comprehensive
<manumanumanu>A friend of mine who is bitmucking professionally (embedded C stuff mostly) has popcount as a litmus test for programming languages. If it has popcount, it can probably be used for serious work. If it hasn't he can't be bothered.
<manumanumanu>wingo: that logand thing is much prettier. I expect popcount to use at least some cycles...
<wingo>if you like this sort of thing, you might enjoy the book "hacker's delight"
<manumanumanu>I will check it out, thanks
<jcowan>To each their own, as the devil said when he painted his tail sky-blue.
<jcowan>It's especially interesting that gcc and clang will *detect* an implementation of popcount they are compiling and replace it with the one instruction
<roptat>I think I already asked, but I can't remember how to put a string to lower case?
<manumanumanu>string-downcase
<manumanumanu>?
<manumanumanu>or string-downcase! if you want to maybe-mutate it.
<zig>Thanks!
<manumanumanu>jcowan: it is surprising how many compiler optimizations are just regular pattern matching (x % 2) == 0 being the most famous one.
<manumanumanu>It has tricked millions of programmers into believing there is nothing slow with modulo...
<lloda>roptat: you can type ,a case at the REPL to find this sort of thing
<roptat>thanks, for some reason I couldn't find it in the manual, I must not have searched properly
<lloda>whish there was an override of sorts for format
<lloda>like (format p "BEGIN~pEND" (lambda (p) (write-things-to-p)))
<lloda>unnecessary I guess
<zig>there is call-with-output-string https://www.gnu.org/software/guile/manual/html_node/String-Ports.html#index-call_002dwith_002doutput_002dstring
<lloda>that'd work
*zig got greenlight for the one and only test directly related to full-text search in guile-babelia.
<stis>hey guilers!
<str1ngs>hello stis
<str1ngs>wingo: does guile-wasm mean guile scheme can be compiles to wasm?
<str1ngs>compiled*
<wingo>nope, just a toolkit for working with wasm file
<wingo>s
<str1ngs>ahh okay that's what I thought.
<str1ngs>I wonder if though a C to wasm, could compile guile to wasm. that would be intresting
<zig>I tried emscripten with another scheme, I gave up, IF I work on something related to that I will target wasm directly instead of compiling the whole compiler.
<str1ngs>zig: right, compiling direct to wasm would be ideal
<nly>hi
***jao is now known as Guest36927
<str1ngs>hello nly
<nly>hey str1ngs
<nly>:)
<nly>str1ngs what do you think of this static website? https://o-nly.github.io/resonance/ generated using wintersmith
<str1ngs>nly: looks really nice. are you aware of https://dthompson.us/projects/haunt.html ?
<nly>yup
<nly>does nomad have a website yet?
<str1ngs>nly: btw I have a project maybe can be combined with https://o-nly.github.io/resonance/
<str1ngs>nly: not yet about the time I start working on documentation I'll probably create a website
<str1ngs>nly: the project is https://github.com/mrosset/home
<nly>if you want i could try to setup a website using wintersmith or maybe haunt is even better? what say?
<str1ngs>nly: I think I rather have some documention first. it's definitely on my todo list though
<nly>maybe outsource some work :)
<nly>page not found https://github.com/mrosset/home
<str1ngs>nly: ahh it was private, try now
<nly>i saw it
<zig>I am confused both of you nly and str1ngs are working on a browser?
<str1ngs>zig: right it's called nomad which is an extensible web browser using guile
<nly>at this point i'd like to point you to a website for nomad zig
<nly> http://savannah.nongnu.org/projects/nomad
<nly>but it's not made yet
<nly>by website i mean just someplace with all the relevant links, irc, source, docs etc.
<zig>ah ok you are working on the same project?
<nly>i made the bookmarks and shroud module
<nly>yes
<zig>oh ok
<str1ngs>nly: I think ideally I'd like to create the manual with texinfo. then provide that online
<str1ngs>nly: like proper GNU projects do
<zig>what branch do you recommend to read?
<str1ngs>zig: feature-release is the stable branch
<str1ngs>zig: though there are newer branches that are moving the API towards gobject introspection.
<zig>ok
<nly>str1ngs that would best, but it's a start
<str1ngs>nly: if you wanted to work on something I'd say the manual would be a good start
<str1ngs>at least the autotools framework to generate a manual
<nly>ok, that seems reasonable
<str1ngs>nly: here is the guide line https://www.gnu.org/prep/standards/standards.html#Documentation
<str1ngs>nly: maybe we use texinfo to generate the web documention html and then haunt or something for the landing website.
<str1ngs>nly: then we can also provide the html documentation internally, why not it's a web browser after all
<zig>"The size of the cache is the single most important tuning knob for a WiredTiger application. Ideally the cache should be configured to be large enough to hold an application's working set."
<zig>:/
<zig> https://source.wiredtiger.com/3.2.0/tune_cache.html
***sneek_ is now known as sneek
<zig>that is... sad to read.
<weinholt>zig, it would be true for any cache, no?
<zig>(a post about the future of databases http://www.cs.cmu.edu/~pavlo/blog/2015/09/the-next-50-years-of-databases.html)
<zig>what is the working set? to me there is "what is requested often" that I will call hot data and "what is requested less often" that I will call cold data. The cache should keep around only the hot data. Hence have statistics about the whole database one what is queried often or not.
<zig>So the cache assuming, it is a fraction of the total memory used by the database, should not be the whole database, it should only be the data that is requested often.
<zig>but at the moment, I am loading what I downloaded from scheme websites, html only is around 3GB. But it is more that 8G in memory.
<zig>AND I just figured, that I made a typo (!) and the database is using mmap.
<zig>that typo was not catched because I forgot to fix a bug reported during the SRFI process (related to keywors / options that must be checked).
<zig>For the search engine usecase, it is very difficult to figure what is the working set.
<zig>also the cache must be big enough to keep the data of the whole transaction in memory.
<weinholt>zig, are you working to bring the search engine back up?
<zig>weinholt: yes
<weinholt>nice
<weinholt>it will be like the old days of the internet: a search engine that actually finds interesting stuff :)
<str1ngs>is there a link to this search engine?
<jcowan>zig: A working set is all the data you have had to access in the last n time units, for some suitable choice of n.
<zig>str1ngs: the code is at https://git.sr.ht/~amz3/guile-babelia (only scheme, no goops ;)
<zig>weinholt: hopefully..
<zig>jcowan: fwiw, there is n in time units that can be configured in wt. Instead one set the size of the cache, and min and max number of threads dedicated to eviction
<zig>here is the current config string: mmap=false,eviction=(threads_max=3,threads_min=2),log=(enabled=true,file_max=1024MB),create,cache_size=5120MB
<jcowan>you mean "there is no n"?
<zig>I have been trying since 20:00
<jcowan>Anyway, that's an abstract definition, not an implementation
<zig>NO n yes.
<zig>there is no n.
<jcowan>That wouldn't be a cache config parameter anyway: it's what the application knows about its own history.
<zig>I am somewhat "happy" that wt documentation acknowledge that tuning eviction is difficult.
<jcowan>so for example a LRU cache is a good approximation in many cases, but not (for example) when the data are being accessed in a loop larger than the cache size, in which case evicting the least-recently-used is *exactly* the wrong thing.
<jcowan>What we really want is a cache that caches all the data we are about to use.
<jcowan>"Prediction is very difficult, especially about the future."
<str1ngs>zig: cool thanks. oh I don't mind goops. but then I've been trying to find an alternative data structure. I guess records is the scheme way to do things?
<str1ngs>zig: I'm kinda use to C a go structs
<str1ngs>zig: this looks very interesting, thanks for the link
<zig>yw
<zig>I guess I am not knowledge able enough to rule out oop from scheme. I never used goops.
<zig>at r7rs has nothing like oop.
<zig>at least r7rs has nothing like oop.
<str1ngs>one think I like goops is the generic methods
***apteryx_ is now known as apteryx
<str1ngs>zig: make check works but I had to hack stemmer.scm for my computed stemmer
<str1ngs>I'm using your channel btw
<jcowan>R7RS-large will, if I have anything to say about it, provide generic functions but not classes.
<jcowan>(that is, you can *have* classes, they just won't be in the standard)
<str1ngs>jcowan: thanks I will look into that