IRC channel logs

2015-01-29.log

back to list of logs

<Rashack>any tips on handling files with utf-8 encoded file names? I've been using file-system-tree and file-system-fold, and they both fail.
<Rashack>so, it seems to be (stat ...) that has problems, but only when it gets handed the file from file-system-fold, file-system-tree of ftw
<Rashack>i get invalid-stat for "la???s"
<Rashack>where the filename is la's, but the quote is utf-8 encoded
<Rashack>which are the bytes e2 80 99
<Rashack>when i do (stat "la's") from the repl everything works fine
<mark_weaver>Rashack: you need to set a UTF-8 locale if you'll be working with UTF-8 encoded filenames
<mark_weaver>Rashack: usually the best thing is to run (setlocale LC_ALL "") near the beginning of your program, which sets the locale according to the usual environment variables
<Rashack>so the repl inherits this from my shell, but my script doesn't?
<mark_weaver>the REPL sets the locale automatically, but in normal programs (including scripts) you have to do it yourself
<Rashack>ok
<mark_weaver>in 2.2 (not yet released), we'll set the locale automatically for guile scripts.
<Rashack>it works better now, thanks
<mark_weaver>glad to help!
<Rashack>but i'm not sure i like it (maybe because i never really understood LC_ALL)
<mark_weaver>I agree that the 2.2 behavior of automatically setting the locale according to the environment variables is what most users expect.
<mark_weaver>or is it something else that you don't like?
<Rashack>not sure :)
<mark_weaver>(setlocale LC_ALL "") essentially does the equivalent of what most scripting languages and i18n'd programs do.
<Rashack>i guess i expected "it" to just handle filenames regardless of the filename encoding
<mark_weaver>Rashack: how would you have that work? strings in guile are sequences of unicode code points, whereas POSIX filenames are sequences of bytes. how would you do that conversion?
<Rashack>i think i don't like the fact that i don't fully understand why the locale would matter in this case
<Rashack>i don't see why there would have to be a conversion
<Rashack>some code in guile reads a file, passes it to some other code in guile, that doesn't understand it
<Rashack>seems wrong to me
<Rashack>aren't "unicode code points" bytes at some level as well? Depending on the encoding
<mark_weaver>I don't see how to turn a sequence of bytes into a sequence of unicode code points without some kind of conversion.
<mark_weaver>unicode code points are not bytes.
<Rashack>maybe not i our heads, but when they're stored in a computer they are
<mark_weaver>if you want to treat filenames as unicode strings, then you need to know what the encoding of the bytes is.
<mark_weaver>they can be converted to+from sequences of bytes if and only if you know what encoding to use. that's a conversion process.
<mark_weaver>we (and I) have done a lot of research on this topic and put a lot of thought into it. if you think we did things wrong, then do some research and make a proposal :)
<Rashack>i'm trying to understand here (perhaps against better judgement)
<mark_weaver>unfortunately, the fundamental problem here is that POSIX filenames are just sequences of bytes, and there's no standard encoding for those bytes, so no way to know what bytes > 0x7f are.
<Rashack>so guile is doing som conversion
<Rashack>i can understand the confusion when trying to display a filename, decoded using something else than it was encoded with
<Rashack>but why can't guile internally handle the filensames it being fed
<mark_weaver>the problem is that Guile strings are sequences of characters, not sequences of bytes like in C.
<mark_weaver>for most purposes this is what you want
<Rashack>as you say, if the names are just bytes there shouldn't have to be a problem, until i want to display them, unless guile has already tried to decode them
<mark_weaver>so you'd prefer for strings to be raw sequences of bytes, and do the conversion only when we display them?
<mark_weaver>or would you prefer for filenames to be bytevectors instead of strings?
<Rashack>hehe, i'm not sure (and i still don't have a clear picture of what's happening here)
<mark_weaver>I assure you, there is no magic bullet here. it's a messy problem.
<Rashack>i believe you
<Rashack>but isn't it strange that a ftw can list a file, but not do stat on it?
<mark_weaver>if we could all agree to standardize on UTF-8, and that became a de-facto standard encoding for POSIX byte strings, then the problem would be solved.
<mark_weaver>but alas, the CJK countries don't like some aspects of unicode.
<adhoc>probably becuase they aren't getting heard on the committees that put together UTF8
<adhoc>but as china teaches the vast majority of its kids in pinyin, the roman character set will become more widespread
<mark_weaver>I'm not trying to assign blame.
<adhoc>and making UTF8 the primary encoding method in apps by default will only help that
<adhoc>mark_weaver: nor am i
<adhoc>practically UTF8 will help the most people
<adhoc>options for UTF16 support for folks in non roman character set languagges
<adhoc>mark_weaver: thinking in characters rather than byte, but with the option to look at the stream in bytes as well is probably the best way to go
<mark_weaver>our standard hack for that is to use the ISO-8859-1 (latin-1) encoding, where every byte maps to a character.
<mark_weaver>if you really don't care how the bytes >= 0x80 are interpreted, and you just want to work with bytes as if they are characters, that's one way to do it.
<adhoc>the web has left that idea behind though
<mark_weaver>and if you want to do I/O with bytes, of course we support binary I/O and bytevectors.
<mark_weaver>in theory, we could allow bytevectors to be used as filenames, and make variants of the (relatively few) procedures that *return* filenames to return it as a bytevector.
<mark_weaver>but it's a bit of a mess
<mark_weaver>adhoc: can you be more clear about what you're proposing?
<adhoc>i've done a lot of work in perl where we treat everything as UTF8
<adhoc>binary files, web requests, the lot
<adhoc>we oft get windows char set encoded stuff that simply doesn't convert to other things
<adhoc>so we get junk that breaks RSS aggregators (not our code)
<mark_weaver>how can you treat a binary file as UTF-8 ?
<adhoc>so we have to convert everything to UTF8
<adhoc>we have code that pokes through the headers and tries to figure out whats going on
<adhoc>if it isn't UTF8, like it finds other code points, it tries to re-read as the other encoding
<mark_weaver>as what other encoding?
<adhoc>sometimes this fails, usually with windows char set stuff in non romain charset languages =/
<adhoc>and we got that alot
<mark_weaver>so it guesses what the encoding is?
<adhoc>er, yes. based on hints
<mark_weaver>I suppose this is a cultural difference between the Scheme and Perl communities, but we are not so fond of making guesses.
<adhoc>thats fair
<adhoc>our decisions are usually solving problems in a hurry to fix some gaping hole or daft bug
<adhoc>looking for the code points of other encodings is the right way to solve this though
<adhoc>many apps that generate files assume the windos char set
<adhoc>so don't bother to add the code point into the stream
<adhoc>and you get it alot in forms submitted to your web app
<adhoc>usualyl stuff cut'n'pasted from word
<adhoc>its part of our string taint libraries
<adhoc>anyhow, BBL, meeting =/
<mark_weaver>okay, bye!
<zacts>hey again
<zacts>adhoc: you are doing scheme coming from perl?
<zacts>that's what I'm doing too
<zacts>well Perl was my first language that I liked
<zacts>I can help with the transition if you want
<zacts>adhoc: I find I like scheme and clojure a ton lately. Although I still like Perl for what it is, and especially simple UNIXy one liners and other stuff like that. and regex
<nalaginrut>morning guilers~
<nalaginrut>I took a look at Gopher which is used to generate JS with some Go functions, better to have one in Artanis...
<nalaginrut>maybe a separated project
<nalaginrut>wait...if so, why not just generate asm.js...
<jgrant>nalaginrut: Is that named after that Starcraft charecter?
<jgrant>Also your site's documentation link doesn't work.
<jgrant>404'd
<nalaginrut>jgrant: you're luck to encounter the problem since I'm tweaking the manual link
<nalaginrut>lucky
<nalaginrut>jgrant: actually, it's named from the web framework "Sinatra"
<nalaginrut>but then I realized it's a character of Starcraft, well, I should use this one
<jgrant>nalaginrut: Ah, neat.
<jgrant>In any case, very cool work. :^)
<nalaginrut>jgrant: thanks for encouraging! ;-)
<jgrant>Might be a fun place for me to play around, if I edge a little more into webdev.
<nalaginrut>the manual is not available at moment, since someone told me use "manual" rather than "manual"
<nalaginrut>rather than "manuals"
<jgrant>Right now, I'm just buggering with Skribilo for my personal Blog/Site.
<jgrant>nalaginrut: Ah, ok.
<nalaginrut>and I'm spinning while I'm playing CVS...
*nalaginrut hate CVS ..
<jgrant>So you are condensing the documentation into one file?
<nalaginrut>jgrant: I'm sorry you have to wait a moment, since I haven't release the tarball, I'm doing the stuffs for preparing 0.0.2 release
<nalaginrut>jgrant: but you may download it from git and 'make docs'
<jgrant>nalaginrut: Not a problem, just excited to see you are still working on this. :^)
<nalaginrut>jgrant: yeah, it's just born, so I'm working on it heavily
<nalaginrut>jgrant: https://github.com/NalaGinrut/artanis
<nalaginrut>in case if you want to play it now
<jgrant>nalaginrut: Ty. I'm close to going to bed for tonight, but I'll certainly bookmark it. :^)
<nalaginrut>it's fine, good night!
<adhoc>zacts: regex's seem to be reviled by lispy people in my experience. don't know why. i don't really have the expeerience in lisp/scheme to find alternatives.
*adhoc will get there yet =)
<adhoc>zacts: yeah, learning scheme after many years of living in the wilderness ;)
<adhoc>zacts: twenty years ago i wrote a lot of assembler in emacs and some elisp =)
*nalaginrut fixed manual link finally...
*nalaginrut hate CVS again...
<zacts>adhoc: heh yeah
<zacts>I find regex can be a really useful DSL
<zacts>although they can be somewhat limited, oh let me show you a link though
<zacts>adhoc: https://github.com/ztellman/automat
<zacts>^ regex is a subset of this
<zacts>so really you can get more power with the automat kind of way of doing things
<zacts>but at the same time you lose the concise usefullness that is the DSL of regex
<saul>zacts, have you looked at srfi-115?
<mark_weaver>zacts: I second saul's suggestion to look at SRFI-115, which I plan to add to Guile at some point. also, there's 'irregex', which is quite close to SRFI-115.
<mark_weaver> http://srfi.schemers.org/srfi-115/srfi-115.html
<civodul>Hello Guilers!
<jgrant>civodul: o/
*jgrant should be sleeping. \\o/
<wingo>moin
<civodul>hey, wingo
<wingo>ahoy, civodul
<civodul>nalaginrut: congrats on Artanis!
<nalaginrut>civodul: thanks! congrats on Guix too!
<civodul>things to upvote: https://news.ycombinator.com/item?id=8965257 & https://news.ycombinator.com/item?id=8965328
<civodul>it's a Guile day! :-)
<nalaginrut>I will learn how to write Guix script for packaging it ;-)
<nalaginrut>thank you very much ;-D
<civodul>heheh
<nalaginrut>yeah, Guile day
<nalaginrut>if we're not going to hold potluck this year, maybe today is (sounds unfair to others huh?)
<civodul>the potluck has a different spirit
<zacts>oh nice mark_weaver
<nalaginrut>civodul: hah, I'm kidding, of course it's very different ;-D
<zacts>nalaginrut: oh did you get your artanis web server released?
<nalaginrut>zacts: yes, 0.0.2 is released
<nalaginrut>and manual has more contents
<zacts>oh nice, is it going to be an official gnu project?
<nalaginrut>zacts: it is now ;-)
<zacts>oh nice
<atheia>Oh fantastic! I had no idea you were going through the process too.
<atheia>Feels like there's a nice growth of Guile based GNU pkgs at the mo…
<zacts> https://github.com/scheme/scsh
<zacts>^ it seems someone is still maintaining scsh a bit
<zacts>mark_weaver: davexunit: apparently SICM is releasing a new edition next month
<zacts> http://www.amazon.com/gp/product/0262028964/ref=s9_psimh_gw_p14_d0_i4?pf_rd_m=ATVPDKIKX0DER&pf_rd_s=desktop-1&pf_rd_r=1XQE86CF012VDV7W9863&pf_rd_t=36701&pf_rd_p=1970559082&pf_rd_i=desktop
<zacts>oops sorry for the long link
<zacts>I wonder if they are going to consider updating SICP in any way
<zacts> http://mitpress.mit.edu/books/structure-and-interpretation-classical-mechanics-2
<zacts>^ a shorter better link
<mark_weaver>they updated SICP several years ago
<mark_weaver>thanks for letting me know about the updated SICM
<dsmith-w`>Thursday Greetings, Guilers
***dsmith-w` is now known as dsmith-work
<zacts>mark_weaver: yeah I know about 2nd edition of SICP, but I wonder about a 3rd sometime in the near future. perhaps, or perhaps not
***dje is now known as xdje
<cluck>yay, new guix release! all hail the guixers \\o/
<wingo>civodul: how are the last minute slides coming
<civodul>wingo: they're not really coming yet!
<civodul>i do have a rough sketch
<civodul>but it's a bit too rough
<civodul>and you?
<civodul>bah
<davexunit>:(