IRC channel logs

2021-03-02.log

back to list of logs

<rlb>leoprikler: that was just a "hack", i.e. in the current code narrow is iso8859-1 and wide is UTF-32, but one way I was playing with migrating to utf-8 was to change narrow to be ascii, and "wide" to be utf-8 (i.e. seemed like some of that transformation might be fairly easy (fsvo easy)). And then, all strings would be utf-8 format, but some (the narrow ones) could still use the optimized fixed-with code paths.
<rlb>(...and of course if we did go that route, we might well want to change some of the terminology in the code eventually, I was experimenting.)
<rlb>"was just"
<apteryx>rekado_: wow, thank you! I'll try and study your solution now that my son sleeps ;-)
<apteryx>rekado_: I like both of your examples and accompanying explanations; I was missing the concept that the filter takes the node expression as an input and how to match on children with sub-expressions, among other things. Thank you, you've made my day!
<leoprikler>rlb: IOW "narrow" strings are char* containing only ASCII values, whereas "wide" strings are char* encoded in UTF-8?
<leoprikler>(after your transformation)
***apteryx is now known as Guest98270
***apteryx_1 is now known as apteryx
<wingo>moin
***wxie1 is now known as wxie
<lloda>wingo: the new reader fails to read stuff like #2f64:0:3()
<lloda>it thinks f64:0:3 is all a type tag
<dsmith-work>UGT Greetings, Guilers
<wingo>lloda: just for reference, does it read #2i32:0:3() ?
<wingo>tx for report, btw
<wingo>there are weird things for the f32 / f64 cases
<apteryx>is there a way in sxpath to express the following: 'select the first node of type P coming after any node having attribute X' ?
<apteryx>more concretely, I want to match the first paragraph (p ...) appearing after a div with attribute class="UnitDoc" from this page: https://pkg.go.dev/github.com/dgraph-io/badger/v2; that is, the paragraph starting with "BadgerDB is an embeddable, persistent and fast key-value [...]"
<apteryx>my best idea so far is to define a procedure that I will put on the sxpath, that will take the nodeset corresponding to the children of body, find the position of the element which has the UnitDoc class attribute, then use return the node at position + 1.
<mwette>apteryx: using sxpath or the low-level routines: there is take-after
<wingo>lloda: fixed
<lloda>wingo: sry it didn't read that either
<lloda>oh great thx
<wingo>was a little typo
<lloda>really the sizes are only necessary when you have zeros in the size vector that you cannot deduce the array body
<lloda>i mean from the array body
<lloda>once i spent a couple days chasing a bug
<lloda>i had written 2,0 someone where i meant 2.0
<lloda>i looked at the file till i nearly went blind
<lloda>i think having the sizes on the default output format would be useful
<lloda>well i do that because i often have to truncate the output so the sizes are the more interesting part
*spk121 hates working on patches that modify configure.ac and Makefile.am
<spk121>wingo: as long as you're patching guile, this vector-map patch from linus should go in, I think. https://lists.gnu.org/archive/html/guile-devel/2021-02/msg00006.html
***rekado_ is now known as rekado
<rekado>apteryx: there is a way, but I don’t think it would help you.
<rekado>apteryx: that website comes with empty documentation tags; they are filled with JS
<wingo>spk121: tx for note, will try to make a guile-devel sweep before releasing
<rekado>apteryx: oh, looks like I’m wrong
<rekado>apteryx: the “<p>Package badger implements…</p>” is actually a grand-child node of <div class="UnitDoc">.
<rekado>but htmlprag misunderstands the <p>.
<rekado>when you patch htmlprag to remove <p> from the parent-constraints alist you should be able to get the paragraph
<lloda>wingo: there's anoher one in line 527
<lloda>another
<wingo>lloda: plz fix :)
<lloda>roger
<apteryx>rekado: oh, that's unfortunate, that htmlprag would do that. I'm going through the go importer patch for Guix, and trying to mash some changes from https://github.com/0x2b3bfa0/guix-go-modules/commit/5defe897065c5d3e63740932b360474132c77877 in, such as licenses and description scraping from that site; in their code they wrote "pkg.go.dev doesn't close some tags", that must have what htmlprag
<apteryx>struggles with.
<apteryx>that must be*
<lloda>wingo: pushed
<wingo>grand
<rekado>apteryx: with the patched htmlprag this gets you the first paragraph in the UnitDoc container: (begin (use-modules (srfi srfi-1) (htmlprag) (sxml xpath) (ice-9 pretty-print)) (define shtml (call-with-input-file \"/tmp/lic/v2\" html->sxml)) (pretty-print (first ((sxpath '(// (div (@ (equal? (class "UnitDoc")))) // (p 1))) shtml))))
<rekado>to enter an environment with the patched htmlprag: guix environment --pure --ad-hoc guile -e '(load "guix.scm")'
<rekado>and here’s the guix.scm: https://elephly.net/paste/1614694637.scm.html
<rekado>the xpath isn’t as pretty as it could be because we use “//” before (p 1), i.e. the first paragraph; the “//” allows us to ignore the structure of the children of the div, but it also flattens all elements, so (p 1) returns the first paragraphs of all children.
<rekado>hence the need for “first” to take the first of all first paragraphs.
<apteryx>thank you, I'll try it! I wonder why <p> would appear in this parent-constraints list; is this arbitrary, or following some W3C standard?
<rekado>apteryx: I think it’s just from the old days when <p> was often used as an alternative to <br>
<rekado>htmlprag intends to be pragmatic, so I guess it must have been a good idea when it was written.
<rekado>I remember people would often just use single <p> tags without ever closing them.
<apteryx>does that mean if <p> tags are not closed the patch would prevent this from being parsed leniently?
<chrislck>latest commit: autoconfigury... y is rather far from e...
<apteryx>it's a diagonal key away from it on dvorak
<chrislck>someone hacks guile on dvorak?
<apteryx>perhaps :-)
<spk121>chrislck: yeah, that was me. autoconfigury is a real term for the whole collective autoconf/make nonsense
<spk121>at least, if you google for it, it has been used fairly often
<chrislck>:) TIL
<rekado>(I use dvorak, but I don’t commit to the Guile repo)
<chrislck>most cases of autofigury is from 9+ years ago on google
<chrislck>spk121 is fronting a retro resurgence
<rekado>apteryx: I don’t know how the patch would break other uses of htmlprag; perhaps it could be made configurable
<spk121>chrislck: I am really old, so checks out
<rekado>apteryx: both Guix and the GWL use the patched variant to post-process their online manuals.
<apteryx>I see, in doc/build.scm.
<apteryx>Perhaps we can move the fix to the proper guile-lib package, and if nobody complains, submit the fix upstream?
<rekado>apteryx: I don’t know how best to make this change in behavior optional.
<rekado>but making the change upstream is the correct way forward
<apteryx>rekado: perhaps an easy thing would be to define and export a %default-parent-constraints alist, and a %parent-constraints parameter to allow easily overriding it
*apteryx tries
<apteryx>and then perhaps a higher level use-parent-constraints? parameter switch that could be set to #false to disable it completely.
<mdevos>I made some changes to guile's source code (exporting extra O_* flags). Can I open a guile REPL for the modified guile without installing the modified guile?
<lloda>meta/guile in the build dir? mdevos
<mdevos>lloda: thank you, that works
<lloda>or maybe | 11013 <jcowan> No, it isn't. But if you are careful you can often arrange things to
<lloda>sorry about that :-(
<lloda>maybe 'meta/uninstalled-env guile' mdevos
<lloda>i think that takes care of paths & modules as well
<daviid>lloda: would that set the necessary env vars, ./meta/build-env guile
<lloda>yeah
<lloda>thx daviid
<daviid>lloda: uninstalled will import installed .go files, as opposed to build-env, i think
<mdevos>I'm writing a patch that adds exports some extra O_* flags, any requests?
*spk121 O_O_O_O_O_THERIGHTSTUFF
<sneek>O_IWANTAPONY
<civodul>sneek: botsnack
<sneek>:)
<mdevos>Sorry, O_PONIES is not yet supported <https://lwn.net/Articles/351422/>. Otherwise I would add an export in guile.
<mdevos>Are ‘round quotes’ allowed in the changelog?
<lloda>wingo: there is something else wrong there :-\
<lloda>> #2f64:0:3()
<lloda>#2f64@0@3()
<lloda>indeed > (array-dimensions #2f64:0:3())
<lloda>= (0 (3 2))
<lloda>should be (0 3)
<lloda>the test i wrote is just a read what you wrote test and doesn't see that
<lloda>i'll have a closer look
<mdevos>I have a patch for a ‘bug‘ (well, a missing future), should I send a message to (a) the bug tracking, or (b) to guile-devel?
<mdevos>(Bug report: https://lists.gnu.org/archive/html/bug-guile/2021-01/msg00029.html)
<mdevos>*future -> feature
<rlb>leoprikler: exactly, and then there's the question of whether we want to store the character count *and* the byte count for the utf-8 strings -- for now I was leaning that way, i.e. the non-ascii strings have an extra word with the byte count to avoid strlen in some copying operations, etc.
<rlb>I got the narrow strings -> just ascii, not latin-1 working fine and passing all the tests fwiw, but that's the easier part of the work.
<rlb>Also a bunch of questions about naming in the code in any final version, i.e. would we replace narrow with ascii, and/or wide with something else, or just leave them with good documentation (the latter of course might avoid as much api/abi churn).
<rlb>anyway, I was just curious - might not pursue it much further right now, since I suspect there may be a lot of work/issues created by the switch from fixed-width to variable width, particularly perhaps given our support for shared mutable strings -- not sure if we allow people to capture internal pointers, but if we do...
<spk121>rlb: I bear a lot of the blame for the current wide/narrow char implementation. Back in the 1.9 days, the wide/narrow provided an upgrade path from 1.8, where 8-bit chars were expected. I actually prototyped it a few different ways: utf-8, utf-32, and the current wide narrow system, inspired by Python's then implementation. The reason I gave up on utf-8 back then was because of how every loop and API had to be disambiguated w
<spk121>hether the intention was bytes or codepoints. Also there was the r5rs demand that all lookups by constant time. (R7RS doesn't prescribe this now). On the plus, tho, these days string representation is internal, so should be easy to modify w/o breaking API
<rlb>ahh, right -- I was mostly "worried" about cases where modifying a shared mutable (sub)string might shift offsets, and if that might be incompatible with current apis (not sure, just wondered).
<rlb>i.e. (string-set! ...) of course might change the width of a char, and require moving all the subsequent bytes...
<rlb>(but I haven't looked at what we really support wrt mutable sharing right now)
<spk121>rlb: with current apis, should be okay if a string-set! caused a shift underneath. It is just that things like string-set! would no longer be constant time lookup. They'd be O(n) probably.
<manumanumanu>are we talking updating the string representation?
<manumanumanu>There is _LOADS_ of useful discussion in the newer string srfis and the r7rs string discussions
<spk121>Yeah. I haven't looked at any string representation stuff in a decade, so I'm obsolete. I do like Perl's utf8 that also does raw bytes. That's cool
<dsmith-work>sneek: botsnack
<sneek>:)
<davexunit>I'm not sure I'm going to explain this well but is there a way to "inline" custom data types? like how '(1 2 3) isn't eval'd every time you call the procedure that contains the code so it's only allocated once.
<davexunit>my experience says "no, there isn't" but I figured I'd ask anyway.
<wingo>o/
<rlb>spk121: yeah, I think the plan is to eventually switch to utf-8 across the board (plus or minus any ascii optimization) but I suppose there would also be the possibility for additional indexing (always, (or as an optional type?)) to restore O("1") operations as per https://srfi.schemers.org/srfi-135/srfi-135.html Though, as mentioned, I might well not pursue this much more right now.
<rlb>manumanumanu: figured that might be one of the srfis you suggested.
<manumanumanu>rlb: yeah. However, I don't think I have ever needed o(1) indexing of strings. I hated cursor based string, but thinking about it I have decided that is probably what I would chose (together with immutable strings).
<manumanumanu>the ship has sailed for immutable strings by default though
<manumanumanu>davexunit: if you are not going to mutate it, why not a closure? That is not what you are asking for, and it is probably too simple. If I understand why not, maybe we can hack something together :D
<apteryx>does someone know what tools I need on top of guile, pkg-config, autoconf and automake to build guile-lib?
<apteryx>./configure says: configure: error: cannot find required auxiliary files: missing install-sh
<apteryx>guix environment guile-lib --ad-hoc autoconf automake -- ./autogen.sh --> configure:3252: error: possibly undefined macro: AC_LIB_LINKFLAGS_FROM_LIBS. I guess I need to update the build system to use a newer autoconf, perhaps.
<apteryx>Seems 'gettext' is the one missing: /gnu/store/rqb80gdyrx2q1ff2pmmyg11j7s5bm4cd-gettext-0.20.1/share/aclocal/lib-link.m4:705:AC_DEFUN([AC_LIB_LINKFLAGS_FROM_LIBS],
<apteryx>ok, it works now!
*apteryx wonders if it's normal the guile-lib test suite uses 2 GiB of RAM
<apteryx>Running test case: test-default-port
<roelj>Why does this segfault? https://paste.debian.net/1187553/
<taw10>roelj: initialise lst to SCM_EOL, not NULL
<wingo>wow, i didn't see it
<wingo>nice eyes taw10
<manumanumanu>lets rename null so not to confuse us when reading C!
<taw10>Hehe, thanks
<roelj>taw10: Wow, thanks! I totally didn't know this :)
<dsmith-work>Basically, never NEVER let a SCM be NULL. Bad things will happen.
***V is now known as v
***v is now known as V
***Noisytoot is now known as N
***N is now known as Noisytoot
***jonsger1 is now known as jonsger