<shanecelis>Bummer, looks like civodul won't be able to be my mentor for Google Summer of Code. <civodul>shanecelis: yes, sorry about that :-/ <civodul>that makes me angry because it all had to happen on the week i was AFK <shanecelis>Yeah, it's just one day too late which is terrible. <shanecelis>Stallman asked me about it a year ago when I did the kickstarter. <civodul>yeah it's a project worth pursuing, i think <shanecelis>I could email him and say, "Hey, I submitted Emacsy, little help?" <shanecelis>But I just don't know what to expect from Stallman exactly. <shanecelis>Stallman asked me to GPL it last year. I was going to release it LGPL. <shanecelis>I'm ok with GPLing it, but I don't want to give up the copyright on it. <shanecelis>That's the thing I could see him asking for: YES but you have to assign copyright to GNU. <shanecelis>Does that seem like a credible fear, or am I just being paranoid? <civodul>and only GNU projects can have their copyright assigned to the FSF <mark_weaver>shanecelis: FWIW, I very much doubt that RMS would try to coerce you into assigning copyright to the FSF, and I know him quite well (better than anyone else here). <shanecelis>mark_weaver: Good to hear. It is just my paranoia. <mark_weaver>did civodul miss a google-imposed deadline for mentors? <mark_weaver>that's terrible. yeah, I have doubts that RMS could convince google to make an exception for something like this. <shanecelis>Yeah, I don't know that he could either. So I don't know how GSoC works exactly. Because civodul didn't register as a mentor, does that mean I'm SOL or just that I'm less likely to get a mentor? <mark_weaver>I really don't know. I suppose it's conceivable that some other GNU mentor might be convinced to be civodul's "proxy" mentor; with civodul doing the actual work. <mark_weaver>do you know off hand of an easy way for me to see the list of GNU project mentors registered this year? <shanecelis>No. The GSoC site is pretty opaque. I can't see the proposals or mentors or much of anything. <mark_weaver>The other problem is no GNU slots were reserved for Guile or Guix. GNU got all the slots they asked for, but one of the other projects would have to be convinced to give up one of their slots for this. That might be the harder task :-( <bipt>it's possible that some students might not take their allocated slots, if they applied successfully to multiple projects <mark_weaver>that file contains both the list of mentors and the list of projects. each project has a number of "essential slots", and a number of "desired slots". GNU got enough slots for all the "desired slots". <shanecelis>I wish I had understood this process better before the deadline. <mark_weaver>so in order to make this happen, we'd need two things: one of the projects with more "desired slots" than "essential slots" would have to be convinced to give up one of their desired slots. and we'd need one of the registered mentors to act as a proxy mentor for civodul. <shanecelis>Well, I wrote to Stallman since he had expressed an interest in Emacsy last year. I explained I intended to work with civodul but for bad luck. I asked if there was someone else he'd suggest that I might be able to work with. [shrug.] <mark_weaver>the main problem is that RMS is unlikely to be familiar enough with the situation to know who to talk to. <mark_weaver>if we found another GNU project with a "desired slot" that's clearly less important than this, then perhaps he could be convinced to write an email to the maintainer of that other project, asking them to voluntarily give it up. but I think someone else would have to figure out all the details of who to contact, etc. <mark_weaver>RMS is extremely busy and mostly focused on dealing with larger threats to software freedom. "putting out fires", as he has said. <shanecelis>Ok, let me just make sure I understand. A slot is a GSoC project, right? Accepted proposals take up a slot. The mentoring organizations have essential and desired slots which is what their minimum or maximum number of slots they can actually use? <shanecelis>mark_weaver: yeah, GSoC is definitely not worthy of Stallman's time. <mark_weaver>a slot corresponds to a project, yes. GNU received 32 slots. now it's up to GNU how to allocate those slots to project proposals. <mark_weaver>shanecelis: well, I wouldn't necessarily say that, but someone else would have to do the busy work. <mark_weaver>the essential-vs-desired slots thing might just be a concept internal to GNU. <mark_weaver>it might be worth RMS's time to write an email on this subject, but not to do the research. (he mostly asks volunteers such as me to do research on most subjects anyway) <shanecelis>Well, I'm happy to do some busy work but I'm not entirely sure what to do. <mark_weaver>well, the part that you might be able to help with is to look over the set of gsoc project proposals for GNU projects whose "desired slots" is greater than "essential" slots, and try to find candidates for projects that maybe aren't so important. <mark_weaver>I'd be willing to try to find another mentor who might be willing to act as civodul's proxy. <shanecelis>Are the proposals online? I know mine is, but I haven't seen a listing of the proposals. <mark_weaver>also, the mailing lists associated with projects that have more "desired slots" are worth looking at. <mark_weaver>shanecelis: let me know what you find, and I'll see what I can do to help. good luck! <shanecelis>mark_weaver: Thanks a ton! You're giving me some hope. :) <Nafai>man, I wish there had been GSOC back when I was a student <mark_weaver>yeah, though octave is considered a "high priority project" by GNU, and probably not a good candidate for this. though who knows, maybe one of the projects will fall through. <mark_weaver>I wonder about the wget project proposals: 1 essential, 2 desired. <shanecelis>I don't know. I found that searching, but I can't see a plain listing of all proposals anywhere. <mark_weaver>there's one other issue, which is that I think there's another project that civodul wanted to mentor that's even more important than emacsy, and that's getting GNOME packaged for Guix. so really we'd need to find two slots. <shanecelis>[nods] Yes, I see civodul listed on the Guix project. Why two? I don't understand. <mark_weaver>well, we'd want to salvage both GNOME-for-Guix and Emacsy. <mark_weaver>both of those got messed up by civodul missing the deadline. <shanecelis>Heh, maybe they're not listed to prevent exactly this kind of interference. <mark_weaver>both of those would need to be taken from other GNU project slots that are not "essential". <shanecelis>Well, from the record of the essential and desired slots. Here's what there is to choose from: <mark_weaver>now we need to find out what the essential+desired projects are for each of those, and get a sense of which ones are either considered less important, or for which no promising student has shown interest. <mark_weaver>there's a good chance that there will be some extra slots lying around. it's just a matter of finding them before the deadline. speaking of which, do we know when the deadline is? (for mentoring organizations to assign their slots to project proposals?) <shanecelis>I added up all the essential and desired: there are 20 essential and 32 desired. I was hoping maybe there would just be an unaccounted for slot. <shanecelis>May 8th to 22nd: Slot allocation trades happen amongst organizations. Mentoring organizations review and rank student proposals; where necessary, mentoring organizations may request further proposal detail from the student applicant. <mark_weaver>I think the May 8th thing happened early. We already know that we got all 32 requested slots. <mark_weaver>FWIW, although it's far from certain, I think there's a reasonable chance that we'll be able to work something out. <mark_weaver>I don't know, it seems fairly transparent to me, at least within GNU. The discussions are mostly happening on public mailing lists. It's just spread out over a bunch of different mailing lists. <shanecelis>Oh crap. "You must follow this project template in order to have your proposal considered!" <mark_weaver>I don't know enough about the process to know the consequences of that. <shanecelis>Ok, thanks a lot, mark_weaver, for all your assistance. I gotta go to bed. <lloda`>I'm getting smxl.simple.test errors on master, anybody else? <mark_weaver>it's because strings ports can only encode characters that are in the current locale, which we plan to fix in master. <mark_weaver>actually, it won't even happen if you run "./check-guile sxml.simple.test", assuming that you're using a UTF-8 locale. it only happens because one of the earlier test files sets the locale to something more restrictive. <lloda`>it's no problem really, just curious if it was on my side, after rebasing my array branch. I'm running my own tests now. <mark_weaver>lloda`: I started to look into the array code myself, as a prelude to reviewing your patches properly, and became fairly convinced that the array code needs to a more fundamental refactor. Even the relatively simple 'scm_c_array_ref_1' has to do a staggering amount of work. <mark_weaver>but admittedly, I haven't yet looked deeply enough to assess the possibilities properly. <mark_weaver>for something as basic as 'scm_c_array_ref_1' (or _2), we really ought to be micro-optimizing, to the point of minimizing the number of branch mispredictions and cache misses. right now we're not even close. <lloda`>i'd like to put an impl pointer on the vector object itself, get rid of the handle for most operations. <mark_weaver>lloda`: if you're interested, I'd invite you to think about how to redo all of this in master. <lloda`>well, I think i've got an improvement on the previous stuff. <lloda`>now array-ref is only two or three function calls instead of five or six it was before. <mark_weaver>lloda`: in your current version, how many calls-through-function-pointers are needed altogether for scm_c_array_ref_1? <mark_weaver>I'm not just talking about calls made directly from 'scm_c_array_ref_1' <lloda`>I don't remember it so (for function pointers; there's many more direct cals) <mark_weaver>I mean, from beginning to end, including all nested calls, how many indirect calls are made? <mark_weaver>haha. yes, it's easy to lose count, there are so many. <mark_weaver>at the top-level there are calls to 'scm_array_get_handle', 'scm_array_handle_pos_1', 'scm_array_handle_ref', and 'scm_array_handle_release'. <mark_weaver>hmm, well, okay, I guess you're right. I was mistaken. there are only two. <lloda`>That's for the direct calls, yes. as I remember, array_ref gets a handle. That alone (in stable-2.0) is a linear search in a type table and a call through a function pointer for array_get_handle, etc. <mark_weaver>but the filling in of that whole handle is terrible. <mark_weaver>oh right, the linear search. I forgot about that one. ugh. <lloda`>Well, before it was TWO linear searches for the array handle for array types, now it's only one. <lloda`>Putting the impl pointer in the SCM would solve that. <mark_weaver>in master at least, we should be able to get rid of this linear search by changing the data representation. <lloda`>I thought so, having the type bits index the table directly <mark_weaver>maybe not an entire pointer, but at least a few bits in the type tag, to be used as an index into a table. <mark_weaver>I wouldn't want to add another word to vectors if we can avoid it, since small vectors are common. <lloda`>The thing is, the array handle is only needed so that the 'root' types appear as arrays. <lloda`>If you know the type is an actual array, the impl pointer is the only thing you need from the handle. so... <lloda`>I see the concern for small vectors, certainly. <lloda`>using the bits to index the table should be feasible. <mark_weaver>since we got rid of generalized vectors, we really need these operations to be fast for non-arrays as well. <mark_weaver>I wonder if most of the handle could be static for non-arrays. <lloda`>absolutely, lbnd and base are always 0, inc is always one, all of these types have a length field. <mark_weaver>(and the 2-dimensional case also, but the 1D case is more important) <lloda`>I think one should be able to get the length of a (bitvector, string, bytevector, vector) without needing to know its type. <mark_weaver>oh well. well, we can still avoid using them internally in some cases. <lloda`>it's actually the only way to access arrays at all <lloda`>yes, it's good that the interface is so limited (bad for the present, but good for the future) <mark_weaver>how would you propose to make it possible to get the length of those things without knowing the type? <lloda`>well all these types have the type tag and the length field and then whatever else in the field. <lloda`>so just put the length in the same place for all of them, presto. <mark_weaver>seems reasonable, although I've occasionally harbored ideas of making small vectors more compact by putting a few bits of the length into the type tag. <mark_weaver>I wonder if compactness of small vectors is more important, or if fact access to the length is more important. <mark_weaver>I guess maybe fast access to the length is probably more important though.. for fast range checks. <lloda`>for example if one uses array operations (like the much maligned array-map!), all those range checks only need to be done once. <mark_weaver>well, these representation details should be abstracted by C macros anyway, so we have flexibility later on. <mark_weaver>sure, the array operations that are implemented in C are easy to make efficient. I'm more worried about the cases where individual references or sets are done from scheme. <mark_weaver>(obviously we should encourage users to use high-level iterators when possible) <lloda`>obviously I agree that the elementary ops should be as fast as possible. <mark_weaver>maybe it's as simple as adding a few more function pointers to each array implementation, and then fixing the implementation-scan by putting an index into the type tag. <mark_weaver>where the "few more function pointers" are things like 1D-ref, 2D-ref, 1D-set!, 2D-set! <lloda`>that might be going to far I think <mark_weaver>you are much more familiar with this code than I am. I'm riffing from only vague knowledge. <lloda`>wingo already put in specializations for 1d and 2d <lloda`>all the work is done there, because those specializations convert the indices to a linear address <lloda`>and beyond that, the rank doesn't matter any longer. <mark_weaver>well, it still involves filling in the handle structure, which seems too expensive to me. <mark_weaver>but I don't know, maybe I worry too much about that. <lloda`>we can work around that, but we still need a dispatch between SCM_I_ARRAYP (an array type) or not <lloda`>so if SCM_I_ARRAYP, great we have all the lbnd inc ubnd ndims, we don't need the handle. <lloda`>if not, great, it's a 1D-vector and we don't need the handle either! <lloda`>the handle is only needed to access the elements (handle-set! and handle-ref) <lloda`>and for the 1D-vectors, also for the length (because each of those types keeps it opaquely) <mark_weaver>accessing the elements is what I'm talking about though. optimizing references and sets of individual elements from scheme. <lloda`>that's a big problem even internally <lloda`>because the root types include things like bytevectors and strings, that can't be accessed by just derreferencing a pointer. <lloda`>so you always need to go through an impl->vset, impl->vref <lloda`>unless strings (and maybe bitvectors) became divorced from arrays <lloda`>I proposed that to civodul, more or less <mark_weaver>well, even if we didn't divorce them completely, we could handle them less efficiently than the others. <lloda`>another point is the range checks and type checks. array-ref already does range check, so the impl->vref, impl->vset shouldn't do it again. <mark_weaver>but I don't see how that would help anyway. we could perhaps handle all of the uniform integer arrays as a single case somehow, using masks and such, but the different floating point types would still have to be handled separately. <mark_weaver>lloda`: how did you reduce the number of calls-through-function-pointers in scm_c_array_ref_1 to just one? <mark_weaver>well, I guess really I should be talking about the number of mispredicted branches. that's really the relevant thing. <lloda`>so before, there was one ctfp to get the array handle. <lloda`>then, when you call array-handle-ref, that goes through a second ctfp. <mark_weaver>and now you avoid that by special casing normal vectors and actual arrays? <lloda`>and then there's another one where array-handle-ref calls the -ref/set of the actual root vector. <lloda`>what I do now is that arrays are not a possible 'array implementation'. <lloda`>the handle doesn't carry the array, just the root vector. <lloda`>and the impl that is stored in the handle is always the root vector's impl, not the array impl (that doesn't exist anymore). <lloda`>so when the impl in the handle is called, that accesses the root vector directly <lloda`>instead of going to the array-handle-ref/set stubs, which I removed. <mark_weaver>okay, but somehow you have to get the length (which depends on the implementation) to do the range check, and then access the element. that sounds like at least two ctfps. <mark_weaver>I'm interested in the cost for accessing all vector-like objects, not just actual-arrays. <lloda`>you mean with the generic interface, right? <lloda`>before there was a call to impl->get_handle. For arrays, that meant that the array handle was being filled twice. <lloda`>now there's only one call, but it's true that for root vectors there was one call before and there's one now. <lloda`>that call is necessary to get the impl pointer. <lloda`>I did more simplifications later on <mark_weaver>well, I think I need to read more of this code to be able to have an intelligent conversation about this. until then, I'm probably wasting your time to try to discuss it now. <mark_weaver>thanks for your efforts on this. I hope to review your patches properly in the next couple of weeks. <lloda`>i'm pushing the master-rebased branch now. I'll mail guile-devel. <lloda`>just want to say, what I have now is certainly not perfect, but it should be better. we can improve on top of this, I hope. <zacts>is this the channel for guix? ***b4284 is now known as b4283
***ijp` is now known as ijp
<wingo>anyone going to els in june? ***serhart_ is now known as serhart
***Gues_____ is now known as Guest82318
***Gues_____ is now known as Guest55788
***bitdiddl` is now known as bitdiddle
***Gues_____ is now known as Guest32811
<ijp>okay guilers, here's a quick quiz for you <ijp>why is this wrong? (equal? (string-upcase "straße") "STRAßE") <civodul>but we get it wrong, that's what you mean, rgight? <ijp>this bug report inspired by a very dull argument on #emacs <wingo>is string-upcase allowed to return STRASSE ? <wingo>or is the number of characters limited by the standard <ft>There is an upper-case sharp s. I have no idea what the standard says about anything, though. ;) <ijp>(string-upcase "Straße") ⇒ "STRASSE" is the example given in the r6rs <ijp>though it also points out is some locale dependence with those procedures <wingo>it could be guile just has a bug <ijp>can we punt this to libunistring? <civodul>yes, but we don't use that part of libunistring yet <wingo>(yes, i heard about the orthography change) <ijp>well, if it happens for german, it may happen for other languages <wingo>we could just delegate to unistring for our case conversion functions <civodul>ijp: i think it's the only case, or i think it's the one i18n people always cite <civodul>(string-locale-ci=? "Straße" "STRASSE") <civodul>and: (string=? "STRASSE" (string-locale-upcase "Straße" de)) <ijp>the problem is that the r6rs says string-ci=? should work that way <ijp>but since a person who can't speak german, and who brought this up only for pedantry on another irc channel pointed this out, I think we can safely ignore it :) <civodul>actually i18n.c uses u32_locale_tocase (from libunistring) <civodul>we could just do the same in strings.c <mark_weaver>I intend to fix the bug as part of my planned strings-as-utf8 work for master. <mark_weaver>(which would allow us to use libunistring more extensively) <civodul>it's even easily fixable in 2.0, actually