***Server sets mode: +nt
<youpi>junlingm: that's not really dead, it's almost dead <youpi>i.e. just one reference left ***foggy68 is now known as foggy67
<gnu_srs2>youpi: Do you have time for some questions? <foggy67>I tried the Debian GNU Hurf release dated 01/01/2020. <foggy67>It starts and and run well with kvm on Ubuntu 16.04 <foggy67>But there is a problem with the upgrade <foggy67>A perl script creates a Debconf error <foggy67>so the the upgrade cann't go to the end <youpi>"a Debconf error", means: it's printed above <youpi>so don't paste only the last line of the output <youpi>paste the whole thing on some pastebin <gnu_srs2>youpi: Adding mach_prints to proc/wait.c reveals that S_proc_wait is not called for kill -CONT <pid>, in the example in man wait. <gnu_srs2>Only S_proc_mark_cont is called, which seen OK. (I've added #define WCONTINUED 8 to bits/waitflags.h) <foggy67>"installed debconf package post-installation script subprocess returned error exit status 127" <youpi>gnu_srs2: well, I'm not surprised: kill -CONT <pid> doesn't call wait(), it calls kill() <gnu_srs2>kill -STOP and KILL -TERM calls waitpid() <foggy67>"usr/bin/perl: relocation error : /usr/bin/perl: symbol __errno_location version GLIBC_2.2.6 not defined in file libthread.so.0.3 with link time reference" <foggy67>"dpkg: error processing package debconf (--configure) : installed debconf package post-installation script subprocess returned error exit status 127" <youpi>gnu_srs2: you mean bash calls waitpid() just after calling kill() ? <youpi>because the eventual __wait4() implementation does call proc_wait <youpi>so there might be something odd along the way <youpi>foggy67: ah, that's way more precise indeed <gnu_srs2>waitpid() calls wait4() which calls proc_wait(), yes <youpi>foggy67: basically, first upgrade libc0.3 before upgrading the rest <youpi>gnu_srs2: that's just an example <youpi>is bash really calling waitpid after kill ? <youpi>or are you compiling and running that program ? <gnu_srs2>kill -SIGNAL sends a signal to the child prrocess in that example, man(2) wait. <youpi>that doesn't imply it necessarily calls waitpid() after that <foggy67>youpi : i entered sudo apt-get install libc0.3 <youpi>foggy67: yes, once you have upgraded perl you're screwed <youpi>you need to restart from the image with the old perl <foggy67>" the following packages have unmet dependencies: libc0.3-dev breaks libgcc-9-dev but 9.2.1-21 is to be installed" <jrtc27>hmm, is this because perl does nasty things, or was a glibc header inlining something implementation-specific that changed? <foggy67>" ... this may be caused by the held packages" <youpi>jrtc27: it's more horrible than this, the symbol was added to libpthread, and the current perl thus expects it to be there <jrtc27>but how does it end up with perl referencing it? <youpi>errno is a macro for __errno_location() <youpi>but errno_location was in libc, not in libpthread <youpi>and there you get the hell of symbol versions, compatibility etc. <youpi>foggy67: you can insist, by telling it to install libgcc-9-dev on the same apt-get line <jrtc27>okay, but, I didn't know ELF kept track of which file a symbol came from <youpi>same for gcc-9-base that it'll tell about then <gnu_srs2>youpi: that doesn't imply it necessarily calls waitpid() after that: Running that example in Linux waitpid() is called for kill -CONT <pid>. <youpi>jrtc27: I haven't looked at the details, but it seems there is something happening like that at least <youpi>I was thinking it'd perhaps be a question of symbol version, but it's GLIBC_2.2.6 in both case <youpi>gnu_srs2: I can't parse the second part of your sentence <youpi>at least in a way that makes sense <youpi>"kill -CONT <pid>" is a bash thing <youpi>the example in waitpid() has nothing to do with bash <gnu_srs2>Compiling and running the example in wait(2) works as expected on Linux. <youpi>either you use kill -CONT <pid> from bash, and then possibly bash doesn't actually call waitpid <youpi>or you run the program, and then yes waitpid will be called <youpi>gnu_srs2: so you mean that you run the program, and *then* call kill -CONT <pid> on the pid of the program? <gnu_srs2>./test_waitpid&, kill -STOP <child-pid>; kill -CONT <cild-pid>; kill -TERM <child-pid> <youpi>anyway, so the program is calling waitpid, isn't it? <gnu_srs2>The call sequence is in already the man page. <youpi>so your mach_print in proc_wait will show up when you run the program <youpi>so do you see that proc_wait call happening with mach_print? <youpi>when running the program, not when running kill <gnu_srs2>Ah sorry, no. Only when sending signals to the child process. <youpi>you should be seeing it when the program calls it <youpi>note that when you run kill, waitpid returns, and thus the program calls waitpid again <youpi>don't confuse the call to waitpid that happens after your kill, it's unrelated actually <jrtc27>I've been trying to infer the problem being debugged as I missed the initial problem statement wherever it was, but to be clear: <jrtc27>is the issue simply that waitpid(..., WCONTINUED) doesn't work and instead behaves as if you didn't pass WCONTINUED? <gnu_srs2>Yes, S_proc_wait is called when starting the program too. <youpi>I *guess* gnu_srs2 is trying to implement it <jrtc27>ok, and we're trying to implement it <youpi>like you say, gnu_srs2 is not used to tell what he is actually trying to do <youpi>so we always have to divine it <youpi>gnu_srs2: so that's the call <youpi>so proc_wait *is* getting called <youpi>what you miss is getting its wait loop triggered by the cont part <gnu_srs2>I wrote above: (I've added #define WCONTINUED 8 to bits/waitflags.h) <jrtc27>you wan't to know about the other side <youpi>jrtc27: that's the implementation behind waitpid <jrtc27>the thing that sends something *to* the wait <jrtc27>you somehow need to get the WCONTINUED out of there <jrtc27>but you also need to know how the response is supposed to come back <jrtc27>to know where to forward the WCONTINUED flag to <jrtc27>ie you need to follow the full path from waitpid to whatever state is being held, and from kill to looking at that state to notifying the waitpid <jrtc27>and then see where SIGCONT diverges to then fix it up appropriately with more state <gnu_srs2>youpi: Yes, the wait loop trigger is missing for WCONTINUED. <gnu_srs2>I also wrote: Only S_proc_mark_cont is called, which seems OK <youpi>S_proc_wait *is* also called <youpi>once when you start the program (to see STOP) <youpi>and once when you kill -STOP (to see CONT) <gnu_srs2>No it is not. I have printfs when that functions is called?? <youpi>(16:01:49) gnu_srs2: Yes, S_proc_wait is called when starting the program too. <youpi>that's the first of what I mentioned <youpi>then you have one when you kill -STOP <youpi>but that one is the *second* one <youpi>your program doesn't call waitpid() *when* you run kill, it calls it way before <youpi>so don't expect proc_wait to be called when you run kill, it's way before <youpi>"before" being when the previous waitpid() call returns, i.e. when you kill -STOP <youpi>put mach_print around waitpid() inside the program, you'll see <youpi>both before and after the waitpid call <jrtc27>why not just normal printf? having it inline with the shell commands would make the cause/effect more obvious <youpi>jrtc27: because he needs to see the output intermixed with the proc server output <youpi>to actually understand what is happening <jrtc27>that way you can match up the shell with the proc server output too <youpi>gnu_srs2: what you can do is disable the hurd console <gnu_srs2>man(2) wait: WUNTRACED also return if a child has stopped <youpi>and run the shell commands on the mach console <youpi>( instead of having to match things up) <youpi>you can at least run the commands on the mach console, since you are already reading the mach_print logs there <jrtc27>(and youpi, found out how it works, Elf_Verneed has a vn_file that points into the string table at a name that's also DT_NEEDED to say which object has the versioned symbol <jrtc27>(so I guess libc0.3 needs to bump the symbols file dependency for __errno_location?) <jrtc27>(and we pretend the ones in between never happened) <jrtc27>(though that's probably a painful transition... and I wonder why perl in particular is affected and nothing else) <youpi>python3.8 also has the issue <youpi>people didn't notice because people upgraded libc0.3 before packages got rebuilt against it <jrtc27>I'd expect it to be anything that inspects errno <youpi>not that painful transition, we just need to keep compatibility symbols available <jrtc27>oh you can do that too if there's a way to make libc.so.0.3 provide an alias for the same symbol <jrtc27>but probably a weak __errno_location in libc does the right thing? <jrtc27>people who want the old libc one get it, and libc itself gets libpthread's? <youpi>the problem is that we need to make the new libc *break* old binaries <youpi>to make it clear: before we had the symbol only in libpthread <jrtc27>uh I mean weak in libpthread to use libc's <youpi>pb is: new perl python etc. look it up in libpthread <youpi>so can't work with the old libc <jrtc27>but presumably it doesn't exist in libpthread? <youpi>I don't remember the details, but that's not the point <jrtc27>grrr and symbol preemption then means libc's hides the existence of libpthread's? <jrtc27>despite the explicit dependency on the libpthread one? <youpi>I'm actually thinking that what we need is fixing the version dep in the .symbols file <youpi>so they have the versioned dependency <jrtc27>before you started talking about compatibility <youpi>again the problem is not with the new libc <youpi>problem is with new binaries and the old libc <youpi>new binaries look in libpthread and can't find it there and complain <youpi>it's not a problem of preemption <jrtc27>ohhh right bumping the symbols file doesn't help <youpi>it happens that when linking new perl etc. it's indeed the symbol from libpthread that is looked up <jrtc27>it'll mean new binaries know they need the new libc0.3 package <youpi>so we make it clear there that binaries need to depend on the new libc <jrtc27>but won't mean old binaries know then can't use th enew one <youpi>the new libc has it in both libc and libpthread <jrtc27>somehow I swapped from "upgrading perl but not libc0.3 breaks" to "upgrading libc0.3 but not perl breaks" <jrtc27>so my original suggestion, which was bumping the symbols file's minimum version for __errno_location, the same as what you just said, is the solution <youpi>(although I initially understood you said to change the version of the symbol itself, in the libpthread link, not the version in the debian/symbols file) <youpi>(which is thus also a way to fix it, but significantly different :) ) <jrtc27>oh, no, definitely not what I was suggesting <jrtc27>you _can_ do that but it doesn't really change anything <jrtc27>it just makes the fact that the d/symbols bump was neglected a bit more obvious <youpi>since new binaries will then need the new version <youpi>and that'll be catched by the wildcards <youpi>and bring the new deb version <youpi>the problem is that I cheated when I made a GLIBC_2.2.6 symbol appear <youpi>it should have shown up on 2.2.6 release only <jrtc27>so, sort of what I meant by "neglected", but not quite <youpi>it's not usually a problem to neglect a symbosl file <youpi>the deps will just be more tight than they need <jrtc27>I was imagining a case where a symbol moved from one library to another <jrtc27>and someone naively just moves the line in the file <jrtc27>without knowing they also need to update the version <youpi>when moving a symbol you're supposed to bump the version of the symbol <youpi>since elf apparently tracks where it comes from and insists on having it that way <youpi>and then the debian symbols file will just notice that and work properly <jrtc27>yes, that's what I meant by "more obvious" <youpi>not only "more obvious" but "non-cheating way" <jrtc27>but d/symbols already has the symbols grouped by soname <youpi>messing with a symbol without bumping its version is cheating, you'll always get troubles with that <jrtc27>so even if you don't bump the symbol version <jrtc27>you'd still see a fatal diff from dpkg-gensymbols <youpi>no, because we have wildcards <youpi>because people putting versions are supposed to properly manage them <youpi>but then you are safe as well <jrtc27>but yes, when wildcards are involved, it makes a difference <youpi>it'll just put the latest dep <youpi>which is more tight than needed <jrtc27>not if the old symbol disappeared <jrtc27>(assuming you listed it properly in the first place) <youpi>a library is not supposed to make a symbol disappear :) <jrtc27>no but that's what happened here :) <youpi>and the wildcard unfortunately catched it <youpi>because I cheated by claiming it was GLIBC_2.2.6 while it definitely wasn't <jrtc27>oh, __errno_location@whatever still exists in libc? <youpi>again, a library is really not supposed to make symbols disappear <jrtc27>that's the key piece of information I hadn't realised :) <youpi>if that happens they have to bump soname <jrtc27>yeah, I agree it's really not supposed to <youpi>I wrote it above, but there was still confusion at the time :) <jrtc27>but thought this had been a short-lived "it's here, but now it's there, sorry, deal with it" <jrtc27>and I agree with everything you've been saying :) <youpi>jrtc27: heh, the libc version on the image is 2.29-7, and the introduction of the symbol was in 2.29-8 <youpi>jrtc27: do you happen to know a way to figure out a "new" binary from an "old" binary, e.g with some objdump or elfutils call? <jrtc27>readelf -V (well, -WV because boo line wrapping) will give you the verneed section, among other things <jrtc27>though not quite sure how you get symbol names out of that <jrtc27>I *think* `readelf -Ws` can match it up <youpi>but it doesn't tell me the soname it looks it up from <youpi>objdump -T can indeed give the symbol name, but not the soname <jrtc27> 000000: Version: 1 File: libdl.so.2 Cnt: 1 <jrtc27> 0x0010: Name: GLIBC_2.2.5 Flags: none Version: 10 <jrtc27> 0x0020: Version: 1 File: libtinfo.so.5 Cnt: 1 <jrtc27> 0x0030: Name: NCURSES_TINFO_5.0.19991023 Flags: none Version: 4 <jrtc27>so you find the symbol name in the readelf -Ws output <jrtc27>get the number in parens at the end <jrtc27>find the Version: entry in the -WV output <jrtc27>waltham:ssith-aws-fpga jrtc4% readelf -Ws /bin/bash | grep stdin@GLIBC_2\\.2\\.5 <jrtc27> 2075: 000000000030f590 8 OBJECT GLOBAL DEFAULT 25 stdin@GLIBC_2.2.5 (2) <jrtc27> 0x0040: Version: 1 File: libc.so.6 Cnt: 8 <jrtc27> 0x00c0: Name: GLIBC_2.2.5 Flags: none Version: 2 <jrtc27>(probably you don't walk back but rather use something like awk and keep track of the last File: you saw, but same thing) <jrtc27>having said that, the easier approach is probably just "binNMU everything built using one of the bad glibc's"? <youpi>really far from all app use libpthread <youpi>I'd rather avoid rebuilding the world <jrtc27>oh, hm, yes, you need to use pthreads to be affected <youpi>but sure, with these readelf bits I can grok things out <jrtc27>thankfully for my sanity symbol versioning is one of those things I still have to look up :)