IRC channel logs

2020-07-31.log

back to list of logs

<youpi>identify toto
***Server sets mode: +nt
<junlingm>how do I check if a port is dead?
<junlingm>oh, port->ip_references == 1
<youpi>junlingm: that's not really dead, it's almost dead
<youpi>i.e. just one reference left
***foggy68 is now known as foggy67
<gnu_srs2>youpi: Do you have time for some questions?
<youpi>gnu_srs2: now, yes
<foggy67>hello youpi
<foggy67>I tried the Debian GNU Hurf release dated 01/01/2020.
<foggy67>Hurd
<foggy67>It starts and and run well with kvm on Ubuntu 16.04
<foggy67>But there is a problem with the upgrade
<foggy67>sudo apt-get upgrade gives an error
<foggy67>A perl script creates a Debconf error
<foggy67>so the the upgrade cann't go to the end
<youpi>"a Debconf error", means: it's printed above
<youpi>so don't paste only the last line of the output
<youpi>paste the whole thing on some pastebin
<gnu_srs2>youpi: Adding mach_prints to proc/wait.c reveals that S_proc_wait is not called for kill -CONT <pid>, in the example in man wait.
<gnu_srs2>Only S_proc_mark_cont is called, which seen OK. (I've added #define WCONTINUED 8 to bits/waitflags.h)
<foggy67>"installed debconf package post-installation script subprocess returned error exit status 127"
<youpi>foggy67: yet further up
<youpi>gnu_srs2: well, I'm not surprised: kill -CONT <pid> doesn't call wait(), it calls kill()
<gnu_srs2>kill -STOP and KILL -TERM calls waitpid()
<foggy67>"usr/bin/perl: relocation error : /usr/bin/perl: symbol __errno_location version GLIBC_2.2.6 not defined in file libthread.so.0.3 with link time reference"
<foggy67>"dpkg: error processing package debconf (--configure) : installed debconf package post-installation script subprocess returned error exit status 127"
<youpi>gnu_srs2: you mean bash calls waitpid() just after calling kill() ?
<youpi>them add mach_prints there
<youpi>because the eventual __wait4() implementation does call proc_wait
<youpi>so there might be something odd along the way
<youpi>so mach_print() the way
<youpi>foggy67: ah, that's way more precise indeed
<youpi>and unfortunate
<gnu_srs2>waitpid() calls wait4() which calls proc_wait(), yes
<gnu_srs2>See the example in man(2) wait.
<youpi>foggy67: basically, first upgrade libc0.3 before upgrading the rest
<youpi>gnu_srs2: that's just an example
<youpi>is bash really calling waitpid after kill ?
<youpi>or are you compiling and running that program ?
<foggy67>youpi : OK
<gnu_srs2>kill -SIGNAL sends a signal to the child prrocess in that example, man(2) wait.
<youpi>yes
<youpi>that doesn't imply it necessarily calls waitpid() after that
<jrtc27>read what youpi wrote again
<foggy67>youpi : i entered sudo apt-get install libc0.3
<foggy67>and it brings an error too
<youpi>foggy67: yes, once you have upgraded perl you're screwed
<youpi>you need to restart from the image with the old perl
<foggy67>" the following packages have unmet dependencies: libc0.3-dev breaks libgcc-9-dev but 9.2.1-21 is to be installed"
<jrtc27>hmm, is this because perl does nasty things, or was a glibc header inlining something implementation-specific that changed?
<foggy67>" ... this may be caused by the held packages"
<youpi>jrtc27: it's more horrible than this, the symbol was added to libpthread, and the current perl thus expects it to be there
<jrtc27>but how does it end up with perl referencing it?
<youpi>errno is a macro for __errno_location()
<jrtc27>oh, was it not before?
<youpi>it was
<youpi>but errno_location was in libc, not in libpthread
<youpi>and there you get the hell of symbol versions, compatibility etc.
<youpi>foggy67: you can insist, by telling it to install libgcc-9-dev on the same apt-get line
<jrtc27>okay, but, I didn't know ELF kept track of which file a symbol came from
<youpi>same for gcc-9-base that it'll tell about then
<jrtc27>youpi: or probably -f works
<jrtc27>maybe
<foggy67>youpi : yehh, your advice works ...
<gnu_srs2>youpi: that doesn't imply it necessarily calls waitpid() after that: Running that example in Linux waitpid() is called for kill -CONT <pid>.
<youpi>jrtc27: I haven't looked at the details, but it seems there is something happening like that at least
<youpi>I was thinking it'd perhaps be a question of symbol version, but it's GLIBC_2.2.6 in both case
<youpi>gnu_srs2: I can't parse the second part of your sentence
<youpi>at least in a way that makes sense
<youpi>"kill -CONT <pid>" is a bash thing
<youpi>the example in waitpid() has nothing to do with bash
<gnu_srs2>Compiling and running the example in wait(2) works as expected on Linux.
<youpi>either you use kill -CONT <pid> from bash, and then possibly bash doesn't actually call waitpid
<youpi>or you run the program, and then yes waitpid will be called
<youpi>gnu_srs2: so you mean that you run the program, and *then* call kill -CONT <pid> on the pid of the program?
<gnu_srs2>./test_waitpid&, kill -STOP <child-pid>; kill -CONT <cild-pid>; kill -TERM <child-pid>
<youpi>that's *way* clearer
<youpi>*way*way*way* clearer
<youpi>anyway, so the program is calling waitpid, isn't it?
<gnu_srs2>The call sequence is in already the man page.
<youpi>so your mach_print in proc_wait will show up when you run the program
<youpi>not when you run kill -CONT
<gnu_srs2>right :)
<youpi>so do you see that proc_wait call happening with mach_print?
<youpi>when running the program, not when running kill
<gnu_srs2>Yes, for -STOP and -TERM, not for -CONT
<gnu_srs2>Ah sorry, no. Only when sending signals to the child process.
<youpi>?
<youpi>you should be seeing it when the program calls it
<youpi>not when you run kill
<youpi>note that when you run kill, waitpid returns, and thus the program calls waitpid again
<youpi>don't confuse the call to waitpid that happens after your kill, it's unrelated actually
<jrtc27>I've been trying to infer the problem being debugged as I missed the initial problem statement wherever it was, but to be clear:
<jrtc27>is the issue simply that waitpid(..., WCONTINUED) doesn't work and instead behaves as if you didn't pass WCONTINUED?
<youpi>WCONTINUED is undefined
<gnu_srs2>Yes, S_proc_wait is called when starting the program too.
<youpi>I *guess* gnu_srs2 is trying to implement it
<jrtc27>ok, and we're trying to implement it
<youpi>like you say, gnu_srs2 is not used to tell what he is actually trying to do
<youpi>so we always have to divine it
<youpi>gnu_srs2: so that's the call
<youpi>so proc_wait *is* getting called
<youpi>what you miss is getting its wait loop triggered by the cont part
<jrtc27>why is S_proc_wait relevant?
<gnu_srs2>I wrote above: (I've added #define WCONTINUED 8 to bits/waitflags.h)
<jrtc27>you wan't to know about the other side
<jrtc27>*want
<youpi>jrtc27: that's the implementation behind waitpid
<jrtc27>the thing that sends something *to* the wait
<jrtc27>well, yes
<jrtc27>you somehow need to get the WCONTINUED out of there
<jrtc27>but you also need to know how the response is supposed to come back
<jrtc27>to know where to forward the WCONTINUED flag to
<jrtc27>ie you need to follow the full path from waitpid to whatever state is being held, and from kill to looking at that state to notifying the waitpid
<jrtc27>for a normal signal
<jrtc27>and then see where SIGCONT diverges to then fix it up appropriately with more state
<gnu_srs2>youpi: Yes, the wait loop trigger is missing for WCONTINUED.
<gnu_srs2>I also wrote: Only S_proc_mark_cont is called, which seems OK
<youpi>S_proc_wait *is* also called
<youpi>as many times as needed
<youpi>once when you start the program (to see STOP)
<youpi>and once when you kill -STOP (to see CONT)
<gnu_srs2>No it is not. I have printfs when that functions is called??
<youpi>(16:01:49) gnu_srs2: Yes, S_proc_wait is called when starting the program too.
<youpi>you wrote that
<youpi>that's the first of what I mentioned
<youpi>then you have one when you kill -STOP
<youpi>but that one is the *second* one
<youpi>the one for -CONT
<youpi>your program doesn't call waitpid() *when* you run kill, it calls it way before
<youpi>so don't expect proc_wait to be called when you run kill, it's way before
<youpi>"before" being when the previous waitpid() call returns, i.e. when you kill -STOP
<youpi>put mach_print around waitpid() inside the program, you'll see
<youpi>both before and after the waitpid call
<jrtc27>why not just normal printf? having it inline with the shell commands would make the cause/effect more obvious
<jrtc27>(or both)
<youpi>jrtc27: because he needs to see the output intermixed with the proc server output
<jrtc27>yeah, ok, so do both
<youpi>to actually understand what is happening
<jrtc27>that way you can match up the shell with the proc server output too
<youpi>gnu_srs2: what you can do is disable the hurd console
<gnu_srs2>man(2) wait: WUNTRACED also return if a child has stopped
<youpi>and run the shell commands on the mach console
<youpi>so it'll get mixed properly
<youpi>( instead of having to match things up)
<gnu_srs2>I log in via ssh
<youpi>you can at least run the commands on the mach console, since you are already reading the mach_print logs there
<jrtc27>(and youpi, found out how it works, Elf_Verneed has a vn_file that points into the string table at a name that's also DT_NEEDED to say which object has the versioned symbol
<jrtc27>(so I guess libc0.3 needs to bump the symbols file dependency for __errno_location?)
<jrtc27>(and we pretend the ones in between never happened)
<jrtc27>(though that's probably a painful transition... and I wonder why perl in particular is affected and nothing else)
<youpi>python3.8 also has the issue
<youpi>people didn't notice because people upgraded libc0.3 before packages got rebuilt against it
<jrtc27>I'd expect it to be anything that inspects errno
<youpi>yes
<jrtc27>I see
<youpi>not that painful transition, we just need to keep compatibility symbols available
<youpi>until the next release
<jrtc27>oh you can do that too if there's a way to make libc.so.0.3 provide an alias for the same symbol
<jrtc27>but probably a weak __errno_location in libc does the right thing?
<jrtc27>people who want the old libc one get it, and libc itself gets libpthread's?
<youpi>the problem is not that way
<youpi>the problem is that we need to make the new libc *break* old binaries
<youpi>err
<youpi>no that's the converse
<youpi>gree
<youpi>to make it clear: before we had the symbol only in libpthread
<jrtc27>uh I mean weak in libpthread to use libc's
<youpi>now we have it in libc too
<youpi>pb is: new perl python etc. look it up in libpthread
<youpi>so can't work with the old libc
<jrtc27>yes
<jrtc27>but presumably it doesn't exist in libpthread?
<jrtc27>any more
<youpi>it does
<youpi>it's meant to be
<youpi>I don't remember the details, but that's not the point
<jrtc27>grrr and symbol preemption then means libc's hides the existence of libpthread's?
<jrtc27>despite the explicit dependency on the libpthread one?
<youpi>I'm actually thinking that what we need is fixing the version dep in the .symbols file
<youpi>and rebuild binaries
<jrtc27>yeah that's what I suggested
<youpi>so they have the versioned dependency
<jrtc27>before you started talking about compatibility
<youpi>again the problem is not with the new libc
<youpi>problem is with new binaries and the old libc
<youpi>new binaries look in libpthread and can't find it there and complain
<youpi>we can't fix that
<youpi>it's not a problem of preemption
<jrtc27>ohhh right bumping the symbols file doesn't help
<youpi>it happens that when linking new perl etc. it's indeed the symbol from libpthread that is looked up
<youpi>it will
<youpi>I mean debian/symbols
<jrtc27>it'll mean new binaries know they need the new libc0.3 package
<youpi>so we make it clear there that binaries need to depend on the new libc
<jrtc27>but won't mean old binaries know then can't use th enew one
<youpi>they *can*
<youpi>that's my point
<youpi>the new libc has it in both libc and libpthread
<jrtc27>.. right, I'm getting confused
<jrtc27>yes
<jrtc27>all good
<jrtc27>somehow I swapped from "upgrading perl but not libc0.3 breaks" to "upgrading libc0.3 but not perl breaks"
<jrtc27>so my original suggestion, which was bumping the symbols file's minimum version for __errno_location, the same as what you just said, is the solution
<jrtc27>yes?
<youpi>probably yes
<youpi>(although I initially understood you said to change the version of the symbol itself, in the libpthread link, not the version in the debian/symbols file)
<youpi>(which is thus also a way to fix it, but significantly different :) )
<jrtc27>oh, no, definitely not what I was suggesting
<jrtc27>you _can_ do that but it doesn't really change anything
<jrtc27>it just makes the fact that the d/symbols bump was neglected a bit more obvious
<youpi>it does
<youpi>since new binaries will then need the new version
<youpi>and that'll be catched by the wildcards
<youpi>and bring the new deb version
<youpi>not neglected
<jrtc27>oh it was a wildcard thing?
<youpi>the problem is that I cheated when I made a GLIBC_2.2.6 symbol appear
<jrtc27>ok
<youpi>it should have shown up on 2.2.6 release only
<jrtc27>so, sort of what I meant by "neglected", but not quite
<youpi>it's not usually a problem to neglect a symbosl file
<youpi>the deps will just be more tight than they need
<jrtc27>I was imagining a case where a symbol moved from one library to another
<jrtc27>and someone naively just moves the line in the file
<jrtc27>without knowing they also need to update the version
<jrtc27>(debian version, that is)
<youpi>when moving a symbol you're supposed to bump the version of the symbol
<youpi>since elf apparently tracks where it comes from and insists on having it that way
<youpi>and then the debian symbols file will just notice that and work properly
<jrtc27>yes, that's what I meant by "more obvious"
<youpi>not only "more obvious" but "non-cheating way"
<jrtc27>but d/symbols already has the symbols grouped by soname
<youpi>messing with a symbol without bumping its version is cheating, you'll always get troubles with that
<jrtc27>so even if you don't bump the symbol version
<jrtc27>you'd still see a fatal diff from dpkg-gensymbols
<jrtc27>without wildcards
<youpi>no, because we have wildcards
<youpi>on symbols with versions
<youpi>because people putting versions are supposed to properly manage them
<jrtc27>yes, but you don't _have_ to
<jrtc27>(use wildcards, that is)
<youpi>yes
<youpi>but then you are safe as well
<jrtc27>but yes, when wildcards are involved, it makes a difference
<youpi>it'll just put the latest dep
<youpi>which is more tight than needed
<jrtc27>not if the old symbol disappeared
<jrtc27>from the old location
<jrtc27>then it'll give an error
<jrtc27>(assuming you listed it properly in the first place)
<youpi>a library is not supposed to make a symbol disappear :)
<jrtc27>no but that's what happened here :)
<youpi>no!
<youpi>it didn't disappear
<youpi>that's the point :)
<youpi>a new one appeared
<youpi>and people started using it
<youpi>so we need a dep
<youpi>and the wildcard unfortunately catched it
<youpi>because I cheated by claiming it was GLIBC_2.2.6 while it definitely wasn't
<jrtc27>oh, __errno_location@whatever still exists in libc?
<youpi>yes
<jrtc27>I see
<youpi>again, a library is really not supposed to make symbols disappear
<jrtc27>that's the key piece of information I hadn't realised :)
<youpi>if that happens they have to bump soname
<jrtc27>yeah, I agree it's really not supposed to
<youpi>I wrote it above, but there was still confusion at the time :)
<jrtc27>but thought this had been a short-lived "it's here, but now it's there, sorry, deal with it"
<jrtc27>all is clear now
<jrtc27>and I agree with everything you've been saying :)
<youpi>jrtc27: heh, the libc version on the image is 2.29-7, and the introduction of the symbol was in 2.29-8
<youpi>jrtc27: do you happen to know a way to figure out a "new" binary from an "old" binary, e.g with some objdump or elfutils call?
<youpi>so I can schedule binNMUs
<jrtc27>readelf -V (well, -WV because boo line wrapping) will give you the verneed section, among other things
<jrtc27>though not quite sure how you get symbol names out of that
<jrtc27>I *think* `readelf -Ws` can match it up
<youpi>but it doesn't tell me the soname it looks it up from
<youpi>objdump -T can indeed give the symbol name, but not the soname
<jrtc27> 000000: Version: 1 File: libdl.so.2 Cnt: 1
<jrtc27> 0x0010: Name: GLIBC_2.2.5 Flags: none Version: 10
<jrtc27> 0x0020: Version: 1 File: libtinfo.so.5 Cnt: 1
<jrtc27> 0x0030: Name: NCURSES_TINFO_5.0.19991023 Flags: none Version: 4
<jrtc27>...
<jrtc27>you get headers for each file
<jrtc27>so you find the symbol name in the readelf -Ws output
<jrtc27>get the number in parens at the end
<jrtc27>find the Version: entry in the -WV output
<youpi>ah
<jrtc27>and walk back to the File:
<jrtc27>e.g.
<jrtc27>waltham:ssith-aws-fpga jrtc4% readelf -Ws /bin/bash | grep stdin@GLIBC_2\\.2\\.5
<jrtc27> 2075: 000000000030f590 8 OBJECT GLOBAL DEFAULT 25 stdin@GLIBC_2.2.5 (2)
<jrtc27>and
<jrtc27> 0x0040: Version: 1 File: libc.so.6 Cnt: 8
<jrtc27>...
<jrtc27> 0x00c0: Name: GLIBC_2.2.5 Flags: none Version: 2
<jrtc27>(probably you don't walk back but rather use something like awk and keep track of the last File: you saw, but same thing)
<jrtc27>having said that, the easier approach is probably just "binNMU everything built using one of the bad glibc's"?
<jrtc27>we have buildinfo files :)
<youpi>really far from all app use libpthread
<youpi>I'd rather avoid rebuilding the world
<jrtc27>oh, hm, yes, you need to use pthreads to be affected
<jrtc27>not just libc
<jrtc27>so yeah
<jrtc27>carry on
<youpi>but sure, with these readelf bits I can grok things out
<youpi>thanks
<jrtc27>thankfully for my sanity symbol versioning is one of those things I still have to look up :)
<jrtc27>other parts of ELF less so....