IRC channel logs

2022-01-01.log

back to list of logs

***Server sets mode: +cnt
<fossy>yeah, bauen1, src directory is gone
<doras>I'm getting the following error when building bash-2.05b:
<doras> +> make mkbuiltins
<doras> make: Makefile: error 13
<doras> make: *** No rule to make target `mkbuiltins'. Stop.
<doras>But I can't figure out what's wrong. Everything looks correct as far as I can tell.
<doras>This only happens when running the bootstrap using BuildStream, whose sandbox is internally based on bubblewrap as far as I know.
<doras>Unfortunately there's no "env" to check if there is anything suspicious in the environment.
<doras>And running "make V=1 mkbuiltins" doesn't show any additional information
<doras>No "strace" either.
<doras>How do you debug such issues?
<doras>I guess I could build static versions of those and copy them to the chroot
<doras>I managed to attach to the bootstrap process with strace externally. I'll see if it helps me figure out what goes wrong.
<doras>I think this is my issue:
<doras>[pid 3241191] open("Makefile", O_RDONLY) = -1 EACCES (Permission denied)
<doras>I think these "cp" commands somehow result in the file permission changing from "-rw-r--r--" to "--w-r--r--":
<doras> https://github.com/fosslinux/live-bootstrap/blob/master/sysa/bash-2.05b/bash-2.05b.kaem#L18
<doras>Since the owner can't read the file (only write), and the owner is the one running "make", we get EACCES.
<doras>I'm guessing umask may be involved here.
<doras>kaem is missing the umask built-in command, so it's hard to check what it is...
<stikonas>hmm, at that stage cp should be coreutils cp, not cp from stage0-posix
<doras>Then this is what changed. Up until then there wasn't an issue with "cp" and permissions.
<stikonas>well, stage0-posix cp does not support copying permissions
<stikonas>so at least for executables we set 755 after that
<doras>A difference in the BuildStream bootstrap compared to my bwrap mode bootstrap is that the user isn't a fake (namespace-mapped) or real root. It's the normal user.
<doras>Maybe this makes "cp" behave differently...
<stikonas>but why would it have some strange umask, hmm...
<stikonas>so the stage0-posix cp might not respect umask at all
<stikonas>since it uses M2libc
<stikonas>oh, actually M2libc is probably unrelated
<stikonas>hmm
<stikonas>let me see if meslibc has support for umask
<stikonas>looks like no, it's stubbed out
<doras>Does this mean that it always preserves permissions?
<doras>This is suspicious:
<doras>[pid 3241120] read(8, "\n\0/dev/tty0\0\0\0\0\0umask stub\n\0\0\0\0\0"..., 4096) = 516
<doras>[pid 3241120] write(9, "\n\0/dev/tty0\0\0\0\0\0umask stub\n\0\0\0\0\0"..., 516) = 516
<stikonas>yeah, so this is a function in meslibc
<stikonas>doras: https://git.savannah.gnu.org/cgit/mes.git/tree/lib/stub/umask.c
<stikonas>so it just prints it and doesn't do anything
<doras>Is meslibc the libc library being statically linked with when "cp" is built?
<stikonas>yes, at that time we don't have anything else yet
<stikonas>if you look at https://github.com/fosslinux/live-bootstrap/blob/master/parts.rst, musl is a few steps later
<doras>Is "umask" something that the kernel manages? Or the standard library?
<stikonas>doras: that I'm not sure
<stikonas>we can check in musl
<stikonas>it must be some information that process carries
<stikonas>doras: https://git.musl-libc.org/cgit/musl/tree/src/stat/umask.c
<stikonas>so it's entirely in the kernel
<doras>I'll see if I can reproduce the permissions change by calling cp myself.
<stikonas>it might be something that buildstream sets
<stikonas>if it for some reason sets umask, then it would be passed through all child processes
<stikonas>and everything in live-bootstrap is a child process of kaem-optional-seed
<doras>Strange. I see "-rwSr--r--" on the original files inside the chroot.
<doras>What does it even mean that they have setuid?
<stikonas>that capital S is suspicious
<doras>Actually, all files seems to have it. Hmmm...
<doras>Also, I don't see why BuildStream would set such a strange umask.
<doras>Removing read permissions for the owner? It's too strange to make sense.
<doras>It seems that I can reproduce it in a non-BuildStream chroot.
<doras>I used this:
<doras>env -i PATH=/usr/bin sudo chroot /path/to/bootstrap/root cp /after/bash-2.05b/mk/main.mk /after/bash-2.05b/build/bash-2.05b/Makefile.test
<doras>I end up with the following permissions:
<doras>--w-r--r-- 1 root root 2572 Jan 0 00:00 Makefile.test
<doras>And since I'm root, the read permission doesn't matter for some reason. "cat" on the file works. This must be why it doesn't reproduce when root is used for the build.
<doras>Alright, so there issue is there anyway, but it's exposed only when non-root users are used for the bootstrap.
<doras>I still suspect this stub. Maybe it results in "cp" getting the wrong impression about the umask.
<doras>I'll try with "cp -p" to see what happens.
<doras>It seems that "cp -p" works around the issue.
<doras>It keeps the permissions of the original file:
<doras>-rwSr--r-- 1 root root 2572 Jan 0 00:00 Makefile.test
<stikonas>so it works in build-stream with -p?
<stikonas>in principle it should be harmless
<doras>Haven't checked through BuildStream since it requires running the entire bootstrap again and I'm still exploring the sysroot, but I'm pretty sure it would work.
<stikonas>so should be fine to just add it to live-bootstrap in general
<doras>But it feels like a deeper issue, probably not specific to "cp".
<stikonas>well, maybe something is wrong with umask
<doras>And would we do if "cp" is used as part of the build of the following packages?
<stikonas>but you can't easily check umask right now
<stikonas>I'm a bit confused. What do you mean?
<doras>Basically it seems that "cp" from coreutils uses this to get the umask: https://github.com/coreutils/coreutils/blob/00ea4bacf6063ccc125209d5186f8f2382c6f0d4/src/copy.c#L3292
<stikonas>well, it goes to stub
<doras>If our umask is stub, maybe it confuses it.
<stikonas>maybe...
<stikonas>in chroot mode it runs as root
<stikonas>so we don't see this
<stikonas>anyway, it should be fine to use cp -p unconditionally
<stikonas>since generally we weant to preserve permissions when copying
<doras>stikonas: I was saying that our coreutils "cp" may be broken for non-root users, regardless of the actual umask.
<doras>At least as long as we use meslibc.
<doras>Or rather that it's built against it.
<doras>So if we build the next step in the bootstrap, which internally uses "cp" too, or even the build of bash itself uses "cp" internally, we may end up with similar failures in those places too.
<stikonas>yes, that's true
<stikonas>and coreutils is not rebuilt until much later
<stikonas>(step 30)
<stikonas>although it might be possible to move it to step 23 (after musl)
<doras>I created a file with rw permissions for owner, group and other ("-rwSrw-rw-"). Then copied it using "cp" without "-p". I ended up with "--w-r--r--"
<stikonas>anyway, it looks like cp -p resolves it
<stikonas>so maybe you can test it and then do PR to fix it where it's necessary
<doras>This is how "cp" creates the file:
<doras>[pid 3247028] open("/after/bash-2.05b/build/bash-2.05b/Makefile-new.test", O_WRONLY|O_CREAT, 0100266) = 4
<doras>Notice the O_WRONLY, which means owner should have write-only permissions.
<doras>And one line after the command above it uses stat() so we can see the actual permissions being 0244 (AKA "--w-r--r--"):
<doras>[pid 3247028] fstat(4, {st_mode=S_IFREG|0244, st_size=0, ...}) = 0
<doras>fstat()*
<doras>Then it just keeps these permissions as-is and exits with success (0).
<doras>You can see the entire execution as seen my strace here:
<doras> https://paste.gnome.org/pgu2qczb4/aczafu/raw
<doras>by*
<doras>stikonas: I'd rather fix "cp".
<stikonas>doras: it might involve implementing the stub in mes libc
<stikonas>mes libc is really minimal and not everything works when you build complicated executables with i
<stikonas>with it
<doras>Hmmm
<doras>Is the final sysc "cp" built with a different libc implementation?
<doras>I'll check if I see the same behavior there too.
<doras>For sanity reasons.
<stikonas>doras: even step 30 cp is built with different libc
<stikonas>so sysc is definitely fine
<doras>Yeah, it seems fine.
<stikonas>but yes, we observed other peculiar behaviour with meslibc
<stikonas>but since we don't use it for long, we didn't focus too much on it
<stikonas>after bash musl is the next thing...
<stikonas>well, there is flex before musl but it's for some technical reasons
<doras>This is when using the sysc "cp": https://paste.gnome.org/pgkiberj0/ax43wl/raw
<stikonas>(flex 2.5.11 links against heirloom tools, so have to be built against the same libc)
<stikonas>well, that looks fine
<doras>Mmhm
<doras>This this looks rather different:
<doras>open("/usr/share/info/make.info-1.copy", O_WRONLY|O_CREAT|O_EXCL|O_LARGEFILE, 0644) = 4
<doras>The last parameter is "0644", which looks like a valid permission mask, rather than "0100266" in the previous case.
<doras>Aha, reproduced it with sysc's "cp" too.
<doras>Hmm... well, not exactly. Never mind.
<stikonas>hmm, so 0100266 must be coming from mes libc
<stikonas>well, open is called here https://git.savannah.gnu.org/cgit/mes.git/tree/lib/stdio/fopen.c
<stikonas>hmm, but here it just uses 0600...
<stikonas>hmm, probably this place https://git.savannah.gnu.org/cgit/mes.git/tree/lib/posix/open.c#n40
<stikonas>but it just passes through that mode...
<doras>This is the correct source: https://github.com/coreutils/coreutils/blob/v5.0/src/copy.c#L252
<doras>I previously looked at a newer version.
<doras>I think this is the previous call: https://github.com/coreutils/coreutils/blob/v5.0/src/copy.c#L1379
<doras>So this may be relevant: https://github.com/coreutils/coreutils/blob/v5.0/src/copy.c#L117
<stikonas>hmm, yes, that loosk relevant
<doras>According to this and the stub source code, our option->umask_kill should be all mask bits active (!0): https://github.com/coreutils/coreutils/blob/v5.0/src/cp.c#L746
<doras>I don't see any issue coreutils or in meslibc that could explain this.
<doras>I'll try to add a bunch of prints in "cp" to see if it helps figure out what's going on here.
<doras>stikonas: the issue is in the line I highlighted above: https://github.com/coreutils/coreutils/blob/v5.0/src/copy.c#L117
<doras>"mode" is ~0 in that function, which is essentially -1.
<doras>Sorry, I mean, "option->umask_kill" is.
<doras>"mode" is 100664.
<doras>Yet this logic works fine on a different compiler.
<doras>So either the compiler logic is broken in some form, or the defines are wrong.
<doras>I added more prints to I can break down that statement into smaller parts and figure out which one is handled incorrectly.
<stikonas>hmm, compiler is at that point tcc 0.9.26
<stikonas>it's an old version of tcc
<stikonas>although if necessary, we can build tcc 0.9.27
<doras>I think it's tcc-0.9.27
<doras>Or at least it is already built at that point.
<stikonas>oh yes, we build it then
<stikonas>just before coreutils
<doras>I highly suspect the compiler. I'll know in a minute.
<doras>Surprisingly, it's not the compiler
<doras>It's the define
<doras>S_ISUID is 0400 instead of 04000
*doras sighs
<doras>Now where is that one coming from?
<doras>I basically need the package that creates our sys/stat.h
<doras>linux-headers-5.10.41?
<doras>Hmmm, can't be.
<stikonas>no, we don't use linux headers at that point
<stikonas>it must be mes headers
<stikonas>let me grep
<stikonas>yes, found it
<stikonas> https://git.savannah.gnu.org/cgit/mes.git/tree/include/sys/stat.h#n110
<doras>Yep!
<stikonas>janneke: ^^
<stikonas>S_ISUID is missing a zero at the end
<stikonas>doras: maybe send a patch to https://lists.gnu.org/archive/html/bug-mes/
<stikonas>although, we'll also have to add that patch to mes-m2
<stikonas>anyway, good job debugging this!
<doras>:)
<doras>Maybe this is also the reason all of our files in bootstrap have the setuid bit flipped.
<doras>Or at least they appear to have it, I doubt they actually do.
<doras>I'll start with a patch for live-bootstrap. Or is this before we build "patch"?
<doras>Yep, it is.
<stikonas>no, patch is already built
<stikonas>although at that stage we have not yet rebuilt meslibc with patches
<stikonas>but you can move things around a bit
<doras>Is it?
<doras>Hmmm...
<doras>"patch" comes after "mes" as far as I can tell.
<doras>Unless stage0 creates "patch" too.
<doras>"mes" is basically the first thing we build in after.kaem as far as I can tell.
<doras>It's "mes" -> "tcc" -> "gzip" -> "tar" -> "sed" -> "patch"
<stikonas>doras: yes, but there is another build of mes later
<stikonas>or rather we should say mes libc
<stikonas>mes itself is no longer used after tcc
<doras>Both need the fix.
<stikonas>well, for early binaries yes
<stikonas>gzip, tar, etc...
<doras>"coreutils" is impacted by the first "mes".
<stikonas>but mes-m2 has no releases
<stikonas>it's an older fork of mes
<doras>Which is why "cp" is broken.
<stikonas>but only used with M2-Planet
<stikonas>so we can just fix it now
<stikonas>or at least once oriansj merges it
<stikonas>but ideally we also send patch to upstream mes
<doras>Sure
<stikonas>since mes-m2 might be a temporary thing
<doras>I'll create a fork and test the fix.
<stikonas>doras: it won't work if you just patch mes-m2
<stikonas>mes libc is used from mes-0.23
<stikonas>so patching mes-m2 will have absolutely no effect
<doras>I'm confused.
<doras>stikonas: where is this coming from? https://github.com/fosslinux/live-bootstrap/blob/master/sysa/mes/mes.kaem#L277
<stikonas> https://github.com/fosslinux/live-bootstrap/blob/master/sysa/tcc-0.9.26/tcc-0.9.26.kaem#L27
<stikonas>oh, it might be a bigger mess
<stikonas>headers might be from older mes-m2
<stikonas>but library itself is from newer mes
<stikonas>so right now actually
<stikonas>you only need to fix header in mes-m2
<stikonas>and I think everything will work
<stikonas>oh, but when we build next meslibc
<stikonas>meslibc itself uses its own headers from build tree
<stikonas>so not sure what happens
<stikonas>it might just work if you update mes-m2
<doras>Hmhmhm
<stikonas>but I am not sure
<doras>I'll start with that.
<stikonas>doras: some of the bugs from mes libc propatage a bit, so we had to build tcc and musl twice...
<stikonas>but it's probably not this bug you found
<bauen1>nice, i think i have a very simple and quite short uart read/write driver, thinks are actually easy if you're calculating the divisors with the correct clock frequency (and 24mhz != 20mhz)
<doras>stikonas: it changes a few checksums.
<doras>How do you update those?
<stikonas>doras: that's expected
<stikonas>well, just manually for now
<stikonas>I think fossy is working on something better
<stikonas>but I haven't seen any code
<stikonas>it is a bit slow and annoying...
<doras>The build fails when it finds a different checksum.
<doras>Do you run it all over again each time?
<stikonas[m]>Well, you can try to inject busybox sh and manually run next steps
<stikonas[m]>But it might just be simpler to let it run a few times
<stikonas[m]>And do something else in the meantime
<doras>I'll try to make the checksum failures a warning instead, and maybe write them to a file.
<stikonas[m]>Hmm, that might be possible now...
<stikonas[m]>I recently added support for if command in kaem
<stikonas[m]>Or alternatively sha256sum can be patched
<doras>I'm actually not building sysb at all
<doras>So I'm not sure if it would also differ.
<janneke>stikonas: thanks!
<stikonas[m]>Well, most credit goes to doras, not me
<janneke>oops, thanks doras!
<oriansj>nice find on S_ISUID; turns out M2libc is wrong for it as well
<oriansj>so fixing that now
<oriansj>also if umask was at the kernel level, the mescc-tools-extra cp would be impacted as well
<oriansj>and guessing by the kernel ABI, it would probably be the sys_umask command
<oriansj>So what would be making that call?
<doras>janneke: sure :)
<doras>Would you like me to submit a patch somewhere?