IRC channel logs

2024-01-28.log

back to list of logs

<Foxboron>matrix_bridge: Wolfi is not really rooted in Alpine, but it has an alpine dev. It is though a separate thing.
<Foxboron>cc lrvick
<Foxboron>But conceptually, it's the same thing. Build packages in containers and offer composability
<Foxboron>But like, "without trusting any existing linux distribution" isn't *really* correct as your are basing this off entierly with OCI containers, so something needs to build the runtimes.
<Foxboron>And as you *are* using Linux namespaces there has to be a kernel *somewhere*. And several builds embed uname into the builds. So they won't be reproducible across different linux distros
<Foxboron>They might not even be reproducible across different container runtimes
<Foxboron>Doing a new linux distro from scratch introduces more problems then they effectively solve.
<lrvick>Support for different container runtimes should be doable now that docker is finally standardizing on OCI
<Foxboron>I wouldn't be too confident on that holding true, as it would still depend on how each runtime setup the namespaces. Even if they use OCI
<lrvick>But also I have already confirmed peole getting identical hashes building from multiple distros (all using docker)
<lrvick>because at the end of the day each build ends in a scratch container, with only the files I insert, with timestamps locked etc
<Foxboron>As long as your supported set is small that will hold true. You are still going to encounter the uname embedding issue at some point
<lrvick>I can fake that if needed
<lrvick>just as in some edge cases I must use libfaketime
<lrvick>but that is a rare exception now as most distros are pushing upstreams to include reproducibility features
<Foxboron>You would end up having to insert a fake sysfs or procfs
<Foxboron>It's not as easy as libfaketime
<lrvick>I have not run into this issue yet with gcc, go, python, perl, zig, or node.
<Foxboron>They are the easy ones
<Foxboron>We've been working on all the base stuff for almost a decade now :)
<lrvick>Those are also the ones that matter to me. Everything needed to build a deterministic enclave os, or an airgapped distro
<lrvick>but if I have a package I really need that has a uname issue, I can patch it out
<Foxboron>Yes, which is why I say "as long as your support set is small"
<lrvick>stagex scope is really only meant to be compilers and toolchains so people have a baseline for producing reproducible releases of their own software
<lrvick>I am not trying to support gnome
<Foxboron>Generally the base stuff is not too hard to get reproducible. I've been playing with getting the thought of getting the Arch docker images 100% reproducible for fun with source builds.
<Foxboron>s/getting//
<stikonas>well, live-bootstrap stuff (so toolchains and many standard GNU tools) don't do any of that uname embedding nonsense
<stikonas>so you can build it on any kernel with the same results
<Foxboron>I have no clue why build systems do that, but here we are
<lrvick>Yeah. Already my project has been built from 3 distros. Container isolation and deterministic seed process, timestamp locking etc, gets the job done
<lrvick>I didn't really have an alternative. No existing distro meets my threat model.
<lrvick>And it is much easier to start over than try to get a very large distro to bend to a threat model most people don't have
<Foxboron>You end up with having to support the distor, and people underestimate that effort
<Foxboron>The multisig stuff is not well thought out. You should depend on a transparency log to push signature towards. It would make them distributed and tamper evident.
<Foxboron>See sigsum or sigstore
<lrvick>If you and I both build a container image, and both get the same digest, and we both sign that digest, how is not tamper evident?
<Foxboron>lrvick: How do you get my signature?
<Foxboron>and how do I get yours?
<lrvick>I am using the official containers-policy.json spec which supports multisig natively, with hardware signing.
<lrvick>Everyone can just PR signatures to the repo, and they can sync to any CDN we want
<Foxboron>So your repo is the SPOF?
<Foxboron>How do I validate that nobody has tampered with the signature at some point in the history?
<lrvick>Anyone is free to merge signatures from each others repos as they like as well.
<lrvick>How could you tamper with a signature? It is cryptographic output from my private key
<lrvick>with my well published public key
<Foxboron>Why would I trust your key and not someone elses?
<Foxboron>How would I know the file i'm looking at hasnt been substituted?
<lrvick>Well, each maintainer publishes their public keys, ideally their long lived web of trust anchored public keys, and you have a snapshot of at least some of those locally you store over time.
<lrvick>Because of multisig, you don't trust any one signature, but you trust that all these well known keys have all signed the same binary that they compiled from source, and that they are not all colluding
<lrvick>Same way we know to trust a release of bitcoin core, because 12 people around the world with long lived well known keys all sign it
<Foxboron>This depends on the availability of the signature. If the signature gets pulled, you don't have any way of knowing.
<Foxboron>Or if the signature gets tampered with. IE you and I have to different views of the signature
<Foxboron>This is what transparency logs solve, as you can confirm you have one canonical version of the history
<lrvick>If you can tamper with a PGP signature, we have bigger problems
<Foxboron>You don't need to tamper with the PGP signature if you just rm the file
<lrvick>Okay. If the binary goes away you can't verify it either? linux distributions solve this with lots of mirrors
<Foxboron>Correct
<Foxboron>And if we published everything on a transparency log you would have a very good overview if the mirrors actually has pulled the binary or is hiding it from you
<Foxboron>See https://github.com/kpcyrd/pacman-bintrans
<lrvick>The transparency log in my case is a signed git repo that contains all code and all expected outputs from that code
<lrvick>and that repo should be mirrored as much as possible too
<Foxboron>Simon Josefsson has this for apt
<Foxboron> https://blog.josefsson.org/2023/04/15/sigstore-protects-apt-archives-apt-verify-apt-sigstore/
<Foxboron>lrvick: a git repo is a veryvery poor transparency log.
<Foxboron>(I also wrote this for my master thesis for apt originally, but it's a bad implementation)
<Foxboron> https://tlog.linderud.dev/ <- if you are curious on people using git repos for transparency logs. But for kernel.org git pushes
<lrvick>A signed git repo with signed reviews shows all the code that went in, and who signed off on it, and who pushed reproduction signatures. Where is the fundamental flaw there?
<lrvick>Trying to use simple easy to reason about tools people already have
<Foxboron>You would reply the git history to validate the log. It's an expensive operation
<Foxboron>While binary trees solve this veryvery effectively and gives you a proof consisting of 5 checksums on a 1 million entry tree
<lrvick>Sure, I admit there are data structures that would be more effecient, but now you need everyone to sign the transparency log -and- the git repo, and getitng people to even just sign the git repo is hard enough
<Foxboron>This depends on the transparency log and how you envision people to prove that rebuilds of the container/package is done.
<Foxboron>Generally though, my point is that you *really* want the published log of signature to be tamper evident. And one single git repository is not going to be a good solution for that
<lrvick>So instead then you recommend one single transparency log hosted where?
<lrvick>A decentralized transparency log I could get behind
<Foxboron>A proper transparency log has monitors that validate and replay the logs, so they are decentralised.
<lrvick>But the sigstore process all looked very centalized to me. Non starter imo
<Foxboron>sigstore is not great because of the lack of a proper monitor ecosystem. But it is an ecosystem instead of doing something from scratch
<Foxboron>The binary transparency log from Mozilla piggybacked on existing certificate transparency logs, but people didn't like that
<Foxboron>Google is working on standardizing on simple file system logs with a standardized omniwitness to distribute and cosign log trees. I think there is something in that. But it's less mature
<Foxboron>This is something LVFS (fwupd.org) is looking at for firmware uploads
<lrvick>It is funny you are going down this path, because I am working on something similar as the end goal
<lrvick>only all the nodes replicating will be enclaves with remote attestations
<lrvick>and then there does not need to be the git repo trust
<lrvick>butI need to bootstrap with something that meets my threat model, even if crude
<lrvick>sigstore is centralized and discourages use of smartcards and web of trust anchored keys, so nonstarter for me
<lrvick>thoughI am using the "sigstore" containers spec
<Foxboron>Well, not really
<lrvick>if anyone wants to replicate those signatures to sigstore central repo too, I can't stop them
<Foxboron>People focus on the OIDC aspect of sigstore. But sigstore accepts anything. You can very well have keys on smartcards or web of trust anchored keys
<Foxboron>It doesn't really care
<lrvick>did you see my sign.sh script (just a prototype)
<Foxboron>I've been drinking, let me reread :p
<lrvick>I am actually doing pgp sigstore signatures, but just local without requiring them to be pushed anywhere
<lrvick>from there they can be mirrored anywhere, and thats fine
<Foxboron>mm, the issue is that the pgp support in sigstore is limited because of their library usage. So you can't do ed25519 as an example
<lrvick>I am creating signatures that meet my threat model, and are compatible with the spec. and my own verifier also compatible with the spec.
<lrvick>the official sigstore tooling not being compatible with their own spec is amusing then
<lrvick>The OIDC support does not interest me, but I will make signatures as close to compatible with what others are doing as I can
<lrvick>But everyone in my world on the project has long lived pgp keys on smartcards with web of trust, and it would be sily to not use that trust just as almost every linux distro does
<Foxboron>Well, WoT is dead though
<Foxboron>Which is why each distro recreate their own WoT
<lrvick>Most of my team uses WoT (with thousands of signatures in our network which includes a lot of distro maintainers) but also we use keyoxide for self certification
<Foxboron>>in our network
<Foxboron>is the key here
<lrvick>I don't care that most devs don't have trusted keys. I only care that the people that are responsible for building trust in the distro do
<Foxboron>You can't validate that. I had 200 signatures at some point and was part of the strong set. I'm pretty sure you can't verify that any more
<Foxboron>(this is probably getting a bit offtopic for the channel though)
<lrvick>I am happy to take this offline with you since you clearly care about this topic and are coming at it from a different angle. I fully expected these arguments to come, and want to refine how I engage with them.
<lrvick>the #! channel is probably a better fit, or dms
<lrvick>though most of my team is in #!
<lrvick>I am optimizing for maximum security first, then as much compatibility as I can with existing solutions second.
<lrvick>but if I can have both, great
<Foxboron>#! doesn't seem to be a channel on libera :p but that is a weird channel name
<lrvick>oh sorry we don't bridge to librera anymore, though we do own that on libera
<lrvick>#! on irc.hashbang.sh
<Foxboron>ahh
<lrvick>or #!:matrix.org
<lrvick>(anyone else that cares about these topics please join in. I assume there is some overlap)
<lrvick>bootstrapping and being able to form trust that a sufficent number of trusted parties did it for you, are tightly related imo
<Foxboron>Well, and #reproduciblebuilds on oftc :)
<Foxboron>#reproducible-builds
<lrvick>yeah I am there too
<lrvick>I whine when I have repo issues
<lrvick>though most/all are resolved now
<Foxboron>ahh, right
<lrvick>Currently trying to get reproducible digests out of docker 25, which regressed >.>
<Foxboron>Regressions at different parts of the stack is why working reprobuilds issue is not very fun :p
<Foxboron>But I'll bounce and sleep. Nice discussing with you
<lrvick>I collect puzzles. This is a very large one
<lrvick>sure. Would love to pick your brain more on this another time. hit me up whenever.
<Foxboron>Sure, no worries :)
<fossy>lrvick: i feel like i should point this out to you, even if you have already figured it out; the output of & process of live-bootstrap does use versions of packages known to include CVEs ;)
<lrvick>CVEs are fine that far deep in the stack, since I am a cross compile and 3 versions of GCC away by the time I build my actual packages
<lrvick>The only class of CVEs that would concern me in the live-bootstrap stack would be those that could somehow persist through a cross compile to a totally new clean filesystem
<lrvick>Something that could do that would have to be an actual and very intentionally designed trusting trust attack in early versions of GCC that I would -hope- someone would have spotted by now
<lrvick>But there is just as much risk of that in a modern version of GCC. More even, since it is a lot more code today
<lrvick>If anyone feels there is a hole in this reasoning though, please do poke in it.
<lrvick>I don't carry anything from live-bootstrap forward past "stage2" in my stack
<lrvick>Even my stage3 should be able to be left to bit rot until it can no longer build modern versions of GCC that I actually use to build my final packages
<fossy>that checks out to me
<fossy>that's my opinion too, i don't see any reason why CVEs should affect it. just wanted to make sure you knew though :)
<lrvick>It is a question taht will come up. I should add a faq
<fossy>been playing for the last two days with what is actually possible with live-bootstrap at the moment.
<fossy>answer is "actually, a lot if you go via lfs"
<fossy>with an additional 12 small-medium sized packages, one can run jhalfs (automated Linux from Scratch & Beyond LFS)
<fossy>and now bootstrap-prefix.sh works in that LFS environment without modification
<fossy>gonna try and make a stage3 now using catalyst
<fossy>if that works first bootstrapped gentoo stage3 ^-^
<lrvick>Nice. Given the path I already went down in stagex, I expect that should be no problem
<lrvick>I have a couple buildroot projects I am going to want to bootstrap soon too.
<lrvick>In other news: docker run -it stagex/stage0
<lrvick>and you can reproduce the container and digest from https://git.distrust.co/public/stagex
<lrvick>rest should follow soon. Stage1 is next (live-bootstrap)
<oriansj>fossy: nice
<oriansj>lrvick: yeah, trust is a never ending game of trying to improve and knowing that failure is possible and knowing the best move is to make failure be as obvious and open as possible. I am glad that you have been making progress on the problem you wanted to solve.
<muurkha>fossy: that's exciting!
<muurkha>lrvick: that's also exciting! Docker is a huge increment of convenience
<lrvick>Most production software today is built with containers. Normally "from alpine" or "from go" or "from python" (which are based on alpine) etc. Billions of dollars in value resting on the shoulders of a few people who push those non-reproducible containers that would never want that responsibility if they understood the risk.
<lrvick>This stage0s/live-bootstrap anchored set at least takes the target off any single person, as this set is a drop in replacement for the alpine defaults.
<muurkha>wow, I didn't realize it was that ambitious!
<muurkha>I thought you were just using Docker to make it more convenient to get the existing stage0 flow working
<muurkha>that's great news!
<lrvick>No, I needed a deterministic distro with no trust in any single person, and that rabbit hole took me all the way -backwards- until I hit stage0
<oriansj>and builder-hex0
<oriansj>which takes the kernel out of the trusted root too
<lrvick>I won't be able to do that step in a container, but I can hardcode the expected hash, and verify my 3 distro builds match it.
<lrvick>and then in turn if you guys are publishing the same hash from live-booted kernelless setups
<lrvick>then it all tracks
<oriansj>true, one would need a full vm or bare metal but the checksums always should be identicial regardless of the host with M2-Planet+M2libc
<lrvick>Ideally a few people that are just building seeds can sign those, and I can import those signatures as yet another trust point in my stage0 dockerfile
<muurkha>oriansj: except when there are bugs ;)
<oriansj>muurkha: fair enough, but I actively will fix those when I learn about them.
<lrvick>Maybe one of my "test builds" in my stage0 setup can be running builds in qemu. As many tests as I can reasonably do in place, why not.
<oriansj>lrvick: well the reason we broke out bootstrap-seeds into a separate git repo is to allow anyone to change out the root binaries easily but if they conform to the spec, then the resulting final binaries will always be identical
<lrvick>makes it that much easier for an auditor to say "this is overkill". goood
<lrvick>Yeah and that is awesome. Ideally I later break out stage0 into stage0 and seed containers, and allow swapping in alternative seed implmentations
<lrvick>(once they exist)
<lrvick>That's a big reasons I chose OCI as well. No single software stack
<lrvick>I am adding support for podman next, then buildah, kaniko etc
<oriansj>nice
<lrvick>so people could use different seed implementations, and different oOCI implementations, and still get the same results from any combination
<oriansj>sounds like you are heading in the direction where declarative package composition will probably end up being helpful
<lrvick>Dockerfiles are temporary. I will likely end up generating them from something templated
<lrvick>gotta get my baseline first before I get fancy though
<oriansj>fair enough
<lrvick>OCI builders can take bash scripts as input, json, all sorts of options
<lrvick>just needs to follow the spec
<lrvick>buildroot generates OCI archives with just a bash script
<oriansj>which should map relatively cleanly from a list of dependencies and basic build information.
<lrvick>yeah atm dependency management is just make. I would love something better, but it has to run with tools most people already have installed
<lrvick>python might be an option.
<oriansj>lrvick: have you played with guix or nix yet?
<lrvick>Yes, but those toolchains are very large and hard to audit or reason about so i avoided them.
<lrvick>Like I would have to compile all those tools on every endpoint, and then how do I compile those tools?
<lrvick>more chicken/egg
<oriansj>fair perspective, I found I needed to fight the default heavy approach it usually opted for
<lrvick>I did consider maybe I could use dockerfiles and make only for bootstrap, then from there I can build any tools that digest and dynamically genrate all the other OCI build inputs from some simple yaml files with variables or something.
<lrvick>but that early all I have access to is gcc and perl, and I would rather write make than either c or perl
<lrvick>though I suppose I could compile some basic go with gcc...
<lrvick>Anyway. Maybe I get away with a python script. Big refactor that results in no hash changes but reduces the total LOC will be fine
<oriansj>well from live-bootstrap there is a path to rust if one wanted (via mrustc and gcc)
<fossy>nix always has me super lost when i use it. for basic things it's incredibly nice and intuitive, but then i'll try to do something non-standard and run into a block. i'm also not a huge fan of the disparity between the community and the technical specifications; particularly, that flakes are still "experimental" but the community encourages people to use them in their main form, and it is
<fossy>regularly stated that "because they are used so widespread we shouldn't introduce breaking changes"
<fossy>"On the other hand side, a lot of necessary changes can’t be done, as flake became a quasi standard with too much adoption. We are in a situation where breaking changes are a necessity, but impossible, as consumers treat flakes production ready"
<lrvick>I am the one that tried to get nix to adopt signing, and was loudly rejected. Signing would slow down drive by contributors... they saw that as a bad thing.
<fossy>i saw this quote, that had me -so- lost, something has gone wrong process-wise for that to happen
<lrvick>I kind of gave up on nix after taht
<oriansj>fossy: I have always felt that we needed an explicit guix with lessons learned from gentoo
<lrvick>guix is much better than nix in so many ways, most nosably of which they at least single-sign things.
<lrvick>but scheme is even worse to write than dockerfiles
<oriansj>where adding something new can be as easy as just reading how you already add existing things and just copy with a few tweaks.
<lrvick>I could just write bash scripts and inject them into a universal template dockerfile to do that..
<lrvick>need to think on it. has to be something with the tools I already hae in stage3, that does not involve me trying to write c
<lrvick>I am convinced humans should no longer write c
<fossy>i've been using Void Linux for a long time because the build system is incredibly easy to understand and use, even though it is flawed, although i think I will be migrating to gentoo now that they have official binary repositories, because of a stronger governance system
<oriansj>lrvick: well python and a great many other languages are not that far from gcc
<fossy>& easier to make changes, and ebuilds are comprehensible enough, although not as good as alpine or void
<muurkha>I like writing C, sorry
<lrvick>oriansj: Yeah. I -could- add a bootstrap python to my stage3, then I could get a lot done with just the standard library. I don't want any deps people have to think about.
<lrvick>I will only make changes to the setup if it means less LOC someone has to read in total.
<muurkha>writing C is rarely a practical thing to do anymore
<lrvick>I want to optimize for auditabiltiy over developer friendlyness, but if I an have both, great
<muurkha>speaking of Python, you know what would be super useful? a self-contained Python 2
<oriansj>muurkha: well C has one social advantage over scheme: the language is largely fixed and people can read and understand each other's code. The same can't be said about scheme when reader macros are introduced.
<muurkha>because for example Debian has dropped its Python 2 packages, so on this new laptop I have to port my old Python programs to Python 3 every time I want to run one
<lrvick>Yeah... No magic. I like rust but to audit anything I end up unwrapping all macros into real code
<muurkha>Debian's PyPy source package actually includes a complete copy of CPython 2 now because they need it to bootstrap PyPy
<lrvick>you could probably build a static python2 binary...
<muurkha>yes, but I'd like to be able to *keep* building it
<muurkha>and I bet I'm not the only ne
<muurkha>*one
<lrvick>to be able to keep building it make a Dockerfile that spits out a python2 static binary
<lrvick>add it to tools in stagex ^_^
<muurkha>oriansj: C has a lot of advantages over Scheme, but I don't think I would have ever mentioned a more statically tractable syntax among them. Scheme doesn't even *have* reader macros in any of the standard versions up to R⁵RS, does it? did they get added in R⁶RS?
<muurkha>as of https://stackoverflow.com/questions/19994651/read-macros-in-scheme apparently Racket and Chicken had two incompatible reader macro systems, and SRFI-10 has anotehr one
<oriansj>don't know when exactly has been added but I know I have seen them in scheme code for years now.
<oriansj>but honestly; I don't like macros in languages higher than assembly.
<oriansj>like just add a CONSTANT keyword and write proper functions if you need to do something;
<muurkha> http://scheme-reports.org/mail/scheme-reports/msg00826.html strongly implies that read macros didn't exist in R⁶RS, and they surely aren't in R⁷RS-small, and R⁷RS-large hasn't been finished and probably never will be
<muurkha>so I think your implication that Scheme has read macros is mistaken; Scheme does not and never will have read macros
<muurkha>(i.e. macros that run in the reader)
<fossy>lrvick: when you say auditability, what is that the auditability of? the entire source code of the whole system?
<oriansj>yet wisp is a thing that runs on schemes
<oriansj> https://www.draketo.de/software/wisp
<muurkha>it appears that wisp runs on Guile and (somewhat) Racket, not Schemes in general
<oriansj>completely fair point
<matrix_bridge><Lance R. Vick> fossy: Yeah. Like right now my entire repo for the whole distro, is under 5k lines
<matrix_bridge><Lance R. Vick> so it is easy for someone to verify I didn't slip anything nasty in there
<matrix_bridge><Lance R. Vick> the upstream author may have, but it is easy for someone to assert I did not in a short time
<matrix_bridge><Lance R. Vick> As soon as I introduce a fancy tool that costs thousands of lines to make the developer experience better, the auditor now has to review those extra thousands of lines, and any changes to them, forever
<muurkha>yeah, seems reasonable
<matrix_bridge><Lance R. Vick> anything for which multiple implementations exist of that get the same results, like docker|kaniko|podman|buildah
<matrix_bridge><Lance R. Vick> then I don't have to audit those individually, so long as they all agree
<matrix_bridge><Lance R. Vick> I trust that they don't trust each other
<oriansj>well what if one leverages a very standard library (like sqlite) and you shove most of the complexity in database tables. Then they need only see: build tool@version and check a few hundred lines
<matrix_bridge><Lance R. Vick> bringing in libraries is something that must be constantly audited, but if I can add a few functions that make the repo more DRY and reduce the total LOC that need to be in it, great
<muurkha>the advantages I'd cite for C over Scheme are that its object types and object lifetimes are statically known, which can help with comprehension; that it's much easier to compile to efficient code; and that it's much easier to call as a library. there are also a lot of disadvantages
<matrix_bridge><Lance R. Vick> as a security engineer, my default is to remove more lines of code than I add.
<oriansj>as a security engineer for the State of Michigan (and technical lead of the CMS audit) I opt for ease of comprehension over any other metric.
<oriansj>a refactor that adds a dozen lines can make things much easier to reason about
<matrix_bridge><Lance R. Vick> 100% in favor of that
<matrix_bridge><Lance R. Vick> but if those dozen lines are adding 12 libraries, those are now in scope for auditing
<matrix_bridge><Lance R. Vick> that is the balance I have to strike
<matrix_bridge><Lance R. Vick> but I am sure the repo can be made much more DRY with a few template functions on the standard library of python or something.
<oriansj>well one doesn't need multiple versions of database libraries; a single standard version for each database backend usually works out nicely
<matrix_bridge><Lance R. Vick> Dockerfiles may be verbose to write, but they are pretty easy to comprehend.
<muurkha>yeah, I think in particular that adding static typing information is usually a good tradeoff which improves comprehensibility by more than it increases volume
<matrix_bridge><Lance R. Vick> oriansj: to have every major language covered in stagex, I only need 30 packages in total, and the dependency chain is pretty light. What are you suggesting sqlite would buy me here?
<matrix_bridge><Lance R. Vick> maybe I missed something
<matrix_bridge><Lance R. Vick> the hardest thing to comprehend in the repo, IMO, is the main makefiles. Make is awful to reason about dependency tracking.
<matrix_bridge><Lance R. Vick> I would like that to mostly go away first
<matrix_bridge><Lance R. Vick> I do have "make graph" which generates a flowchart of the deps, which helps, but yeah
<matrix_bridge><Lance R. Vick> Also as far as the dockerfiles themselves: https://git.distrust.co/public/stagex/src/branch/main/src/core/sed/Dockerfile
<matrix_bridge>I write dockerfiles every day so to me that is easy to read, but with some templating it could for sure be a lot leaner. like the wget/sha256sum could just be a bash function I inherit.
<matrix_bridge><Lance R. Vick> Dockerfiles are also getting HEREDOC syntax in a few months so all the \ line terminations will go away
<oriansj>I am expressing a rather odd idea; builds should just be information trackable in a database and thus a trivial build tool could exist that just walks the dependencies in the database back to build anything.
<matrix_bridge><Lance R. Vick> I think I got hung up on the sqlite bit, but in spirit I don't disagree
<matrix_bridge><Lance R. Vick> the "database" in my mind would be single yaml file or something, read only
<oriansj>fair enough, sqlite is just a nice stand in for a simple database library in a pinch
<matrix_bridge><Lance R. Vick> then I define what depends on what, what hashes, and what versions all there
<oriansj>indeed
<matrix_bridge><Lance R. Vick> then a few functiions could parse that, and spit out a build order, and shove values into some dockerfile templates
<muurkha>sqlite isn't really that simple though
<matrix_bridge><Lance R. Vick> then in general no one will need to touch the dockerrfles
<matrix_bridge><Lance R. Vick> I could just have one global template with a bunch of variables the python script shoves into it
<matrix_bridge><Lance R. Vick> then someone that wants to bump versions of a few things opens one file, updates some hashes, commits, and runs "make"
<muurkha>cloc reports that sqlite3-3.40.1 as shipped by Debian contains 698254 source lines of code
<matrix_bridge><Lance R. Vick> might be a middle ground with only a small amount of code
<matrix_bridge><Lance R. Vick> Yeah, -actually- bringing in sqlite for figuring out dependencies of < 100 packages seems like a lot. Some dump loops over a textfile could get the job done.
<matrix_bridge><Lance R. Vick> now, if I had 10k packages... that is a very different story
<matrix_bridge><Lance R. Vick> it is a good thought excercise to say "what common tools would make the best developer experience, and can we provide that same experience with fewer lines of code but worse (but acceptable) performance"
<oriansj>entirely fair points
<matrix_bridge><Lance R. Vick> https://stackoverflow.com/questions/11557241/python-sorting-a-dependency-list
<matrix_bridge><Lance R. Vick> I was thinking something like this. Just a topological sort using the standard library
<matrix_bridge><Lance R. Vick> then the "database" is just a dumb text file
<matrix_bridge><Lance R. Vick> and in doing so I can probably delete all the dockerfiles and cut the total code in the repo down, while also improving comprehension
<matrix_bridge><Lance R. Vick> maybe
<muurkha>Dockerfile familiarity may help with comprehension
<matrix_bridge><Lance R. Vick> muurkha: but it sucks to write. So my thinking was, still have dockerfiles as the actual thing that is committed to the repo
<matrix_bridge><Lance R. Vick> but have tools used only by developers, to generate those files
<matrix_bridge><Lance R. Vick> and update them as needed
<matrix_bridge><Lance R. Vick> like a function to bump all hashes/versions across the repo to latest would be nice. If the hashes and versions have a single file source of truth that could also be nice. I dunno. Will do some experiments
<matrix_bridge><Lance R. Vick> I would like to believe my Dockerfiles are at least easier to read than the official upstream counterparts they are intended to replace... lol: https://github.com/nodejs/docker-node/blob/5ce4dae24d8af4283baa45226b4de1827f128de3/21/alpine3.18/Dockerfile
<matrix_bridge><Lance R. Vick> at that point just write a bash script and ADD it
<oriansj>that is a very low bar to beat
<oriansj>and probably generated by a tool without a human looking at it
<matrix_bridge><Lance R. Vick> Likely. Which is an argument for being very careful about getting too smart with generation. Have to make sure the final output files are still easy for humans to comprehend, to your point.
<oriansj>so I am guite sure you can do great things in improving that situation ^_^
<matrix_bridge><Lance R. Vick> like that whole file is full of signature checking, and yet, "from alpine" which does not use signatures at all and is not reproducible. Most of the internet is running on these official containers. It has been this way for 10 years
<matrix_bridge><Lance R. Vick> I thought surely someone else will do something about it eventually
<matrix_bridge><Lance R. Vick> no one did, and adoption only grew
<matrix_bridge><Lance R. Vick> at some point you convince yourself the situation will never get better if you don't do it yourself
<matrix_bridge><Lance R. Vick> as with bootstrapping
<matrix_bridge><Lance R. Vick> heh
<matrix_bridge><Lance R. Vick> I specialize in supply chain attacks and have pulled them off in the wild many times in audits... easily. But I have had not good alternatives to recommend in the reports I give, so they are kind of pointless.
<matrix_bridge><Lance R. Vick> now I can at least say "this exists, but if you can find something even better, great. This just checks the boxes"
<matrix_bridge><Lance R. Vick> with most things I do, someone comes behind me and does it better in a year. Great. Then I can move on to something else! 😄
<matrix_bridge><Lance R. Vick> make things modular so others can (like the seeds)
<matrix_bridge><Lance R. Vick> I aspire to understand the seeds well enough to write my own implementation some day
<matrix_bridge><Lance R. Vick> oh, it is 5 am again. I should sleep.
<stikonas>fossy: probably some of the extra packages needed to bootstrap gentoo stage should go live-bootstrap anyway. We do need to get ncurses and readline working...
<oriansj>well fixing a bad situation is usually harder than pointing out that there is a problem.
<oriansj>and why it took years to get the bootstrap situation to where we are today.
<oriansj>and why I am glad that you are working on improving the situation ^_^
<oriansj>muurkha: what I got with sloccount is: ansic=139034,yacc=1448
<muurkha>sloccount is probably more accurate! cloc was counting HTML. also though it counted 273k lines of C rather than 139k, which is almost exactly 2×; I wonder if it's counting a generated file or something?
<muurkha>or 283kloc if you include the headers
<muurkha>also it counts 21kloc of shell, 20kloc of Tcl, 14kloc of m4, and 9kloc of JS, which are all probably legit (but mostly don't end up in the actual compiled library)
<muurkha>lrvick: yes, agreed that dockerfiles suck to write
<muurkha>especially if you take the time to make them easier to read!
<Googulator>lrvick: as I understand it, your 64-bit bootstrap depends on the underlying kernel (behind Docker) being already 64-bit
<Googulator>is that right?
<muurkha>Docker doesn't really hide the kernel
<muurkha>I mean processes inside the Docker container don't have their own kernel
<muurkha>they use the same kernel as the rest of the system
<Googulator>I know that
<muurkha>sorry
<muurkha>I think I didn't understand what you were asking
<Googulator>my point is, is there any point in lrvick's bootstrap where we can plug in a build step to compile & kexec to a 64-bit kernel from a 32-bit one on bare metal
<Googulator>or is it dependent on the running kernel being able to natively execute 64-bit binaries
<Googulator>if I'm not wrong, at some point in the chain, an x86-32 -> x86-64 cross compiler needs to be built
<Googulator>what I'm not sure if that is already sufficient to then cross-compile a 64-bit Linux kernel (with 32-bit userspace support)
<stikonas>Googulator: well, that whole docker thing is close to bwrap mode than qemu mode
<stikonas>it's just different frontend to linux namespaces
<Googulator>right
<Googulator>but if the cross-GCC is sufficient for building a 64-bit kernel, then it's relatively easy to add a kernel build & kexec immediately after cross-GCC is ready
<Googulator>OTOH if e.g. some other tool is needed, which is currently built and executed natively in 64-bit mode - that's an obstacle
<Googulator>e.g. can our existing binutils (which is natively 32-bit) make a working 64-bit bzImage?
<Googulator>(from object code compiled by cross-GCC)
<stikonas>yeah, cross-compiler is always sufficient for building kernel
<Googulator>no need for other things like 64-bit libc?
<Googulator>or a "cross-binutils"
<Googulator>(never cross-compiled a kernel "by hand", so no idea exactly what's needed)
<stikonas>Googulator: libc is not available for kernel builds anyway
<stikonas>you only have pure C
<stikonas>(since it's libc's job to talk to kernel and do syscalls)
<stikonas>as for binutils, yes, you do need them
<stikonas>but you can also build cross-binutils
<stikonas>Googulator: cross-kernel build is what we'll also have to do with UEFI bootstrap (at least for now)
<stikonas>since Fiwix is x86 only
<stikonas>one needs to build tcc with x86 backend that runs on x86_64 system
<stikonas>and binutils is not needed there because tcc emits binary code rather than assembly
<rickmasters>stikonas: I was thinking the path would be to do a 64-bit Fiwix (even though that's super difficult)
<stikonas>yeah, longer term true
<stikonas>but short term it's much easier to switch to x86...
<stikonas>once we are ready to ditch UEFI boot services
<stikonas>we'll see how mes x86_64 port goes...
<stikonas>if it doesn't go well, maybe we need to figure out graphics framebuffer and ditch UEFI earlier
<rickmasters>stikonas: there could be a lot of effort into cross-compiling in the short term though
<stikonas>shouldn't be fairly easy
<stikonas>just need to build tcc with -DTCC_TARGET_X86_64
<rickmasters>should ?
<stikonas>well, I used x86_64 -> riscv64 crosscompiler of tcc quite a bit
<rickmasters>you said 'shouldn't' so just asking if that was a typo
<stikonas>oh yeah, that's a typo
<stikonas>probably wanted to write shouldn't be too hard...
<rickmasters>stikonas: I think I was confused. You're right that it's just cross-compiling Fiwix, continuing a 32-bit prior to Linux,
<rickmasters>stionas: then cross-compiling the Linux and any dependencies to 64-bit
<stikonas>yeah...
<stikonas>or for the first step just let x86 bootstrap to finish
<stikonas>and then try to build x86_64 kernel
<rickmasters>yeah, that could be staged now optionally at end of x86 ARCH build
<stikonas>and I still need to fix some bug in posix-runner...
<rickmasters>I could probably help with that at some point. First I need to fix a bug that cosinusoidally reported in builder-hex0's execve.
<rickmasters>Then that whole mechanism will probably be fresh in my mind. I'd like to document it better as well.
<stikonas>oh yeah, execve is another thing I need to finish
<stikonas>right now fork / execve works but probably not pure execve
<stikonas>(and that bug where mes starts failing after about 30 invocations by kaem script)
<rickmasters>If posix-runner is anything like builder-hex0's fork/execve/waitpid pattern then maybe I can help but I haven't looked at your code recently
<stikonas>yeah, it is that pattern
<stikonas>anyway, no pressure, finish your other project first :)
<rickmasters>Yeah, reported bug takes priority.
<Mikaku>I hope by the time you finish x86 live-bootstrap I'll be able to start adding 64bit support to Fiwix
<Mikaku>do you have an estimated time when you plan to finish x86 bootstrap?
<stikonas>well it is finished in some sense
<rickmasters>heh, it's never truly "finished" but I'm thinking it "works" right now.
<stikonas>now it's just some improvements
<Mikaku>I see, well, I just wanted to know if I can have enough time to finish a feature am working now before you guys finish the x86 build completely
<stikonas>well, for 64-bit bootstrap we need to sort out userspace first
<stikonas>and kernel stuff can temporarily go via cross-build route
<Mikaku>aha
<rickmasters>Mikaku: Don't feel like you have a deadline. I'd finish your feature. Don't worry about it.
<stikonas>exactly...
<Mikaku>so the live-bootstrap project is now in the 1.0 version? I check the issue #292 from time to time but it don't seems updated lately
<stikonas>it's not 1.0 yet..
<Mikaku>rickmasters: ok, thanks
<rickmasters>Mikaku: There is a lot to be done before a Fiwix 64-bit could even be utilized.
<stikonas>I think fossy wants to include all Googulator's baremetal fixes
<Mikaku>rickmasters: perfect, that's what I wanted to know :-)
<Mikaku>stikonas: ok, thanks
<stikonas>and also we'll see how well posix-runner works
<stikonas>as it's easier to extend (since it's written in M2 C) than builder-hex0
<oriansj>and should be relatively easy to port to new architectures (which have UEFI)
<stikonas>yeah, I think so
<stikonas>one needs to port thigns like enabling syscalls (if necessary), syscall numbers and jumping returning from syscalls
<oriansj>would it be complete and utter overkill in an explicit package manager to expect one to specify what programming language the source code files are?
<oriansj>or should it be optional metadata?
<rickmasters>metadata
<oriansj>ok
<oriansj>probably could make the license(s) on individual files metadata too and make the missing info a warning but not an error
<Googulator>meanwhile, updated my remaining PRs
<Googulator>(plus a new one to pull in builder-hex0 changes)
<Googulator>This includes the last changes required for quality of life on bare metal; the only thing I'm thinking of adding for bare metal is in-band logging (i.e. a logfile in the bootstrap FS itself, that can be viewed via the alternate virtual consoles, since on bare metal, you can't just pipe the VGA or HDMI output into a file like you can with the
<Googulator>virtual serial output or straight stdio in other modes)
<janneke>hmm, latest M2-Planet produces instruction `LOADS8' for arm, but there's no macro for LOADS8
<janneke>on mes, do:
<janneke>M2-Planet --architecture armv7l -f include/mes/lib-mini.h -f lib/string/strlen.c
<janneke>=>
<janneke>[..] LOADS8 R0 LOAD R0 HALF_MEMORY [..]
<stikonas>it is there
<stikonas>armv7l_defs.M1:DEFINE LOADS8 D0
<stikonas>I guess you need to sync defs from M2libc
<janneke>stikonas: ah, my bad
<janneke>ah, those puny submodules got me again
<janneke>yep, works beautifully
<oriansj>janneke: yeah, unfortunately I haven't found a better alternative to git submodules yet
<oriansj>maybe if I upgrade to a format that enables true libraries perhaps?
<oriansj>but at that point, I would be building a binutils replacement that could be built by M2-Planet
<janneke>yeah, for now it's a nice hack
<stikonas>oriansj: we can have non-true libraries with M2libc
<stikonas>janneke did that for mescc after all
<stikonas>well with meslibc...\
<stikonas>but we just need to compile all M1 files into libc.s and hex2 files into libc.a
<oriansj>but that wouldn't remove the M2libc submodule problem
<oriansj>in M2-Planet
<fossy>stikonas: i'm actually finding the main problem with a lot of things is static libraries...
<stikonas>oh yes, I guess at higher distro level everybody expects dynamic libraries
<stikonas>especially all that python stuff
<stikonas>our python is not very usable for normal pythonic workloads
<stikonas>well, the static build...
<fossy>yeah
<fossy>well, it can load dynamic modules (.so in lib-dynload) fine
<fossy>but it does have a few limitations
<oriansj>well basically every program seems to want a little custom perfect work in which to be built and run
<oriansj>^work^world^
<oriansj>and we can only hide from that fact so much
<stikonas>well, it's mostly that people don't test static builds that much
<stikonas>and if it's not tested, assume that stuff is kind of broken
<oriansj>well people don't think about static vs dynamic builds, they think: I want to use that functionality which library x has, which hoops do I need to jump through to get it with as little work as possible.
<oriansj>library maintainer says: I prefer $(static or dynamic) libraries for $(self-justification reason(s)) and then everyone just deals with that.
<fossy>don't know that i've ever seen a library that works with static but NOT dynamic
<fossy>most distros use dynamic linking so that is usually much more well supported
<fossy>and the issues usually come with build systems not supporting statically linked binaries rather than building static libraries
<stikonas>fossy: meslibc :)
<fossy>lol
<fossy>intended-for-general-use library