IRC channel logs

2024-12-20.log

back to list of logs

<fossy>hey all - i've been working on a simple tool over the last 2ish weeks that uses a number of heuristics to find pre-generated code within a tarball or codebase
<fossy> https://github.com/fosslinux/problematic-source
<fossy>it's at a point where it's catching useful things
<fossy>most of them are just heuristics that i
<fossy>'ve been using personally in live-bootstrap
<fossy>my next goal is to run it on all live-bootstrap's sources
<daddy>does pre-generated include things like `./configure`, transpiled sources, etc. or is it targeting binary blobs?
<fossy>all of the above
<daddy>neat.
<fossy>transpiled sources it probably isn't very good at yet
<stikonas>fossy: oh nice
<stikonas>oh and the checks are basically plugins in https://github.com/fosslinux/problematic-source/tree/master/checks
<lanodan>Oh neat, also seems like it's missing something like a pyproject.toml to be installable/packageable
<stikonas>isn't it setup.py?
<stikonas>and then you can upload it to PyPI
<daddy>you can also use pyproject.toml which i'd recommend as it's less arbitrary code execution.
<daddy>hm, i think you might need setup.py anyways
<fossy>yeah stikonas that was quite intentional to have the checks as a "plugin" type format
<fossy>yeah, i haven't done any of the python packaging stuff
<stikonas>I guess the reasoning was that if you don't trust arbitrary code in setup.py, why would you trust the rest of the code in the pip by the same author
<fossy>isn't setup.py deprecated now?
<fossy>i need to read up on the new packaging stuff
<daddy>running setup.py directly is deprecated.
<stikonas>I'm still using for one of the pip things I maintain
<fossy>ah, okay
<stikonas>(not bootstrapping related and not generally useful as it's for specific device but you can see an example of packaging https://gitlab.com/neohubapi/neohubapi/)
<stikonas>hmm, I actually still run setup.py there according to releasing readme
<stikonas>hmm
<stikonas>python3 setup.py bdist_wheel sdist
<stikonas>am I doing it wrong then?
<stikonas>by the way, is anybody familiar here with gnunet?
<stikonas>I was looking at DHT implementations in C, that seems to be one of them https://en.wikipedia.org/wiki/Kademlia#Implementations
<stikonas>but I guess more research is indeed needed if we want to create a distributed mirror for live-bootstrap sources
<fossy>stikonas: my (very basic) design idea was that the user would be asked for 1 or more mirrors, and then that mirror could introduce other mirrors
<fossy>but that comes with it's own set of problems, of course
<stikonas>yeah, I know...
<stikonas>well, I don't have anything specific in mind
<stikonas>I just thought this sounds like an already solved problem
<stikonas>DHT does exactly that, you start with one bootstrap node
<stikonas>and then it finds more nodes
<fossy>yeah, you're totally right
<fossy>i hadn't really thought of that
<stikonas>and can do O (log N) search hash->file
<stikonas>and there are of course thigns to think about: piggy back on existing network or create own...
<stikonas>former probably would let us reuse some software
<jackdk>I note that torrents can include HTTP seeds these days
<lanodan>Well one of the problems of setup.py is pip got stuck executing it for dynamic dependencies: https://github.com/pypa/pip/issues/1884
<jackdk>I suppose each release of such a torrent could add/remove HTTP mirrors as people stand up/give up mirroring?
<lanodan>(Although personally I stay away from pip)
<stikonas>hmm, wouldn't it be simpler to just have something like magnet link
<stikonas>so you need certain hash
<stikonas>you create uri from it and search for it
<stikonas>rather than comitting torrent files to repo
<stikonas>hmm
<jackdk>I'm also unsure if it lets you say "this one specific file in this torrent is available from this URI"
<stikonas>hmm, probably just the whole set
<stikonas>though I'm unsure either
<stikonas>but I was thinking it will be basically 1 torrent per file anyway
<stikonas>(or equivalent of torrent in another network)
<stikonas>i.e. really just a way to obtain file from hash
<stikonas>well, maybe could map entries in 1 live-bootstrap "sources" file to "torrent"
<jackdk>I was thinking "here's a torrent for all the bootstrap seeds for this particular version of the bootstrap process" though I can see arguments for both
<stikonas>but then each commit needs a new one...
<stikonas>they'll get dead very quickly
<stikonas>the big bundle for particular version might make sense for proper releases though
<lanodan>I guess if it updates that frequently something more like usual rsync mirrors would make sense
<lanodan>At least I already have that set up for iana, rfcs, gutenberg, tuhs: https://hacktivis.me/kopimi/mirror/
<jackdk>With the caveat that we want people to generate and verify their own sources from upstream
<lanodan>As in verifying tarballs origin? I guess signify wouldn't be too bad but not many projects have adopted it, and gnupg is pretty big.
<stikonas>(this is by the way discussion started in https://github.com/fosslinux/live-bootstrap/issues/485)
<jackdk>Even just verifying that what's served by a mirror is the same as the files served from upstream
<fossy>yes, and if we adopt my proposal of generating git tarballs ourselves, that too
<mid-kid>where is the distfiles mirror for live-bootstrap?
<mid-kid>hboehm.info is down right now
<mid-kid>nvm it was on archive.rog
<mid-kid>org
<matrix_bridge><Andrius Štikonas> https://files.bootstrapping.world/
<matrix_bridge><Andrius Štikonas> Might be a bit out of date, not sure
<matrix_bridge><Andrius Štikonas> Though we didn't do much recently
<aggi>i'm thinking to boundle all necessary source distfiles with the tinycc distribution
<aggi>so it's possible to download some i486-tcc-linux-musl.iso and everything is contained
<aggi>currently i'm trying to keep various package versions in sync with both what bootstrappable got and what linux-2.4 supports
<aggi>and i noticed already it's not too trivial to find some old distfiles matching linux-2.4/linux-2.X abi