IRC channel logs
2023-07-18.log
back to list of logs
<damo22>solid_black: i am starting my day job in a few minutes <damo22>but keen to talk about the bootstrap ideas <damo22>solid_black: what are you designing for bootstrap? <damo22>i dont fully understand what i wrote in libmachdev but it was difficult to make it work correctly <solid_black>the main goal is what I said, putting as much as possible of bootstrap specifics into one single task, and letting evey other translator "just work" during early bootstrap without requiring any special logic compared to running on a fully bootstrapped Hurd system on their part <damo22>i am assuming the system currently boots by passing a bootstrap port around to various translators <damo22>that has the right of birth to the system <solid_black>so I'm not super familiar with how your libmachdev stuff works, which i one of the things we need to discuss <solid_black>but let me describe my plan first, and then we'll see how your stuff fits into it <solid_black>well, for one thing me having a long long long year to think without being able to actually work on anything <solid_black>but also the desire to enable more interesting translator usage for the root filesystem <solid_black>like originally it was just ext2fs and in-Mach disk drivers, and that seems simple enough, right? <damo22>i also agree that the system should be booted simply by one task instead of chaining one to the other, if the system cant boot without special command line options, then its a bug <solid_black>now we get your rumpdisk, and that needs the pci arbiter, and the acpi thing, and rumpdisk itself, and maybe something else I'm forgetting <solid_black>but now imagine you want to boot off of usb, so you need another translator <damo22>yes rumpusbdisk is already written <solid_black>and you'd also need to patch isofs to work as an early boot fs <solid_black>so for that you need netdde, lwip or linux pfinet, and nfs (or my 9pfs) <damo22>right, it gets messy but each translator is only responsible for its own driver <solid_black>we certainly don't want to patch all of those into doing awkward early bootstrap things <solid_black>you'd take a regular translator that knows nothing about early boot, relink it statically, and you should be able to use it in bootstrap <solid_black>grub and mach set up a bucnh of tasks, as today (although there is also an idea to move ELF loading to the bootstrap task too, but that's unclear and something to think about later) <solid_black>then mach resumes /hurd/bootstrap, the new bootstrap task <solid_black>this is the only task to accept special ports from the kernel via command line arguments like --kernel-task <solid_black>and this task then tries to implement/emulate as much of the normal Hurd environment for the others <solid_black>in particular: it provides them with stdio (so they can just read/write without having to open the Mach console device) <solid_black>(this is actually big, becuase it means they'll be able to complain about bad arguments or other startup errors -- they cannot currently, because they have to successfully parse their arguments to get the device master port to open the console) <solid_black>so the others start with working root dir and cwd ports <damo22>i was thinking we could expose a netfs with /dev <solid_black>this fs doesn't actually contain any data, but it's only used as a namespace for placing and looking up ports <solid_black>so for instance netdde would start up and place its port at /dev/netdde, as normal <damo22>so does that run inside the bootstrap task? <solid_black>or rather: it won't be netdde itself that would place its port there, because netdde would just reply to its parent with fsys_startup, as normal <solid_black>the bootstrap task will place its port at /dev/netdde <solid_black>yes, the bootstrap task links against netfs and provides the bootstap filesystem <solid_black>again, it's not as much of an intramfs in the linux sense, as it's just a namespace for the ports/devices/servers <damo22>how does that work with subhurds <solid_black>so they can be placed (settrans'd) and looked up in the normal way <solid_black>i don't see why this would be an issue with subhurds or chroot? <damo22>for example, would a new bootstrap task run and create a new netfs? <solid_black>another thing the bootstrap task does: it implements just enough of the process RPC interface to appear as the process server to the early bootstrap tasks <solid_black>which means, to get the privileged (device master and kernel task) ports, they just use the regular what's-it-called function in glibc <damo22>but there is a flaw in your design... how does the subhurd know it cant use some of the devices <solid_black>it gets a port that boot(1) gives it as the device master <solid_black>and boot(1) listens on that port and only exposes a couple of devices <solid_black>"console" (the Mach console), forwarded to boot(1)'s own stdio <solid_black>"pseudo-root" (block device), forwarded to a store that you boot off <solid_black>and whatever real devices you tell boot(1) to pass through, so as an ethernet device (perhaps from eth-multiplexer) <solid_black>so a subhurd typically won't use all of those {acpi, pci-arbiter, rumpdisk} things, since from its perspective the root device is implemented in Mach <solid_black>the other reason the bootstrap task impersonates the proc server is this way the other tasks register their message ports with it! <solid_black>which means, we get full messaging (and signals!) during early bootstrap <solid_black>which also means we can use the existing mechanism in glibc to set (and get) init ports <solid_black>such as: when we start the auth server, we give everyone we started that far their new auth port <solid_black>but we don't need a special RPC for this, we use msg_set_init_port, the one glibc already implements <solid_black>when we start the real proc server, we query it for proc ports for each of the tasks, and set them the same way <solid_black>and this way we migrate from our fake proc server to the real one <solid_black>similarly, when we get the root filesystem, we send everyone their new root and cwd ports <solid_black>but before we do that, we re-attach all the translators we have set up until that point on our little netfs bootstrap filesystem onto the new root <solid_black>which is actually something that should be possible to do completely transparntly to the translators themselves <solid_black>and I don't know enough about acpi or pci arbiter, so let's imagine we want to boot off an nfs, using netdde and lwip <damo22>you still need pciarbiter for netdde <solid_black>netdde starts up looks up any eth devices (using the device master port, which it queries via the fake process server interface), and sends its fsys control port to the bootstrap task in the regular fsys_startup <solid_black>the bootstrap task sets the fsys control port as the translator on the /dev/netdde node in its netfs bootstrap fs <solid_black>lwip looks up /dev/netdde, and sure enough, there it is <solid_black>then lwip returns its fsys control port to the bootstrap task, which it sets on /servers/socket/2 <solid_black>then we resume nfs, and nfs just creates a socket using the regular glibc socket () call, and that looks up /servers/socket/2 and it just works <solid_black>(imagine how many hacks it would take to make that work currently) <solid_black>we know it's not just *a* translator, but the real root filesystem <solid_black>so we take the netdde's and lwip's fsys control ports, and do file_set_translator on the nfs, on the same paths <solid_black>so now /dev/netdde and /servers/socket/2 exist and are accessible both on our bootstrap fs, and on teh new root fs <solid_black>that means the root fs is ready to be used, so we make a root dir port, and broadcast it to eveyone in a msg_set_init_port <solid_black>now everyone is running on the real root fs, and our little bootstrap fs is no longer used <solid_black>then we can resume the exec server, which is the first dynamically-linked task <solid_black>then we just file_set_translator the exec server to /servers/exec, so the nfs doesn't have to care about this <solid_black>that means we can now spawn tasks, instead of resuming ones loaded by mach and grub <solid_black>and by that point, we have enough of a Unix environment to call fork() and exec() <solid_black>then the bootstrap tasks does the things that /hurd/startup used to do <damo22>why cant this entire bootstrap thing be built into gnumach, or do we need it to be a separate task so it can be used for chrooting <solid_black>because: 1. this is using a lot of Hurd specifics and RPCs, this is not appropriate for Mach <solid_black>2. the microkernel mantra is the opposite of this: if something *can* be done outside of the kernel, it should be <solid_black>so the question shouldn't be whether we can move things into mach, it should be what we can move out of mach :) <solid_black>also imagine this, with good old ext2fs, you should really be able to start your root fs as /hurd/ext2fs.static /dev/wd0s1 <solid_black>like, not a device: thingie, not --magic-port thingie, no --next-task <damo22>i was also wondering how to do this <solid_black>just the same way as you'd run ext2fs for a second partition (typically /home) on an already running Hurd <damo22>but i didnt get as far as your design <solid_black>another thing that this design enables is far greater visiblity into the boot process, and interactivity <damo22>one thing i would like to point out about libpciaccess: <solid_black>like if some task doesn't start or crashes or something, /hurd/bootstrap can tell the other about it <solid_black>currently if some bootstrap task fails to run, eveyrthing appears to just hang <solid_black>like we've seen recently with that acpi/pci reordering <damo22>libmachdev currently has a trivfs and pci-arbiter uses this to mount a netfs ontop of it <damo22>most of that could go away if there was already a netfs to mount something <damo22>libpciaccess is a special case: it has two modes, the first time it runs via pci-arbiter, it acquires the pci config IO ports and runs as x86 mode, every subsequent access of pci becomes a hurdish user of pci-arbiter <solid_black>I've briefly looked into libmachdev, and it seems to be some thing built on top of libtrivfs that, true to its name, adds some Mach device RPC things, <solid_black>so let's think about what of this will still be useful, and what won't <damo22>but it needs /servers/bus to mount /pci <damo22>so theres all this extra fiddling around at bootstrap to give it a / <solid_black>sure, we'll just make directory nodes at both /servers and /servers/bus (and /dev, and /servers/socket) <solid_black>maybe you should start by helping me understand what is it that libmachdev actually tries to do <damo22>libmachdev exposes the mach device interface in userspace <damo22>so we can remove all drivers from mach <damo22>everything that connects to hardware can be a machdev <solid_black>yes, that I understand, but what does it actually do? <solid_black>like does it forward Mach device RPCs to trivfs? the other way around? <solid_black>how does it fit into servers like the pci arbiter that want to expose more than just a single node? <damo22>i think it provides a trivfs that intercepts the device_open rpc <damo22>it also fakes a root filesystem node so you can mount a netfs onto it <solid_black>does anything actually use device_open with ports on the filesystem? <damo22>libmachdev's main purpose is to let you implement hardware drivers as a mach device and have device_* rpcs actually call it <solid_black>does it provide anything useful compared to just implementing device_read / device_write yourself? <damo22>you still have to implement those yourself, but it doesnt go inside mach <solid_black>ok so how does rumpdisk work at all, translator-wise, what does it expose? /dev/rump? <solid_black>and you can open() that node and it will act as a device master port, so you can then device_open () devices (like wd0) inside of it, right? <solid_black>that's another translator we'd need in early boot if we want to boot off /hurd/ext2fs.static /dev/wd0 <damo22>we implemented it as a storeio with device:@/dev/rumpdisk:wd0 <solid_black>so the @ sign makes it use the named file as the device master, right? <damo22>the @ symbol means it looks up the file as the device master yes <damo22>but the code falls back to looking up mach if it cant be foud <solid_black>I see it's even implemented in libstore, not in storeio <solid_black>yeah so it just does file_name_lookup, then device_open on that <damo22>pci-arbiter also needs acpi because the only way to know the IRQ of a pci device reliably is to use ACPI parser <solid_black>but instead of handling the RPCs directly, it sets the callbacks into the machdev_device_emulations_ops structure <damo22>in case you wanted to merge drivers? <solid_black>so yeah it would help if you wanted multiple different devices in the same translator <solid_black>which is of course the case inside Mach, the single kernel server does all the devices <solid_black>but that shouldn't be the case for the Hurd translators, right? we'd just have multiple different translators <solid_black>ok, so other than those machdev emulation dispatch, what does libmachdev do? <damo22>it centralises the early bootstrap so all the machdevs can be the same code <damo22>pci-arbiter creates a netfs on top of the trivfs i think <solid_black>how well does this work if it's not actually used in early bootstrap? <damo22>and rumpdisk opens device ("pci" <damo22>when each task is resumed, it inherits a bootstrap port <damo22>so rumpdisk can call pci-arbiter rpcs on it <solid_black>hm, so I see from the code that it returns the port to the root of its translator tree actually <solid_black>does pci-arbiter have its own rpcs? does it not just expose an fs tree? <damo22>it has rpcs that can be called on each fs node called "config" per device <solid_black>how does that compare to reading and writing the fs node with regular read and write? <damo22>so the second and subsequent instances of pciaccess end up calling into the fs tree of pci-arbiter <damo22>you cant call read/write on pci memory its MMIO <damo22>and the io ports need inb inw etc <damo22>they need to be accessed using special accessors <solid_black>but I can do $ hexdump /servers/bus/pci/0000/00/02/0/config <damo22>the pcifs is implemented to allow these things <solid_black>why is there a need for pci_conf_read as an RPC then, if you can instead use io_read on the "config" node? <solid_black>(sorry if these questions are stupid, I've very little idea about what PCI even is) <damo22>i think it wasnt fully implemented from the beginnign <damo22>but you definitely cannot use io_read on IO ports <damo22>these have explicit x86 instructions to access them <damo22>MMIO maybe, im not sure, but it has absolute physical addressing <solid_black>i don't see how you would do this via pci.defs wither? <damo22>we expose all the device tree of pci <damo22>you may be right, it would be best to implement pciaccess to just read/write from the filesystem once its exposed on the netfs <solid_black>yes, the question is: 1. is there anything that you can do by using the special RPCs from pci.defs that you cannot do by using the regular read/write/ls/map on the exported filsystem tree, 2. if no, why is there even a need for pci.defs, why not always use the fs? <solid_black>but anyway, that's irrelevant for the question of bootstrap / libmachdev <damo22>there is a need for rpcs for IO ports <solid_black>could you point me to where rumpdisk does device_open ("pci")? grep doesn't show anything <damo22>i think the way it works, libmachdev uses the next port <damo22>when the pci task resumes, it has a bootstrap port <damo22>which is passed from previous task <damo22>or if its the first task to be resumed, it grabs a bootstrap port from glibc? <solid_black>how much of libmachdev functionality will still be used / useful? <solid_black>i'd rather you implemented the Mach device RPCs directly, without the emulation structure <solid_black>but that's an unrelated change, we can leave that in for now <damo22>i kind of like the emulation structure as a list of function pointers, so i can see what needs to be implemented <damo22>but thats neither here nor there <damo22>libmachdev was a hack to make the bootstrap work to be honets <damo22>the new one would be so much better <solid_black>is there anything else I should know about this all? <solid_black>what else could break if there was no libmachdev and all that? <damo22>acpi, pci-arbiter, rumpdisk, rumpusbdisk <damo22>pci-arbiter needs to start first <damo22>to claim the x86 config io ports <solid_black>that it will get through the glibc function / the proc API <damo22>it needs a /servers/bus and the device master <solid_black>right, so then it just does fsys_startup, and the bootstrap task places it onto /servers/bus <solid_black>(it's not expected to do file_set_translator itself, just as when running as a normal translator) <damo22>it exposes a netfs on /servers/bus/pci <solid_black>so will pci-arbiter still expose mach devices? a mach device master? or will it only expose an fs tree + pci.defs? <damo22>i think just fs tree and pci.defs <solid_black>ok, so we drop mach dev stuff from pci-arbiter completely <damo22>it looks up the right nodes and calls pci.defs on them <solid_black>looks up the right node on what? there's no root filesystem at that point (in teh current scheme) <solid_black>that's why I was wondering how it does device_open ("pci") <damo22>i think libmachdev from pci gives acpi the fsroot <solid_black>so does it set the root node of pci-arbiter as the root dir of acpi? <solid_black>as in, is acpi effective chrooted to /servers/bus/pci? <damo22>i think acpi is chrooted to the parent of /servers <damo22>it shares the same root as pci's trivfs <solid_black>i still don't quite understand how netfs and trivfs within pci-arbiter interact <damo22>you said there would be a fake / <solid_black>yeah, in my plan / the new bootstrap scheme, there'll be a / from the very start <damo22>ok so acpi can look up /servers/bus/pci and it will exist <solid_black>and pci-arbiter can really sit on /servers/bus/pci (no need for trivfs there at all) <solid_black>so my question is, do we need to change anything in acpi to get it to do that <solid_black>maybe we'd need to remove some no-longer-required logic from acpi then? <damo22>it looks up device ("pci") if it exists, otherwise it falls back to /servers/bus/pci <solid_black>currently pci-arbiter exposes its mach dev master as acpi-s mach dev master <damo22>but it doesnt need that if the / exists <solid_black>yeah, we could remove this in the new scheme, and just always open the fs node <solid_black>(or leave it in for compatibility, we'll see about that) <damo22>it needs /servers/bus/pci and pci.defs and /servers/acpi/tables and acpi.defs <solid_black>would it make sense to make rumpdisk expose a tree/directory of Hurd files and not Mach devices? <solid_black>this is not necessary for anything, but just might be a nice little cleanup <damo22>well, it could expose a tree of block devices <solid_black>plus the Hurd fil interface is much richer than Mach device, you can do fsync for instance <damo22>the rump kernel is bsd under the hood, so needs to be /dev/rumpdisk/ide/wd0 <solid_black>can't use convert "ide/0" to "/dev/wd0" when forwarding to the rump part? <solid_black>not that I object to ide/wd0, but we can have something more hierarchical in the exposed tree than old-school unix device naming <solid_black>why name your device file /dev/sda1 why you could have /dev/sata/0/1 or something <damo22>because you cant have a file and a directory with the same name <solid_black>but then we'd still keep the bsd names as symlinks into the /dev/rumpdisk/... tree <solid_black>and we won't be doing that either, rumpdisk only exposes the devices, not partitions <damo22>well you just implement a block device on the directory? <damo22>but that would be confusing for users <solid_black>I'd expect rumpdisk to only expose device nodes, like /dev/rumpdisk/ide/0 <solid_black>and /dev/wd0s1 being a storeio of type part:1:/dev/wd0 <solid_black>or instead of using that, you could pass that as an option to your fs, like ext2fs -T typed part:1/dev/wd0 <solid_black>so yeah, you could do the device tree thing I'm proposing in rumpdisk, or you could leave it exposing Mach devices and have a bunch of storeios pointing to that <damo22>yea but i cant find anything here <solid_black>so anyway, let's say rumpdisk keeps exposing a single node that acts as a Mach device master <solid_black>then we either need a storeio, or we could make ext2fs use that directly <solid_black>so we start /hurd/ext2fs.static -T typed part:1:@/dev/rumpdisk:wd0 <solid_black>I'll drop all the logic in libdiskfs for detecting if it's the bootstrap filesystem and starting the exec server and spawning /hurd/startup <solid_black>yeah, and after that the bootstrap task migrates all those translator nodes from the temporary / onto the ext2fs, broadcasts the root and cwd ports to everyone, and off we go to starting auto and proc and unix <solid_black>so we're just removing libmachdev completely, right? <damo22>how much work do you think this is <solid_black>the parts that are still unclear to me is how you script this thing <solid_black>like ideally we'd want the bootstrap task to follow some sort of script <solid_black>settrans /dev/netdde ${netdde-task} --args-to-netdde <solid_black>and ideally the bootstrap task would implement a REPL where you'd be able to run these commands interactively <solid_black>(mkdir, settrans, setroot are not the shell commands here, but built-in bootstrap task commands) <solid_black>wdym a mini console? how is that different from a repl? <damo22>maybe it can type its own commands <solid_black>where it has a predefined script, and you can do something (press a key combo?) to instead run your own commands in a repl <solid_black>or if it fails, it bails out and drops you into the repl, yes <solid_black>this gives you *so much more* visibility into the boot process <solid_black>beucase currently it's all scattered across grub, libdiskfs (resuming exec, spawning /hurd/startup), /hurd/startup, and various tricky pieces of logic in all of these servers <solid_black>and if something fails, you're on your own, at best it prints an error message (if the failing task manages to open the mach console at that point) <damo22>it probably should be called "boot" <solid_black>which is where I stole the ideas for making this scriptable and adding a repl <solid_black>i'm not a big fan of lisp myself, but you could convince me it's a good idea :D <solid_black>re what can you do: first of all keep this plan in mind when hacking on libmachdev, grub, and bootstrap things <solid_black>second, when/if this is ready, we'll have to remove libmachdev and port everything else to work without it <solid_black>and certainly there will be issues, like it will fail to boot for some reason initially, and I'll have no idea how to debug that, so maybe you'd be able to help with figuring this out <solid_black>and of course if you want to hack on the bootstrap task itself you're very welcome to (I'd need to put up the code I have so far somewhere) <damo22>i could try cleaning up the unneeded code in other translatorsw <solid_black>also you can of course give feedback on the plan, but so far sounds like you love it and is just excited as I am :D <damo22>if i keep in mind that / is available early.. <damo22>can i just clean up the other stuff <solid_black>/, and the device master can be accessed with the regular glibc function, and you can printf freely (no need to open the console) <solid_black>well you probably run netfs_startup or whatever, and it calls that <solid_black>you're not supposed to call fsys_getpriv or fsys_init <damo22>i think my early attempts at writing translators did not use these <solid_black>yes, you should assume you have /, and just do all the regular things you would do <solid_black>and if something that you would usually do doesn't work, we should think of a way to make it work by adding more stuff in the bootstrap task <solid_black>and please consider exposing the file tree from rumpdisk, though that's orthogonal <damo22>you mean a tree of block devices? <solid_black>yes, but each device node would be just a Hurd (device) file, not a Mach device <solid_black>i.e. it'd support io_read and io_write, not device_read and device_write <solid_black>if a node only implements the device RPCs, we need a storeio to turn it into a Hurd file, yes <solid_black>but if you would implement the file RPCs directly, there wouldn't be a nned for the intermediary storeio <damo22>thing is, i dont know at runtime which devices are exposed by rump <damo22>it auto probes them and prints them out but i cant tell programmatically which ones were detected <solid_black>so rump knows which deivces exist but doesn't expose it over API in any way <damo22>because it runs as a kernel would <solid_black>speaking of which, how good is this rump anyway? does it have better hardware support than Linux drivers (of modern Linux)? <solid_black>i think i saw you say somewhere that it's essentially unmaintained upstream too? <damo22>they still use it to test kernel modules <damo22>but it lacks makefiles to separate all drivers into modules <solid_black>then... if i may ask... why go with that instead of updating / redoing the linux drivers port? <damo22>because netbsd internal kernel API is much much more stable than linux <damo22>we would fall behind in a week with linux <damo22>who would maintain the linux driver -> hurd port <damo22>also, there is a framework that lets you compile the netbsd drivers as userspace unikernels <damo22>such a thing does not exist for linux <damo22>i think rump is already good enough for some things <solid_black>but it doesn't let you get the list of devices, how does that make any sense? <damo22>but it has extra ones that arent connected <solid_black>and how much of a netbsd kernel is really in there? the whole thing, with fs? <damo22>so rumpdisk only has the ahci and ide drivers <damo22>but can detect them off the pci bus <luckyluke>solid_black: I like your idea of the new bootstrap, it looks like a ramdisk + a script that uses the other multiboot modules and starts them as translators <luckyluke>and the bootstrap script could be just a bootstrap module <luckyluke>maybe the name bootstrap is a bit too generic :p