IRC channel logs
2024-12-03.log
back to list of logs
<youpi>pae can't work with the kernel linux drivers <youpi>isn't the gnumach build disabling the linux group in the pae case? <Pellescours>I don’t think so, I don’t see a disabling of linux in case of pae enable <youpi>maybe I hoped to fix it into using bounce buffers <youpi>#ifdef PAE #define VM_PAGE_LINUX VM_PAGE_DMA32 #else #define VM_PAGE_LINUX VM_PAGE_HIGHMEM #endif <youpi>yes, it has some stuff to make it work <youpi>so possibly it could be fixed <youpi>at least for the read part that should be fine <youpi>device_write doesn't seem to be using a bounce buffer however <youpi>it's calling bd->ds->fops->write on a projection of the physical page, so can just fail if the driver doesn't support 64b physical addresses <youpi>that being said, the future is rumpdisk, so I wouldn't bother that much <shmorg83>is there any recommended hardware for Hurd at this moment? <Pellescours>For what I can see by playing with smp, it start running rumpdisk and stop here, ext2fs is not yet started and rumpdisk does not seems to do anything more. It’s like every threads started to wait for something and the running gnumach kernel is looping doing "machine_idle" and "idle_thread_continue" (but never waking up a thread) <damo22>with no network and Pellescours' smp patchfix, i was able to boot fully with smp and compile gnumach with -j6 <damo22>i think netdde breaks smp due to it changing the interrupt flag on processors <youpi>isn't netdde bound on cpu0 ? <youpi>are you running only the make inside smp? <damo22>i removed the smp isolation and booted fully <damo22>then was able to run make -j6 with full smp <youpi>I'm surprised netdde would have troubles that rumpdisk doesn't <damo22>rumpdisk uses libirqhelp but netdde uses libirqhelp AND cli / sti <youpi>does it really use cli/sti? it should indeed really not <youpi>we could as well bind netdde on cpu0 actually <youpi>we could add a cpu0-binding helper in libshouldbeinlibc actually <damo22>libdde-linux26/lib/src/net/core/dev.c -> local_irq_* <youpi>sure, I know the linux source code :) <youpi>I just assumed that the netdde layer was avoiding that <youpi>netdde doesn't seem to be calling iopl <youpi>so it wouldn't be able to cli/sti <youpi>Mmm, we don't event support iopl actually <youpi>it's not allowed in userland <youpi>you need to use iopl(3) to actually enter kernel mode <youpi>even if you are in a process <youpi>you stay in userland in terms of memory, but kernlland in terms of i/o <damo22>so something is wrong with netdde though, for smp <youpi>not really sure the netdde layer was really made smp-safe <damo22>i think i recall it worked if i isolated boot to cpu0 <youpi>or if we enable its smp safety, if any <damo22>i found out that rump has a compile time flag that optimises for non-smp <damo22>so it defaults to compiling with smp support <damo22>effect: If "yes", build rump kernel with uniprocess-optimized locking. <damo22> An implication of this is that RUMP_NCPU==1 is required at <damo22> runtime. If "no", build with multiprocessor-capable locking. <Pellescours>damo22: did you removed the RUMP_NCPU=1 to your rumpkernel build ? <damo22>Pellescours: that is runtime option <Pellescours>does this means that rump will always run on the same cpu ? <solid_black>rumpnet is a replacement for netdde, not the TCP/IP stack, right? or is it both? <Pellescours>I would says yes, and we keep the current ip stack (pfinet or lwip) <solid_black>it would be cool to have support for vsocks (as in AF_VSOCK) <solid_black>apparently there are two actual transports that Linux can use <solid_black>supporting virtio, whatever that is, would be great irrespective of vsock <solid_black>in my understanding virtio is not a specific device, but more like a class of devices? with some common traits? <youpi>better separate the tcp/ip stack from the drivers, so the driver can crash and be fine <damo22>i am hoping /dev/bpf can separate the driver from the stack <solid_black>and if that's right, than I have no idea how virtio would fit into the Hurd's model <solid_black>should it be managed by a central server in /servers/bus? (virtio-arbiter?), or should it be a libvirtio.so framework? <damo22>we will have part of the netbsd network stack compiled into the driver so we can use AF_LINK though <solid_black>I don't think pfinet handles /dev/eth crashes nicely <youpi>there is apparently still one issue, but most often killing netdde is fine for pfinet <damo22>i had a ssh connection survive between restarts of netdde <solid_black>my point though is that vsocks are (said to be) much simpler and more reliable than eth + tcp/ip <solid_black>which would be very valuable for us, given the instability of the network stack <youpi>do vsock really allow to connect to an outside tcp server? <youpi>virtio would, however, and that could be useful indeed <solid_black>? it's unrelated to tcp, it's a separate socket protocol <solid_black>but af_vsock is between the host and VMs, not local to a system <youpi>ok but I don't see the point in implementing that if it doesn't allow tcp to the outside, when virtio would <solid_black>would be cool of us to support that on the guest side <solid_black>virtio is, in my very limited understanding, not a protocol by itself <youpi>there are a lot of cool stuff that we could do, I'd say better prioritize what stabilizes the os <youpi>it is not a protocol as in tcp, but it is as in ethernet <youpi>shared buffer-based packet passing essentially <youpi>I'm just talking about the position in the network stack <solid_black>you can have ethernet-like protocol _on top of_ virtio (virtionet I think it was called? or something like that) <youpi>I'm talking about the packet passing thing <youpi>that's the problem with the "ethernet" word, it says a lot <youpi>and most often not exactly what you want to say <solid_black>so "supporting virtio", whatever that means/entails, would be great for a lot of things <solid_black>and once we do, we could have pfvsock, working on top of the virtio transport, that would allow us to ssh into the system from the host w/o involving pfinet, dhcp, ... <solid_black>so given you and damo22 understand what virtio is much better than I do, how do you think it should fit into the Hurd's architecture? as a single /dev/virtio node? is it a bus like pci? is it a helper library for actual virtio-based devices to use? ("libvirtioaccess"?) <youpi>that being said, it'd be really useful to fix any issue you have with pfinet when the ethernet driver crashes <youpi>because that's really a killing feature of the hurd <solid_black>the older vmware transport for vsocks is apparently based on a pci device <solid_black>which would mean I could try to implement it on top of libpciaccess <youpi>is there no library-based implementation that we could use? <damo22>at least it used to work for disk access <damo22>you need to link rumpdisk with rumpdev_virt or something like that <Pellescours>solid_black: I just read about vsocks and I think we need a virtio driver if we want to support it <Pellescours>virtio allows to not have a nic emulation, and vsock allows to not use tcp/ip stack. But to have the latter we need the former <youpi>the tcp/ip stack isn't really a problem, we can use lwip on the long run <Pellescours>what is the (long term) plan for lwip vs pfinet ? Keep the 2 and let user choose which implementation he want to use ? or just switch to one (lwip?) and abandon the second? <gnu_srs1>Pellescours: I'd prefer lwip over pfinet. <gnu_srs1>Mainly due to maintenance and license issues. <youpi>long-term is get rid of pfinet <youpi>not really because of licensing (it's in a separate process anyway) <youpi>but because it's not really maintainable <solid_black>using lwip is all good, but being able to sidestep it and tcp/ip entirely (while supporting a protocol that is quickly gaining traction in the Linux virtualization world) is also good <solid_black>i.e. I'm not against lwip or pfinet, I'm saying for some use cases, we don't even need it <solid_black>and again, tcp/ip brings a bunch of complexity that's not limited to pfinet/lwip <solid_black>ok, suppose hwclock(8) will be able to access /dev/rtc <solid_black>but don't we need some daemon to keep the system time in sync with rtc? <solid_black>there is the rtc, there are timers, and there is NTP <solid_black>based on the initial reading of the rtc we set a global system time, and then maintain it using the timer <solid_black>do we consider the timer to be more accurate then the rtc? i.e. do we only need the rtc once on startup? <azert>solid_black: regarding virtio, how do you imagine the implementation on the Hurd? <solid_black>but it does sound like /servers/bus/virtio is the most fitting interface <solid_black>(and as always, I know nothing about hardware, trust anything youpi or damo22 says over what I say) <solid_black>so I imgaine there'd be things like /servers/bus/virtio/25/virtqueue/0 under there <solid_black>and a driver would read and write data (pages) in there <azert>Ok, the reason I asked this question is since I have the impression that we have poor support for code reuse. Let’s say we have usb, virtio, sata, the three of them uses scsi. But we end up implementing scsi under three different processes <solid_black>if things work out, this would be zero-copy while fitting both Mach and virtio designs nicely <solid_black>I don't have an understanding of how scissi and usb fit into the picture <solid_black>I thought you then had a "virtio hard drive" (or something) implemented directly on top of virtio <azert>But in Linux everything reuses the same scsi layer <azert>Probably can be solved with shared libraries in the Hurd <azert>It is: all disk interfaces will talk scsi <solid_black>(please bear with me, 'cause I really don't understand how the bits fit together here) <solid_black>so are you saying /dev/vd* devices are scsi on top of virtio? <azert>What I mean is that, with the exception of the network stack, we don’t really have a solution to separate hardware drivers from their functions <azert>Yes, and ad arda are scsi on top of usb and so on <azert>But maybe the way rump does, we can do that <azert>Then I’d advise to implement virtio using rump <solid_black>I see "block device" and "SCSI host" listed separately in the virtio spec <solid_black>but yeah, I agree that I don't see a general nice model of how to do "stacked" things <solid_black>possibly each step in the stack is its own translator <youpi>solid_black: there is an init.d script to update the rtc on system shutdown <solid_black>but if so, that does mean that we treat system time as a source of truth while the system is up, and only use rtc to preserve a notion of time while the system is down <solid_black>i.e. the timer used to maintain the system time is "better" (more accurate? reliable?) than the rtc <youpi>rtc is only used at system bootup to get the time <youpi>then the system maintains the time accurately <youpi>the timer of the system is way more accurate than the rtc, yes <youpi>iirc man hwlock mentions somebody who was having like second-drift over a night off <solid_black>...but the reason we still need rtc at all is because it works while the power is off, so the timer doesn't tick <azert>Indeed solid_black: I think rump stacks things nicely as libraries. Although we don’t use dynamic linking I think, it could in theory <solid_black>rump is, in my understanding, basically taking code from netbsd <youpi>there were so odd bug that nobody investigated, but it should be feasible yes <solid_black>but it obviously has a huge advnatage of already existing and working :) <solid_black>looks like "virtio-blk" and "virti-scsi" are distinct things indeed <solid_black>"prefer virtio-scsi for attaching more than 28 disks" <solid_black>but like really, I'd guess the absolute majority of GNU/Hurd installations are VMs <solid_black>those can use virtio for storage, networking, vsocks (and more) <solid_black>how does DMA / IOMMU work? can a host just export any physical page to "hardware"? what sort of things does it need to do to perform that exporting? <youpi>I don't know the details, but the principle is that you tell the io mmu what device can access what physical pages <solid_black>does the "hardware" see the actual physical address of the page, or is there a layer of indirection like with a regular mmu? <youpi>and with vt-x, you can also delegate a pci device to a guest <youpi>so we could delegate to a process <solid_black>so all devices that we support have a way to communicate with them that isn't DMA, and we only use that? <youpi>so drivers and device can do whatever they want wih the memory <solid_black>does a device know if there is an iommu, or is it transparent? <solid_black>do we currently just send a physical address to a device? <youpi>it doesn't know about the page table anyway <solid_black>sure, I don't mean physical as opposed to virtual (of some vm_map), but rather as opposed to something else <youpi>there is nothing else currently :) <solid_black>well perhaps devices have some sort of descriptors to refer to pages or something, I don't know <solid_black>does that mean for example that a device is tied to the size of a memory address? <youpi>probably they manipulte pseudo-physical addresses <youpi>afaik they don't need to know about io/mmu <youpi>some devices can't grok physical addresses beyond 4G <solid_black>yes, I mean whether the memory address bus (?) is 32-bit or 54-bit or whatever other size <solid_black>is, say, a hard drive tied to a specific number of bits that a memory bus must have <youpi>but the disk controller, yes <azert>solid_black: I don’t think netbsd has support for iommu <solid_black>a part of my original question is: are all regions of the physical memory equal wrt DMA-ability? <solid_black>because I remember vm_page in gnumach having different segments of pages or something <youpi>all physical memory is dma-able <youpi>but devices have address size limitations <etno>I have been trying to troubleshoot rumpusb failing to initialize ehci, and discovered that the libpci part of rump is writing to all PCI devices config space as part of the enumeration. <etno>I am under the impression that the strategy of using several instances of rumpkernel in parallel can work only if we implement our own version of libpci (src/sys/dev/pci) <azert>etno: do you mean that the pci device enumeration cannot happen twice? <azert>if that’s the case then I don’t see how what you said would solve this issue. In the sense that if it can happen only once then you need a mechanism to store the collected information that cannot be restarted or rebooted <azert>a translator wouldn’t fit the bill, you’d have to store it in a file, which is kind of nasty business <youpi>enumeration is fine. The problem is several drivers driving the same pci device <youpi>we are getting eaten by this on hurd-amd64 buildd boxes <youpi>if the chroot's /dev/rump happens to trigger, it tries to open the disk controller again, and mayhem happens <etno>So my point is that even if several rumpkernel-based drivers attach to distinct PCI devices; I think that the second one will mess with the first one while enumerating <azert>you need to implement some kind of exclusive memory in the kernel, that can belong only to a single task. To solve that, my guess <youpi>no, juste enumerating will not bother an existing driver <youpi>reconfiguring PCI adresses etc. would indeed, but mere enumeration wouldn't <etno>youpi: It could be implemented this way, yes. But my naive reading of pci/pci.c in rump make me doubt it is currently the case. <azert>how could different rumpdisk acknowledge each others? No way! You need a mechanism in gnumach <azert>an ack to avoid opening twice the same device <youpi>enumerating doesn't open a device <youpi>it just reads the pci configuration <youpi>which goes through the pci arbiter, so it's safe <azert>youpi what you meant by the chroot /dev/rump <youpi>the chroot /dev/rumpdisk doesn't only enumerate, it also tries to initialize the ahci pci devices <youpi>since it's supposed to serve wd0 etc. <youpi>pci-arbiter should be telling it it shouldn't be doing that because they're busy <azert>one way is to never touch it, but if it happens you’d like a graceful error instead of mayhem, right? <youpi>possibly somehow rumpdisk should tell libpciaccess that it's driving a pci device and then prevent others from driving it too <azert>what if there is more than one pci-arbiter, for instance? <azert>like more than one working as a master <youpi>that's not supposed to happen <youpi>because pci-arbiter checks for the "pci" device already existing on the master port <azert>ok, but you’d want to handle that also gracefully <azert>so that’s implemented at the level of the master port <azert>then the hurdish way would be that pci-arbiter exposes a pci file system, and then use file locking perhaps? <azert>chroot rump will try to lock a file that is already locked by another rump, and die gracefully <youpi>depends what you call "hurdish" <youpi>in the minix meaning, it's i/o itself that would be prevented because processes don't do inb/outb themselves <youpi>but that's terribly expensive <azert>I call “hurdish” not more than the filesystem as a namespace <youpi>since these are root translators, we can trust them to be cooperative <youpi>and tell pci-arbiter which pci devices they will manage <youpi>and that's where the lock is <etno>Currently, rumpdisk uses the code in sys/dev/pci, which does enumerate the PCI devices instead of using pci-arbiter. <youpi>"filesystem as a namespace" is not necessarily hurdish, it's rather plan9-ish <youpi>hurd emphasizes more on RPCs <youpi>having a non-posix RPC to handle such locking is fine <youpi>enumerating the devices is fine <youpi>it's actually trying to driving a pci device that should get some locking <youpi>that conf_write deserves some locking indeed <youpi>but it would make sense that the locking comes way before that <youpi>when the driver probing considers driving that particular pci device <etno>Isn't this going to interfere with another driver already handling a device <youpi>drivers would have to cooperate on this, yes <youpi>you could try to lock the access to the i/o ports, but that'll be messy <etno>pci-arbiter is perfect for the job, I think <etno>What about providing rump with an alternate implementation of libpci, based on pci-arbiter API? <youpi>I mean, it uses libpciaccess, doesn't it? <youpi>but possibly libpciaccess indeed doesn't currently has such locking operation, but it would make sense to extend it this way <youpi>when you try to run several Xorg on Linux, you'd have the same issue <etno>Maybe we simply need to filter the devices that a rumpkernel based driver has access to <youpi>a filtering could work too indeed, but that'd make lspci less useful <etno>It will then enumerate, but only for the allowed devices <Pellescours>in rumpdisk there is some device open to check if dev is already probed, (It’s for gnumach drivers but) can this way also work in this case? <azert>I think that locking is like filtering but done on the client side. Which is maybe better since it allows for corner cases where the driver really wants to ignore the lock <youpi>it needs to ask pciarbiter what it thinks about it, and register that it will be driving it <youpi>in the common case there is just one kernel and thus checking is easy <etno>youpi: a "reservation indicator" <youpi>in the hurd case it needs to be requested from a central place <etno>Such a flag in pci-arbiter would not prevent a race between drivers probing concurrently <youpi>probing is just about reading the config, that's not a problem <etno>Rumpkernel seems to behave badly here <youpi>if so, then it should ask pci-arbiter an enumeration that avoids devices that are already driven <youpi>as opposed to lspci which behaves nicely and thus can enumerate all of them <etno>This is where I see a potential race condition <etno>There is a window between the asking for the reservations and the choice to take a device <youpi>you'd need to make device driving request wait for pci-arbiter enumeration to complete <youpi>it's fine, as long as taking a device can fail <azert>the lock/reservation would consist of a port right, so that pci-arbiter can use dead name notification to free the lock if a driver dies? <etno>While the idea is nice, I fail to see how to implement reservation in rump-pci <etno>At the level of libpci, there are only individual access requests to PCI registers, and afaict, the driver attachment lifecycle is not available. <youpi>that'd probably need to introduce a new operation <etno>The "driver attach" transition exists, but at a higher level. If we provide a custom implementation of libpci... <azert>etno: isn’t libpci called by the higher levels? <etno>Instead of rumpdev-pci -> libpci -> pci-gnu, we would have rumpdev-pci -> glue-to-pciarbiter <etno>azert: it is, but libpci implements direct access to all devices without state and enumeration <youpi>it can make sense to add the operation at the libpci layer <etno>This would bring us farther from upstream, but would certainly be less work, yes <youpi>it could be integrated upstraem <youpi>again, it does make sense when running several Xorg in parallel <azert>youpi: you mean at the libpciaccess layer, etno means the netbsd internal libpci <youpi>in the end it could need to be both <azert>I think that the netbsd people might consider it a bug if their libpci ignore their rump shims <azert>I think they would have the same or worse issues on netbsd or Linux <azert>Rump pci_user is that to take care of it <azert>So they might consider integrating it upstream. But surely they would like to participate in this <youpi>sure, discussion needs to happen <etno>In the meantime, I'll probably implement a basic filtering in pci_user-gnu.c to validate that this is the (only) problem I hit :) <etno>(basic filtering == match device class/id with parameter/env var)