IRC channel logs

2022-02-07.log

back to list of logs

<Pellescours>any idea?
<youpi>look in config.log for what actually happens
<Pellescours>It’s just saying: no
<Pellescours>not really helpfull, but I forced the flag to true to see and…
<Pellescours>/usr/bin/ld : /root/hurd/build/pci-arbiter/../../pci-arbiter/device_map.c:44 : référence indéfinie vers « pci_device_map_legacy »
<Pellescours>library version is not the latest one compared to what hurd expect
<youpi>which version is that?
<youpi>0.16-1+hurd.8 ?
<Pellescours>0.16-3+hurd.1
<Pellescours>the latest I have by doing apt udate
<youpi>(00:23:59) Pellescours: It’s just saying: no
<youpi>in config.log?
<youpi>the config.log created by ./configure in the hurd tree?
<Pellescours>when I search for pciaccess inside yes
<Pellescours>696-configure:6656: checking for liblwip
<Pellescours>697-configure:6715: result: no
<Pellescours>698:configure:6731: checking for libpciaccess
<Pellescours>699-configure:6790: result: no
<Pellescours>700-configure:6948: creating ./config.status
<youpi>so it's just the pkg-config test that fails
<youpi>check pkg-config --exists pciaccess
<youpi>and check that you indeed have pciaccess.pc
<Pellescours>I don’t have pkg-config installed, I remember now. I already had this issue with another VM
<Pellescours>now libpciaccess is found
<Pellescours>I built pci-arbiter, thanks youpi
<Pellescours>youpi: for what I see, it seems that the process is not dead, I can still see my logs comming from pci_device_open() func
<youpi>ah probably it's just the bug about bootstrap not getting properly named
<youpi>so we can't see it in ps
<youpi>so possibly what got bogus is libpciaccess accessing pci-arbiter through device_open("pci")
<Pellescours>yeah, and because pci-arbiter is not properly named another pci-arbiter starts
<youpi>no, the start doesn't care about naming
<youpi>it tries to device_open("pci")
<youpi>see libpciaccess' initialization function
<Pellescours>but I did a settrans -g /servers/bus/pci to prevent the new one to boot and the old one is not providing pcifs tree
<youpi>that's another matter
<Pellescours>so it’s fine if 2 processes pci-arbiter are running?
<youpi>it should be attaching itself to /servers/bus/pci
<youpi>but that's a different story than having device_open("pci") working
<youpi>which is what rumpdisk needs
<youpi>no it's not
<youpi>I mean
<youpi>it's fine if the second is using libpciaccess to access the first
<youpi>it's not if it's using libpciaccess to poke with x86 ports
<Pellescours>So, I only have the pci-arbiter that I launch at boot to start (settrans -g command), and I can see the logs for the pci_device_open.
<Pellescours>I was able to start rump (I can see the logs) but rump was not able to see my ahcisata disk
<Pellescours>my logs shows that at some point, it fails to do device_open when pci_device_open is called
<youpi>possibly e.g. mapping BARs doesn't work or such
<youpi>that can be a debugging starting point indeed
<Pellescours>it succeed to do it during the startup but when I start rump the calls fails
<Pellescours>so it’s certainly a regression in lipbciaccess
<youpi>with more prints you can probably determine what exactly fails
<youpi>and which error number
<youpi>and also print on the other side, in the arbiter, in the device_open RPC
<youpi>(for a start, making sure that it's that that gets called)
<Pellescours>yes, I will print the error
<youpi>in a word: check the whole path, to work out where exactly things get wrong
<Pellescours>during boot pci_device_open is called a lot of times, most succeed but some failed with error "no such device", then when I start rump a fist call succeed and the the 2 others fails with the same error "no such device"
<youpi>note that pci_device_open is a forwarder
<youpi>when calling device_open("time"), that goes through it
<youpi>those are expected, make sure to print the name to rule them out from your debugging
<Pellescours>yes but my logs are after the strncmp("pci"…) call
<youpi>that call doesn't return
<youpi>or do you mean after the if which is after that?
<Pellescours>yeah after the if
<youpi>(that code is bizarre, why setting err, it could just use strncmp in the if itself)
<Pellescours>no
<Pellescours>my logs are inside the if (err)
<Pellescours>so yeah that’s the forwarder function
<youpi>which if(errr) ?
<youpi>(there are two in that function)
<youpi>please really be always as much explicit as possible
<youpi>otherwise it's just gueswork
<Pellescours>the first one. I added a mach_print(name) to know which failing calls are for the pci
<youpi>the first one is precisely *not* for pci questions
<youpi>but for device_open("time") and whatnot
<youpi>which happen to go through pci-arbiter just because it interposes the device master port
<Pellescours>the 2 calls that fails after rump start are for wd0 and disk:wd0 which is normal because rump did not found the sata disk
<Pellescours>So the open of the pci device suceed, I’m adding logs to libpciaccess to understand
<Pellescours>I don’t know if it’s usefull, but when I do lspci with the "normal" boot, I get a result. and when I do lspci with my pci-arbiter started as boot (and no other arbiter) I get an error
<youpi>as root or not?
<Pellescours>yes
<Pellescours>Cannot find any working access method.
<youpi>IIRC lspci uses /servers/bus/pci
<Pellescours>I don’t know if lspci reads the /servers/bus/pci tree or if it does… raced
<Pellescours>that explains
<youpi>which is about pci-arbiter attaching itself to the filesystem, that wouldn't be related to rumpdisk
<Pellescours>I build libpciaccess but I don’t see any .o nor .so :/
<Pellescours>found
<damo22>ouch, looks like a lot of issues here in the backlog of irc
<damo22>i need to go out, i will be back to take a look in ~4 hours
<damo22>hi, did you get any further with your debugging?
<damo22>i havent tried upgrading my hurd system yet to latest packages
<damo22>from reading the above, it sounds like a libpciaccess problem, or invocation of libpciaccess in pci-arbiter?
<youpi>it looks so
<damo22>the q35 machine by default has a AHCI controller, i hacked my qemu to not include the ahci controller by default and then added one manually so i could refer to it in the bus= specification
<damo22>are you sure you are attaching your disk to the correct controller?
<youpi>yes
<damo22>ok i will load my image and check what version of libpciaccess i am using
<damo22>Version: 0.16-1+hurd.7
<youpi>that's an old one indeed
<damo22># dpkg -s hurd|grep Version
<damo22>Version: 1:0.9.git20210811-5+b1
<damo22>so should i attempt an upgrade of my system?
<youpi>you'll probably get the break, yes
<damo22>ok
<damo22>upgrade completed
<damo22>ext2fs: part:2:device:wd0: No such device or address
<damo22>I noticed in libpciaccess you are calling device_close() on pci_port:
<damo22>+ device_close (pci_port);
<damo22>does that still allow the enumeration to occur later?
<youpi>I didn't write that code
<youpi>but in principle the obtained root port should be working fine
<youpi>be the device open that we used to obtain it open or not
<damo22>ok
<damo22>i need to check if this commit is present in Version: 0.16-3+hurd.1 of libpciaccess0
<damo22>* 740d2f2 (origin/master, origin/HEAD) hurd: Restore initialization order
<youpi>it is
<youpi>otherwise nedde wouldn't work at all
<youpi>that was the ponit of that commit :)
<damo22>hmm, thats interesting then, i can only see 3 lines changed in pci_system_hurd_create() and they seem harmless
<damo22>but the rest of the changes, the mapping, i have no idea about that
<youpi>possibly that broke the use in rump, no idea
<damo22>ahh
<youpi>does rump's libpciaccess usage prints warning if pci functions fail?
<damo22>no
<youpi>never leave an error silent :)
<damo22>its probably a regression in pci-userspace
<damo22>i can add more debug prints in the debug mode
<damo22>so did the api change for libpciaccess region mapping?
<youpi>normally they shouldn't have, but see Joan's changes
<youpi>perhaps try to grab version 0.16-1+hurd.8 from snapshot.debian.net
<youpi>that one integrated the map patch
<youpi>from what I can see, there is no source difference between 0.16-1+hurd.8 and 0.16-3+hurd.1
<youpi>and that patch is the only difference between 0.16-1+hurd.7 and 0.16-1+hurd.8
<youpi>it'd be really good if people were testing there changes against various scenarii
<damo22>im a bit confused, pci_device_hurd_map_range() uses _SERVERS_BUS_PCI to read the pci device tree, but when its a bootstrap filesystem, that wont exist right?
<damo22>do we ever use the hurd access method during bootstrap?
<youpi>we are not supposed to
<youpi>that's indeed very probably the problem
<youpi>Joan introduced that so that delegation can work through /servers/bus/pci
<youpi>but indeed at bootstrap that way can't work
<youpi>perhaps pci_device_hurd_map_range should try pci_device_x86_map_range first, and revert to /servers/bus/pci if that fails
<youpi>similarly to pci_system_hurd_create that tries the pci device first
<youpi>note however the comment that Joan had on the list
<youpi>he said that my "fix" patch broke his usage
<youpi>because that ordering doesn't match his case either
<youpi>really, it's just a matter of taking into account the different situations
<youpi>please people avoid stay focused on your situation only, and make sure that all people's situation work :)
<youpi>it shouldn't happen that I'd be the one doing it, since it'd mean having to magically find out the time to do it
<damo22>how can we automate testing this stuff
<damo22>if there was a couple of unit tests, we could ensure nothing breaks before sending in patches
<youpi>debootstrapping a filesystem, running some commands to set up what should be done
<youpi>and then you can run it in qemu
<youpi>yes, unit tests can also help
<youpi>but nothing replaces actual end results
<youpi>the problem with unit tests is getting the situatioin properly
<youpi>here, the bootstrap situation is really not easy to oibtain
<youpi>you'd want pci access
<youpi>which you cannot reasonably do on a running system
<youpi>thus qemu
<damo22>yes, its difficult
<youpi>and it's the eventual situation that matters anyway, so even if unit testing can be helpful to pinpoint, the actual eventual situation is what really needs to be checked
<damo22>i will send an email to the list to start a discussion about this problem
<damo22>maybe Joan can clarify his use case and we can fix it
<youpi>his use case is simply going through /servers/bus/pci
<youpi>so that setting permissions on these files is enough to give somebody access to a pci card