IRC channel logs

2024-04-07.log

back to list of logs

<azert>solid_black: as a real dumbass I just tried to boot your kernel on an arm64 board just to see what would happen. Of course it didn't work, but would you like to check my steps to see if there is not anything obviously wrong?
<azert>I guess you are sleeping.. just in case you are curious, this is what I ended up with so far https://paste.debian.net/1313336/
<azert>solid_black: when you will be available, I will have a few questions
<gnucode>sneek: later tell azert did you try to boot an AArch64 Mach on an AMD processor? That's daring my friend!
<sneek>Okay.
<azert>Hello
<sneek>azert, you have 1 message!
<sneek>azert, gnucode says: did you try to boot an AArch64 Mach on an AMD processor? That's daring my friend!
<azert>gnucode: it’s a allwinner a64. Basically an Arm Cortex-a53
<anatoly>azert have you prepped device tree, I'd guess this is a requirement. I also wonder if it's possible to re-use device trees from linux?
<azert>anatoly: what I did is to recompile u-boot myself and use the device tree for my board in the u-boot source tree compiles during the process
<azert>u-boot alone is a singular beast of a boot loader, I don’t think we should look into Linux as much as u-boot
<azert>I was thinking of driving some leds to the gpio for debugging, or perhaps using memory to store info that will persist after the cpu reset
<azert>I cannot envision any other ways at this point
<azert>I’ve some notes on what I did, please tell me if you are interested in sharing. I’m not making them public because nothing works so far
<azert>My intuition is that the fancy el2 and el3 should come handy for debugging
<azert>But I cannot find anything useful on the net
<azert>Maybe is just Google that felt in the shitter and me refusing to use the other alternative
<azert>There should be a standard way to do kernel debugging on arm from the higher execution levels, eg attaching some sort of gdb or at least proprietary debugger up there
<anatoly>azeem: I don't have an unused aarch64 board but I have another board with armv7
<azert>They force you to install this thing https://github.com/ARM-software/arm-trusted-firmware that serves absolutely no purpose
<azert>anatoly : did you ever do debugging with that?
<anatoly>azeem: can you please tell me more about u-boot and device trees? How does is end up in mach?
<anatoly>azeem: nope
<azert>I just followed solid_black indications
<azert>You boot gnumach exactly like Linux
<azert>Im afraid arm is playing some dark cabal game with the debugger, keeping basic info secret
<azert>I’ll just go with leds. In theory once the IO layer is initialized one can proceed with the in-kernel debugger
<anatoly>azert: won't it look like some in-kernel support for debugging (kdbg) then jtag and then gdb on the other side
<azert>anatoly: for that you need the IO to be working
<azert>We are not there yet
<azert>It would be obvious that Arm had some lower lever debuggers available since they force you into an useless firmware already, but it don’t
<azert>Even Linux allows you to kgdb only after the IO layer is initialized
<azert>The https://github.com/ARM-software/arm-trusted-firmware the way it is now is just gay as fuck, with no offense meant for gay people, I just cannot make any sense of that. They are providing you a scaffold with a few drivers that may as well live in el1
<azert>When I buy a PIC processor I get an assembler and a debugger for free, that’s the bare minimum you would expect from an embedded processor
<azert>It has been like that from at least the late 90s
<azert>i know that these cortex-a were meant to run phones but then why even selling dev boards at all?
<azert>Sorry for the rants , but even grub pales as « simple » if compared to u-boot. U-boot is a full os that don’t even have the decency to include a tool like MSDoS debug to help you
<anatoly>:-)
<azert>Ok, I figured out that the a64 has a jtag interface, but has been hijacked by the board developer for things like the power button
<solid_black>hi!
<sneek>Welcome back solid_black, you have 1 message!
<sneek>solid_black, gnucode says: that I am about to submit a qoth for Q1 of 2024. I would like to mention for Alpine Hurd distribution. Do you have a git repo somewhere?
<azert>Hi!
<solid_black>sneek: later tell gnucode, the temporary repo is at https://github.com/bugaevc/aports/ (see README.hurd.md)
<sneek>Will do.
<solid_black>azert: thank you so much!
<solid_black>this is really cool that you're experimenting with it
<azert>Glad you are happy
<azert>Thing is that I’m struggling in figuring out how to debug
<solid_black>so the thing that is "obviously wrong" is that you're setting size cells / address cells to 2, and then only passing a single cell
<solid_black>however, looks like it crashes before it gets to parse that
<azert>Do you know where it crashes?
<solid_black>at 0x0000000040000084, evidently
<solid_black>your u-boot says it moved the image from 0x42000000 to 0x40000000
<azert>That’s quite soon
<azert>Yes it does move stuff for no reasons
<solid_black>so it's 0x84 bytes into the image, right?
<azert>Yes
<solid_black>can you look at what's at that address?
<solid_black>in your build
<azert>I think I can, it’s your branch latest commit btw
<solid_black>that's _start+52 in my build here
<solid_black>it's not, let me push the latest & greatest ~~bugs~~ fixes and performance improvements
<solid_black>pushed
<azert>Ok I’ll try your latest and come back to you
<solid_black>I don't know how one would debug on bare metal, something something JTAG but I have no idea what it is or does
<solid_black>we're not going to have Linux-like kgdb, at least for foreeable feature
<solid_black>Mach KDB is more possible
<solid_black>but that's not going to help you if it crashes early
<azert>I know, probably leds can halp
<solid_black>also one possible reason for why it would crash: it tries to write things to UART, and UART is not yet wired up super properly
<solid_black>it will read the base address from the device tree, but who knows whether that worked in your case
<solid_black>can you share the full dtb / dts?
<azert48>If it crashed at that point, I’d be super happy
<azert48>Yes I will in the afternoon
<solid_black>what's afternoon? :D
<azert48>Well when I’m back on the computers
<solid_black>I mean, which TZ are you in?
<azert48>France
<solid_black>that's UTC+2, right? ok
<solid_black>so, 0x84 is "adr x1, .boot_stack_end" in my build, it shouldn't be anything different in yours since we're still in pure asm at this point
<solid_black>yes, u-boot's little disassmbly there confirms it
<solid_black>good
<solid_black>now let's look at what that esr is
<solid_black>that's ESR_EC_IL, illegal execution state
<solid_black>meaning we performed a bad eret
<solid_black>any idea whether your board gets entered at EL{1,2,3}?
<azert48>How can loading a register fault with this?
<azert48>Im pretty sure it’s el1
<azert48>But how would I check?
<solid_black>no, what happens is we eret with an apparently invalid execution state indicated in spsr, and doing that works in a funny way
<solid_black>it does eret, but keeps the execution state / el as is, and sets the IL flag in PSTATE
<solid_black>so the very next instruction traps with ESR_EC(esr) = ESR_EC_IL
<azert48>That would mean it didn’t start in el1
<solid_black>and, there's also a little other thing
<solid_black>which is that you can actually intentionally set IL
<solid_black>and I actually let userland do that in gnumach
<solid_black>i.e. you can set IL for yourself with thread_set_state
<solid_black>and it will instantly trap
<solid_black>and I have a special-purspoe exception code just for this
<solid_black>anyways
<solid_black>so what we have is 1. gnumach doesn't think it's entered at el1
<solid_black>2. it tries to drop down to el1
<solid_black>3. that fails
<azert48>Wow, That was fast
<solid_black>? we still don't know what exactly is wrong
<azert48>u-boot might be running in el3
<azert48>Let say we started in el2
<solid_black>let's try to remember / figure out what that ((0x7 << 6) | (5 << 0)) value stands for
<solid_black>I think I stole it from some online guide back when I was writing that part
<solid_black>but now that I have a much better understanding (& definitions) of all the SPSR bits, we should be able to break it down
<solid_black>and see if it's wrong
<solid_black>look at aarch64/aarch64/pcb.c for the bit definitions
<solid_black>5 is SPSR_SPSEL_N | SPSR_EL of 1
<solid_black>nRW is not set, as it shouldn't be
<solid_black>the value is 0x1c5 btw
<solid_black>the 'c' is F & I masked
<solid_black>it would be a good idea to mask A (& D?) too at this point
<solid_black>well, the '1' is exactly A being masked, good
<solid_black>and that's all there is too it
<solid_black>why does it IL then?
<solid_black>it could be some stupid "secure"/"non-secure" thing, does your hardware have it?
<solid_black>it also could be that RW in either SCR_EL3 ot HCR_EL2 indicates AArch32 for EL1
<azert>Yes
<solid_black>can you read registers from u-boot?
<azert>Im sure my hw have all that
<azert>You can read registers
<azert>But I’m not sure you can set breakpoints
<solid_black>can you please see the values of SCR_EL3 and HCR_EL2?
<solid_black>"ATF will then drop into U-Boot proper (in EL2)" => sounds like u-boot and gnumach are entered at EL2
<azert>Yes I can do this it will take me some hours or days
<azert> https://patchwork.kernel.org/project/kvm/patch/1454522416-6874-23-git-send-email-marc.zyngier@arm.com/ Linux can work on el2
<solid_black>I'm planning to teach Mach to be able to run in EL2 as well, for virtualization
<solid_black>but I'm going to require VHE (arm 8.1+)
<solid_black>anatoly: I could tell you more about device trees if you're interested :)
<solid_black>how's the risc0v port doing?
<biblio>solid_black: still no new update yet.
<solid_black>azert: well, I'm dump, we have the full register dump, so we can see what EL that was in by looking at x4
<solid_black>s/dumpdumb/
<solid_black>aaaaaargh
<solid_black>but you get the point
<solid_black>and since x4 is 8, this is EL2 indeed
<solid_black>x5 is 0x40000084 (where we eret'ed) and x6 is 0x1c5 (what we tried to set as CPSR), as expected
<solid_black>so this must be HCR_EL2.RW being 0, which might just be the reset value
<solid_black>see, we got useful info even from the little information that we had :)
<anatoly>solid_black: not doing :-D haven't spent much time on it
<solid_black>:|
<solid_black>well, and I haven't spent much time on the alpine thing, so I guess we're even :D
<anatoly>solid_black: does u-boot read dtb file?
<solid_black>it does
<anatoly>and then passes it to linux, for example
<solid_black>though I'm not sure if it uses it for much, since you configure u-boot for a particular board at build time
<solid_black>yes, it loads it, it can print / modify nodes, and eventually it passes the tree on to the kernel that it boots
<anatoly>I now remembering that dtb files are under u-boot boot difrectory in armbian. I was "patching" board's dts to change mode of otg usb port
<anatoly>solid_black: thanks for explanation
<anatoly>solid_black: re. "alpine" hurd, have you done more changes locally?
<solid_black>I don't remember if I pushed this, but they have done some changes to drop more libc abstractions
<solid_black>basically to hardcode musl in more places instead of "libc"
<solid_black>and I had measures to undo/counter that
<solid_black>along with a rebase
<solid_black>but that's a couple of months old by this point anyway, would need another rebase
<anatoly>I need to finish my little shell script and push stuff as well
<solid_black>hmm, so qemu too resets HCR_EL2 as 0
<solid_black>so why doesn't this crash in qemu?
<anatoly>that script will produce disk image within container, so basically run build container, then run script in container and you'll get an image to play qemu
<solid_black>actually it does crash :D
<solid_black>but I'm 95% sure I tested this
<anatoly>*to play with in qemu
<solid_black>anatoly: sounds great!
<solid_black>another important TODO item that started hacking on back then is netdde
<anatoly>what's wrong with it? :-D
<solid_black>I should push my changes, and then it'd be great if you could look into netdde
<anatoly>you mean package it, etc
<solid_black>if I had 100x more time, I'd make us an awesome new dde, one that'd be Hurd-native from the start and would run modern linux drivers and not some ancient broken code
<solid_black>packaging it, yes
<solid_black>which is non-trivial
<anatoly>also would damo's feature for xattrs help to build an image for qemu for example?
<solid_black>maybe, yeah
<solid_black>you'd have to write a separate script to set static translator records with xattrs from Linux
<solid_black>youpi: low-priority ping
<youpi>?
<solid_black>I wanted to run my ideas/understanding for how virtualization would work by you
<anatoly>as I understood doing it the current way depends on hurd-specifics in ext2 which are not available on non-hurd environments but using xattrs solves this issue
<youpi>(I advise not to try to stuff the linux network stack in a translator, linux is way too volatile for a stable approach, see how netdde was supposed to be maintained, but nobody took the time to, while the bsd drivers should be really little problem)
<anatoly>solid_black: isn't it rump-stuff replacing netdde in future?
<solid_black>potentially, maybe
<solid_black>but we want networking working now, don't we
<youpi>rump seems a simpler way to get something now
<solid_black>youpi: so, I've been reading about VHE, which is an arm v8.1 feature that makes it possible to run geenral-purpose kernels (and not special hypervisors) in EL2
<youpi>what is EL2? something like the intel rings?
<solid_black>yes
<solid_black>EL0 is userland
<solid_black>EL1 is kernel
<solid_black>EL2 is hypervisor, or, with VHE, the host kernel, EL1 is guest kernel then
<youpi>ok, so sort of hypervisor level
<solid_black>yes
<solid_black>Mach currently runs in EL1
<solid_black>I'll teach it to run in EL2 if VHE are persent
<solid_black>but then, we need a model/API for running VMs
<solid_black>and a rather nice and logical and beautiful design
<solid_black>is just using the save abstractions we already have
<solid_black>a VM is just a special Mach task
<solid_black>and a vCPU is a thread in that task
<solid_black>you'd use all the existing vm_ APIs to manipulate what the VM sees as its physical memory
<solid_black>the VM task cannot make syscalls of its own
<youpi>I was wondering about that
<solid_black>but it can run into exceptions, and Mach will send them off to the exception port, as usual
<youpi>how there's a difference between the syscalls from guest kernel and from guest guest
<solid_black>QEMU, or whatever hypervisor, would catch the exceptions, and do whatever it wants
<solid_black>guest guest's syscalls go to guest's kernel, that traps to EL1, not EL2
<solid_black>but when the kernel tries to write to an MMIO address to output something via UART for example,
<solid_black>well, the hypothetical QEMU port would ensure that there is no VM mapping at that "physical" address
<anatoly>do we need like virtio in mach?
<solid_black>so the VM task will get EXC_BAD_ACCESS => QEMU catches that implements the UART write
<solid_black>anatoly: I looked into virtio and decided it's too complicated to be implemented in Mach, let's leave that to userland
<solid_black>but I'm not a virtio expert by any means
<solid_black>oh, the guest kernel can do "hypercalls" (HVC instruction)
<solid_black>that does trap to EL2
<solid_black>but Mach wouldn't really implement any semantic for them
<solid_black>it will just make them into exceptions and let qemu do whatever it wants to do with them
<solid_black>from what I've seen so fat, PSCI is available via HVC
<solid_black>PSCI includes stuff like "start that CPU", "stop that CPU", "shut down the whole system"
<solid_black>starting/stopping vCPUs is of course just thread_resume() / thread_suspend() that qemu would do
<solid_black>shutting down is task_terminate(vm) + exit()
<solid_black>this all makes sense to me nicely, and I've read that newer nanokernels do things that way
<solid_black>because a nanokernel and a hypervisor is basically one and the same, and that makes perfect sense now too
<solid_black>does Intel vt-x also let you implement something like this?
<youpi>vt-x is quite similar
<youpi>though quite involved since the x86 semantic is stuffy
<solid_black>the big difference between this plan and KVM / Hypervisor.framework is with them, the guest runs inside your thread synchronously IIUC
<solid_black>i.e. you do kvm_run(), and it runs in your thread until a "vm exit" happens
<youpi>yes
<youpi>isn't that what you propose too?
<solid_black>whereas in this model, you make a separate task and a thread within it, and resume that
<youpi>ah, ok
<solid_black>and it runs independently from you
<solid_black>and a "VM exit" is a Mach exception that's delivered to you
<solid_black>you can block on incoming exceptions from the VM of course
<youpi>I don't know how userland kvm-qemu tells the kernel about the guest pagetable etc.
<youpi>but creating a task looks a very sane way to do it
<solid_black>KVM_SET_USER_MEMORY_REGION
<solid_black>see https://github.com/dpw/kvm-hello-world/blob/master/kvm-hello-world.c
<solid_black>i.e. they have a special-purpose API to map things into the guest's address space
<solid_black>and a special-purpose API (that is not ptrace) to access vCPU registers
<solid_black>whereas we'd just have a task with an vm_map, and threads with PCBs, with all the usual APIs
<solid_black>and not just user-level APIs, that'd literally be the implementation
<solid_black>this might be a better example https://zserge.com/posts/kvm/
<solid_black>all the things like paging out/in VM's "physical" memory will work in the natural way too
<solid_black>and to load a blob (a kernel, a dtb, ...) into guest's memory, you'd just vm_map it there from a file on the host
<solid_black>set MAP_COPY to make the guest memory overwritable
<solid_black>ARM 8.4+ also has a Nested Virtualization extension
<solid_black>that lets you make it look to the guest kernel in EL1 that it's running in EL2
<solid_black>with decent performance overhead
<solid_black>(so it can then run a nested VM)
<solid_black>but supporting that requires a bunch of tricky code, so I'm not implementing that any time soon
<solid_black>azert: pushed a fix for entering at EL2
<solid_black>things work for entering at EL2 on QEMU now
<solid_black>we drop down to EL1 (enabling AArch64 for EL1 first, that was the bug), and keep booting there
<solid_black>I've also written the experimental VHE / E2H code path
<solid_black>we start booting in EL2 w/ E2H, but die on user memory access
<solid_black>need to research how PAN interacts with E2H
<pavlx>hello
<pavlx>Is it possible to see a picture of Gnu/hurd that is running on the computer?
<solid_black>"When the value of this PAN state bit is 1, any privileged data access from EL1, or EL2 when HCR_EL2.E2H is 1, to a virtual memory address that is accessible to data accesses at EL0, generates a Permission fault"
<solid_black>that sounds like accessing EL1-accessible memory from EL2 doesn't cause a PAN fault
<solid_black>got booting in EL2 to work!
<solid_black>as in, all the way, with a Hurd starting and a Unix PID 1 running as expected in userland
<solid_black>no VMs yet, obviously
<solid_black>and pushed
<solid_black>azert: the last commit may breaks things again for you, please try reverting it if it does
<azert>solid_black: I think your plan for virtualization is super sweet
<azert>I’ll try your fixes soon and tell you if I need to revert the last commit and where we arrive in both cases
<azert>solid_black: tried, in both cases the system reboots without any outputs
<azert>hinting to the fact that probably your exception handlers has been installed over the ones of u-boot
<azert>which I interpret as good news
<azert>although I was hoping for the same output..
<azert>could you tell me where the exception handlers are installed in the source?
<solid_black>azert: the exception handlers are installed by aarch64/aarch64/locore.S:load_exception_vector_table()
<solid_black>which is called from aarch64/aarch64/model_dep.c:c_boot_entry()
<solid_black>right after we enable MMU & switch to highmem
<azert>thanks, i'll try to skip that
<solid_black>which would mean that we succeed in enabling the MMU
<solid_black>why skip it?
<solid_black>the most likely reason for you not seeing any output is that gnumach doesn't pick up your uart
<solid_black>is it a pl011?
<azert>no I think it's another uart
<solid_black>so you
<solid_black>'ll need to implement support for it
<azert>it's not described as a pl011 in the device tree
<azert>that's the plan
<azert>to implement it
<solid_black>awesome!
<solid_black>and when you try to do that, you'll discover that we don't have a nice story about dynamically dispatching to the right uart
<solid_black>and that things are half-hardcoded
<azert>problem is that i'm not sure we arrive there, I was hoping that if I disable the interrupt handlers I'll discover if it dies accessing it
<solid_black>if you do know your uart's base address, you can insert debug prints here and there
<solid_black>to see what gets reached and what doesn'y
<solid_black>just be aware of phys_to_virt
<azert>true
<azert>that's a good plan!
<azert>I'll get back to you
<solid_black>yes, please keep me updated
<solid_black>do you know if your hardware supports VHE or not?
<azert>I think it does
<solid_black>I should really look more into how VHE works
<azert>but not sure 100%
<solid_black>because right now it's in the "my code is working, but I've no idea why" state
<solid_black>I've enabled some bits in HCR, and it just worked
<solid_black>except for WFI trapping, which now doesn't work
<azert>nice
<solid_black>and I don't feel like I understand what's going on
<azert>I'm sure it's not trivial tech
<solid_black>whereas for the last month or so, I did have the feeling that I know exactly what each individual instruction does
<azert>did you check if booting in el1 still works?
<solid_black>so yeah, point is, I need to look more into VHE / E2H
<solid_black>no reason for it not to work, but sure, let me check
<solid_black>yep, still workd
<pavlx>Good evening, i go to take my dinner here in Italy, have a good day at all
<Gooberpatrol66>wow that's wild
<Gooberpatrol66>til glusterfs took concepts from hurd https://www.gnu.org/software/hurd/open_issues/glusterfs.html
<Gooberpatrol66> https://glusterdocs-beta.readthedocs.io/en/latest/overview-concepts/translators.html
<azert>solid_black: it arrives to somewhere in boot_script_exec and dies there
<azert>very very deep
<azert>I think it is useless to debug the way I'm doing further, since I'd say everything works
<azert>I'd like to port/implement the serial driver, where should it be plugged in? where is it called?
<azert>is it in device_service_create ?
<azert97>ok it's in walk_dtb_visit_node