IRC channel logs

2023-06-18.log

back to list of logs

<almuhs>hi. I have a problem running Debian GNU/Hurd with my smp gnumach. The kernel crash just after start the booting https://pasteboard.co/ruL0UbnhhUY7.png
<almuhs>here is my compiler configuration https://github.com/AlmuHS/gnumach_dev_scripts/blob/main/compile_scratch.sh
<almuhs>and here my qemu conf https://github.com/AlmuHS/gnumach_dev_scripts/blob/main/qemu-hurd.sh
<almuhs>i tested both with 1 or more cpus, with same problem
<almuhs>fix: with APIC and only 1 cpu, it boots
<almuhs>but the problem persists with more than one cpu
<almuhs>damo22 do you have the same problem?
<damo22>no
<damo22>i havent tried compiling gnumach on linux before
<almuhs>damo22 what flags do you use
<almuhs>?
<almuhs>in configure
<damo22>i dont remember, something similar to your script
<almuhs>../configure --host=i686-gnu CC='gcc -m32' LD='ld -melf_i386' --enable-apic --enable-kdb --enable-ncpus=$NUM_CPUS --disable-ide --disable-linux-groups
<damo22>i dont use --disable-ide
<almuhs>these are the mine
<damo22>disable linux groups is enough
<almuhs>i use that to enable rumpdisk
<damo22>yeah but if you disable linux groups, theres no ide driver
<almuhs>ok, i will fix it
<damo22>its a useless flag in combination with that
<almuhs>removed flag. Trying again
<damo22>what do you set NUM_CPUS To
<damo22>i use 8
<almuhs>2, 4 ...
<damo22>if you allow 8 it should work on any intel system up to 8 cores
<damo22>eg i7
<almuhs>yes. But now i'm testing in Qemu
<almuhs>in the qemu script I set the number of cpus manually
<damo22>thats ok, you can still run qemu with -smp 2
<damo22>and it will only give the machine 2
<almuhs>yes
<damo22>but gnumach can still have 8
<almuhs>yes
<almuhs>8 as maximum
<damo22>s/8/up to 8
<damo22>can you try 8
<damo22>maybe theres a bug that only shows up when <8
<almuhs>yes, i know that bug
<damo22>??
<almuhs>wait, i understand bad
<almuhs>i know a bug which cause a panic when -smp 8 and ncpus=8
<damo22>i mean, can you please compile with NUM_CPUS=8
<damo22>and see if it works
<almuhs>ok, wait
<damo22>with -smp 1 or 2
<almuhs>some weeks ago, my gnumach worked with the same configuration than now
<damo22>which git commit?
<damo22>master?
<almuhs>yes
<damo22>ok
<almuhs>from upstream without any aditional patch
<damo22>should work then
<almuhs>but, after latest git pull, a week ago, this problem appeared
<damo22>if its broken, see if you can find the commit that broke it
<almuhs>yes, it's a possibility
<damo22>i have not submitted anything recently
<almuhs>i know. But I remember a recent commit which modified some thing related with apic
<almuhs>in the 64-bit work
<almuhs>i think that modified the casting in the search of apic tables in acpi
<damo22>i can have a look at the commits
<damo22>54a4ca27230ae85bf75804d5d581ebf68e620cee
<almuhs>send link
<damo22>maybe the offset for apic is always 32 bit
<almuhs>it's posible
<almuhs>xAPIC is 32-bit based
<damo22>are you booting a 32 bit kernel?
<almuhs>yes
<damo22>yeah im not sure that commit is correct
<damo22>if
<almuhs>maybe i can undone
<almuhs>and try again
<damo22>but vm_offset_t is 32 bit on 32 bit machine?
<almuhs>i'm not sure
<almuhs>reverting commit
<damo22>but part of that commit is makefile
<damo22>to allow 64 bit to use apic
<damo22>ideally you just revert the code part
<almuhs>reverting in local, only for test
<almuhs>this revert don't fix the problem
<damo22>i386/include/mach/i386/vm_types.h:typedef uintptr_t vm_offset_t;
<almuhs>and... what is the size of vm_offset_t?
<damo22>machine dependent
<almuhs>ok
<damo22>i think its 32 bit when compiled for 32 bit
<almuhs>probably
<damo22>but i think acpi table is not 64 bit
<damo22>even on 64 bit machine
<damo22>the pointers stored in the table i mean
<almuhs>it's possible, because xAPIC is 32-bit
<almuhs>and the xAPIC tables are in ACPI tables
<damo22>yes
<damo22>so the commit is most likely wrong
<almuhs>but i'm not compiling for 64-bit, as i know
<damo22>i will compile latest master to confirm
<damo22>yep its broken
<almuhs>we have to rollback to find the commit which broke that
<almuhs>but there are so many commits :-$
<damo22>bisect
<almuhs>what is this?
<damo22>roll back exactly half of the commits
<almuhs>yes, it's a good idea
<damo22>if it still breaks, the commit that broke it is in the remaining half
<almuhs>yes
<damo22>i found one that works
<damo22>about half way down
<damo22>* 25a3748b (HEAD) Fix task_info for TASK_THREAD_TIMES_INFO.
<damo22>so the broken one is more recent than that
<almuhs>i have this works 6b6d49a71f016a4ea60c4ae63af8dfd8c76f55ba
<damo22>thats older than mine
<damo22>we already know that works
<damo22>try one above mine in the log
<damo22>can you build this one ee2e9072
<almuhs>ok, i will try it
<damo22>i found another that works
<almuhs>which?
<damo22>keep going
<damo22>* d9c47d8e (tag: works2) x86_64: push user's VM_MAX_ADDRESS
<almuhs>this works?
<damo22>yes
<almuhs>it's just before fix APIC commit
<almuhs>54a4ca27230ae85bf75804d5d581ebf68e620cee
<damo22>* 377a9387 (HEAD, tag: works3) kdb: Add showing new 64bit registers
<damo22>this works too
<damo22>how did you go with ee2e9072
<almuhs>wait, i was checking other commits
<almuhs>this works 54a4ca27230ae85bf75804d5d581ebf68e620cee
<damo22>theres no point checking that one, ive already covered it
<almuhs>now compiling ee2e9072
<damo22>its one of these:
<damo22>* ed7f24de (origin/master, origin/HEAD) Fix copying in MACH_PORT_DEAD on x86_64
<damo22>* 5e597575 x86_64: add a critical section on entry and exit from syscall/sysret
<damo22>* 54d025d4 x86_64: use solid intstack already during bootstrap
<damo22>* 2e6b257f copyinmsg: allow for the last message element to have msgt_number = 0.
<damo22>* f09a574a intr: Fix crash on irq notification port destruction
<damo22>* ee2e9072 x86_64: add 64-bit registers when dumping thread state
<damo22>* 4677606b x86_64: enable code for managing interrupts
<damo22>* d972c01c pmap: only map lower BIOS memory 1:1 when using Linux drivers
<almuhs>ed7f24de (origin/master, origin/HEAD) Fix copying in MACH_PORT_DEAD on x86_64 : failed
<almuhs>fix: the failure is in that ee2e9072
<damo22>you need to find a commit that works followed by a commit that failed
<almuhs>yes
<almuhs>now i have a commit that fails
<damo22>which one
<almuhs> ee2e9072
<almuhs> ee2e9072 x86_64: add 64-bit registers when dumping thread state
<damo22>ok so its one of these:
<damo22>(10:18:13) damo22: * ee2e9072 x86_64: add 64-bit registers when dumping thread state
<damo22>(10:18:13) damo22: * 4677606b x86_64: enable code for managing interrupts
<damo22>(10:18:13) damo22: * d972c01c pmap: only map lower BIOS memory 1:1 when using Linux drivers
<damo22>no wait
<damo22>yes
<damo22>you compile 4677606b
<damo22>and i will compile the next one
<almuhs>ok
<almuhs>4677606b x86_64: enable code for managing interrupts FAILED
<damo22>mine failed too
<damo22>so mine is the culprit
<damo22>this commit broke the build * d972c01c pmap: only map lower BIOS memory 1:1 when using Linux drivers
<almuhs>maybe this code has broke the paging mapping
<almuhs>the paging needs a lineal mapping before enable, do you remember it?
<almuhs>the temporary mapping
<damo22>i think the temporary mapping is needed for acpi
<damo22>not just linux drivers
<damo22>so indeed this is broken
<almuhs>and maybe for paging
<damo22>since the tables are sitting in low memory
<almuhs>trye
<almuhs>true
<damo22>youpi: (10:27:09) damo22: this commit broke the build for ACPI tables * d972c01c pmap: only map lower BIOS memory 1:1 when using Linux drivers
<almuhs>then, taking notes, we have to ask to revert
<almuhs>d972c01c pmap: only map lower BIOS memory 1:1 when using Linux drivers
<almuhs>and undone the casting changes in 54a4ca27230ae85bf75804d5d581ebf68e620cee
<damo22>thanks for your help almuhs
<almuhs>we are in the same team ;)
<damo22>:)
<almuhs>meanwhile we have time to debug the scheduling problems, we have to make care that the other work doesn't break the our
<almuhs>**we have no time
<youpi>we have to discuss on the list, rather :)
<youpi>the commit probably has some goal
<youpi>we can't just walk back blindly
<almuhs>true
<damo22>ok i can send an email to the list with our findings
<almuhs>maybe there are other solution to get the objective of these commits
<almuhs>send mail and we try to find another solution
<damo22>ok done
<almuhs>thanks
<almuhs>now i'm improving my upload kernel script to avoid boot the machine to copy the kernel file
<almuhs>ready https://github.com/AlmuHS/gnumach_dev_scripts/blob/main/upload_kernel.sh
<damo22>why not just compile inside hurd
<damo22>install and reboot
<almuhs>it's even slower
<almuhs>and then i will have to sync my sources from host to hurd each time i have some change in the code
<damo22>my sources are inside hurd
<damo22>on a different partition
<almuhs>but, do you edit code inside hurd?
<damo22>yes
<damo22>all my hurd development is done inside hurd
<almuhs>i prefer use GUI editors in Linux
<damo22>ok
<almuhs>i go to sleep. 3:13 AM in spain
<damo22>ouch
<almuhs>bye
<damo22>see ya