IRC channel logs
2025-07-13.log
back to list of logs
<almuhs>And I want to know if could be possible using it like <sneek>Welcome back almuhs, you have 1 message! <sneek>almuhs, youpi says: the normal build already produces libtrivfs.a and the debian package already installs it <damo22>apt install ./myhttp/path/to/web/package <almuhs>but the settrans command for httpfs require a website <damo22>you can settrans the base host path <almuhs>if we can download the deb using a translator, apt must not know that the deb file is downloaded <almuhs>the problem if that httpfs translator doesn't support https <damo22>there is little point using apt to do that <damo22>since you can already use a local mirror <almuhs>but dpkg and apt, if receives a deb file path as argument, simply install it (and apt check dependencies) <damo22>if httpfs doesnt support https then you can only fetch from http hosts <damo22>almuhs: smp is mostly fixed now, but theres a new vm bug <almuhs>and, can you fix the rumpdisk detection issue? <almuhs>i had the hope that the IRQ issue with rumpnet was the same that crash rumpdisk <damo22>theres only two bugs i know of remaining, the rumpdisk detection issue and this new vm bug <almuhs>and, what matters with this new bug? <damo22>i dont know enough about the vm subsystem to tell you <damo22>but when i compile heavy on all cores, gnumach backtraces <damo22>yesterday all the known irq issues have been fixed and merged in <almuhs>there was a remaining bug some time ago that crash gnumach with certain combinations of NCPUS in compilation and -smp in Qemu <almuhs>depending of the NCPUS value in gnumach compilation, and the number of cpus set in qemu, gnumach crashed <damo22>no i think the problem was that if you boot with -smp 2 it would hang <almuhs>with some combinations, gnumach crashed and showed the debug console <damo22>NCPUS must be greater or equal to the -smp number <damo22>ie, you cant run gnumach on a machine with more cores than the number it was compiled with <almuhs>if NCPUS is less than -smp, gnumach simply must not be able to detect all cpus <almuhs>but must not crash (and most case doesn't crash in this case) <almuhs>the cpu searching loop finish when i = NCPUS <damo22>i think there is an assert somewhere <damo22>but its not an interesting limitation <almuhs>when I implemented this, I set that the cpu search will stop when doesn't find more cpus, or when reach tne NCPUS value <damo22>gnumach supports 256 cores maximum <damo22>maybe we should always compile smp with --enable-ncpus=256 <almuhs>because is the max quantity supported by xAPIC <almuhs>this was my first implementation ;) , hardcoding -mach-cpus=256 <damo22>then we dont have strange limitations <almuhs>but gnumach usually doesn't crash in this situation <almuhs>simply, if you have -smp 8 and NCPUS=4, gnumach only will detect 4 cpus <damo22>its not a useful restriction i mean <damo22>if you have 8 cores, you probably want to use them all <almuhs>youpi told that assume that always the MAX is 256 is inneficient <damo22>yeah it allocates more memory for stacks etc <almuhs>currently, the gnumach-smp offered in repositories is compiled with NCPUS=8 <almuhs>there are Xeon with many more than this, but currently we are not testing Hurd on Xeon <almuhs>i have the NCPUS as a argument in my compilation script <almuhs>so from time to time i test with different combinations <almuhs>i remember a different type of crash, in which gnumach shows a backtrace <damo22>its not a crash, rumpdisk is still running <almuhs>i don't remember, i will have to repeat the test. But It was a gnumach crash <almuhs>exec translator was a old bug, which is supposed fixed a few years ago <almuhs>the exec bug caused that the boot keeps freezed in the exec loading <almuhs>sometimes exec freezed, sometimes exec not freezed <almuhs>maybe the exec's bug was not fixed properly <damo22>can you test with init=/bin/bash <almuhs>i told you that because maybe it's related with your problem <almuhs>i don't have downloaded your latest patches <damo22>i mean does setting that command line param allow one to test without the exec problem? <almuhs>maybe kdb is more useful for that? <damo22>how do you get a backtrace from a task in kdb <almuhs>i don't remember, i barely use kdb <almuhs>"apt upgrade" - done, git pull from upstream - done , compile gnumach with smp - in progress <damo22>dont forget, your smp will be restricted to one core mostly, with APs in slave pset <damo22>mine has full smp because i reverted that <almuhs>yes, because i haven't make the git revert yet <almuhs>which is the commit's number to revert? <damo22>you can add remote and fetch from my repo <almuhs>meanwhile, i've just compiled with NCPUS=16, as a experiment <damo22>then you can check out my branch <almuhs>ok, i was confused because "git log" shows the commit in chronological sort <damo22> lol = log --graph --decorate --pretty=oneline --abbrev-commit --all <almuhs>testing your smp kernel. Compiled with NCPUS=16 and booted with -smp 8 works fine <almuhs>i notice a little bug with the tty. Ctrl+c was a bit buggy <damo22>but if you use a lot of cores simulaneously it will crash <almuhs>now the tty doesn't execute my commands <damo22>does that give you a shell again? <almuhs>pruebas@debian-hurd:~$ ps -e | grep netdde <almuhs>root 551 - S<o 0:04.78 /hurd/netdde <almuhs>pruebas 873 p0 S 0:00.01 grep netdde <damo22>wow thats crazy, it doesnt work for me <damo22>are you sure you compiled with my full smp code <damo22>i think you have the code from master <almuhs>i made a git clone of your repository and git checkout full-smp <damo22>i think netdde works with restricted smp <almuhs>almu@debian:~/gnumach-sv$ git lol <almuhs>* cb4823f8 (HEAD -> full-smp, origin/full-smp) Revert "smp: Create AP processor set and put all APs inside it" <almuhs>* 340c089a configfrag: Enable HW_FOOTPRINT on smp <almuhs>* 4160c2b8 i386/configfrag: Make --enable-apic the default <almuhs>* 135fdbc0 (origin/master) ioapic: Add conditional TMR bit in EOI (no-op) <almuhs>* ea38b460 ioapic: Make it clear that multiple ioapics don't quite work yet <almuhs>* dfa25cdd irq: make it clear what irq_lock protects <almuhs>* c6181cdb i386/irq.c: Make irq nesting smp safe <almuhs>* b7fbb06f i386/smp.c: Change order of waiting for pending ICR <almuhs>* 6fbf3116 trap: Fix printf format <almuhs>* b40fadc3 i386 intel read fault fix <almuhs>* 4687a5ff i386 kern: fix overflow in vm_object_print_part call <almuhs>* 0bb929fa kdb: Fix printf format warning for phys_addr_t <almuhs>* 3dd7daf5 i386: trap.c add prototype for handle_double_fault <almuhs>* 878635ec i386 ldt.c make ldt_fill static <almuhs>* 87b6b23c ioapic.c: Fix default polarity and trigger mode for irqs <almuhs>* 608d1e19 tests: also disable stack protector <almuhs>* 9aecfa65 tests: Also try to use mig as USER_MIG when not cross-building <damo22>did you run autoreconf -fi and rebuild that one? <almuhs>and the hilos application shows that the APIC ID is not always 1 <almuhs>the hilos application execute 16 pthread calling to CPUID and showing the APIC ID by screen <almuhs>but in my installation there are not rumpnet, i think <damo22>did you install the kernel you built? <almuhs>the smp from the repository will shows always 0 in the hilos app <damo22>thats why i wrote rumpnet, because netdde wasnt working with smp <almuhs>i can try a different combination of NCPUS and -smp <almuhs>but there are a new bug in my case: the tty finally crashed <damo22>hmm youre right, with -smp 8 netdde works! <almuhs>the mistery of the bugs which only appears in certain combinations <almuhs>but now the tty survived after kill the hilos app from ssh <almuhs>i go to check if the console failure appears in a process which show many information by screen <almuhs>rumpdisk temporary crash with "ls -R / " <almuhs>the second attempt of "sudo ls -R /" works fine <almuhs>then the problem must be in my "hilos" application. Must be a pthread related problem <almuhs>new testing, booting with -smp 5. Freezed in rumpdisk <almuhs>-smp 6 works fine. And the Ctrl+C works better with my hilos application <almuhs>netdde works in this configuration <almuhs>with NCPUS=4 and -smp 8 the system reboot inmediately. This must not be a issue <almuhs>in previous implementation, in this case gnumach only detect the cpus until NCPUS, ignoring the others. Maybe there are a unnecesary assert() ? <almuhs>i go to sleep. In some -smp i required many attemps to boot, because rumpdisk freeze <damo22>sneek: later tell almuhs i changed the init and startup sequence to send to all except self to wake up all cpus, because one by one could not address all lapics, so if you compile gnumach with enable-ncpus < cpus in your system, it will crash because some of the cpus you didnt enumerate will still wake up <damo22>youpi: when usb fails and hogs irq11 i get this backtrace from outside: <damo22>#0 ioapic_write (id=0 '\000', reg=38 '&', value=81979) at ../i386/i386at/ioapic.c:166 <damo22>#1 ioapic_write_entry (apic=0, pin=11, e=...) at ../i386/i386at/ioapic.c:188 <damo22>#2 ioapic_irq_eoi (pin=11) at ../i386/i386at/ioapic.c:328 <damo22>#3 0xc104bc40 in interrupt () at ../i386/i386at/interrupt.S:102 <damo22>irq 11 is unmasked at this point and seems like there is irq 11 storm <damo22> pin 11 0x000000000000c03b dest=0 vec=59 active-hi level fixed physical <damo22>it seems the protection for the masking is not enough <damo22>ok so on UP, inside ioapic_irq_eoi() after masking and setting to edge, as soon as the level entry is restored, if the interrupt line is still high it immediately triggers a new interrupt before lapic_eoi has a chance to execute <damo22>so i think the lapic_eoi needs to be done first <youpi>damo22: apic doesn't seem to be working when the linux groups are enabled? <damo22>ive been bypassing linux drivers for ages <damo22>i think the lapic_eoi indeed needs to be moved before the ioapic eoi <damo22>because for a level triggered interrupt, after the trigger mode is altered and masked, as soon as it the entry is restored it can retrigger before lapic_eoi has a chance to execute <damo22>then the interrupt gets stuck on <youpi>I don't know the details, but I would have understood the converse: if you lapic_eoi, you tell that you're ready to get another interrupt, and there you go you have it, so calling it earlier will just let ioapic raise it immediately? <damo22>well maybe, but it should at least be protected with ioapic_lock? <youpi>ioapic_lock is only about protecting the hardware access <youpi>to avoid several cpus mangling it at the same time <damo22>i was getting 3M interrupts on irq 11 <damo22>as soon as it unmasked, it raised again <damo22>not unmasked, but restored the original level mode <damo22>so as soon as the eflag was restored it raised <damo22>ioapic_lock is also an eflag protector because its an irq lock <youpi>but inside the interrupt handler we're supposed to have IF cleared, don't we? <youpi>(that just pushes the issue later on iret, though) <damo22>i get two __disable_irq calls per ioapic_irq_eoi calls <youpi>if you are single-stepping in qemu+gdb, I wouldn't be surprised that they get interrupt masking wrong <youpi>apic makes rumpusbdisk way faster, 500KB/s instead of 100Ko/s <youpi>(that's still quite slow, though) <damo22>dd if=/dev/ud0 of=/dev/null bs=1M count=100 takes about 10 seconds <youpi>-usb -device usb-storage,drive=stick <damo22> -drive if=none,id=usbstick,format=raw,file=testzero.dd \ <damo22> -device usb-storage,drive=usbstick \ <youpi>with xhci I'm getting 18MB/s indeed <youpi>no, it just emulates a usb bus <youpi>with linux I get 1MB/s with the same uhci setup <youpi>so it's not that bad comparatively, actually <youpi>true tests should be done on actual hardware :) <youpi>ah, xhci performance actually *lowers* with apic, here, from 25MB/s to 18MB/s <damo22>there is something still fishy with apic eoi <damo22>so we are hitting the code path with !has_irq_specific_eoi on qemu <damo22>we are calling ioapic_irq_eoi before calling the handler, and interrupts are not disabled <youpi>don't we mask the interrupt on the ioapic before eoi? <damo22>so i think its possible a level triggered line that stays high can keep triggerring interrupts before any of them are handled <youpi>it's done by the processor on interrupt entry <youpi>othewise you wouldn't be able to cope with a device overflowing with irqs <damo22>if i run two drivers on the same device, which i probably shouldnt, the irq masking gets out of sync <youpi>that's not supposed to happen: the irq code waits for ack from both drivers before unmasking <damo22>yeah, somehow the counter skips past 1 on disable so never gets to mask it <damo22>or it gets unmasked and never masked again <damo22>can we make disable mask unconditionally? <damo22>why do we need to be frugal with the masking <damo22>like only doing it the first time <youpi>I mean, ducktape-coding is no good <youpi>if there is something wrong, better see it wrong <damo22>i saw 3M interrupt nesting levels <damo22>you said that irq_lock is protecting ndisabled, but what protects the act of masking? <youpi>the decision of masking is the same <youpi>the act of masking is the ioapic lock <almuhs>damo22: i've just removed the -M q35 flag in my qemu script, and now i found a network problem when i boots with less than 8 cpus. Maybe is the problem that you refered <sneek>Welcome back almuhs, you have 1 message! <sneek>almuhs, damo22 says: i changed the init and startup sequence to send to all except self to wake up all cpus, because one by one could not address all lapics, so if you compile gnumach with enable-ncpus < cpus in your system, it will crash because some of the cpus you didnt enumerate will still wake up <almuhs>this is the error. But i'm not sure if the problem is with netdde or with rumpnet <almuhs>this lock shows in -smp 2,3,4 and 6 <almuhs>Pending to test in a real machine <almuhs>my compilation options are this: ../configure --host=i686-gnu CC='gcc -m32' LD='ld -melf_i386' --enable-apic --enable-kdb --enable-ncpus=$NUM_CPUS --disable-linux-groups <almuhs>compiling with ncpus=16, and booting with -smp 10, the system boots mostly fine (after two attempts)