IRC channel logs

2023-09-29.log

back to list of logs

<gnucode>oh...so I met my old college professor today. He has a lab with his minimal OS. He actually gave me a short tour...And he recommended that I could build my own lab with the Hurd...
<gnucode>He said something about making charges to the Hurd, then power cycling the machine...
<gnucode>He is of the opinion that OS development be done with kprintf
<damo22>sometimes i put in printf("W"); into gnumach and reboot, just to see when and how often it executes a code path
<gnucode>hmmm. That's an interesting idea.
<damo22>its basic debugging
<damo22> 28 f58f6660 (((/bin/sh(119)))) [2]
<damo22> 29 f58f63a8 grep(158) [1]
<damo22> 30 f58f62c0 sed(159) [1]
<damo22>db{0}> show all runqs
<damo22>Processor set runq: count(0) low(25)
<damo22>Processor #0 runq: count(0) low(32)
<damo22>Processor #1 runq: count(0) low(32)
<damo22>runqs are empty even though there are plenty of threads
<damo22>so nothing is running, its just idling
<damo22>i think i fixed smp slowness, except i am getting a general protection fault returning from an interrupt
<damo22>what does it mean when iret faults?
<damo22>root@zamhurd:~# nproc
<damo22>6
<janneke>ooh, great -- hoping you find the segfault soon
<damo22>the stack pointer gets set to 0
<damo22>how do i debug with gdb and get a back trace without kdb getting in the way?
<damo22>i guess i can compile gnumach without kdb
<damo22>ok so it was about to call iret, and got a IPI interrupt... is that bad?
<damo22>it also had a level interrupt from ethernet
<youpi>damo22: that's supposed to happen fine
<youpi>as in: IF is supposed to be clear before calling iret, so the interrupt happens *after* the iret
<damo22>the IF was set just before the iret executed
<damo22>i think that caused a fault
<youpi>with a lot of IPIs you can end up with cascaded interrupts that fill the stack
<youpi>why is the IF set just before iret is executed?
<damo22>not set "just as", i meant at the point where it wanted to iret, it was still set
<damo22>is it possible that th->processor_set is not set?
<youpi>it's null in the thread_template
<youpi>and it's set to null in pset_remove_thread
<damo22>!!!
<youpi>so that's probably expected it can be null
<damo22>that would be causing it
<youpi>but possibly it's always set to non-null before unlocking the thread
<youpi>thread_deallocate is the only caller of pset_remove_thread
<damo22>can i just use default_pset instead of th->processor_set ? we dont need it
<youpi>and thread_create initializes the pset from the parent_task
<youpi>so possibly the pset gets passed from tasks to tasks to threads
<youpi>starting from thefirst task
<youpi>well, for a start you can put an assert to check whether it does happen to be null
<damo22>ok
<youpi>not respecting the pset would mean not respecting the thread binding
<youpi>we probably don't want that
<damo22>i think we only have one processor_Set
<youpi>yes but later people may create others
<youpi>we don't want to break that
<damo22>ok
<youpi>gotta go
<damo22>thanks
<solid_black>hello!
<damo22>hi
<solid_black>hi damo22!
<damo22>did you see my latest patch
<damo22>smp almost boots correctly
<solid_black>I did see that you sent a patch, and it sounds great! but I haven't looked too closely
<solid_black>perhaps you could explain it to me?
<solid_black>why is dispatching to an idle processor bad?
<damo22>the run queues are almost empty and the code that schedules directly to idle processors seems to slow it down
<solid_black>why does it slow it down?
<damo22>so while its deciding if it should schedule onto an idle processor it could just put the thread on a run queue and the idle threads will detect it
<damo22>i think thats why
<solid_black>no really, why does it take that long to check whether it can be scheduled onto an idle processor?
<damo22>i think it takes locks that are expensive perhaps
<damo22>or there is a side effect of not putting the thread actually on a runq
<solid_black>maybe; but if that's the case everything everywhere would be super slow
<damo22>it was
<solid_black>is there a way to profile gnumach and get like a flamegraph?
<damo22>idk
<damo22>but there is a remaining bug
<damo22>it calls iret and explodes during boot
<damo22>general protection fault
<damo22>hmm or page fault
<damo22>login: Kernel Page fault trap, eip 0x46, code 0, cr2 46
<damo22>kernel: Page fault (14), code=0
<damo22>Stopped at 0x46:Kernel Page fault trap, eip 0xc1034d35, code 0, cr2 46
<damo22> Caught Page fault (14), code = 0, pc = c1034d35
<damo22>Trouble printing location 0x46.
<damo22>it seems like the net_write() call caused the cpu to jump to 0
<damo22> https://paste.debian.net/plain/1293485
<gnucode>morning all!
<solid_black>hello gnucode!
<solid_black>I saw you recreated the event once again? why? are you sure everybody got it?
<gnucode>I edited the event. I just changed the text that it said. Then google asked if I wanted to resend the the updated info.
<gnucode>Everybody on the Hurd side has accepted the invitation.
<solid_black>but according to the message I got from Google, you actually cancelled it, and then created a new one
<gnucode>I did accidentally press the delete event button, then pressed undo.
<gnucode>as far as I can tell it's still the same event.
<gnucode>sorry for the confusion.
<solid_black>*I* can handle it, that's not a concern
<gnucode>As smart as you are, I bet Kent is smart too. :)
<solid_black>Kent is much smarter than me no doubt, but we're already stretching his willingness to spend his time on this
<solid_black>also I collected a list of things I'd like to discuss, not sure an hour will be enough
<solid_black>we'll see though
<gnucode>I'm going to go work out for a half hour now. And you might be right. But it is also possible that he will be very cordial.
<solid_black>also, I did enable PipeWire media support in Firefox, but it still doesn't capture video from my front/selfie camera
<solid_black>so you won't be able to see me
<gnucode>last thing...I have to use a windows computer to join the meeting. My linux machine is not accepting my camera...
<gnucode>and I don't know how to record video on this windows computer that I do not own.
<solid_black>ohhhh that's an awesome idea, I'll reboot into windows
<solid_black>how did I not think of that myself
<gnucode>hmm. I'll think about that after I work out. gotta get started.
<solid_black>sure, go ahead, ping me when you get back
<solid_black>going proprieatary all the way, aren't we
<solid_black>so much for a GNU project :D
<nikolar>kek
<wleslie>kent = kent mcleod? that could be entertaining
<gnucode>wleslie: bcachefs author. kent overstreet
<gnucode>nikolar: did you get the invite?
<nikolar>i did
<gnucode>good.
<nikolar>yup, all good
<gnucode>sweet action.
<nikolar>yeah
<xelxebar>Man, that sounds like a cool conversation to be in on!
<gnucode>xelxebar: We are planning on recording said conversation.
<gnucode>I don't want to invite everyone, because I am trying to avoid it turning into a circus. :)
<wleslie>front page of website mentions capnproto, good sign
<gnucode>solid_black I am back now. eating bkfast. will shower soon.
<gnucode>nikolar: if you wanna see a young Einstein, you could log into the meeting room now.
<gnucode>that would just let us double check things.
<gnucode>well it looks like you are on.
<solid_black>so I rebooted into Windows, let Windows Update do its thing, and now I can no longer boot into GNU/Linux
<wleslie>shame it's GPLv2 only, like btrfs
<gnucode>solid_black: that's super annoying!
<gnucode>hahaha
<solid_black>guess I'll have to figure it out after the meeting
<solid_black>the camera works nicely in Edge though
<solid_black>is anyone in the room currently? should I try joining?
<gnucode>nikolar: and I are
<gnucode>come on in.
<gnucode>solid_black: what distro do you use?
<solid_black>Fedora (with linux-surface stuff) on the host, typically Debian in VMs
<gnucode>gotcha. I'm a guix system fan myself. and OpenBSD.
<gnucode>-
<damo22>is the meeting now? oh man
<solid_black>no, we're just testing our setups
<damo22>its 11:30pm here
<damo22>i think i will pass
<solid_black>it's still 1.5 hours until the actual meeting starts
<damo22>ok
<wleslie>in a week you'll be in DST and you'll have a real hard time of international meetings ^___^
<damo22>yep
<damo22>us aussies have a hard time with international meetings
<janneke>gnucode: most probably i'll be tied up this afternoon, i'll see if i can find a moment to attend...
<xelxebar>gnucode: Cool. Looking forward to the video!
<gnucode>janneke: no worries
<janneke>ACTION has a special day; their daughter returns home for a bit after 2months of schooling in norway
<gnucode>awesome!
<gnucode>have fun!
<janneke>ty!
<janneke> https://hostux.social/@fsf/111144420428636640
<janneke>(fresh mentioning of old news, still nice)
<solid_black>the meeting isĀ / will be at https://meet.google.com/zzt-gnez-wvf, if anyone else wants to join
<wleslie>you had a good hack though, I'm glad they bought the holiday forward
<wleslie>fridays off always seem more productive
<damo22>me? yes
<damo22>im trying to fix the last known bug with gnumach smp
<wleslie>I saw; interrupts have been perplexing me for the last couple of months too
<damo22>maybe i should try compiling with -O0
<wleslie>it's fun that it says /page fault/. Are you getting vector 0xd or 0xe?
<damo22>i dont know
<damo22>i thought it was a general prot fault
<damo22>because it happens when iret is called
<damo22>but maybe its randomly hitting that or a page fault after
<damo22>when it tries to push something on the zero stack
<wleslie>could it have happened while handling the interrupt, and now that you're returning it can proceed?
<damo22>something happened on the edge of returning from an interrupt, iret caused a general prot fault
<damo22>the iret instruction iteslf
<damo22>itself*
<wleslie>I mean, it's possible that the interrupt occurred earlier, but it was masked, right? 
<damo22>i remember reading on osdev that if interrupt flag is set before iret is called, you can get a general prot fault
<damo22>but i dont know why
<damo22>during an interrupt
<wleslie>you mean once you handle one interrupt, if there's a pending interrupt, you'll hit it when you iret?
<wleslie>did you clear the previous interrupt?
<wleslie>this is an ipi via the lapic?
<damo22>yes this is a ipi
<damo22>there is a comment in the old code that says you should call the EOI before handling the interrupt so it can occur again
<damo22>for pmap update ipi
<damo22>seems like 0x58 level interrupt is getting stuck
<wleslie>I don't have anything on how to ack an ipi (haven't gotten that far). I do have a note saying that you can take exceptions on IRET if someone loads a segment selector with a nonsense value.
<damo22>hmm maybe the initial value of gs is garbage
<wleslie>segment selectors are used to hold thread-local storage on gnu systems, and it's possible to remove those pages from the pmap I guess
<damo22>it does not use gs
<damo22>but we use gs to hold percpu area
<damo22>maybe when it restores gs, gs has a nonsense value?
<wleslie>it could be set to nonsense by the user, but I'm not sure why anyone would be messing with it outside of glibc
<wleslie>we get GP if the code segment is bogus, according to the note I have
<damo22>im pretty sure we save gs value upon entering the kernel and restore it on exit
<damo22>and in between, we set it to 0x68
<damo22>which is set up by the gdt
<wleslie>are you looking at i386at/interrupt.S ?
<damo22>locore.S
<wleslie>looks sensible
<damo22>for the first time i was able to get a shell and ssh to -smp 2
<damo22>is there a simple command i can use to stress 2 cores?
<damo22>without creating files on my disk
<janneke>stress -c2?
<janneke>hmm, stress is not a GNU tool, make that
<gnucode>hello!
<janneke>stress -c 2
<damo22>i dont have that
<janneke>apt install stress?
<janneke>(it _might_ be linux specific, dunno!)
<damo22>ah yes
<janneke>meanwhile, /me tries "guix shell stress -- stress -c 2" in a childhurd
<damo22>damn, the ethernet level triggered interrupt is stcuk
<damo22>stuck
<damo22>maybe that is a ioapic specific issue
<wleslie>goodnight. hope you get some good sleep
<wleslie>ACTION <- sleep(28800)
<damo22>Explicit EOI is only supported for IOAPIC version 0x20
<janneke>yep seems to work
<janneke>(in my no-smp childhurd)
<damo22>qemu is emulating version 0x11 of ioapic
<gnucode>someone wants to join the call
<gnucode>sorry...a little late
<gnucode>nikolar: are you trying to join?
<nikolar>yeah i am in
<nikolar>thanks
<gnucode>yup
<nikolar>I am on my phone and can't reply on call
<gnucode>woo hoo!
<gnucode>that was an awesome chat!
<gnucode>Gooberpatrol66: you have the recording I believe. thanks again for that!
<Gooberpatrol66>shit, there's a loud grinding noise in the recording
<Gooberpatrol66>i have a broken laptop fan, it must be from that
<Gooberpatrol66>you guys didn't hear that through my mic?
<Gooberpatrol66>also my video and nikolar's comments are cut out of the screen, sorry
<nikolar>Nope
<gnucode>that's totally ok.
<gnucode>We did not hear the loud grinding noise at all.
<gnucode>honestly a video, even with poor sound, is better than nothing.
<gnucode>basic lessons learned:
<gnucode>90% of bcachefs runs in userspace
<gnucode>Kent really likes rust, and encourages us to use Rust when writing new filesystems.
<gnucode>Kent believes and much of linux's libraries could and should be modified to run in userspace.
<gnucode>and then we could use those libraries in userspace.
<gnucode>on the hurd.
<Gooberpatrol66>him trying to get all of linux VFS running through fuse is cool, that makes a fuse translator an even better sell
<gnucode>I believe he said that linux's hash tables could mostly already be used on the Hurd right now in userspace.
<gnucode>true. It was nice that he was very laid back.
<gnucode>solid_black definitely helped on asking the technical questions.
<gnucode>Gooberpatrol66: had some good questions too!
<gnucode>Gooberpatrol66: thinks that we should host Linus Torvalds next and restart the age old debate of microkernels vs. monolithic kernels!
<Gooberpatrol66>i will avenge tannenbaum
<gnucode>Kent also invited anyone to help out with bcachefs. Apparently he does a lot of mentoring for people wanting to become developers.
<gnucode>Gooberpatrol66: hahaha!
<gnucode>youpi: provided that I have your blessing to invite other cool software people to talk to the Hurd people...
<gnucode>would you like to join in these talks? What days and times work for you?
<gnucode>also bcachefs is licensced GPLv2. google owns the copyright on much of bachefs' code. So it will most likely stay GPLv2.
<Arsen>that's fine, if it's 2+
<Gooberpatrol66>the video has been uploaded
<Gooberpatrol66> https://youtu.be/bcWsrYvc5Fg
<Gooberpatrol66> https://yewtu.be/watch?v=bcWsrYvc5Fg
<gnucode>wooo hoo!
<gnucode>people are already teasing me about my "not trying to smile face".
<gnucode>at work
<gnucode>Gooberpatrol66: can you put a link to Kent's patreon on the youtube description
<Gooberpatrol66>google needs to verify my face before i can put urls in the description, which takes 24hrs
<damo22>what is vector 0 for idt?
<damo22>some kind of fault?
<damo22>seems to be divide by zero
<youpi>gnucode: well, the thing is that if the meeting goes bad, that can bring bad press, we don't really want that. Inviting Linus is probably not a good idea, notably
<Gooberpatrol66>yeah that was definitely a joke
<nckx>gnucode: Congrats, it seems like as usual I missed something cool.
<damo22>youpi: it seems that qemu is emulating an old ioapic version that does not support directed EOI per irq
<damo22>this is a problem for level triggered interrupts
<damo22>there is a workaround apparently in linux where you mask the irq then change the trigger mode or something like that, which i did implement in hurd but i dont know if its working
<damo22>i introduced a check for ioapic version
<damo22>but i did not submit it as a patch yet
<damo22>i am noticing that ethernet interrupt is getting stuck, 0x58 (level)
<gnucode>nckx: haha. No worries. I just think it's funny that I have a silly smile on my face for most of the interview.
<gnucode>youpi: sounds good. It was interesting to hear kent talk about using various kernel libraries in userspace. But apparently there is a bit of work to make that happen
<gnucode>also most of the linux kernel is GPLv2 only.
<gnucode>to hear kent encourage to use various kernel libraries in userspace.*
<damo22>Kernel General protection trap, eip 0xc100a997, code 0, cr2 c10a8190
<damo22>kernel: General protection (13), code=0
<damo22>Stopped at all_intrs+0xcf: iret
<damo22>The General Protection Fault sets an error code, which is the segment selector index when the exception is segment related. Otherwise, 0.
<damo22>so its not segment related