IRC channel logs

<matrix_bridge><Andrius Štikonas> oriansj, fossy: what do you think about https://github.com/fosslinux/live-bootstrap/pull/339 I'm a bit reluctant to have those assembly functions in wrap.c...

<matrix_bridge><Andrius Štikonas> Maybe it's better to put them into M2libc?

<oriansj>stikonas: The syscalls definitely belong in M2libc; the wrap.c program minus the assembly syscalls definitely should go in mescc-tools-extra

<stikonas>do you want to leave a comment there?

<stikonas>or should I write

<oriansj>I can leave a comment

<stikonas>thanks!

<oriansj>done

<Googulator>Is it intentional that the "src" command in builder-hex0 never closes the file descriptor it writes to?

<Googulator>I tried adding a close system call to the end, but it actually breaks things

<stikonas>I think only rickmasters know this...

<stikonas>s/know/knows/

<stikonas>oriansj: thanks again!

<fossy>stikonas: yeah i agree with the functions going in M2libc. I'm not sure i understand the overarching purpose of the PR yet either though. while it does reduce the binary seed, in such a mode bwrap is such a small part of the binary seed cause you have a whole Linux userland underneath, unless i'm misinterpreting the PR?

<fossy>stikonas: also if you could review #337 #340 #341 (but don't merge, I'd like them to be merged at the same time as the big PR) i'd appreciate that. those 3 are relatively smaller PRs

<fossy>but i still want eyes on them

<rickmasters>Googulator: The src command doesn't call close because close was not implemented when the src command was originally written

<rickmasters>Googulator: close was implemented later as part of per-process file descriptors

<rickmasters>Googulator: I probably should have added a close call to src when per-process file descriptors was implemented.

<rickmasters>Googulator: Without the close, the kernel process (process zero) will probably overflow its file descriptor table.

<rickmasters>Googulator: From inspection, it appears that overflowing the table will go into memory that is not used yet, so technically the close does not appear to be necessary

<rickmasters>Googulator: But I don't like that so I think adding a close is a good idea.

<rickmasters>Googulator: Just looking at the source, I can't see why that would not work, so I'm surprised you ran into trouble.

<rickmasters>Googulator: close sets eax to zero but that's normal for a system call and I don't see that causing any trouble.

<Googulator>rickmasters: the effect of adding a close is that hex2-0 never completes linking catm

<Googulator>./x86/artifact/hex2-0 ./x86/catm_x86.hex2 ./x86/artifact/catm is the command that fails

<Googulator>it just seems to loop forever

<rickmasters>Googulator: Hmm that's mysterious because that comes well after src commands are done

<rickmasters>Googulator: I can try to reproduce that by inserting a close here:

<rickmasters> https://github.com/ironmeld/builder-hex0/blob/3f20b992161a1e1976549d5a74db617164b0a1ac/builder-hex0.hex2#L2020

<Googulator>that's exactly what I did

<Googulator>I have

<Googulator>:src_finish

<Googulator>EB 07 # 89 D3 # mov ebx, edx

<Googulator>B8 04 00 00 00 # mov eax, 6 ; syscall_close

<Googulator>CD 80 # int 80

<Googulator>5F # pop edi

<Googulator>5E # pop esi

<Googulator>5A # pop edx

<Googulator>59 # pop ecx

<Googulator>5B # pop ebx

<Googulator>58 # pop eax

<Googulator>CB # ret

<Googulator>EB 07 to skip it - delete it to actually run the close

<rickmasters>shoudn't that be B8 06 00 00 00 ?

<Googulator>oh sh*t...

<Googulator>you're right

<Googulator>BTW, is scratch actually used here for anything?

<Googulator>:readwrite_loop

<Googulator>85 C9 # test ecx, ecx

<Googulator>74 !src_finish # jz src_finish

<Googulator>BF 00 02 04 00 # mov edi, 0x00040200 ; scratch buffer

<Googulator>57 # push edi ; save buffer address

<Googulator>31 DB # xor ebx, ebx ; ebx=0=stdin

<Googulator>9A &read 08 00 # call read

<Googulator>89 D3 # mov ebx, edx ; prepare to write

<Googulator>5E # pop esi ; restore buffer address to esi

<Googulator>9A &write 08 00 # call write

<Googulator>49 # dec ecx ; count--

<Googulator>EB !readwrite_loop # jmp readwrite_loop

<Googulator>...but nothing in there seems to use edi or esi

<rickmasters>right, that looks unused. Its probably left over from old code that worked differently. I'd have to research that. Good find.

<Googulator>OK, with the correct system call number, it now works in qemu

<Googulator>but it unfortunately didn't solve the lockup on bare metal

<rickmasters>Googulator: Does the lockup happen every time? I seem to remember you said you got the bootstrap to work on your birthday...

<Googulator>It happens every time if the drive controller is in standard or enhanced IDE mkde

<Googulator>*mode

<Googulator>Works in AHCI or RAID

<Googulator>But only one of my boards can actually enable and boot from AHCI or RAID

<Googulator>the others either lack those options in the BIOS, or require the boot drive to be configured as IDE

<rickmasters>ok

<rickmasters>To explain the unused scratch, this is left over from when builder-hex0 only supported text files and src read/wrote a line at a time into a scratch buffer

<rickmasters>That changed to a byte at a time here: https://github.com/ironmeld/builder-hex0/commit/66ca665f1645646494f4813db8c5250742649e0d

<rickmasters>I should have removed the buffer setup in that commit. I'll work on that.

<Googulator>The freeze issue is reproducible also on a HP D530 USDT with a Socket 478 Pentium 4 (with a modified builder-hex0 that puts file data @ 0x14000000 instead of 0x54000000 - of course, such a kernel will never successfully run a real build, but it's enough to test the srcfs reading code)

<Googulator>Totally different BIOS, totally different chipset

<Googulator>I don't have a period AMD system to test with, unfortunately

<Googulator>& yet it somehow works in qemu with SeaBIOS, even if I use legacy IDE as the emulated HDD interface

<rickmasters>Googulator: I wish I could help more. I've got hardware to test with but I won't be at that location until late December.

<rickmasters>Googulator: The nature of the problem makes it difficult to determine where the fault lies.

<rickmasters>it's hard to imagine that such varied hardware has the same kind of problem - software seems more likely.

<rickmasters>Working towards the simplest test case possible seems like the way to go.

<rickmasters>Googulator: looking back at your previous comments it looks like you said that after changing stage1 memory writes to noop you were able to read consistently?

<GoogulatorMobile>rickmasters: in stage1 only

<GoogulatorMobile>The same stage1 code with a real stage2 triggers the bug in stage2

<GoogulatorMobile>I've also seen the bug trigger as early as sector 0x96 (srcfs starts at 0x93 with my current code)

<GoogulatorMobile>If I force stage1 to read garbage data, it reads well past that

<GoogulatorMobile>Actually not the same stage1 code, since what I did there was to not store read bytes at all

<rickmasters>right, so with a hacked up stage1 you can consistently read much further into the disk without lockups?

<GoogulatorMobile>Yes, stage1 with the stosb commented out reads all the way to the cylinder boundary

<GoogulatorMobile>Even the unmodified stage1 can read up to about 600KiB (where it runs out of low memory)

<GoogulatorMobile>Stage2 fails at a random point, no more than 512 sectors in (sometimes as early as the 4th sector read)

<GoogulatorMobile>& it fails regardless of which reading code I use

<GoogulatorMobile>CHS one matching stage1

<GoogulatorMobile>Or my new LBA one

<GoogulatorMobile>Both show the same behavior

<GoogulatorMobile>If I had to guess, we don't quite return to real mode properly

<GoogulatorMobile>Hmm...

<GoogulatorMobile>Do we disable Gate A20 when returning to real mode?

<GoogulatorMobile>I think not

<GoogulatorMobile>...flaky connection

<rickmasters>Googulator: something related to switching back and forth between 16-bit and 32-bit was my first guess as well.

<rickmasters>It's the biggest difference between stage1 and stage2.

<rickmasters>Just getting the switching to work at all was a big challenge. There isn't much code available as a guide so I had to piece it together with some guesswork.

<stikonas>fossy: I'll take a look at your PRs, but a bit later, maybe next week

<fossy>stikonas, thanks, no rush

IRC channel logs

2023-11-30.log