IRC channel logs

<bdefreese>Not sure if anyone's alive in here anymore but hey!

<nikolar>oi

<bdefreese>o/ Do you know if Samuel or any of them are in here much anymore? I haven't been around in years. :(

<nikolar>eh actually no clue

<nikolar>what's samuel's nick

<bdefreese>I'm trying to remember. Youpi maybe?

<nikolar>yeah i think i've seen youpi around

<bdefreese>OK, thanks

<pabs3>wow, blast from the past! bdefreese hasn't been around in years

<damo22>i dont recognise that nick

<damo22>must be before me

<luckyluke42>hi!

<luckyluke42>I don't know if anyone is working on stabilizing x86_64, I think one issue is that when an ipc is interrupted and retried, the retry is somehow corrupted

<youpi>that could explain the issue I had been having with shellscripts

<youpi>possibly related with the redzone or such

<luckyluke42>yes, I see it with that test code

<luckyluke42>I'm not sure, from the trace of the rpc I see that the first io_read_reply() is interrupted, and the second one fails with MIG_BAD_ARGUMENTS

<luckyluke42>which makes the pipe read return an empty string instead of "a"

<luckyluke42>and that breaks the cycle

<luckyluke42>and the first argument of io_read_reply is an int32 instead of an int64

<luckyluke42>it also seems a proper descriptor, i.e. correct name and size, I don'ìt know if it was just a coincidence

<luckyluke42>but I can easily reproduce the issue, and in a few traces I checked I always see the same sequence when the loop terminates

<luckyluke42>what could be a convenient way to test the ipc retry logic? running it with a modified glibc?

<luckyluke42>I probably need to start running subhurds :)

<youpi>I guess it's not just the retry thing that has an issue, but the interruption mechanism

<youpi>some part of the retry thing gets mangling from the interruption mechanism

<youpi>note that you can easily try a program with a just-built glibc thanks to the testrun.sh script

<youpi>no need for subhurds for that :)

<youpi>luckyluke42: MIG_BAD_ARGUMENTS could be simply the send_size parameter getting mangled

<youpi>luckyluke42: when you see the two io_read_reply(), which rpc tracing are you doing? the shell process? could you paste the trace you get?

<luckyluke42>the size and the second parameter are the same, only the first parameter is different

<luckyluke42>I'm using the kernel tracing I sent to the ml some time ago

<luckyluke42>a bit enhanced, to uniquely identify threads and tasks, so I can make sense of the whole system

<youpi>ah, you meant the first argument gets truncated from 64bit pointer to 32bit pointer?

<luckyluke42>yes, but apparently it's only the descriptor filled in differently, the message have the same size

<youpi>but the pointer is supposed not to change

<youpi>the message is supposed to be the same

<luckyluke42>this is the trace I have, the payloads are different https://paste.debian.net/1322700/

<luckyluke42>sorry it's not very clear, but the tracing I'm using is still a bit hacky

<luckyluke42>uh sorry wrong fragment...

<luckyluke42>that is only the ipc failing with the bad argument

<luckyluke42>ah, actually also the size in the msh header is wrong

<luckyluke42>it's 16 bit lower

<luckyluke42>16 byte

<luckyluke42>so maybe it's computed using the 32-bit ABI

<youpi>if the pointer is wrong, all that is read from it will be wrong

<luckyluke42>most of it is good, even the msgh_id

<luckyluke42>this is the message sent in the good and the bad case: https://paste.debian.net/1322702/

<youpi>is that little endian or memory byte order?

<luckyluke42>it's a dump of the msg buffer, in byte order

<luckyluke42>the payload size is actually taken from the mach_msg() parameter, which seems to be the same in the two cases

<luckyluke42>as well as the pointer to the msg buffer

<youpi>luckyluke42: to determine whether it's the stack that is getting corrupted or it's the kernel that is mangling the message, I'd say try to add a pushq $0 before the other pushq in sysdeps/mach/hurd/x86_64/intr-msg.h's INTR_MSG_TRAP, and increase the addq to $24

<youpi>if the corruption is still the same shape, it means it's probably the kernel that mangles the message

<youpi>if the corruption is shifted by 8 bytes, it means it's the stack that is getting corrupted by the signal management

<luckyluke42>ok I will try

<youpi>that being said, the payload of the bad rpc really looks like a normal reply message, with the 32b value

<luckyluke42>in the meantime, I was looking at _hurd_intr_rpc_mach_msg()

<youpi>as if the kernel actually did start the copyout of the reply message

<youpi>but the signal management didn't notice that

<youpi>and thus wrongly thinks that the receive wasn't actually done already

<youpi>i.e. wrongly thinks it can re-send the message

<luckyluke42>the mach_msg() calls were all successful, in the trace the rpc message entry is stored only if the copyin/copyout operation in the kernel succeeds

<luckyluke42>could it be that in _hurd_intr_rpc_mach_msg() the clobber and check structures need the 4 byte padding to align in size with the 64 bit ABI?

<youpi>they're just used over the existing msg pointer, so any required alignment would have been done before anyway

<youpi>and in terms of size, the field members are already supposed to be the proper sizes

<youpi>but perhaps there are some details there indeed

<youpi>it does seem correct to me, though: in all cases, a mach_msg_header_t first (already 64b-size-aligned), then mach_msg_type_t (also 64b-size-aligned), then int code/err

<luckyluke42>shouldn't the save_data hold also the 4 byte padding?

<youpi>I don't see why we should care about the padding

<youpi>it being random wouldn't hurt the message

<youpi>one thing that surprises me, however, is the content of the second payload

<youpi>02200020 01000000 ffffffff 03000000

<luckyluke42>maybe if the field in the request was be a 64 bit integer, the area later used by padding would need to be restored when retrying the rpc... but this doesn't explain the change in msgh_size and field type

<youpi>It means one 32bit int

<youpi>that would here mean -1 (the ffffffff)

<youpi>it's not a proper return value

<youpi>the change in msgh_size and field type can be completely explained by the kernel actually having returned the reply, but glibc not noticing this

<youpi>and thus erroneously re-try sending the message

<luckyluke42>in the correct case it's still -1, just as a 64 bit integer

<youpi>and we're just seeing the reply in the payload

<youpi>yes, that's expected: -1 offset, i.e. fd position

<luckyluke42>it's like the value is only half restored

<youpi>aow wait I understand what you mean

<youpi>yes, the clobber struct would need to be 64b-extended

<youpi>as it its content doesn't express what we want to save, but what got written

<youpi>s/as it/as in/

<youpi>I'm surprised that the code doesn't save the msgh_size field, though

<youpi>ah, but perhaps something like the kernel doesn't care about it and rather looks at the mach_msg send_size parameter, perhaps?

<youpi>the beginning of the payload is still wrong, though

<luckyluke42>yes, the size in the header is ignored in the kernel, and it's overwritten with the mach_msg() parameter

<youpi>ok

<youpi>so at least there's that rounding up clobber that is needed

<luckyluke42>I still don't see how the descriptor could change from 64 bit int to 32 bit int, if it seems that part of the value is restored

<youpi>check itself shouldn't be a problem since there it's really what we want to read

<youpi>yes

<youpi>(and that's what actually would trigger MIG_BAD_ARGUMENTS, since the content of the offset, by itself, doesn't matter)

<youpi>(though a wrong offset would get crazy results of course)

<youpi>luckyluke42: maybe after save_data = m->request.data; add an assert(save_data.type == m->check.type) ?

<youpi>to check that they really align the same

<youpi>I tried by hand and it seemed fine, but who knows

<luckyluke42>yes, better safe than sorry

IRC channel logs

2024-07-08.log