IRC channel logs

2024-07-08.log

back to list of logs

<bdefreese>Not sure if anyone's alive in here anymore but hey!
<nikolar>oi
<bdefreese>o/ Do you know if Samuel or any of them are in here much anymore? I haven't been around in years. :(
<nikolar>eh actually no clue
<nikolar>what's samuel's nick
<bdefreese>I'm trying to remember. Youpi maybe?
<nikolar>yeah i think i've seen youpi around
<bdefreese>OK, thanks
<pabs3>wow, blast from the past! bdefreese hasn't been around in years
<damo22>i dont recognise that nick
<damo22>must be before me
<luckyluke42>hi!
<luckyluke42>I don't know if anyone is working on stabilizing x86_64, I think one issue is that when an ipc is interrupted and retried, the retry is somehow corrupted
<youpi>that could explain the issue I had been having with shellscripts
<youpi>possibly related with the redzone or such
<luckyluke42>yes, I see it with that test code
<luckyluke42>I'm not sure, from the trace of the rpc I see that the first io_read_reply() is interrupted, and the second one fails with MIG_BAD_ARGUMENTS
<luckyluke42>which makes the pipe read return an empty string instead of "a"
<luckyluke42>and that breaks the cycle
<luckyluke42>and the first argument of io_read_reply is an int32 instead of an int64
<luckyluke42>it also seems a proper descriptor, i.e. correct name and size, I don'ìt know if it was just a coincidence
<luckyluke42>but I can easily reproduce the issue, and in a few traces I checked I always see the same sequence when the loop terminates
<luckyluke42>what could be a convenient way to test the ipc retry logic? running it with a modified glibc?
<luckyluke42>I probably need to start running subhurds :)
<youpi>I guess it's not just the retry thing that has an issue, but the interruption mechanism
<youpi>some part of the retry thing gets mangling from the interruption mechanism
<youpi>note that you can easily try a program with a just-built glibc thanks to the testrun.sh script
<youpi>no need for subhurds for that :)
<youpi>luckyluke42: MIG_BAD_ARGUMENTS could be simply the send_size parameter getting mangled
<youpi>luckyluke42: when you see the two io_read_reply(), which rpc tracing are you doing? the shell process? could you paste the trace you get?
<luckyluke42>the size and the second parameter are the same, only the first parameter is different
<luckyluke42>I'm using the kernel tracing I sent to the ml some time ago
<luckyluke42>a bit enhanced, to uniquely identify threads and tasks, so I can make sense of the whole system
<youpi>ah, you meant the first argument gets truncated from 64bit pointer to 32bit pointer?
<luckyluke42>yes, but apparently it's only the descriptor filled in differently, the message have the same size
<youpi>but the pointer is supposed not to change
<youpi>the message is supposed to be the same
<luckyluke42>this is the trace I have, the payloads are different https://paste.debian.net/1322700/
<luckyluke42>sorry it's not very clear, but the tracing I'm using is still a bit hacky
<luckyluke42>uh sorry wrong fragment...
<luckyluke42>that is only the ipc failing with the bad argument
<luckyluke42>ah, actually also the size in the msh header is wrong
<luckyluke42>it's 16 bit lower
<luckyluke42>16 byte
<luckyluke42>so maybe it's computed using the 32-bit ABI
<youpi>if the pointer is wrong, all that is read from it will be wrong
<luckyluke42>most of it is good, even the msgh_id
<luckyluke42>this is the message sent in the good and the bad case: https://paste.debian.net/1322702/
<youpi>is that little endian or memory byte order?
<luckyluke42>it's a dump of the msg buffer, in byte order
<luckyluke42>the payload size is actually taken from the mach_msg() parameter, which seems to be the same in the two cases
<luckyluke42>as well as the pointer to the msg buffer
<youpi>luckyluke42: to determine whether it's the stack that is getting corrupted or it's the kernel that is mangling the message, I'd say try to add a pushq $0 before the other pushq in sysdeps/mach/hurd/x86_64/intr-msg.h's INTR_MSG_TRAP, and increase the addq to $24
<youpi>if the corruption is still the same shape, it means it's probably the kernel that mangles the message
<youpi>if the corruption is shifted by 8 bytes, it means it's the stack that is getting corrupted by the signal management
<luckyluke42>ok I will try
<youpi>that being said, the payload of the bad rpc really looks like a normal reply message, with the 32b value
<luckyluke42>in the meantime, I was looking at _hurd_intr_rpc_mach_msg()
<youpi>as if the kernel actually did start the copyout of the reply message
<youpi>but the signal management didn't notice that
<youpi>and thus wrongly thinks that the receive wasn't actually done already
<youpi>i.e. wrongly thinks it can re-send the message
<luckyluke42>the mach_msg() calls were all successful, in the trace the rpc message entry is stored only if the copyin/copyout operation in the kernel succeeds
<luckyluke42>could it be that in _hurd_intr_rpc_mach_msg() the clobber and check structures need the 4 byte padding to align in size with the 64 bit ABI?
<youpi>they're just used over the existing msg pointer, so any required alignment would have been done before anyway
<youpi>and in terms of size, the field members are already supposed to be the proper sizes
<youpi>but perhaps there are some details there indeed
<youpi>it does seem correct to me, though: in all cases, a mach_msg_header_t first (already 64b-size-aligned), then mach_msg_type_t (also 64b-size-aligned), then int code/err
<luckyluke42>shouldn't the save_data hold also the 4 byte padding?
<youpi>I don't see why we should care about the padding
<youpi>it being random wouldn't hurt the message
<youpi>one thing that surprises me, however, is the content of the second payload
<youpi>02200020 01000000 ffffffff 03000000
<luckyluke42>maybe if the field in the request was be a 64 bit integer, the area later used by padding would need to be restored when retrying the rpc... but this doesn't explain the change in msgh_size and field type
<youpi>It means one 32bit int
<youpi>that would here mean -1 (the ffffffff)
<youpi>it's not a proper return value
<youpi>the change in msgh_size and field type can be completely explained by the kernel actually having returned the reply, but glibc not noticing this
<youpi>and thus erroneously re-try sending the message
<luckyluke42>in the correct case it's still -1, just as a 64 bit integer
<youpi>and we're just seeing the reply in the payload
<youpi>yes, that's expected: -1 offset, i.e. fd position
<luckyluke42>it's like the value is only half restored
<youpi>aow wait I understand what you mean
<youpi>yes, the clobber struct would need to be 64b-extended
<youpi>as it its content doesn't express what we want to save, but what got written
<youpi>s/as it/as in/
<youpi>I'm surprised that the code doesn't save the msgh_size field, though
<youpi>ah, but perhaps something like the kernel doesn't care about it and rather looks at the mach_msg send_size parameter, perhaps?
<youpi>the beginning of the payload is still wrong, though
<luckyluke42>yes, the size in the header is ignored in the kernel, and it's overwritten with the mach_msg() parameter
<youpi>ok
<youpi>so at least there's that rounding up clobber that is needed
<luckyluke42>I still don't see how the descriptor could change from 64 bit int to 32 bit int, if it seems that part of the value is restored
<youpi>check itself shouldn't be a problem since there it's really what we want to read
<youpi>yes
<youpi>(and that's what actually would trigger MIG_BAD_ARGUMENTS, since the content of the offset, by itself, doesn't matter)
<youpi>(though a wrong offset would get crazy results of course)
<youpi>luckyluke42: maybe after save_data = m->request.data; add an assert(save_data.type == m->check.type) ?
<youpi>to check that they really align the same
<youpi>I tried by hand and it seemed fine, but who knows
<luckyluke42>yes, better safe than sorry