IRC channel logs
2024-07-08.log
back to list of logs
<bdefreese>Not sure if anyone's alive in here anymore but hey! <bdefreese>o/ Do you know if Samuel or any of them are in here much anymore? I haven't been around in years. :( <pabs3>wow, blast from the past! bdefreese hasn't been around in years <luckyluke42>I don't know if anyone is working on stabilizing x86_64, I think one issue is that when an ipc is interrupted and retried, the retry is somehow corrupted <youpi>that could explain the issue I had been having with shellscripts <youpi>possibly related with the redzone or such <luckyluke42>I'm not sure, from the trace of the rpc I see that the first io_read_reply() is interrupted, and the second one fails with MIG_BAD_ARGUMENTS <luckyluke42>which makes the pipe read return an empty string instead of "a" <luckyluke42>and the first argument of io_read_reply is an int32 instead of an int64 <luckyluke42>it also seems a proper descriptor, i.e. correct name and size, I don'ìt know if it was just a coincidence <luckyluke42>but I can easily reproduce the issue, and in a few traces I checked I always see the same sequence when the loop terminates <luckyluke42>what could be a convenient way to test the ipc retry logic? running it with a modified glibc? <youpi>I guess it's not just the retry thing that has an issue, but the interruption mechanism <youpi>some part of the retry thing gets mangling from the interruption mechanism <youpi>note that you can easily try a program with a just-built glibc thanks to the testrun.sh script <youpi>no need for subhurds for that :) <youpi>luckyluke42: MIG_BAD_ARGUMENTS could be simply the send_size parameter getting mangled <youpi>luckyluke42: when you see the two io_read_reply(), which rpc tracing are you doing? the shell process? could you paste the trace you get? <luckyluke42>the size and the second parameter are the same, only the first parameter is different <luckyluke42>I'm using the kernel tracing I sent to the ml some time ago <luckyluke42>a bit enhanced, to uniquely identify threads and tasks, so I can make sense of the whole system <youpi>ah, you meant the first argument gets truncated from 64bit pointer to 32bit pointer? <luckyluke42>yes, but apparently it's only the descriptor filled in differently, the message have the same size <youpi>but the pointer is supposed not to change <youpi>the message is supposed to be the same <luckyluke42>sorry it's not very clear, but the tracing I'm using is still a bit hacky <luckyluke42>ah, actually also the size in the msh header is wrong <youpi>if the pointer is wrong, all that is read from it will be wrong <youpi>is that little endian or memory byte order? <luckyluke42>the payload size is actually taken from the mach_msg() parameter, which seems to be the same in the two cases <youpi>luckyluke42: to determine whether it's the stack that is getting corrupted or it's the kernel that is mangling the message, I'd say try to add a pushq $0 before the other pushq in sysdeps/mach/hurd/x86_64/intr-msg.h's INTR_MSG_TRAP, and increase the addq to $24 <youpi>if the corruption is still the same shape, it means it's probably the kernel that mangles the message <youpi>if the corruption is shifted by 8 bytes, it means it's the stack that is getting corrupted by the signal management <youpi>that being said, the payload of the bad rpc really looks like a normal reply message, with the 32b value <luckyluke42>in the meantime, I was looking at _hurd_intr_rpc_mach_msg() <youpi>as if the kernel actually did start the copyout of the reply message <youpi>but the signal management didn't notice that <youpi>and thus wrongly thinks that the receive wasn't actually done already <youpi>i.e. wrongly thinks it can re-send the message <luckyluke42>the mach_msg() calls were all successful, in the trace the rpc message entry is stored only if the copyin/copyout operation in the kernel succeeds <luckyluke42>could it be that in _hurd_intr_rpc_mach_msg() the clobber and check structures need the 4 byte padding to align in size with the 64 bit ABI? <youpi>they're just used over the existing msg pointer, so any required alignment would have been done before anyway <youpi>and in terms of size, the field members are already supposed to be the proper sizes <youpi>but perhaps there are some details there indeed <youpi>it does seem correct to me, though: in all cases, a mach_msg_header_t first (already 64b-size-aligned), then mach_msg_type_t (also 64b-size-aligned), then int code/err <luckyluke42>shouldn't the save_data hold also the 4 byte padding? <youpi>I don't see why we should care about the padding <youpi>it being random wouldn't hurt the message <youpi>one thing that surprises me, however, is the content of the second payload <youpi>02200020 01000000 ffffffff 03000000 <luckyluke42>maybe if the field in the request was be a 64 bit integer, the area later used by padding would need to be restored when retrying the rpc... but this doesn't explain the change in msgh_size and field type <youpi>that would here mean -1 (the ffffffff) <youpi>it's not a proper return value <youpi>the change in msgh_size and field type can be completely explained by the kernel actually having returned the reply, but glibc not noticing this <youpi>and thus erroneously re-try sending the message <luckyluke42>in the correct case it's still -1, just as a 64 bit integer <youpi>and we're just seeing the reply in the payload <youpi>yes, that's expected: -1 offset, i.e. fd position <youpi>aow wait I understand what you mean <youpi>yes, the clobber struct would need to be 64b-extended <youpi>as it its content doesn't express what we want to save, but what got written <youpi>I'm surprised that the code doesn't save the msgh_size field, though <youpi>ah, but perhaps something like the kernel doesn't care about it and rather looks at the mach_msg send_size parameter, perhaps? <youpi>the beginning of the payload is still wrong, though <luckyluke42>yes, the size in the header is ignored in the kernel, and it's overwritten with the mach_msg() parameter <youpi>so at least there's that rounding up clobber that is needed <luckyluke42>I still don't see how the descriptor could change from 64 bit int to 32 bit int, if it seems that part of the value is restored <youpi>check itself shouldn't be a problem since there it's really what we want to read <youpi>(and that's what actually would trigger MIG_BAD_ARGUMENTS, since the content of the offset, by itself, doesn't matter) <youpi>(though a wrong offset would get crazy results of course) <youpi>luckyluke42: maybe after save_data = m->request.data; add an assert(save_data.type == m->check.type) ? <youpi>to check that they really align the same <youpi>I tried by hand and it seemed fine, but who knows