IRC channel logs

<damo22>paculino: yes it is

<damo22>the clock is very strange

<damo22>i wrote a simple C program to test CLOCK_MONOTONIC

<damo22>i made a change to gnumach and reduced the backward rate from 50% to 13%

<damo22>but still 13% of the times i read the clock twice in a row, the clock moves backwards

<ZhaoM>damo22: hi is there a link to the simple C program?

<ZhaoM>I'm suspecting the time backward issue is due to the correction. The clock tried to correct itself, but the hpet is not included in that correction

<damo22> https://paste.debian.net/hidden/34259a1b/

<ZhaoM>how about disabling ntp and testing the clock?

<damo22>i just use clock_gettime(CLOCK_MONOTONIC, ...)

<ZhaoM>eh I see.. :/

<damo22>i have a commit that improves it, but i dont understand why its not solved

<damo22>ZhaoM: https://git.zammit.org/gnumach-sv.git/commit/?h=fix-clock-backwards-wip&id=9622c7d098ebf6bafb16cf5949cf89dc98f219a1

<nexussfan>damo22: the c program works fine with no "clock going backwards" on my amd64 hurd machine, is it an i386 specific issue?

<damo22>hmm

<ZhaoM>nexussfan: could you send out a small piece of the output, maybe you are not using hpet?

<nexussfan>rumpnet doesn't work so i have to wait a long time for the debian live environment to boot just so i can add the file to debian paste

<ZhaoM>I'm quite curious about the output of that C programme on your hurd machine

<nexussfan>last couple of lines: https://paste.debian.net/1397474

<damo22>../kern/mach_clock.c:451: time_value64_add_hpc: Assertion `0 <= (value)->nanoseconds && (value)->nanoseconds < TIME_NANOS_MAX' failed.Debugger invoked: assertion failure

<damo22>Assert(c106d0ec,c1071564,1c3,c106876c,f5fa1000)+0x29

<damo22>time_value64_add_hpc(f9eb8010,f5fa1000,f5fa2f20,c1051cd0,c10a6020)+0x8c

<damo22>host_get_uptime64(c10a6020,f9eb8034,f5fa2f20,c10037d5)+0x85

<damo22>_Xhost_get_uptime64(f5fa1010,f9eb8010,f5fa1000,c1023806,1003d38)+0x30

<damo22>amd64 does not run without hpet right?

<damo22>ZhaoM: i think i fixed it

<damo22> https://git.zammit.org/gnumach-sv.git/commit/?h=fix-clock-backwards-wip&id=389ea742fc1090417949d8c144ce0f764be6fd0f

<damo22> https://git.zammit.org/gnumach-sv.git/log/?h=fix-clock-backwards-wip

<damo22>ZhaoM: should i submit this patch?

<ZhaoM>damo22: hi I just set up the development environment, but it seems I cannot reproduce the time-go-backward issue

<ZhaoM>This is the ./configure '../configure --enable-apic --disable-linux-groups'

<ZhaoM>The commit 8d456cd9 (HEAD, origin/master) mach_clock (s...

<ZhaoM>And the simple C program output: went backwards 0 out of 10000 times

<ZhaoM>BTW I think resetting HPET is indeed more elegant than keeping the last read

<ZhaoM>The patch looks good to me

<damo22>the only thing is, the timer seems to mostly read with 0.01s granularity even with hpet

<damo22>it does occasionally read between that, but mostly not

<ZhaoM>I don't quite understand 'the timer seems to mostly read with 0.01s granularity even with hpet'. Here is the log, it seems to me the reads have higher accuracy than the 0.01s granularity https://paste.debian.net/1397487/

<damo22>ok so i changed my latest commit such that it only corrects with value MIN(ns, tick * 1000)

<damo22>and now i get:

<damo22>$ ./test

<damo22>BACKWARDS

<damo22> 171 800000000

<damo22> 171 790000980

<damo22>BACKWARDS

<damo22> 214 750000000

<damo22> 214 740005770

<damo22>went backwards 2 out of 10000000 times

<damo22>it seems the correction must be sometimes larger than the duration between two clock interrupts?

<damo22>i cant explain why its moving backwards!

<damo22>or why on qemu the clock reads are mostly rounded to nearest 0.01s

<ZhaoM>wait I think I'm a bit confused. What is the commit hash of gnumach you used to get the result of '14:36 < damo22> $ ./test'

<damo22>ah i havent pushed it

<ZhaoM>I haven't successfully reproduced the time backward issue?

<ZhaoM>s/\?//g

<damo22>really?

<damo22>are you on real hw or qemu?

<ZhaoM>qemu

<damo22>that is surprising

<ZhaoM>qemu-system-i386 --enable-kvm -m 2G -drive cache=writeback,file=hurd.img -net user,hostfwd=tcp:127.0.0.1:2222-:22 -net nic,model=e1000 -display curses

<damo22>ok

<damo22>try -smp 1

<damo22>you will need that

<damo22>so it detects apic

<youpi>damo22: I don't think we want to reset the value, we'd lose precision

<damo22>ok

<youpi>how often does the backward effect happen?

<damo22>but the hpet ticks once per 10ns on qemu

<damo22>how would you lose precision if you reset the hpet?

<youpi>you don't know how much it was before resetting it

<youpi>so you don't know how much you have removed from it

<damo22>okay but if you reset it exactly at the time you update the elapsed_ticks value, you get a new timer per clock interrupt

<youpi>at best you would atomically xchg the value to determine what it was

<youpi>but then it's really just like knowing what value it had on last clock tick

<youpi>there is no reason why that strategy can't work

<youpi>"exactly at the time" doesn't exist

<youpi>you have some instructions that take some time in between

<damo22>yes, it is protected by a lock

<youpi>so you aren't precise

<youpi>? the hpet is not locked

<youpi>it just continues incrementing

<youpi>you don't control that

<damo22>ok

<youpi>again: how often does the backward effect happen ?

<damo22>let me recompile master and measure it

<ZhaoM>wierd I still cannot reproduce the time backward issue :|

<ZhaoM>qemu-system-i386 --enable-kvm -smp 1 -m 2G -drive cache=writeback,file=hurd.img -net user,hostfwd=tcp:127.0.0.1:2222-:22 -net nic,model=e1000 -display curses

<ZhaoM>The time readings looks still cute: https://paste.debian.net/1397491/

<damo22>ZhaoM: if you check out my latest branch and increase the test loops to 10M you will get ~2 backward reads

<youpi>I'm thinking that perhaps you should check if hpet itself is really monotonous

<youpi>possibly qemu gets it just wrong...

<ZhaoM>ok I should try 10M

<ZhaoM>Ha got them

<ZhaoM>demo@debian:~$ ./go

<ZhaoM>BACKWARDS 82 509974180 82 500008590

<ZhaoM>BACKWARDS 103 629982970 103 620008250

<damo22>went backwards 6 out of 10000000 times with master

<damo22>with my changes, only 2

<damo22>but the clock resolution looks wrong with my changes

<youpi>does this not look like the hpet counter wrap-around ?

<youpi>if qemu's hpet ticks every 10ns, that wraps very fast

<damo22>in 42 seconds it wraps?

<youpi>well, whatever the way, the counter goes back

<youpi>because the way time_value64_add_hpc currently computes will be monotonous anyway

<damo22>as long as you dont add more than 1 second worth of nanos to the value, it wont break

<youpi>Mmm, that said, the "now - last_hpc_read" computation should be wrapping it back correctly

<youpi>but anyway, we need more information: are we really 100% sure that the HPET counter doesn't go back

<damo22>if it does, that would be a significant bug in qemu

<youpi>not unheard of

<damo22>10/10^9*4*1024*1024*1024

<damo22>42.94967296

<damo22>(seconds before counter wraps)

<youpi>thinking about it: we might be getting an interrupt between reading mtime, and reading the hpet

<youpi>i.e. get an old tick, and a new hpet

<damo22>hmm

<youpi>as in: in between the interrupt will have updated last_hpc_read

<damo22>one thing that is definitely buggy in the existing code is that: the elapsed_ticks is updated only on BSP but in every clock interrupt the last_hpc_read is updated

<youpi>that's a smp question

<damo22>if you reset the hpet at the start of every BSP clock interrupt only, you have a regular counter that counts the exact number of hpet ticks until the next clock interrupt

<youpi>not exact

<youpi>since there is a delay between getting the interrupt, and resetting the counter

<youpi>you get a small miss-up at the beginning of the interrupt

<damo22>the counter reset is an atomic instruction that writes over the register holding the value

<youpi>which means you have lost the old value, and you don't know how much that was

<youpi>you don't know how much time passed between the interrupt and the reset

<youpi>and thus you'll be imprecise anyway

<youpi>you lose information

<damo22>you could read it before resetting it :P

<youpi>same

<youpi>simplest would be to just make the read_mapped* read the hpet before and after reading mtime

<youpi>and if that doesn't match, try again

<youpi>the *last_hpc_read*

<damo22>it should be a number between 0 - (ticks * 1000) ns

<damo22>the whole point of this is to get a more accurate time, if the computation takes longer than ticks*1000 ns you may as well not bother

<youpi>?

<youpi>what computation?

<youpi>"try again" can only happen if a clock interrupt raises in between

<youpi>that won't happen twice in a row

<damo22>oh

<youpi>(also, you can't assume that the hpet-equivalent of hardware other than x86 have the possibility to reset)

<damo22>if you read hpet before and after reading mtime, and if that doesnt match, read it a third time?

<damo22>because if you just reuse the second one, its no different from current code

<youpi>not read a third time, but re-try the whole thing

<youpi>exactly like the "seconds" check

<damo22>ah

<ZhaoM>Ah the reason of the issue is that last_hpc_read may be updated just after 'uint32_t now = hpclock_read_counter();'

<ZhaoM>It takes me some time to get it :/

<youpi>no, before

<youpi>between reading mtime and reading hpclock_read_counter

<ZhaoM>ok

<youpi>the interrupt will have updated mtime, and set last_hpc_read

<youpi>so last_hpc_read does not match the mtime value

<damo22>but if the kernel reads the hpet twice in a row it might get a different value every time, even with no instruction between it?

<damo22>how do i test the value is the same

<damo22>or you mean check last_hpc_read didnt change

<ZhaoM>how about store the value of last_hpc_read in a temporary variable before reading mtime?

<ZhaoM>and pass the variable to the time_value64_add_hpc() after the while loop

<damo22>yeah im doing that

<ZhaoM>or put last_hpc_read into mapped_time_value_t

<ZhaoM>maybe the code will be easier to understand

<youpi>we don't want to change the mapped_time structure, it's exposed to userland!

<ZhaoM>ah ok

<youpi>you can just put the old value in a local variable, yes

<youpi>damo22: I meant last_hpc_read

<damo22>../kern/mach_clock.c:462: time_value64_add_hpc: Assertion `0 <= (value)->nanoseconds && (value)->nanoseconds < TIME_NANOS_MAX' failed.Debugger invoked: assertion failure

<youpi>not hpet, which surely changes :)

<damo22>i did all that, and got the fault

<youpi>which fault?

<damo22>^

<youpi>my mach_clock's 462 is "/*"

<damo22>i havent pushed it

<youpi>that looks like a completely different issue

<youpi>and a very surprising one, actually, since we already cap ns to at most one tick length

<ZhaoM>damo22: does the capping code exists in your repo?

<youpi>or maybe the value given to time_value64_add_hpc is already wrong? you can call time_value64_assert before time_value64_add_nanos, to check that it's not already wrong

<damo22> https://git.zammit.org/gnumach-sv.git/commit/?h=fix-clock-backwards2&id=1199d04e308e660278d55860ea939686c75f9cec

<youpi>ZhaoM: well, even without the capping, there is really little reason for ns to be more than one second...

<damo22>that code caused the fault

<youpi>damo22: you don't want to cast now to int64_t !

<youpi>that breaks the wrap-around support

<ZhaoM>I guess we got a negative value

<damo22>oh

<youpi>and please leave the capping as it is now

<youpi>which is way more explicit about the situation

<damo22>okay

<youpi>also, you don't need two while loops, you can just check both seconds and hpc conditions at the same time

<damo22>went backwards 0 out of 10000000 times