Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Time is not monotonic on macOS runner #820

Closed
1 of 5 tasks
tysonite opened this issue May 1, 2020 · 12 comments
Closed
1 of 5 tasks

Time is not monotonic on macOS runner #820

tysonite opened this issue May 1, 2020 · 12 comments
Assignees
Labels
Area: Apple investigate Collect additional information, like space on disk, other tool incompatibilities etc. OS: macOS

Comments

@tysonite
Copy link

tysonite commented May 1, 2020

I am not sure if this is a right repo to report this kind of issue, so please feel free to close it, but point me into the right direction.

Sometimes, we note that consequence calls to get current time of operating system return values that are not monotonically increasing. The 2nd call, for example, may return value that is less than previous one.

I can reproduce this by a simple Python script that runs on macOS github-hosted runner. See it here: https://github.com/tysonite/test_cpython/blob/master/script.py

The source of interest is:

        t1 = time.time()
        time.sleep(0.001)
        t2 = time.time()

Where t2 - t1 may be negative sometimes which is proved by this run.

I expect t2 - t1 to be always positive, but probably this is something related to VM on which runner runs.

Virtual environments affected

  • macOS 10.15
  • Ubuntu 16.04 LTS
  • Ubuntu 18.04 LTS
  • Windows Server 2016 R2
  • Windows Server 2019
@miketimofeev miketimofeev added Area: Apple investigate Collect additional information, like space on disk, other tool incompatibilities etc. OS: macOS and removed needs triage labels May 4, 2020
@Darleev Darleev self-assigned this May 4, 2020
@Darleev
Copy link
Contributor

Darleev commented May 4, 2020

Hello @tysonite,
I was able to reproduce the issue only in Github Actions. I tried to reproduce the issue in ADO(Azure DevOps) agents or locally, but with no luck. It seems the issue is specific for current platform.
We will continue the investigation and keep you posted.
Also, I will try to find a way how to mitigate the issue

@Darleev
Copy link
Contributor

Darleev commented May 5, 2020

@tysonite During the investigation, I found that issue most probably related to MacOS time synchronization(using sntp). I'm trying to find a way how to disable time synchronization for MacOS at all, to test how it works without sync.
Currently, I see only 1-2 failed requests for 1.000.000 attempts. Meanwhile, could you please provide use case for this specific task(t2-t1 ever 0.0001 sec)? Maybe we will try to find a better way to mitigate the issue.

@tysonite
Copy link
Author

tysonite commented May 5, 2020

We use Github Actions to run unit/acceptance tests for Robot Framework project, sometimes, tests fail due to that issue, and it is just weird.

The test verifies that elapsed time is not negative, and the mitigation on Robot Framework side is clear, it is only needed to remove such check. But the removal of check may lead to potential errors, and we would like to avoid that.

Example of test failure: https://condescending-banach-9beeb5.netlify.app/log.html#s1-s12-s7-t25-k1-k4-k1

EDIT: Btw, sometimes, similar issue happens on Windows, but I can't reproduce it with a simple script.

@Darleev
Copy link
Contributor

Darleev commented May 6, 2020

@tysonite We've found a workaround for the issue on MacOS agent, to reach our goal, it is required to disable time synchronisation using the following commands in runtime:

sudo systemsetup -setusingnetworktime off
sudo rm -rf /etc/ntp.conf

Please try this solution on your side. We are looking forward to your reply.

@tysonite
Copy link
Author

tysonite commented May 6, 2020

Thanks, @Darleev, I will report back if any issues.

@Darleev
Copy link
Contributor

Darleev commented May 7, 2020

@tysonite could you please provide us with results of your tests? Are they successful?
We are looking forward to your reply.

@pekkaklarck
Copy link

pekkaklarck commented May 7, 2020

I merged PR robotframework/robotframework#3571 by @tysonite with these fixes and results are here:
https://github.com/robotframework/robotframework/runs/652387887

There's some macOS failure but apparently it's something different and @tysonite already reported it as #841. No time related failures this time, but these failures have been flaky so one successful run doesn't guarantee anything. We just made a release and there are no planned code changes right now that would kick a new run. Should we start one manually to see how it goes? You can obviously do that yourself if you can.

@Darleev
Copy link
Contributor

Darleev commented May 7, 2020

@pekkaklarck Thank you for an update. I have already verified that the workaround works fine . Just in case I will perform additional test, python/c++ to make sure it's not just a coincidence.
Successful run #1
Successful run #2
Successful run #3
In a row. I was able to reproduce the issue every attempt before the workaround.
Let's wait a final test results.

@tysonite
Copy link
Author

tysonite commented May 7, 2020

I also have scheduled tests running once a day here (C++ / Python): https://github.com/tysonite/test_cpython/actions. I suppose if they are fine till Monday, we could consider issue I described in the first post as fixed.

However, this is only a work-around, but not a fix if I understand this right. Despite of the good results, the fix is required globally applied to all runners, probably.

@Darleev
Copy link
Contributor

Darleev commented May 7, 2020

@tysonite currently, we are not planning to implement fix for the issue globally, and I can only provide the workaround. Synchronisation is out of the box MacOS feature, and it seems 1 failure for 1.000.000 attempts(every 0.0001) is expected behavior in such cases.
I'm going to close the issue for now. If additional assistance or information is required, please do not hesitate to contact us.

@tysonite
Copy link
Author

tysonite commented May 7, 2020

I never used macOS, and you might be right. In Linux, such behavior usually means misconfiguration if either guest or host OS, e. g. https://www.redhat.com/en/blog/avoiding-clock-drift-vms

In any case, thanks again for your support!

@Darleev Darleev closed this as completed May 7, 2020
@pekkaklarck
Copy link

For our usage the workaround is adequate. Thanks for support @Darleev and thanks for investigation @tysonite!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: Apple investigate Collect additional information, like space on disk, other tool incompatibilities etc. OS: macOS
Projects
None yet
Development

No branches or pull requests

4 participants