Author Topic: Eppur si muove  (Read 33900 times)

âspen

  • Posts: 155
Eppur si muove
« on: 28 June 2015 02:44:37 PM »
Please note

Quote
Dear WebFaction customer,

This email is to inform you that our upstream provider will be performing network maintenance on July 1, 2015 between 22:00 and 03:00 UTC the following morning.

During the maintenance window connectivity to the public network could become unavailable, we expect upwards of 30 minutes of public network downtime.

We will update our status blog at the link below as maintenance will progress:
http://statusblog.webfaction.com/2015/06/23/scheduled-maintenance-july-1-2015/

We apologize in advance for the inconvenience. Should you have any questions about this maintenance, feel free to reply to this email and our support team will be happy to help.

Regards,

The WebFaction Team

UTC stands for Coordinated Universal Time and is a replacement (when precision is required) for Greenwich Mean Time (which is now an ambiguous term and can refer to either UTC or UT1, which can be up to 0.9 seconds apart from each other).

Quote
In 1970, the Coordinated Universal Time system was devised by an international advisory group of technical experts within the International Telecommunication Union (ITU). The ITU felt it was best to designate a single abbreviation for use in all languages in order to minimize confusion. For example, in English the abbreviation for coordinated universal time would be CUT, while in French the abbreviation for temps universel coordonné would be TUC. To avoid appearing to favor any particular language, the abbreviation UTC was selected.

However, that is just by way of warm-up.  The main purpose of this note is to provide an update on the actions of the International Earth Rotation Service.

Yes, really, http://iers.org (http://iers.org)

One of the tasks of this organization is to mandate the insertion of leap seconds into UTC in order to prevent it from drifting too far away from UT1.  The next leap second is scheduled to follow 23:59:59 UTC on 30th June 2015.  This will be the first time for more than 10 years that a leap second has occurred on a non-holiday weekday.

Quote
The upcoming leap second scheduled for June 30, 2015 will be the first leap second in many years to occur during a time of significant normal business activity. For example, local time for leap second implementation will 7:59:59 PM Eastern Daylight Time or 4:59:59 PM Pacific Daylight Time on Tuesday, June 30, 2015. Leap seconds in 2005 and 2008 occurred at the end of December 31. The most recent previous leap second in 2012 occurred at the end of Saturday, June 30, 2012.

Recommended best practices for the leap second implementation involve introducing an extra second labelled 23:59:60 after the last normal second of the month.  The US National Institute of Standards and Technology does indeed achieve this.  Here for example is evidence of their triumphant adherence to best practices in 2008:



However it is recognized that few if any organizations apart from the NIST will have the budget or the will-power to go the extra mile and undertake the massive re-programming of internal systems needed to display the time correctly for one second.

It is expected that many organizations may either

(1) reset the clock back by one second at the end of 23:59:59 on 30th June, or
(2) freeze the clock for one second at the end of 23:59:59 on 30th June, or
(3) run the clock at half speed for the duration of 23:59:59 on 30th June.

Needless to say, none of these alternatives are entirely problem-free and safe.  Choosing the first alternative could lead to an event being given a later timestamp than another subsequent event, thus giving rise to impossible chains of causation and anti-causation.  The second alternative could lead to the system shutting down altogether, if there is no way (with the clock frozen) of deciding when it is time to start the clock again.  The third alternative could lead to navigational errors, for example in aeroplanes which observe (during the second that the clock is running at half speed) that their velocity has apparently doubled.

Quote
Historically leap second changes have created significant operational problems. All coordinated time scales will be affected by this adjustment.

Please report operational challenges you experience to the following organizations:

GPS -- United States Coast Guard Navigation Center (NAVCEN), via the NAVCEN Website, http://www.navcen.uscg.gov/ under "Report a GPS Problem"

Network Timing Protocols (NTP) -- Michael Lombardi at NIST, Boulder, Colorado at 303-497-3212, or [email protected].

Thank you for supporting this effort to better understand leap second issues.

and,

Quote
If you know your system will not work through the leap second on 30 June 2015:

Isolate the system from any external timing sources and manually insert the additional second at 23:59:59 UTC. Then reconnect the system after 00:00:00 UTC 1 July 2015.

If a problem occurs after 23:59:59 on Tuesday, 30 June the following actions should be taken:

a. Verify the time on the system by calling the time voice announcers at the National Institute of Standards and Technology (NIST) at (303) 499-7111 and (808) 335-4363 or the United States Naval Observatory (USNO) at (202) 762-1401 or (719) 567-6742.

b. If the time is off by 1 second, then reset the time to the correct time indicated by the voice announcer and if necessary, restart the system.

Google, on the other hand, have chosen a sustantially different path.

Quote
Very large-scale distributed systems, like ours, demand that time be well-synchronized and expect that time always moves forwards. Computers traditionally accommodate leap seconds by setting their clock backwards by one second at the very end of the day. But this 'repeated' second can be a problem. For example, what happens to write operations that happen during that second? Does email that comes in during that second get stored correctly? What about all the unforeseen problems that may come up with the massive number of systems and servers that we run? Our systems are engineered for data integrity, and some will refuse to work if their time is sufficiently 'wrong'. We saw some of our clustered systems stop accepting work on a small scale during the leap second in 2005, and while it didn't affect the site or any of our data, we wanted to fix such issues once and for all.

This was the problem that a group of our engineers identified during 2008, with a leap second scheduled for December 31. Given our observations in 2005, we wanted to be ready this time, and in the future. How could we make sure everything at Google stays running as if nothing happened, when all our server clocks suddenly see the same second happening twice? Also, how could we make this solution scale? Would we need to audit every line of code that cares about the time? (That's a lot of code!)

The solution we came up with came to be known as the leap smear. We modified our internal NTP servers to gradually add a couple of milliseconds to every update, varying over a time window before the moment when the leap second actually happens. This meant that when it became time to add an extra second at midnight, our clocks had already taken this into account, by skewing the time over the course of the day. All of our servers were then able to continue as normal with the new year, blissfully unaware that a leap second had just occurred. We plan to use this leap smear technique again in the future, when new leap seconds are announced by the IERS.

The leap smear is talked about internally in the Site Reliability Engineering group as one of our coolest workarounds, that took a lot of experimentation and verification, but paid off by ultimately saving us massive amounts of time and energy in inspecting and refactoring code. It meant that we didn't have to sweep our entire (large) codebase, and Google engineers developing code don't have to worry about leap seconds. The team involved in solving this issue was a handful of people, distributed around the world, who were able to work together without restriction in order to solve this problem.

The solution to this challenge drove a lot of thinking to develop better ways to implement locking and consistency, and synchronizing units of work between servers across the world. It also meant we thought more about the precision of our time systems, which have a knock-on effect on our ability to minimize resource wastage and run greener data centers by reducing the amount of time we must spend waiting for responses and rarely doing excess work.

Mr G

  • Posts: 156
Re: Eppur si muove
« Reply #1 on: 28 June 2015 08:01:06 PM »

âspen

  • Posts: 155
Re: Eppur si muove
« Reply #2 on: 17 July 2015 04:24:35 PM »
I think this is tomorrow:

Quote
Dear WebFaction customer,

This email is to inform you that your server will be taken offline for
power supply maintenance on July 18 between 07:00 and 10:00 UTC.

We will update our status blog at the link below as maintenance will progress:
http://statusblog.webfaction.com/2015/07/16/emergency-maintenance-on-web385-july-18-2015/

We apologize in advance for the inconvenience. Should you have any questions
about this maintenance, feel free to reply to this email and our support team
will be happy to help.

Regards,

The WebFaction Team

Time is given in UTC which as noted previously is a combination of English and French, compare the pronunciation of CERN at 30 seconds in

http://zeta.forum4.org/utc/PronouncingCern.html (http://zeta.forum4.org/utc/PronouncingCern.html)

Mr X

  • unpinged
  • Posts: 311
Re: Eppur si muove
« Reply #3 on: 19 July 2015 07:59:08 AM »
(anyone interested in trying to pronounce francais or other languages might enjoy http://www.memrise.com/courses/english/french/ , fun website)