Upgrade GTM (BIG-IP DNS) with no downtime

GTM devices are one of the most critical devices in your network. They manage the DNS traffic for the most important applications on your business. Upgrading these devices can be a sensitive operation as you don’t want the access to your applications to break. In order to be sure you don’t have issues with your GTM (BIG-IP DNS) upgrade, is better to engage an F5 Specialist and have a proper plan of action to tackle the activity. On this post we will go through the details on how to Upgrade GTM (BIG-IP DNS) with no downtime.

Scenario:

Two datacenters, each with GTM and LTMs deployed on separate boxes.

Desired outcome:

Upgrade TMOS on GTMs without incurring downtime or causing a failover event.

Upgrade GTM (BIG-IP DNS) with no downtime

Constraints:

  • Primary NS records and DNS is hosted by ISP.
  • LTMs cannot be upgraded yet.
  • Little to no downtime is available to perform this upgrade.
  • Critical factors to ensure success – reduce TTL for the DNS records associated with the GTM. In GTM terms the records tied to the listeners and the wide IPs. Use the minimum time possible like 5 to 15 minutes for the TTL value. This must be set well before the main change window.

Pre-planning steps:

**NOTE – These steps must be done before any change window, in order to ensure a smooth transition that will not disrupt user sessions. Do not skip these steps, they are important if you want to successfully Upgrade GTM (BIG-IP DNS) with no downtime.

  1. Determine TTL for the records associated with the IPs used for the listeners and the cnames used as WideIPs on the GTM. These values will have to be altered and the old TTL has to expire before the planned change window.(example if the TTL for gtm1.mydomain.com is 3 days, then the TTL value has to be modified 3 days before the desired change window)
  2. Run a relicense check on all GTMs prior to the change window.
  3. Take archives of all GTM devices.
  4. Download the ucs files off of the devices. Should aid in any quick recovery efforts if needed.
  5. Take qkviews of all GTM devices.
  6. Upload all qkviews into iHealth. Should aid in any support engagements if needed.
  7. Download ISO file for major version being used.
  8. Upload ISO file for major version to all GTM devices.
  9. Download ISO file for latest hotfix related to the major version being deployed.
  10. Upload ISO file for latest hotfix to all GMT devices.
  11. Install major version and hotfix to an inactive boot location.
  12. Read SOL13690
  13. Ensure proper network connectivity between GTM and all monitored
    devices/systems. Some routing and firewall changes may be necessary to resolve any issues uncovered.

High level review of tasks per GTM:

  1. Remove targeted GTM from the DNS servers for the DNS subzone Delegation. Prevents clients from trying a dead NS or one with inaccurate information.
  2. Change the global GSLB setting for “virtual server status” to depends on monitors only. This step removes the dependency on iquery communications for virtual server status.
  3. Remove targeted GTM from Sync Group. Prevents Upgraded GTM from sending bad status to peer via iquery.
  4. Ensure that no virtual servers begin to flap or otherwise change status in an unexpected manner.
  5. Confirm that GTM systems are not exchanging information via iQuery.
    Refer to SOL13690.
  6. Boot the targeted GTM into the install slot with the new TMOS and HF applied.
  7. Validate that targeted GTM has iquery sessions with all LTM instances.
  8. If targeted GTM is accurately communicating and answering test NSLOOKUPS for all WIPs, add it into the DNS servers for the DNS subzone Delegation.
  9. Repeat steps 1- 8 on other GTM.
  10. Reestablish GTM Sync Group.
  11. Confirm iQuery between GTM devices. Refer to SOL13690
  12. Update the big3d service by running the big3d_install script on the GTMs to the LTMs. Refer to SOL13703 and SOL13312
  13. Return the global GSLB setting for “virtual server status” back to the default setting of “Depends on servers and monitors”.
  14. Add other GTM to the DNS servers for the DNS subzone Delegation.
  15. Restore all TTL values to their pre-change values.
  16. Take archives of all GTM devices.
  17. Download the ucs files off of the devices. Should aid in any quick recovery efforts if needed.
  18. Take qkviews of all GTM devices.
  19. Upload all qkviews into iHealth. Should aid in any support engagements if needed.

Additional notes:

  • You can tcpdump all the health monitor from big3d, and grep the destination. telnet to these destinations from the other GTM, to make sure health monitors currently send by the GTM to-be-upgraded, can also run on the other GTM.
  • You should check /var/log/gtm before the upgrade to look for any flapping.
  • Document the pre-change behavior so that we understand what the system was like before the upgrade.
  • The concept is that as GTM is updated with more complex processing, Big3D is updated to provide more data to support GTM.
  • An older GTM interacting with a newer Big3D will “hear” more than it wants to know, and ignore the unnecessary details. That’s OK.
  • A newer GTM interacting with an older Big3D won’t “hear” what it expects to know. That may lead to unexpected behavior. One temporary workaround is to change the global setting for “virtual server dependency”, which by default relies on both monitor and parent server status, meaning iQuery outage to an LTM causes LTM’s VSs to go down. Changing it to rely only on the monitor status only will let LTM VSs stay up during the brief iQuery down time during a Big3D upgrade (any monitor requests scheduled during the iQuery outage will be retried on the next interval, normally 30 seconds).

SOL references:
https://support.f5.com/kb/en-us/solutions/public/13000/700/sol13703.html
https://support.f5.com/kb/en-us/solutions/public/13000/300/sol13312.html
https://support.f5.com/kb/en-us/solutions/public/13000/600/sol13690.html