DR Automation - Using Public DNS APIs
If you have a Disaster Recovery site (cloudy or otherwise), and your DR plan involves changing public addresses when you "declare", you might want to consider automating your DNS changes.
Why would you do this?
- If you've outsourced the management of your DR site, your DR management staff might not have credentials to update DNS records
- You might not want to give your DNS admin to staff outside your organization (even if you have done so, you might not want to)
- During a DR event, your staff (outsourced or otherwise) might now understand DNS so well. Plus your changes might be happening at 2am, or when your DNS admins is on vacation - you can't count on your whole team being there during a DR event
- Sticking with that 2am theme, even if your DNS SME is around for the change, nobody is at their best that late - even automated "we can fail over within minutes" DR plans can see hiccups - things change and the DR site isn't always kept up to date. If you don't test your DR plan periodically, you will definitely find things that you don't expect during a real event.
Long story short, the last step of most DR plans is "update the external DNS records". Assuming your firewall rules are up to date at the DR site - is that list of DNS changes also up to date?
Automating these DNS changes can take errors off the table (again, assuming that the list of changes is up to date).
"Can I even automate that?" you ask? - - yup, most of the larger DNS providers give you an API, you can script changes with powershell, python or even curl in a shell script.
Looking at GoDaddy for instance, their API documentation is here:
- https://developer.godaddy.com/getstarted
- https://developer.godaddy.com/doc (you'll want the "domains api" for what we're discussing here)
Since their API examples are so easy to implement in curl, let's go with that. I could (of course) write this in python or powershell and make it a "whole thing", but the object of this example is to show how simple this can be, and to give you a decent example to build on (or just re-use) in your environment.
This script changes a set of A record (in an input file) from the prod IP's to DR IP's (or back)
The script:
# # DNS Update for GoDaddy DNS Records # # syntax dnsup.sh <input file> <apikey> <apisecret> #
# check if input file exists [ ! -f $1 ] && { echo "$1 file not found"; exit 2; }
# check for correct variable count if [ -z "$3" ] || [ -n "$4" ] then echo "invalid variable list" echo "Syntax:" echo "dnsupd.sh <inputfilename> <apikey> <apisecret>" exit 3 fi
# INPUTS ARE OK, PROCEED
domain="coherentsecurity.com" type="A" ttl="600" headers1="Authorization: sso-key $2:$3"
echo "=====================================" | tee -a dnsupd.log echo "DNS Update run for $(date)" | tee -a dnsupd.log
while read hostname newip do echo "A Record : " $hostname echo " IP Data : " $newip # check current value
currentdns=$(curl -s -X GET -H "$headers1" "https://api.godaddy.com/v1/domains/$domain/records/$type/$hostname") currentip=$(echo $currentdns | grep -oE "\b([0-9]{1,3}\.){3}[0-9]{1,3}\b") echo "the change is $currentip to $newip" if [ $currentip != $newip ]; then curl -X PUT "https://api.godaddy.com/v1/domains/$domain/records/$type/$hostname" \ -H "accept: application/json" \ -H "Content-Type: application/json" \ -H "$headers1" \ -d "[ { \"data\": \"$newip\", \"ttl\": $ttl } ]" rc=$? echo "retcode is $rc" if [ $rc ]; then echo "$hostname update to $newip success" | tee -a dnsupd.log else echo "ERROR - $hostname update to $newip failed" | tee -a dnsupd.log fi else echo "ERROR - Source and Dest IP are the same - $newip" | tee -a dnsupd.log fi done < $1 |
The dnsupd-dr.in input file (this moves me from prod to dr addresses). Note that it's just the CN followed by the IP address (the domain is buried in the script):
ltuae 41.41.41.41 |
The dnsupd-prod.in input file (this moves me back to prod addresses):
ltuae 42.42.42.42 |
Let's run the script, moving from prod to dr:
/bin/sh ./dnsupd.sh dnsupd-dr.in <notmyapikeyyoudont> <ormyapisecreteither> |
Our example host - ltuae ? That's "Life, the Universe and Everything" (if you didn't get that from the PROD IP address)
Gotcha's? Like any DNS Migration process, the key thing to do is set your TTLs appropriately BEFORE your migration. DNS is all about expiry times - the phrase "DNS propagation" is malarky, even though it's still uttered by every DNS provider on the planet. What the TTL does is say "after being cached for xxxx seconds, I will expire that entry" - the instruction is for the DNS Server making the request. If your zone TTL is 7200 (2 hours), and the remote client is querying their DNS server, that entry will be cached for 7200 seconds after the last query. So if the last query was 7219 seconds ago, it'll expire in 1 second, and if the client just made a query, it's stuck there for them for the next 2 hours. So if you have a business process that relies on a DNS change (like your DR process), you're going to want to keep this in mind. 2 hours is likely too long, but 5 minutes is likely too short - you don't want to be that "bad citizen" on the internet that forces everyone else to burn excessive resources on your behalf. 15 minutes (900 seconds) is a happy medium that lots of folks find reasonable - it's short enough to management that it's reasonable, but it's not so short that you're "that company"
So the right time to change your TTL was yesterday, or for a DR process, many years ago. The important thing is that it should be "short enough" when you pull the trigger (not after). If it's set for 86400 (1 day) or something silly, the best time to think about it is today - like planting a tree :-)
This script of course will evolve over time, and I'll likely update it for other DNS providers (as one client or another needs that) - check my github for changes if you're interested - https://github.com/robvandenbrink. As always, TEST it for your organization and your situation and MODIFY IT AS NEEDED. This script is NOT meant to be a one-size-fits-all script that'll just work 100% for everyone without testing. For instance, you might choose to use CNAMEs instead of A records, or you might choose to have both sites active during PROD windows to spread load, and just delete the PROD addresses if you are in a DR situation. Or you might choose to use a GSLB (Global Server Load Balancer) with health checks instead of DNS to swing PROD traffic over to DR. Or if you have a different DNS provider the API calls will of course be different.
If you find this useful, or if you have suggestions or updates to the script, by all means use our comment section - let's talk!!
===============
Rob VandenBrink
rob<at>coherentsecurity.com
<shameless_plug>
Want to know more about how DNS operates or how you might secure a DNS Server? Or did that Load Balancer / Health Check thing sound interesting?
Check out my book:
https://www.amazon.com/Linux-Networking-Professionals-configure-enterprise/dp/1800202393
https://www.amazon.ca/Linux-Networking-Professionals-configure-enterprise-ebook/dp/B09BZTLRKY
</shameless_plug>
Comments