HOWTO: Breaking and fixing DNS - Understanding modern DNS on Ubuntu.
One dark and stormy night I broke my DNS. I decided to move
beyond /etc/resolv.conf
and see what demons (daemons?) were
lurking under the hood. “Its complicated.” This is the story of
understanding, debugging and fixing it.
/etc/resolv.conf
If you look at /etc/resolv.conf
on a Linux system today (Ubuntu
19.10) you will find something like:
# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
# DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
# 127.0.0.53 is the systemd-resolved stub resolver.
# run "systemd-resolve --status" to see details about the actual nameservers.
nameserver 127.0.0.1
search lan
But the file seems to change. I’ve seen it without most of the verbiage above. I’ve seen the file contain both 127.0.0.1 and 127.0.0.53. Confusing. systemd?
You can edit /etc/resolv.conf
First let me say that despite the dire warnings below, you can
edit /etc/resolv.conf
, e.g. to make it look like
# Generated by NetworkManager
search lan
nameserver 9.9.9.9
And it will work until NetworkManager chooses to overwrite the
file. Not sure if sudo chmod 444 /etc/resolv.conf
be enough to
keep NetworkManager from overwriting it.
You can make /etc/resolv.conf
immutable
If you do edit /etc/resolv.conf
you can make it immutable to
prevent systemd from updating it:
$ sudo chattr +i /etc/resolv.conf
$ sudo rm /etc/resolv.conf
rm: cannot remove '/etc/resolv.conf': Operation not permitted
Debugging a broken DNS
I was living dangerously and simultaneously playing with https://pi-hole.net/ and letting Ubuntu try to upgrade my system. It went south. DNS stopped working. The following were some of the debugging steps I took to try to understand/fix the issue:
Testing resolution - is name resolution working?
In this phase of debugging, I try to do name resolution as configured:
- dig - no namserver specified
- I ran
$ dig www.uu.net
to see if everything was working as intended. Nope. No response. - dig - known-good nameserver
- I ran
$ dig www.uu.net @9.9.9.9
to see if I could resolve against a known-good nameserver. This worked. No issues with connectivity/routing. - dig - 127.0.0.53
- I ran
$ dig www.uu.net @127.0.0.53
to see if the local systemd-resolved nameserver specified in /etc/resolv.conf was working. Nope. - systemd-resolved - how is it configured?
- I ran
$ systemd-resolve --status
to see how systemd thought dns was configured. The wireless interface I was using pointed to a nameserver (the proxy server on my wireless router) that should work:
$ systemd-resolve --status
...
Link 3 (wlp2s0)
Current Scopes: DNS
LLMNR setting: yes
MulticastDNS setting: no
DNSSEC setting: no
DNSSEC supported: no
DNS Servers: 192.168.86.1
DNS Domain: ~.
lan
- systemd-resolve - let systemd resolve a name
- dig(1) and host(1) are
not the only game in town for doing command line DNS look-ups.
Systemd (of course) will do it for you:
$ systemd-resolve www.uu.net www.uu.net: 152.195.32.39
In this case, it worked, which tells me that systemd-resolved is happy and working.
- try dig again
- Try another “normal” lookup:
$ dig www.uu.net
This failed. The conclusion seems to be that the whatever the resolver library is looking at (127.0.0.53) is not working.
- edit
/etc/resolv.conf
- Pointing
/etc/resolv.conf
at working nameservers fixed the problem:
# Generated by NetworkManager
search lan
#nameserver 127.0.0.53 # BROKEN. systemd-resolved nameserver set by NetworkManager
#nameserver 9.9.9.9 # WORKS. quad9 nameserver
nameserver 192.168.86.1 # WORKS. wireless router nameserver
Conclusion - the systemd-resolved is not answering
What name resolution processes are running?
The next question is: what’s (not) running? What’s (not) listening?
To answer these questions, I poked at the network and the running processes:
- nmap - look for listeners
- nmap did not show a DNS listener at 127.0.0.53
gmj@ed home-computing [master] $ sudo nmap -v -sU -PS 127.0.0.53
Starting Nmap 7.60 ( https://nmap.org ) at 2020-05-10 07:51 EDT
Initiating Parallel DNS resolution of 1 host. at 07:51
Completed Parallel DNS resolution of 1 host. at 07:51, 0.02s elapsed
Initiating UDP Scan at 07:51
Scanning 127.0.0.53 [1000 ports]
Completed UDP Scan at 07:51, 2.80s elapsed (1000 total ports)
Nmap scan report for 127.0.0.53
Host is up (0.000049s latency).
Not shown: 997 closed ports
PORT STATE SERVICE
68/udp open|filtered dhcpc
631/udp open|filtered ipp
5353/udp open|filtered zeroconf
zeroconf :: Is zeroconf listening? What is 5353?
It looks like 5353 is multicast DNS.
$ egrep -i domain\|dns /etc/services
domain 53/tcp # Domain Name Server
domain 53/udp
mdns 5353/tcp # Multicast DNS
mdns 5353/udp
- lsof -i
- look at listening ports
Next, I used lsof(1) to look at listening and connected ports, successively grepping out the “known” and “uninteresting”:
gmj@ed home-computing [master] $ sudo lsof -i -n | egrep -vi established\|dropbox\|ssh\|http\|smtp\|bootp\|ipp
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
avahi-dae 1064 avahi 12u IPv4 25434 0t0 UDP *:mdns
avahi-dae 1064 avahi 13u IPv6 25435 0t0 UDP *:mdns
avahi-dae 1064 avahi 14u IPv4 25436 0t0 UDP *:42027
avahi-dae 1064 avahi 15u IPv6 25437 0t0 UDP *:44240
dnsmasq 2538 libvirt-dnsmasq 5u IPv4 37248 0t0 UDP 192.168.122.1:domain
dnsmasq 2538 libvirt-dnsmasq 6u IPv4 37249 0t0 TCP 192.168.122.1:domain (LISTEN)
brave 28951 gmj 43u IPv4 250584 0t0 UDP 224.0.0.251:mdns
Looks like avahe-dae[mon] is listening on multicast-dns (mdns) on 5353, and there are outbound connections to 192.168.122.1:53, which was a wired connection to the router, but nothing listening on port 53. This is a problem.
Why is systemd-resolved not answering - do I care?
Do I really want to debug systemd-resolved? No. I was half planing on upgrading to the latest Ubuntu release (20.04) anyhow. This seems like the time to do it, rather than debugging this problem further.
Lessons learned
- run servers on dedicated systems
- I had been messing with https://pi-hole.net/ on this system (a laptop that mostly does not move/go off the net). There was some confusion/doubt about whether this interacted badly with things/caused the problems. It may have. I un-installed it. But running a dedicated server would be better.
- Failed Ubuntu “upgrade”
- The actual trigger that made things not work was an attempt to let the Ubuntu installer upgrade the system. This failed in strange ways. After running, my system which was Ubuntu 19.10 reported (/etc/issue) to being 18.04 and the pi-hole logs reported that they could not find the wireless interface it had been configured to use (but the device was still there, same name, still working…)
Next Steps
TODO Do a hard upgrade to Ubuntu 20.04
- Full backup, wipe disk, restore…
- Use ansible, docker, chief or similar to make configs repeatable.
TODO Set up a server to run pi-hole and other services
- Possibly re-purpose an old laptop or pogo-plug device running something minimal like Arch Linux
- Use ansible, docker, chief or similar to make configs repeatable.
Things to learn more about
- avhai
- So what is avhai-dae[mon]? It looks like a zero-configuration (I wish !) networking services that uses multi-cast DNS on a local network. Do I need to be running this?
- systemd-resolved
- I may want to learn more about this, as it is part of the new regime in most Linux distros. But not now.
For Further Reading
- resolvers, stub resolvers and nameservers
- https://unix.stackexchange.com/questions/500536/what-are-dns-server-resolver-and-stub-resolver
Day 10 of #100DaysToOffload. Delayed a day due to DNS problems :-)