Ad-Blocker Operation Using dnsmasq and custom httpd410server |
Yoyo's list of ad servers formatted for dnsmasq is basically a long list of TLDs and URLs, each followed by the IP address 127.0.0.1 so that browser requests for ads go to 127.0.0.1. My script that downloads the ad server lists changes the DNS resolution addresses from 127.0.0.1 to a local server 10.42.2.1 that runs a small http and https server daemon I wrote that always returns HTTP 410 (Gone). HTTP 410 has the intended meaning of the client never ever asking for the blocked resource again, though implementation on client web browsers and mobile apps is dubious.
I wrote this little web server daemon called httpd410server because I wanted to see ad-servers being blocked actively in real-time and the web sites or mobile apps requesting them. This little web server is coded to always return HTTP 410 (GONE) whenever any HTTP or HTTPS request comes in, and also logs to syslog the HTTP GET request it responds to. Therefore, I have a way to keep track of all ad server blocking activity in the house.
A free download link to the stuff I describe here is at the bottom of this post.
This daemon is in the class of DNS Sinkholes and not really required for ad blocking; ad blocking works fine with redirection to a IP address with no DNS sink running on the HTTP server port of that IP, resulting in "connection refused" from the operating system. I just did not like redirection to a non-existent service port - it is cool to have something to at least listen on the port, log the ad requests and return a meaningful (intentionally meaningless!) response.
As depicted in the diagram at the top, here is what happens using this setup:
- Someone in the house launches a web-browser and types in a URL, or launches a mobile app
- The web browser or the mobile app loads the main web page and requests resolution of TLDs and URLs for advertisements from ad servers on that page
- dnsmasq intercepts all DNS requests, including the ones for ad servers
- dnsmasq checks the list of ad servers for the TLD or URL being requested. If it finds a corresponding matching entry in the ad-server blocklist, it resolves the TLD or URL to a local IP address (10.42.2.1). Otherwise, the normal Internet IP address for the web resource is returned.
- The user's browser or app then connects to the IP address returned by dnsmasq to load the contents from that server. For ad server requests, the web browser or mobile app connects to 10.42.2.1 as that was the IP resolved to by dnsmasq.
- 10.42.2.1 gets a connection from the web browser or mobile app, logs it into syslog (/var/log/messages) and returns a HTTP 410 (GONE) error response to the requesting web browser or mobile app.
- The net effect is the advertisement is not displayed, and we have a log of the web site requesting ads, for purely academic observation and possible analysis. (Answers to questions like this become easy: What advertising networks do the top news services use? How many requests for ads is a web page making?)
The C++ source for the little web server is given below. It compiles straight away with gcc on Linux. It a minimal multi-threaded web server always returning only one thing - HTTP 410. The -lpthread command line parameter for compilation is required to link with the Linux POSIX threads library.
The source code can also serve as a demonstration or tutorial for beginner programmers delving into the world of Linux C or C++ multi-threaded socket server programming serving multiple ports. Some of the features of my small program that may be of particular interest to novie network programmers are:
- socket(), bind(), listen() and then select() and accept() in a loop to listen to more than one port and serve clients. The server listens on both HTTP (80) and HTTPS (443) ports.
- multi-threaded TCP socket server using Linux Posix Threads: pthread_create() etc.
- use of PTHREAD_CREATE_DETACHED in pthread_attr_setdetachstate() so that the service threads are marked as detached and when they terminate, resources are automatically released back to the system without the need for another thread to join with the terminated thread which is the default behavior.
- critical section synchronization using mutex mechanism: pthread_mutex_init(), pthread_mutex_lock and pthread_mutex_unlock(). These are used to keep track of the number of simultaneous clients being served to keep DOS (denial of service) attacks under check using an upper limit of maximum clients supported at the same time
- using the syslog (syslogd daemon provided) facility for logging: openlog(), setlogmask(), syslog() etc.
- dropping of root privileges after binding the sockets to file descriptors, using setgid() and setuid(). this server listens to protected ports below 1024, and is thus required to startup to the bind phase with root privileges. After binding, it sets its gid and uid to a non-privileged account - a standard security practice for all servers
- usage of SO_REUSEADDR to set socket option in setsockopt() to avoid TIME_WAIT induced "address already in use" errors when calling bind() on rapid server daemon restart
- usage of SO_RCVTIMEO socket option for receiving data on a socket with timeout - the program uses this to retry to recv() before giving up
- usage of the venerable unix select() call including usage of the helpful macros FD_ZERO, FD_SET, FD_ISSET etc.
- correct way of shutting down sockets and associated file descriptors - shutdown() with SHUT_RDWR) before close()
- general error handling - check all return status and do something intelligent, even if it means exiting the server daemon process in situations no obvious way out can be thought of
Here is a screen full of examples of httpd410server working:
Finally I needed a standard init script to make a service out of the daemon, and that completed the setup. Here is the classic init.d script to start and stop httpd410server. This should be copied into /etc/init.d, after which it can be added to chkconfig using "chkconfig --add httpd410server",
You can download a tarball containing the daemon, source and init script from my google drive.