active server side load balance tuning and control


fyi: managed dns services - edgedirector.com

One of the parameters that client dns administrators are able to set is the url that is monitored by edgedirector.

This has three important implications.

First, administrators are able to set an exclusive url that is normally not used by the public. This makes it easy and convenient to filter traffic statistics to eliminate traffic associated with server health monitoring.

Second, by using a custom script as the monitor target, it is possible to include any aspect of server health as the criteria for the response to the health probe. For example, an administrator might decide to have a script evaluate the health of a backend database as part of the response. Should the backend be degraded to a degree that impairs the ability of the site to perform properly, the script can return a http 500 status code. Doing so will cause the dns service to withdraw that server from service and spread the load to other servers or bring a hot spare online.

Third, by extending the technique described above, it is also possible to throttle the load sent to a particular server forming part of a globally distributed cluster. When a server is too busy, the script can again return a http 500 and the dns answers will temporarily exclude that server ip from answers given to client dns queries. Once the server is below the determined load threshold, it simply needs to start responding with a http 200 status code to be included in dns answers again.

Implementing custom responses to health monitor probes does involve the use of suitable custom code. However, the benefit gained is tremendous because the server can directly influence the dns responses for itself. It changes the one way nature of server monitoring and dns integration into a two way feedback servo loop. Now, individual servers can influence the amount of work they have to do before reaching load levels that cause critical errors. They can keep some reserve processing power to continue servicing existing clients. New clients will be directed to other servers until the crisis has passed.

Load shifting, moving clients before a complete outage is an important advantage. It is sometimes argued that cached dns records are a potential weakness of global load balancing. However, by implementing preventive load shifting that potential weakness can be countered. Load shifting combats the effects of cached records by creating time for the cache to expire before total failure. In fact, by relieving the server of some load, a potential failure might be avoided completely.

The requirements are simple: a unique url, code to respond to conditions, and the return of the appropriate http status code when probed. Only two status codes are required, a normal http 200 when all is well, and a http 500 when no more traffic is desired to that node.

Clients requiring further information with respect to their implemtation of custom responses can contact support for further advice.

note:
The techniques described above can be implemented for other services by implementing a http responder that acts as a proxy for the service being monitored.

source: edgedirector.com