Can you describe your deployment environment in a little more detail?
I’m going to guess that you’ve got gunicorn running behind some web server - nginx perhaps? Is nginx connected to gunicorn through that socket file?
The HTTP_HOST header is set by the client - nginx should be passing it through unmodified, however this error makes it appear to me like something (proxy, nginx, some other middleware) is rewriting that header - unless you have other processes that are connecting directly to that socket file.
One way to attempt to verify this is to change your nginx connection to use an ip address/port instead of a socket to see if the error goes away. You could also see what the most verbose logging is available in gunicorn to see if it’ll show you the full headers being presented from the server. You could also deploy a minimal wsgi application that doesn’t do anything other than print the request coming in and returns a “success” result.
(The objective behind any of these is to determine if this header is being changed before or after its handed off to Django.)
Yes, I have gunicorn and nginx. I think it’s connected through a socket file, yeah. I really don’t know much more about server administration to answer or understand any other point.
This is how nginx passes the connection to Django:
indicates that the request is being passed through a TCP connection.
Now, this setting:
has a potential problem. If the original request does not contain that header, nothing will be set for the request. According to the docs at Module ngx_http_proxy_module, it’s safer to use: proxy_set_header Host $host;
(Now, while I think this is relevant here - whether this has any direct bearing on your log issue is an open question.)
Is this site open to the public internet? If so, you’re going to get a lot of garbage requests being submitted to your site - and without any up-front filtering from nginx or some other tool, these errors are not going to be fully preventable. (It’s one of the reasons why I will never stand up an application based off of the site root. Every app I deploy is deployed into a subdirectory. That prevents most of the site scans from triggering anything. Even with this, more than 99+% of all HTTP requests are scripted scans. Thanks to “fail2ban”, they’re kept down to a reasonable level.)
Your other option, if you just wanted to mask the issue rather than fix the problem, would be to add a filter to your logging configuration to prevent those items from being logged.
Yes, I read about changing the setting to $host but I’ve tried that and kept getting those errors. Actually it seemed I got more, but I’m sure it was coincidental.
I also tried adding a custom handler for our logging configuration which is:
I created a class extending django.utils.log.AdminEmailHandler but I wasn’t sure where to go from there. I would get an error about the handler not being defined correctly.
Also, what exactly do I need to check within the class to silence that specific error?
In our site we have this feature where clients can register their own domain (that’s why we allow * all hosts and then check with custom middleware) and set up a CNAME pointing to our proxy (which redirects to the main domain). Do you think this could be related to the issue?
You would create a Filter class with a filter method. The filter method can look for that specific string and return 0. (Note that you’re dealing with a LogRecord object and not just the text of the message.)
You would then put that filter in a named entry in the filter section of your logging configuration, and identify that named entry in the filters setting of your handler(s).
That way, if you wanted to track these, you could segregate them out to a different handler.
Regarding the name, I really don’t think anything in that area has anything to do with this.
This header is originally set by the client.
That means that either something (someone) is trying to access your server using a bot/client that is setting the HTTP_HOST value to that specific value, or you have something in between the client and your applications changing the value of that header.
I would check the configuration of the proxy (unless you’re using nginx as the proxy and you’ve already provided that information).
But beyond that, the only way I know of to track something like this down is to trap and trace requests to see at what point the header is being changed. I’d also look for additional information such as the IP address of the originator to try and determine the source - you might want (or need) to go back into your nginx configuration to get more detailed logging for this in addition to enhanced logging within gunicorn.
But what exactly do I need to check in the filter method of the Filter class to identify that error? What should I be looking for in the record object to tell it’s that “Invalid HTTP_HOST header” error and return False? Maybe this?
class SilenceInvalidHttpHostHeader(logging.Filter):
def filter(self, record):
return 'Invalid HTTP_HOST header' not in record.getMessage()
I’m checking the gunicorn error logs and I’m not finding that error at all.
Or, since you’re not looking at the variable portion of that message, you could also say ... not in record.msg and avoid the overhead of the function call.
I wouldn’t expect it to be flagged as an error anywhere else.
Nor would it show up under that specific text. That message is generated within django.http.request.HttpRequest, in the get_host method.
class SilenceInvalidHttpHostHeader(logging.Filter):
def filter(self, record):
return 'Invalid HTTP_HOST header' not in record.msg
Doesn’t look like there’s anything else wrong with it, right?
Is there anything else I could do to silence the error? I tried following your other suggestions about how that error would happen and try to actually solve it but I really don’t know what I should be looking for.
Your handler that you have defined is only handling the errors being sent to the AdminEmailHandler. It’s not addressing what’s being written to stdout.
You have propagate = True in your logger for django.request. That means that, in addition to your handling of these logs, to also pass the log entry along to Django’s default logger.
To filter these messages out from the console without squelching everything from ‘django.request’, you either need to inject your filter into django’s default logger (no idea how to specifically do that), or effectively recreate the default loggers settings in your own handler and set propagate = False.
I think I understand. Essentially I’d need to override Django’s default logging so that it silences that particular error, right? How would I go about that?
Still (again, if I understood correctly), since we have two different handlings of django.request errors here, the mail_admins handler and Django’s default logger (since it propagates), I’d be fine with django logging that error (to stdout I assume). What I don’t want is to keep getting emails with that specific error. But I did that already with the handler definition using the filter, right? So why do I keep getting those emails?
See the default logger docs - Django’s default logger does send the email messages.
That’s why you either need to override it to change its behavior or turn propagate off and handle everything yourself.
Yes. No, I have no idea specifically how to do that. In general you would somehow want to inject your filter into its configuration - but that’s as far as my knowledge goes in that area. (We always just define our own logging with propagate false.)
Wait, I’m realizing something now. The django.request logger propagates to its parent logger django. And this is defined at django.utils.log as follows:
Which means the message simply gets logged to the standard output, but no email is sent, which is an acceptable behavior for me. What am I missing here then?
Yes, I saw that myself, that’s what I meant with the django logger where the message propagates to shouldn’t be sending mail emails.
Could it be that the “Invalid HTTP_HOST header” error is processed by the django.security logger instead of django.request? I wish logs included the error class, not just the error message, because now I don’t know what that error is.
You can find where the error is generated in django.http.request.HttpRequest.get_host. The code at that point is examining the HTTP_HOST header in the request to see if it’s in the ALLOWED_HOSTS setting.
However, before ALLOWED_HOSTS is checked, Django first checks to see if the supplied domain name in that header is a valid hostname based upon RFC 1034 and 1035 (as the error message says).
Since '/home/scheduler/run/gunicorn.sock' is not a valid host name, the error is thrown - again before any comparison is made with the ALLOWED_HOSTS setting.
So, you’ve got something in your stack before Django sees this request that is replacing the HTTP_HOST header with that value. My initial guess would be nginx, but I’ve got no way to determine that.
That’s why I keep suggesting that you do some very detailed logging and/or tracing of data through the stack to see where this alteration occurs.
(Also, going back to one of your earlier posts - you mention the use of some custom middleware. I’d also double- and triple- check that to ensure it’s not mangling that header.)
Sorry, the Django version I’m using only uses the console handler for the default django logger.
You’re right, that’s where the error is generated and is caught by django.request, not django.security. Well, I guess I need to solve the actual issue but it goes out of the scope of Django so I’ll be seeking help elsewhere. Thanks for everything!
As of your last comment, this is our middleware:
class DomainNameMiddleware(object):
"""
Checks if the host is trusted. Default ALLOWED_OPTIONS and SITE_ROOT are accepted, except the wildcard (*):
If it isn't registered as a custom domain, DisallowedHost is raised.
If a custom domain with the host name is found, the test passes and request.domain is set to the corresponding
Domain instance. The current managed association is set to the site's, if the user is an admin of the association.
"""
def process_request(self, request):
request.domain = None
host = request.get_host()
domain = split_domain_port(host)[0]
allowed_hosts = [pattern for pattern in settings.ALLOWED_HOSTS if pattern != '*']
allowed_hosts.append(settings.SITE_ROOT)
site = Site.objects.get_current()
# if the host is not one of the "default" ones, excluding the wildcard (*), check if it is a registered domain
if domain and not validate_host(domain, allowed_hosts):
site = RequestSite(request)
try:
if domain.startswith('www.'):
# if www.domain.org it should match domain.org
domain = domain[4:]
request.domain = Domain.objects.get(name=domain)
site.name = request.domain.association.name
try:
set_session_association(request, request.domain.association)
except PermissionDenied:
pass
except Domain.DoesNotExist:
logger = file_logger('disallowed_hosts')
logger.error('DISALLOWED HOST\nHost: {}\nPath: {}\nGET: {}\nPOST: {}\nCOOKIES: {}\nMETA: {}'.format(
request.get_host(), request.path, request.GET, request.POST, request.COOKIES, request.META))
return HttpResponse(status=403)
request.site = site
I have very similar case. I use nginx, gunicorn, Django 5. The nginx is configured to serve multiple domains. What is even more weird that I get disallowed hosts error from domains served from the same physical server but of course they have different server_name directive in my nginx file. So I do not have idea, how my django app gets those request.
So it is not clear for me that if there is a static site and a dynamic one within the same nginx config and there is a request to the static site, how can the dynamic site throw an exception about disallowed hosts.