We’ve known for a while that some proxies, anti-virus programs, and other client side applications that do content analysis munge HTTP request headers to disable compression. They do this because it’s easier to do content analysis on uncompressed content. For web service providers though, this means having to serve uncompressed content to users, resulting in higher bandwidth bills.
This was first discovered by Tony Gentilcore and Andy Martone at Google, and they presented their findings and a workaround
at Velocity 2009 and 2010. Marcel’s post on
the YDN blog has the details and an alternate solution, so I won’t go into them here, but in short, it says that 15% of
requests did not have the
Accept-Encoding: gzip header.
The state today
A few days ago, Avi Keinan posted some stats to the Webpagetest forums about the state of compression headers today. He found that the numbers were closer to 1%, but his sample size was fairly small, so I decided to retry it with a subset of our data from August 15. This is what we found:
- Sample size
- Requests without a suitable Accept-Encoding header
- 190,364 (1.07%)
- Unique requests without a suitable Accept-Encoding header
- Unique headers that may have once been an Accept-Encoding header
So there were 29 headers that might have been an accept-encoding header. I arrived at these by a simple method of eliminating all the headers I knew about, and then all the headers that made sense in some context or the other. The ones that were left fell into a few patterns:
|Header name||Header value||What it might have been|
(4 dashes, colon, 18-251 dashes)
(x, dash, 5 x)
(22 to 210 'x'es)
(30 to 117 'X'es)
(x, dash, 17 'x'es)
(26 or 30 'x'es)
(4, 12 or 17 dashes)
|Accept-Encoding: gzip, deflate, sdch|
(14 or 18 pluses)
(13 or 14 pluses or 'X'es)
|gzip, deflate, sdch||Accept-Encoding: gzip, deflate, sdch|
Not all of these map to the
Accept-Encoding header, but a large number of them do.
Who’s doing this?
I looked to see if there were any patterns with the user agents or other headers that would cluster these requests together. There didn’t seem to be any major patterns.
We see requests from the following Browser/OS combinations (in decreasing order of popularity):
- Safari/Mac OS X
- Safari/iOS 5.1
The common theme here is Windows. 99.668% of all requests with mangled accept-encoding headers were from a browser running on Windows.
The iOS requests all came through a proxy that identified itself as
localhost.localdomain, so my best guess is that this was someone
running their iPhone through their desktop.
The Mac OS X requests appeared to also come in through a proxy, which sets an
Which leads us to Proxies.
I see many requests with a cookie named
_sm_au and value
aaaaaaaaaaaaaaaaaaaa. This is weird because we don’t set any cookies, so we
should never receive a cookie from a client. I was unable to find any information on what cookie this is, so any ideas are appreciated,
However, it could possibly be SMProxy.
A small number of requests also had an MSISDN number, suggesting that they were mobile devices, however they all ran Firefox 14 on Windows.
Unfortunately we don’t currently have information on any browser plugins that may have caused these changes, but I’ll be looking for that going forward.
The number of requests coming in with mangled accept-encoding headers has definitely gone down from 15% to about 1%… or this could be simply because we’re not Google. Some of these appear to be mangled by proxies, but not all of them.
1% of requests is still a large number when you’re handling over a billion hits a month, so it’s possibly still worth it to ignore some of these headers and gzip content anyway.