rel=”canonical” for non-HTML files?
Update in June 2011: Google now supports
rel=”canonical” in the HTTP header! It’s party time.
Q: How would Google implement rel=”canonical” for non-HTML files?
A: Likely through the link entity in the HTTP header. It would look something like this:
HTTP/1.1 200 OK
Date: Tue, 20 Apr 2010 07:28:14 GMT
Content-Type: text/html; charset=UTF-8
Link: <http://www.example.com/preferred-canonical-url.doc>; rel="canonical"
Q: When will this feature be ready?
A: Oh no, sorry if I misled. We probably won’t support this any time soon.
A: That’s not a question.
Q: So why wouldn’t you guys support rel=”canonical” in the HTTP header?
A: Truth is, we’ve discussed it internally and we’re currently leaning toward the worry that it may cause more damage than benefit.
- An HTTP header with rel=”canonical” could be too obscure for many webmasters to debug — it’s a lot more obvious to troubleshoot when it’s in the HTML source.
- We favor verifying correct adoption/implemention before increasing support for new features. For example, we waited some time before rolling out cross-domain rel=”canonical” to be sure same-domain rel=”canonical” was largely properly implemented.
- Less notably, it’s not an often requested feature.
- Update on 04/20/2010: We still use URLs in your Sitemap as a hint for your preferred canonical whether it’s HTML or non-HTML content (thanks to John for mentioning this!). So when we have a cluster of duplicates, your Sitemap URL can be the display version and obtain the linking properties from the cluster. Unlike rel=”canonical”, it’s not quite as strong a signal and it doesn’t have the ability to actually cluster dupes.
Last thing: If you feel that the lack of HTTP header support for non-HTML files is a gaping hole in rel=”canonical” functionality, let us (me) know. Otherwise, it’ll probably remain low to miniscule priority for some time to come.