Update in June 2011: Google now supports
rel=”canonical” in the HTTP header! It’s party time.
Q: How would Google implement rel=”canonical” for non-HTML files?
A: Likely through the link entity in the HTTP header. It would look something like this:
HTTP/1.1 200 OK
Date: Tue, 20 Apr 2010 07:28:14 GMT
Content-Type: text/html; charset=UTF-8
Link: <http://www.example.com/preferred-canonical-url.doc>; rel="canonical"
Q: When will this feature be ready?
A: Oh no, sorry if I misled. We probably won’t support this any time soon.
A: That’s not a question.
Q: So why wouldn’t you guys support rel=”canonical” in the HTTP header?
A: Truth is, we’ve discussed it internally and we’re currently leaning toward the worry that it may cause more damage than benefit.
- An HTTP header with rel=”canonical” could be too obscure for many webmasters to debug — it’s a lot more obvious to troubleshoot when it’s in the HTML source.
- We favor verifying correct adoption/implemention before increasing support for new features. For example, we waited some time before rolling out cross-domain rel=”canonical” to be sure same-domain rel=”canonical” was largely properly implemented.
- Less notably, it’s not an often requested feature.
- Update on 04/20/2010: We still use URLs in your Sitemap as a hint for your preferred canonical whether it’s HTML or non-HTML content (thanks to John for mentioning this!). So when we have a cluster of duplicates, your Sitemap URL can be the display version and obtain the linking properties from the cluster. Unlike rel=”canonical”, it’s not quite as strong a signal and it doesn’t have the ability to actually cluster dupes.
Last thing: If you feel that the lack of HTTP header support for non-HTML files is a gaping hole in rel=”canonical” functionality, let us (me) know. Otherwise, it’ll probably remain low to miniscule priority for some time to come.
8 Replies to “rel=”canonical” for non-HTML files?”
What about somehow canonical data in XML sitemap protocols? I agree that doing it in headers would be problematic. It’s not a huge problem for me, but then I don’t have any sites with tons of non-HTML content.
Maile Ohye Reply:
April 20th, 2010 at 4:29 pm
John, you’re right, Sitemaps can help with this! Lol, I just updated the post.
Were you thinking more like a Sitemap extension? Or just Sitemaps, as is?
Also, I think it’s a rarely requested feature because dynamic content usually causes dupes, and dynamic content is usually in HTML.
That’d be an awesome feature! Great idea 🙂 But why wrap the URL in brackets?
I think the HTTP header would be a good idea for all the reasons stated on sebastian-x.com. (http://sebastians-pamphlets.com/x-canonical-uri-http-header/) Although I do understand that the current canonical HTML link element relies on comparing similar appearing pages and the comparison couldn’t (?) be made by comparing the HTTP header of one document and the known contents of another.
Maile Ohye Reply:
June 2nd, 2010 at 3:16 am
I appreciate your directing me to this post. Sebastian was on it.
Seanr, in regard to your comment, even with the HTTP header, we’d still compare the contents of each page at indexing time — that’s not the problem.
Will ponder this situation more… thanks for all the feedback.
I’ve been wanting this feature for quite some time – it would be rather useful in many places.
As for the format, if it were ever really implemented, I would suggest it should be something like:
Using the “X-” prefix, as used with e.g. X-Robots-Tag.