rel=”canonical” for non-HTML files?

Update in June 2011: Google now supports
rel=”canonical” in the HTTP header
! It’s party time.

 

Q: How would Google implement rel=”canonical” for non-HTML files?

 

A: Likely through the link entity in the HTTP header. It would look something like this:

 

HTTP/1.1 200 OK
Date: Tue, 20 Apr 2010 07:28:14 GMT
Server: Apache/2.2
Content-Type: text/html; charset=UTF-8
Link: <http://www.example.com/preferred-canonical-url.doc>; rel="canonical"
Transfer-Encoding: chunked

 

Q: When will this feature be ready?

 

A: Oh no, sorry if I misled. We probably won’t support this any time soon.

 

Q: Rats!

 

A: That’s not a question.

 

Q: So why wouldn’t you guys support rel=”canonical” in the HTTP header?

 

A: Truth is, we’ve discussed it internally and we’re currently leaning toward the worry that it may cause more damage than benefit.

  • An HTTP header with rel=”canonical” could be too obscure for many webmasters to debug — it’s a lot more obvious to troubleshoot when it’s in the HTML source.
  • We favor verifying correct adoption/implemention before increasing support for new features. For example, we waited some time before rolling out cross-domain rel=”canonical” to be sure same-domain rel=”canonical” was largely properly implemented.
  • Less notably, it’s not an often requested feature.
  • Update on 04/20/2010: We still use URLs in your Sitemap as a hint for your preferred canonical whether it’s HTML or non-HTML content (thanks to John for mentioning this!). So when we have a cluster of duplicates, your Sitemap URL can be the display version and obtain the linking properties from the cluster. Unlike rel=”canonical”, it’s not quite as strong a signal and it doesn’t have the ability to actually cluster dupes.

 

Last thing: If you feel that the lack of HTTP header support for non-HTML files is a gaping hole in rel=”canonical” functionality, let us (me) know. Otherwise, it’ll probably remain low to miniscule priority for some time to come.

8 thoughts on “rel=”canonical” for non-HTML files?

  1. Hey Maile,

    What about somehow canonical data in XML sitemap protocols? I agree that doing it in headers would be problematic. It’s not a huge problem for me, but then I don’t have any sites with tons of non-HTML content.

    Thanks!
    John

    [Reply]

    Maile Ohye Reply:

    John, you’re right, Sitemaps can help with this! Lol, I just updated the post.

    Were you thinking more like a Sitemap extension? Or just Sitemaps, as is?

    Also, I think it’s a rarely requested feature because dynamic content usually causes dupes, and dynamic content is usually in HTML.

    take care,
    maile

    [Reply]

  2. Pingback: SearchCap: The Day In Search, April 20, 2010

  3. Pingback: What to do about duplicate content on Non-Html Pages? « Follow Matt Cutts

  4. I think the HTTP header would be a good idea for all the reasons stated on sebastian-x.com. (http://sebastians-pamphlets.com/x-canonical-uri-http-header/) Although I do understand that the current canonical HTML link element relies on comparing similar appearing pages and the comparison couldn’t (?) be made by comparing the HTTP header of one document and the known contents of another.

    [Reply]

    Maile Ohye Reply:

    I appreciate your directing me to this post. Sebastian was on it.

    Seanr, in regard to your comment, even with the HTTP header, we’d still compare the contents of each page at indexing time — that’s not the problem.

    Will ponder this situation more… thanks for all the feedback.

    [Reply]

  5. I’ve been wanting this feature for quite some time – it would be rather useful in many places.

    As for the format, if it were ever really implemented, I would suggest it should be something like:

    X-Canonical: http://url.here

    Using the “X-” prefix, as used with e.g. X-Robots-Tag.

    [Reply]

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>