HTML “text-indent: -9999px” and holding the line

Because today is Towel Day and because it’s just you and me, I can write about stuff I couldn’t say on a large platform like our Webmaster Central Blog. For example, I can write that:

 

If possible, it’s still best to avoid techniques such as “text-indent:-9999px” or “margin:-4000px” or “left:-2000em”.

 

And you can scream at me, “But I do it for accessibility! You’re mean, I’m nice!”

 

And that may be true. Another truth is that using “text-indent: -9999px”, or hiding text (keeping text out of the user’s sight in a browser), is common spammer’s technique to hide off-topic keywords and/or links to manipulate search engine rankings.

 

hidden links using text-indent
Example of “text-indent:-9999px” to hide unrelated links and boost PageRank to those sites. Search engines will never notice!

 

Google has top-secret algorithms designed to detect when text is hidden/positioned off screen. If this type of hidden text is detected, our important red phone rings, and this becomes one of the signals that may cause us to believe your site is deceptive.

 

Given that Google only wants to return the most relevant sites to users, if we consider your site deceptive, its rankings may be negatively affected.

 

So what should a webmaster do?

 

Try to hold the line — avoid hiding text. We’re trying to find an elegant solution. And once we do, I’ll write an official post.

 

What solutions are being considered?

 

With HTML5, my friend Ian Hickson shared a few possibilities that could satisfy both webspam and accessibility needs:

 

  1. Hide content from screen users but show it to screen reader users.
    Use media-specific CSS, e.g. @media speech { } vs @media screen { }.

    Caveat: Not yet implemented by screen readers.

  2. Hide irrelevant content, such as hiding a login form once the user is logged in.
    Use HTML5’s hidden=”” attribute.

    Caveat: This was just drafted a few months ago. I’ll get Ian’s latest take on the subject once he returns from paternity leave. Congrats, Ian!

 

Happy Towel Day, everyone!

 

Update made later on towel day: Luigi Montanez and I have some crazy connection. He just posted on the same friggin text-indent topic (enjoy my anchor text, Luigi!). Suddenly all that was impossible is possible.

rel=”canonical” for non-HTML files?

Update in June 2011: Google now supports
rel=”canonical” in the HTTP header
! It’s party time.

 

Q: How would Google implement rel=”canonical” for non-HTML files?

 

A: Likely through the link entity in the HTTP header. It would look something like this:

 

HTTP/1.1 200 OK
Date: Tue, 20 Apr 2010 07:28:14 GMT
Server: Apache/2.2
Content-Type: text/html; charset=UTF-8
Link: <http://www.example.com/preferred-canonical-url.doc>; rel="canonical"
Transfer-Encoding: chunked

 

Q: When will this feature be ready?

 

A: Oh no, sorry if I misled. We probably won’t support this any time soon.

 

Q: Rats!

 

A: That’s not a question.

 

Q: So why wouldn’t you guys support rel=”canonical” in the HTTP header?

 

A: Truth is, we’ve discussed it internally and we’re currently leaning toward the worry that it may cause more damage than benefit.

  • An HTTP header with rel=”canonical” could be too obscure for many webmasters to debug — it’s a lot more obvious to troubleshoot when it’s in the HTML source.
  • We favor verifying correct adoption/implemention before increasing support for new features. For example, we waited some time before rolling out cross-domain rel=”canonical” to be sure same-domain rel=”canonical” was largely properly implemented.
  • Less notably, it’s not an often requested feature.
  • Update on 04/20/2010: We still use URLs in your Sitemap as a hint for your preferred canonical whether it’s HTML or non-HTML content (thanks to John for mentioning this!). So when we have a cluster of duplicates, your Sitemap URL can be the display version and obtain the linking properties from the cluster. Unlike rel=”canonical”, it’s not quite as strong a signal and it doesn’t have the ability to actually cluster dupes.

 

Last thing: If you feel that the lack of HTTP header support for non-HTML files is a gaping hole in rel=”canonical” functionality, let us (me) know. Otherwise, it’ll probably remain low to miniscule priority for some time to come.

Title and name attributes in HTML anchors

How does Google currently process title and name attributes in HTML anchors?

 

<a title=”sweet link!” name=”nice name!” href=”page.html”>foo</a>

 

title = not processed by Google (please keep in mind that it could be useful for other engines or applications)

 

name = not processed for ranking/content relevance, but can be utilized for understanding page structure (such as with JavaScript functions)

 

Thanks to Joachim Kupke (super nice guy) for checking the code to provide clarification.