Simple best practice for sitelink titles

Sitelink titles (anchor text) can be influenced by your webmaster charms! The URLs that Google selects for sitelinks, however, are far less manually manipulated.

 

google sitelinks for oprah.com

Oprah’s sitelink titles include “The Oprah Winfrey Show,” “Contact Us,” “Why Oprah Says She’ll Never Diet…”

 

If the titles of your sitelinks aren’t exactly what you hoped for, a troubleshooting tactic is to investigate the anchor text of your internal links (as it’s one of several factors used to determine sitelink titles). For example, here are a few links on Oprah’s homepage:

 

Text link <a href="http://www.oprah.com/omagazine.html">O, The Oprah Magazine</a>
Link to a CSS sprite (so it’s a less common case, but you get the idea) <a class="bookclub" href="http://www.oprah.com/book_club.html" alt="BOOK CLUB">BOOK CLUB</a>

 

Let's pretend Oprah sees her sitelink "BOOK CLUB," but she would prefer it displayed with standard capitalization as "Book Club". One way to help influence this change is for Oprah (or a web-savvy Stedman) to check the anchor text of her internal links and the alt text of her image links -- making sure to use "Book Club," not "BOOK CLUB."

 

We recently updated our sitelinks FAQ to reflect this tip (thanks to the Sitelinks teams for all their help!):

 

[ At the moment, sitelinks are completely automated. We're always working to improve our sitelinks algorithms, and we may incorporate webmaster input in the future. There are best practices you can follow, however, to improve the quality of your sitelinks. For example, for your site's internal links, make sure you use anchor text and alt text that's informative, compact, and avoids repetition. Read a blog post about the importance of link structure. ]

 


Recovering from a broken heart in HTTP status codes

Sometimes a breakup is a breath of fresh air and sometimes it causes chest pain. The stages below are for the chest pain moments. I’ve totally been there. If you’re there now, maybe HTTP can help you through. HTTP helps everything.

 

I often think of myself as a website with several facets/folders. It’s a pre-req for this whole post so please bear with me. Imagine you’re a structure like:

 

www.you.com/career/
www.you.com/romantic-interest/
www.you.com/hobbies/
www.you.com/hobbies/world-of-warcraft/

(yeah, you’re a real winner.)

 

Just kidding. If you’re into the subdomain vs. subdirectory debate, that’s fine. I made subdirectories because that’s what came to me, but feel free to knock yourself out with subdomains.

 

Now imagine that you’ve been dumped (or you dumped the other person, whatever). And your feelings are just not cooperating. In a word, you feel “sadness.” That sucks, but it’s human, and of course things gets better — it’s like working on your website.

 

Stage I
503s across the entire domain

 

www.you.com is down. This is the equivalent to being in shock. Life is difficult.

 

Foo: Bar, I heard the news… how you doing?
Bar: Site-wide 503.
Foo: Sorry, dude.

 

Stage II (optional)
404s across www.you.com/romantic-interest/
503s or 200s nearly everywhere else

 

This stage is optional and can last weeks/months/until you find someone else attractive. Never been at this stage myself, but sometimes in movies you’ll hear dialogue like:

 

Friend: I know Jack broke your heart, but how about I set you up with Dave? He’s a nice guy.
Main character: I know you’re trying to help, but no Jack, no Dave. I swear off all men completely .

 

Stage III
200s everywhere except…
503s for romantic-interest/

 

Hooray, you’re functioning! This can be a really productive stage. I bet the content on www.you.com/career/ has expanded. And with all your free time, you might even have new folders: www.you.com/hobbies/lifting-weights/ and www.you.com/hobbies/learning-tango/.

 

Stage IV
302 temporary redirect from www.you.com/romantic-interest/
200s everywhere else

 

Also known as the “rebound.” Try not to make the target of your redirect 200 on the domains crazy-person.com or someone-who-has-crushed-on-you-forever.com. The key to Stage IV is that’s it’s a 302, not a 301.

 

Stage V
200s everywhere, including romantic-interest/

 

Yay for life! The orb is green. Backend ready to publish unique content. Your frontend always turning heads.


RSVPs are like little gifts

I’m hoping to have some friends over in a few weeks. I emailed them to 1) save the date, and 2) properly set expectations.

From: Maile Ohye
Date: Mon, Jul 19, 2010 at 9:50 PM
Subject: Save the date: Saturday, August 7th!

 

Mi casa. No one leaves sober.

Several sweet, feeling-type people replied:

  • “Looking forward to it!”
  • “Can’t wait!”
  • “It’s calendar’d!”

 

These people really are cuddly in-person — it’s cute how that correlates.

 

And there were fun RSVPs:

  • “Hell yeah! It’s been too long since I last stumbled home drunk from your place.”

 

Agreed. My bad.

  • “Are we allowed to hit on coworkers?”

 

I like where you’re going with this, and at my place, “inappropriate” DNE.

 

I think this response from my neighbor, though, wins the prize:

  • “FYI. She is a lot of fun but I feel like I am too old for this kind of party. We will see how we feel.”

Indexing OCR text and layered PDFs

Wondering whether PDF overlays was too obscure a topic for the Webmaster Central Blog, I consulted my girlfriend in AdWords, who has knowledge of Search and I believe represents the general audience reaction:

me: marie, yt? qq

 

Marie: sure

 

me: when you read the term “pdf overlay” what do you think? does it sound like a feminine hygiene product?

 

Marie: it sounds more nonsensical than fem hyg pro

 

me: pdf overlay sounds nonsensical? really? so for search, i’m just referring to a text layer under an image in a pdf.

 

Marie: not intuitive
but again…
im in sales

 

Given this one datapoint*, this post is on my blog. Here’s the basic gist of three questions about OCR’d content/layered PDFs that I was recently asked.

 

Can Google index textual content from OCR?

Yes. For example, we can index text layers beneath the image as found in PDF overlays.

 

(Though I have limited understanding, I’ve found that when people talk to me about PDF overlays/image+text PDFs/layered PDFs/text searchable PDFs, they’re largely referring to the same thing. To the rest of the world there may be important distinctions, and it seems like “PDF overlays” could actually be a superset, but let’s not get bogged down by crazy stuff like being accurate.)

 

Bottom line, if it’s been OCR’d, yes, it can be indexed. And PDFs with standard text, like our SEO Starter Guide, have been indexed and searchable for years.

So OCR’d content isn’t considered spammy?

The technique is fine. We’re always trying to find more ways to index quality information. In fact, in our own Indexing pipeline we’re now using OCR on some documents that are without textual content. It’s the early phase, though, and of course standard REP directives still apply.

What if I use OCR on every single page I’ve ever written ever, do you think I could rank numero uno for every query forever?

Forever ever? Unlikely. It’s helpful to remember that the quality and compelling-ness of your content is still important. Long ago, like four years, some webmasters thought that if they dumped their entire database on the web, unleashing millions of new spreadsheets and documents, then their rankings would soar! It didn’t pan out.

 

This OCR-every-document plan has a similar feel.

 

But back to ranking, if your site has content that you feel is important to have indexed and searchable, try to make the content regular text (non OCR) on the page. It’s safer and often more user-friendly. Because sometimes OCR isn’t that clear — so it’ll be hard for search engines to index and users to comprehend.

* Thanks, Marie, for assisting my rigorous research.


Google & Site performance: The compilation answer album

The comments from my last post about text indent made me feel like Captain Hammer, so this time I’m crossing my fingers to make allies, not enemies.

 

Anyone want to talk about site performance? Don’t we all love a faster site? Users dig it. Webmasters can capitalize on it. It pairs perfectly with a sauvignon blanc!

 

I’ve consolidated information from personal conversations with people like Sreeram Ramachandran and Steve Souders, and I combed WMC blog posts and my blog comments for anything site performance related. This information is accurate as of June 1, 2010.

 

How is a page’s performance measured?

It’s measured very, very carefully… We’re of course experimenting with several types of measurements. For instance, toolbar data from opted-in users is a signal.

 

One of the ways we measure a page’s speed incorporates both download and render time — we pay attention to the time taken from the moment the user clicks on a link until just before that document’s body.onload() handler is called. This includes:

  • DNS resolution
  • network travel time
  • browser time to construct and render the DOM
  • time to parse and execute necessary javascript
  • and so on and so forth

 

If actions are deferred to the body.onload() handler, they won’t affect the page load time in this measurement. Please keep in mind that there are several measurement techniques. I only highlighted one of them.

How big of an impact is site performance on Google rankings?

From our original WMC blog post:

 

[ While site speed is a new signal, it doesn't carry as much weight as the relevance of a page. Currently, fewer than 1% of search queries are affected by the site speed signal in our implementation and the signal for site speed only applies for visitors searching in English on Google.com at this point. ]

Does Webmaster Tools’ Site performance feature consider the site’s geographic preference settings and report accordingly?

Some of our speed statistics come from real user data (opted-in toolbar users). Therefore, if your site’s target audience consists of mainly Australian users, then our performance numbers should reflect their usage.

What about ads? The slowest thing on my website costing me the last 7 points to the full 100 in Page Speed is Google’s AdSense ads.

One factor that makes ads kind of slow is their use of inline DOM
elements like document.write(), which doesn’t allow deferred loading (because the document.write may alter the page’s content, the browser has to wait).

 

The good news is that Steve Souders, Alex Russell, along with several of our co-workers and many outside developers, are looking into improving the speed of external factors like ads, etc. There are some promising things to keep an eye out for: html5 and its iframe attributes (seamless and srcdoc) and the FRAG tag.

 

Additionally, asynchronous loading would be a terrific improvement in the ads space. In fact, companies like BuySellAds.com are already using this technique to improve performance for their publishers.

What are the typical causes/solutions regarding fixing long time-to-first-byte metrics? Other than reducing the number of requests what other optimizations are there?

Can you flush the document early? It’s covered in Chapter 12 of “Even Faster Web Sites.”

(And then there’s the really old stuff that I’ve answered before about site performance.)

 

Is it possible to check my server response time from different areas around the world?

Yes. WebPagetest.org can test performance from the United States (both East and West Coast—go West Coast!), United Kingdom, China, and New Zealand.

What’s a good response time to aim for?

If your competition is fast, they may provide a better user experience than your site for your same audience.

 

Otherwise, studies by Akamai claim 2 seconds as the threshold for ecommerce site “acceptability.” Just as an FYI, at Google we aim for under a half-second.

Does progressive rendering help users?

Definitely! Progressive rendering is when a browser can display content as it’s available incrementally rather than waiting for all the content to display at once. This provides users faster visual feedback and helps them feel more in control. Bing experimented with progressive rendering by sending users their visual header (like the logo and searchbox) quickly, then the results/ads once they were available. Bing found a 0.7% increase in satisfaction with progressive rendering. They commented that this improvement compared with full feature rollout.

 

How can you implement progressive rendering techniques on your site? Put stylesheets at the top of the page. This allows a browser to start displaying content ASAP.

Sweet! That’s it for now. See you in the comments if you have questions. eof


HTML “text-indent: -9999px” and holding the line

Because today is Towel Day and because it’s just you and me, I can write about stuff I couldn’t say on a large platform like our Webmaster Central Blog. For example, I can write that:

 

If possible, it’s still best to avoid techniques such as “text-indent:-9999px” or “margin:-4000px” or “left:-2000em”.

 

And you can scream at me, “But I do it for accessibility! You’re mean, I’m nice!”

 

And that may be true. Another truth is that using “text-indent: -9999px”, or hiding text (keeping text out of the user’s sight in a browser), is common spammer’s technique to hide off-topic keywords and/or links to manipulate search engine rankings.

 

hidden links using text-indent
Example of “text-indent:-9999px” to hide unrelated links and boost PageRank to those sites. Search engines will never notice!

 

Google has top-secret algorithms designed to detect when text is hidden/positioned off screen. If this type of hidden text is detected, our important red phone rings, and this becomes one of the signals that may cause us to believe your site is deceptive.

 

Given that Google only wants to return the most relevant sites to users, if we consider your site deceptive, its rankings may be negatively affected.

 

So what should a webmaster do?

 

Try to hold the line — avoid hiding text. We’re trying to find an elegant solution. And once we do, I’ll write an official post.

 

What solutions are being considered?

 

With HTML5, my friend Ian Hickson shared a few possibilities that could satisfy both webspam and accessibility needs:

 

  1. Hide content from screen users but show it to screen reader users.
    Use media-specific CSS, e.g. @media speech { } vs @media screen { }.

    Caveat: Not yet implemented by screen readers.

  2. Hide irrelevant content, such as hiding a login form once the user is logged in.
    Use HTML5′s hidden=”" attribute.

    Caveat: This was just drafted a few months ago. I’ll get Ian’s latest take on the subject once he returns from paternity leave. Congrats, Ian!

 

Happy Towel Day, everyone!

 

Update made later on towel day: Luigi Montanez and I have some crazy connection. He just posted on the same friggin text-indent topic (enjoy my anchor text, Luigi!). Suddenly all that was impossible is possible.


Ciao Roma!

Holiday with my girlfriends was amazing. My new developments:

  • I want to learn more about Android. (Lol, my manager has been asking the same of me.) I’ll see if I can start a 20% project.
  • With the GPS in my Nexus One and Google Maps, one could mistake me for a person with a sense of direction.
  • Gelato can be eaten for/after every meal.
  • I think I got my mojo back.
view from tivoli, italy
View from Tivoli, Italy. About an hour outside of Rome.

 

hadrian's villa
Emperor Hadrian’s crib, constructed circa AD 120.

 

maile ohye and girlfriend at the vatican museum
My gf and me (sportin’ an audio guide) at the Vatican Museum.


Reproductively focused “blathering idiots”

My team recently moved from the Android building to the new posh office space right across the creek from main campus. In our big microkitchen, I enjoyed talking with one of my new neighbors. Later that night he sent this really funny email that referenced an equally funny study:

Hi Maile,

 

Fun chatting with you today. It got me thinking about this report on a study I read, where guys have a tendency to be reduced to blathering idiots when they are talking to an attractive girl. And I was totally doing that today! I promise, I’m usually extraordinarily eloquent and charming.

 

The same study said that girls aren’t affected by chatting to handsome men, so I’m not surprised that you didn’t change. haha

 

See ya.


rel=”canonical” for non-HTML files?

Q: How would Google implement rel=”canonical” for non-HTML files?

 

A: Likely through the link entity in the HTTP header. It would look something like this:

 

HTTP/1.1 200 OK
Date: Tue, 20 Apr 2010 07:28:14 GMT
Server: Apache/2.2
Content-Type: text/html; charset=UTF-8
Link: <http://www.example.com/preferred-canonical-url.doc>; rel="canonical"
Transfer-Encoding: chunked

 

Q: When will this feature be ready?

 

A: Oh no, sorry if I misled. We probably won’t support this any time soon.

 

Q: Rats!

 

A: That’s not a question.

 

Q: So why wouldn’t you guys support rel=”canonical” in the HTTP header?

 

A: Truth is, we’ve discussed it internally and we’re currently leaning toward the worry that it may cause more damage than benefit.

  • An HTTP header with rel=”canonical” could be too obscure for many webmasters to debug — it’s a lot more obvious to troubleshoot when it’s in the HTML source.
  • We favor verifying correct adoption/implemention before increasing support for new features. For example, we waited some time before rolling out cross-domain rel=”canonical” to be sure same-domain rel=”canonical” was largely properly implemented.
  • Less notably, it’s not an often requested feature.
  • Update on 04/20/2010: We still use URLs in your Sitemap as a hint for your preferred canonical whether it’s HTML or non-HTML content (thanks to John for mentioning this!). So when we have a cluster of duplicates, your Sitemap URL can be the display version and obtain the linking properties from the cluster. Unlike rel=”canonical”, it’s not quite as strong a signal and it doesn’t have the ability to actually cluster dupes.

 

Last thing: If you feel that the lack of HTTP header support for non-HTML files is a gaping hole in rel=”canonical” functionality, let us (me) know. Otherwise, it’ll probably remain low to miniscule priority for some time to come.


Pseudocode for giving compliments

Women are diverse. And, in this beautiful diversity of women, there are some (like me) who are (at times) slightly neurotic (let’s pretend it’s endearing?). I think this is one reason why, if you’re a boy, complimenting a woman can be difficult.

 

Women are people, and people can be caught up in their thoughts, past relationships, childhoods, etc. Navigating personalities and knowing the “right”, or even just the “all right,” thing to say can be like walking through a minefield. What worked in one situation could be a total turnoff the next time around.

 

In most aspects of life, randomness sucks. If you’re a man, and if a woman has taken your compliment the wrong way, I empathize. I hope that all compliments from nice guys are accepted as they were intended, but for whatever reason, sometimes compliments falter — either they fall flat or they do more harm than good.

 

For all my neuroses, I’d still like to think that I’m logical. Here’s my first pass at creating a complimenting algorithm to help guys make more sense of (at times, crazy) people like me.

 

Again, pease note that I, Maile Ohye, am strange/nutty/<your-adjective-here>. The tests and algorithm do not apply across the board.

 

Compliment test cases

 

  1. On the phone: “You’re perfect.”
    FAIL
    I could literally feel my brain pagefaulting when I heard this — my flaws are numerous. He seemed fairly sincere, but this had to be a joke. He later clarified that by “perfect”, he meant that he “respected me and held me in high regard.” So while my first reaction was “this guy is illogical” this compliment had a happy ending.
  2.  

  3. At a bar: “You’re pretty.”
    FAIL (So sorry, kind of harsh, I know)
    It’s always nice to hear that you’re pretty, but it feels a bit strange, too. I tend to wonder how many drinks he’s had, and whether he has any interest in me as a person. Besides, “pretty” isn’t an adjective I would use to describe myself. It’s just so dainty.
  4.  

  5. Accidentally turning/bumping into each other at a bar: “Wow, you’re pretty!”
    PASS
    So spontaneous it’s sweet.
  6.  

  7. At a bar: “You’re pretty. But you probably hear that all the time. I just really like your smile.”
    PASS
    Lol, thanks! (I’ll take it.)
  8.  

  9. If you’re in a relationship together: “You look pretty!”
    PASS
    Aww. So nice of you to say.
  10.  

  11. All of the compliments above, but said to me in my early twenties.
    PASS
    You could’ve said “I love your pink hair” and I would’ve eaten it up.
    Update on 04/13/2010: To clarify, I never had pink hair.
  12.  

    My algorithm for giving compliments in common situations

     

    if (she’s your girlfriend || she’s not super confident) {
      needs and/or likes reassurance = true;
      desires appreciation for how she hopes to see herself = true;
      noteToSelf(needs and/or likes reassurance, desires appreciation for how she hopes to see herself);
      genericCompliment();
      // also good to randomize calling customizedCompliment()
      }

     

    if (you’re pre-relationship) && (she’s a confident person || she’s no longer in her early 20′s) {
      needs reassurance = false;
      desires appreciation for how she hopes to see herself = true;
      noteToSelf(needs reassurance, desires appreciation for how she hopes to see herself);
      if (your compliment is truly spontaneous) || (your authority on the topic is indisputable) || (your sincerity is unmistakable) {
        genericCompliment();
      }
      else {
        // best to elaborate
        customizeCompliment();
      }

     

    Please let me know if this doesn’t make sense.