RSVPs are like little gifts

I’m hoping to have some friends over in a few weeks. I emailed them to 1) save the date, and 2) properly set expectations.

From: Maile Ohye
Date: Mon, Jul 19, 2010 at 9:50 PM
Subject: Save the date: Saturday, August 7th!


Mi casa. No one leaves sober.

Several sweet, feeling-type people replied:

  • “Looking forward to it!”
  • “Can’t wait!”
  • “It’s calendar’d!”


These people really are cuddly in-person — it’s cute how that correlates.


And there were fun RSVPs:

  • “Hell yeah! It’s been too long since I last stumbled home drunk from your place.”


Agreed. My bad.

  • “Are we allowed to hit on coworkers?”


I like where you’re going with this, and at my place, “inappropriate” DNE.


I think this response from my neighbor, though, wins the prize:

  • “FYI. She is a lot of fun but I feel like I am too old for this kind of party. We will see how we feel.”

Indexing OCR text and layered PDFs

Wondering whether PDF overlays was too obscure a topic for the Webmaster Central Blog, I consulted my girlfriend in AdWords, who has knowledge of Search and I believe represents the general audience reaction:

me: marie, yt? qq


Marie: sure


me: when you read the term “pdf overlay” what do you think? does it sound like a feminine hygiene product?


Marie: it sounds more nonsensical than fem hyg pro


me: pdf overlay sounds nonsensical? really? so for search, i’m just referring to a text layer under an image in a pdf.


Marie: not intuitive
but again…
im in sales


Given this one datapoint*, this post is on my blog. Here’s the basic gist of three questions about OCR’d content/layered PDFs that I was recently asked.


Can Google index textual content from OCR?

Yes. For example, we can index text layers beneath the image as found in PDF overlays.


(Though I have limited understanding, I’ve found that when people talk to me about PDF overlays/image+text PDFs/layered PDFs/text searchable PDFs, they’re largely referring to the same thing. To the rest of the world there may be important distinctions, and it seems like “PDF overlays” could actually be a superset, but let’s not get bogged down by crazy stuff like being accurate.)


Bottom line, if it’s been OCR’d, yes, it can be indexed. And PDFs with standard text, like our SEO Starter Guide, have been indexed and searchable for years.

So OCR’d content isn’t considered spammy?

The technique is fine. We’re always trying to find more ways to index quality information. In fact, in our own Indexing pipeline we’re now using OCR on some documents that are without textual content. It’s the early phase, though, and of course standard REP directives still apply.

What if I use OCR on every single page I’ve ever written ever, do you think I could rank numero uno for every query forever?

Forever ever? Unlikely. It’s helpful to remember that the quality and compelling-ness of your content is still important. Long ago, like four years, some webmasters thought that if they dumped their entire database on the web, unleashing millions of new spreadsheets and documents, then their rankings would soar! It didn’t pan out.


This OCR-every-document plan has a similar feel.


But back to ranking, if your site has content that you feel is important to have indexed and searchable, try to make the content regular text (non OCR) on the page. It’s safer and often more user-friendly. Because sometimes OCR isn’t that clear — so it’ll be hard for search engines to index and users to comprehend.

* Thanks, Marie, for assisting my rigorous research.