What is the Maximum Character Length for a URL that Google Will Index?

Summary of Experiment Results

This section contains a quick overview of the answer; if you want to know how this answer was achieved, you can read everything else below.

Answer: Technically, Google will index URLs up to 2047 characters in length, but the SERP links to these URLs will cause Google’s servers to throw an error if anyone clicks on them. So the longest URL that Google will crawl and index–and that will actually work when someone clicks on it in the SERPs–is approximately 1855 characters in length.

To verify that this answer is accurate, you can check to see if this page (which has 1855 characters in its URL) is still listed in Google’s search results.

Here is the full URL of this page:

https://seomofo.com/experiments/title-and-h1-of-this-post-but-for-the-sake-of-keyword-prominence-stuffing-im-going-to-mention-it-again-using-various-synonyms-stemmed-variations-and-of-coursea-big-fat-prominent-font-size-heres-the-stumper-that-stumped-me-what-is-the-max-number-of-chars-in-a-url-that-google-is-willing-to-crawl-and-index-for-whatever-reason-i-thought-i-had-read-somewhere-that-googles-limit-on-urls-was-255-characters-but-that-turned-out-to-be-wrong-so-maybe-i-just-made-that-number-up-the-best-answer-i-could-find-was-this-quote-from-googles-webmaster-trends-analyst-john-mueller-we-can-certainly-crawl-and-index-urls-over-1000-characters-long-but-that-doesnt-mean-that-its-a-good-practice-the-setup-for-this-experiment-is-going-to-be-pretty-simple-im-going-to-edit-the-permalink-of-this-post-to-be-really-really-long-then-im-going-to-see-if-google-indexes-it-i-might-even-see-if-yahoo-and-bing-index-iteven-though-no-one-really-cares-what-those-assholes-are-doing-url-character-limits-unrelated-to-google-the-question-now-is-how-many-characters-should-i-make-the-url-of-this-post-there-are-a-couple-of-sources-ill-reference-to-help-me-make-this-decision-the-first-is-this-quote-from-the-microsoft-support-pages-microsoft-internet-explorer-has-a-maximum-uniform-resource-locator-url-length-of-2083-characters-internet-explorer-also-has-a-maximum-path-length-of-2048-characters-this-limit-applies-to-both-post-request-and-get-request-urls-the-second-source-ill-cite-is-the-http-11-protocol-which-says-the-http-protocol-does-not-place-any-a-priori-limit-on-the-length-of-a-uri-servers-must-be-able-to-handle-the-uri-of-any-resource-they-serve-and-should-be-able-to-handle-uris-of-unbounded-length-if-they-provide-get-based-forms-that-could-generate-such-uris-a-server-should-return-414-request-uri-too-long-status-if-a-uri-is-longer.html

Earlier today, a very rare occurrence occurred: an SEO-related question arose to which I did not have an answer. I’m as shocked as you are. The question in question is already mentioned in the title and h1 of this post, but for the sake of keyword prominence/stuffing, I’m going to mention it again–using various synonyms, stemmed variations, and of course…a big fat prominent font size. Here’s the stumper that stumped me:

What is the max number of chars in a URL that Google is willing to crawl and index?

For whatever reason, I thought I had read somewhere that Google’s limit on URLs was 255 characters. But that turned out to be wrong, so maybe I just made that number up. The best answer I could find was this quote from Google’s Webmaster Trends Analyst, John Mueller:

“We can certainly crawl and index URLs over 1000 characters long–but that doesn’t mean that it’s a good practice.”

The setup for this experiment is going to be pretty simple: I’m going to edit the permalink of this post to be really, really long. Then I’m going to see if Google indexes it. I might even see if Yahoo! and Bing index it…even though no one really cares what those assholes are doing.

URL Character Limits Unrelated to Google

The question now is: How many characters should I make the URL of this post? There are a couple of sources I’ll reference to help me make this decision. The first is this quote from the Microsoft support pages:

Microsoft Internet Explorer has a maximum uniform resource locator (URL) length of 2,083 characters. Internet Explorer also has a maximum path length of 2,048 characters. This limit applies to both POST request and GET request URLs.

The second source I’ll cite is the HTTP 1.1 protocol, which says:

The HTTP protocol does not place any a priori limit on the length of a URI. Servers MUST be able to handle the URI of any resource they serve, and SHOULD be able to handle URIs of unbounded length if they provide GET-based forms that could generate such URIs. A server SHOULD return 414 (Request-URI Too Long) status if a URI is longer than the server can handle (see section 10.4.15).

Note: Servers ought to be cautious about depending on URI lengths above 255 bytes, because some older client or proxy implementations might not properly support these lengths.

Experimental URL Character Length…

This experiment is going to try a URL of 2048 characters…that way, we can also see if Microsoft was telling the truth about Internet Explorer. I’ll update this post when I find out whether or not Google can index this ridiculously-long URL.

WordPress isn’t letting me edit this page’s permalink to be anything longer than 200 characters, so I’m going to try to edit the core WP files and the database settings to get it to work. So if this post doesn’t yet have 2000+ characters, just wait a little bit and check back later. If I have to, I’ll just create a static HTML file for this post.

Okay, so apparently the 200 character limit was a combined effort between WordPress and the database. I was able to alter the database side of things by changing the varchar(200) setting for the post_name column to varchar(2048). However, I had a much more difficult time finding the WordPress file that was truncating the permalink post slug, so I finally gave up. Instead, I edited the database directly in phpMyAdmin, and pasted the huge page_name value over the 200-character value. As long as I don’t try to update this post through the WordPress admin, it will keep using the value I gave it. Also, this is a static .html file…I used an internal rewrite in .htaccess to point that fatty URL to this file.

Experiment Results

The first results are in, and they’re rather interesting. Or at least confusing. Google has NOT indexed this page yet, and judging by the looks of the error reported in Google Webmaster Tools…I don’t think it’s going to. The error says:

Invalid URL – This is not a valid URL. Please correct it and resubmit.”

I’m not sure what makes this URL invalid. According to the W3C protocol, this URL is perfectly valid. Furthermore, as I’ve pointed out in the image…Google Webmaster Tools left off the “L” at the end of the anchor text URL. (WTF?) However, the href value itself is correct, and clicking on that massive link actually takes me to this page. I’ll wait and see what Yahoo and Bing do…then I’ll update again.

Click image to view larger

google-sitemaps-invalid-url

Update: 2/24/2010 (24 hours after original post)

Okay, none of the three search engines (Google, Yahoo, Bing) indexed this page. In addition to the error shown above, Google Webmaster Tools also had a problem when I submitted this page to the Labs feature, Fetch as Googlebot. Again, it mysteriously dropped the “L” from “html”:

fetch-as-googlebot-failed

The next experiment I’m going to try is to resubmit the Sitemap, but this time the long URL will be 2047 characters (I will shave off the very last character). I’ve also set up a 301 redirect that points the 2047-character URL to the 2048-character URL. In other words, I’m going to use the 2047-character version to lure googlebot over here, then I’m going to fatality that little bitch with an extra character…possibly bringing down Google’s entire network in the process. So here goes nothin’…

Okay…I just submitted the new Sitemap to Google Webmaster Tools, and Google accepted it! Then I went back to the Fetch As Googlebot feature and tried submitting the 2047-character URL…and it was a success! (Just FYI: I also randomly noticed that Google actually has a 2048-character limit coded into the input field.)

Click image to view larger

fetch-as-googlebot-success

Actually, it was only a partial success. Webmaster Tools shows that the fetch was successful, but then when I click on the link to see the results, I get this:


G
o
o
g
l
e
Error

Request-URI Too Large

The requested URL /webmasters/tools/labs-googlebot-fetch-2… is too large to process.

(And yes…I really did take the time to re-code that ↑ in XHTML strict, instead of just taking a screen capture. I’m THAT dedicated to page speed…or something.)

That’s right…after Google saw the length of my URL, it just plain refused to accept my request and returned a 414 status code instead. Honestly, I think this was my first time ever seeing a 414 HTTP response. How exciting! No? Whatever.

None of the search engines have indexed my 2047-character URL yet (and I doubt they will), but I’ll give them another day before I try something else. Until then…

Update: 2/25/2010

None of the search engines fell for my bait n’ switch, so I’ve made some more changes. First, I changed all of my internal links to the 2047-character URL (i.e. I altered the database value for post_name again). Then I edited my .htaccess file so that (1) the internal rewrite connects the new URL to the same static HTML file I’ve been using, and (2) the 2048 URL redirects to the 2047 URL (to avoid breaking the bit.ly URLs that have already been passed around). Also, I noticed that I had overlooked a slash in my URL (corresponding to the text “prominence/stuffing”), so I have replaced it with a dash. Essentially this means the current 2047-character URL is different from the one in the bait n’ switch. Lastly, I updated my Sitemap and resubmitted it to Google, Yahoo, and Bing, via their respective webmaster tools. The final result of today’s changes is that this page now has a 2047-character URL, which is linked to from all WordPress-generated pages and pointed to in my sitemap.xml file. In other words, there’s nothing fancy going on here. Neither of the previous URLs was indexed, so as far as search engines are concerned, this is simply a new page that needs to be crawled.

In addition to submitting my Sitemap, I also tried submitting the new 2047-character URL to each of the search engines’ “submit your URL” pages. (I never thought I’d be using those again!) One thing I discovered was that none of the URL submission forms would accept 2047 characters in their input fields. Here is the HTML for each of the input tags:

Google:
<input type="text" maxlength="256" size="40" value="" name="q">

Yahoo:
<input type="text" maxlength="256" size="60" value="http://" name="site_url">

Bing:
<input type="text" maxlength="1024" value="" size="30" name="url">

Fortunately, these character length restrictions rely on the value of the maxlength HTML attribute, which I can easily modify with the (must-have) Firefox extension, Firebug:

submit-url-to-google

Unfortunately, any web developer with more than 3 brain cells will implement a server-side restriction as well, in order to foil the plans of any wannabe hackers who might try to circumvent the client-side restriction by editing the value of the maxlength HTML attribute. I ignored this possibility and tried it anyway. Google gave me another 414 server response error page. Yahoo gave me this:

yahoo-invalid-request

And Bing gave me this:

Thank you for submitting your URL to Bing

MSNBot automatically indexes pages that meet accepted standards for content, design, and technical implementation.

This change will not be reflected immediately, so please be patient and check back periodically.

Bing’s response suggests that my little Firebug trick worked, and therefore their URL submission feature may have been created by web developers of the 1-brain-cell or 2-brain-cell varieties (possibly the same developers that created Internet Explorer). On the other hand, it’s quite possible that Bing’s server silently discarded my URL but returned the same “thank you” page that everyone sees. However, I have to assume the former case is true, since it conveniently set me up for that IE joke. I hate Internet Explorer. A lot.

The last (and most interesting) discovery I’ll mention…is that Google Webmaster Tools is reporting a 404 error for the following URL:

https://seomofo.com/experiments/earlier-today-a-very-rare-occurrence-occurred-an-seo-related-question-arose-to-which-i-did-not-have-an-answer-im-as-shocked-as-you-are-the-question-in-question-is-already-mentioned-in-the-title-and-h1-of-this-post-but-fo

I pasted this URL into my beloved SERP Snippet Optimizer tool, which revealed that this URL is exactly 255 characters long! Maybe I didn’t make up that number after all. Maybe…I’m psychic. Also note that Google is claiming to have discovered this “broken link” on the 23rd–the same day I first published this post, and at a time when the URL was 2048 characters. Here’s the screen capture:

google-crawl-error-404

What’s the significance of this 255-character broken link? I don’t know. I’m pretty sure it’s the result of Google truncating my URL to fit into some kind of database, but where did it pull the URL from…my Sitemap? And why is there a discrepancy between the REAL maximum character length (255) and the maximum character length that Google’s front end developers seem to believe in (256)? Perhaps Google’s database restricts this string to 2048 bits, but the first 8 bits are reserved for something (e.g. the length of each URL)? I get the feeling the significance of “255 vs. 256″ and “2047 vs. 2048″ would be really obvious to me if I had pursued a degree in computer science instead of bartending. If you have any insight on this, drop me an email. My address is listed in this domain’s whois records.

I’ll check back tomorrow, after Google has had sufficient time to crawl this page…

Update: 2/28/2010

I’m in! Over the weekend, Google and Bing have both indexed this page with the 2047-character URL.

Google

google-serp-long-url

Bing

bing-serp-long-url

Update: 3/2/2010

After seeing these SERP results, you might be tempted to assume that it’s perfectly safe for you to start using 2047-character URLs on all your pages…and the only loss would be your Yahoo traffic. Not so fast, champ. Depending on your browser and whether or not you’re signed into a Google account, you may end up seeing the 414 error page again.

If you are using Internet ExplorerSafari, or Chrome, OR you’re signed outfrom your Google account, then Google’s SERP links take you directly to the URL, my server serves you the page, and there are no problems.

If you are using Firefox or Opera, AND you’re signed in to a Google account, then Google’s SERP links do NOT take you directly to the URL. Instead, the links point to a URL on Google’s servers, where information about your click is logged, and then you’re redirected to the URL shown in the search results. The Google URL you’re first sent to follows a pattern similar to this:

http://www.google.com/url?sa={…}&source={…}&ct={…}&cd={…}&ved={…}&url={URL shown in SERP listing}&ei={…}&usg={…}&sig2={…}

From looking at that URL pattern, you can see why the URL for this page would be problematic. We’ve already established that Google will return a 414 status code for any requests of URIs longer than 2047 characters, so obviously when I plug a 2047-character URL into that parameter in Google’s URL, it’s going to be well over the limit. (The full Google URL ends up being roughly 2220 – 2230 characters, which fluctuates depending on if you’re using Firefox or Opera.)

In order for FF and Opera users to avoid the 414 page, they would need to disable JavaScript, so that Google’s SERP links send them directly to the destination URL, instead of routing them through Google’s server first.

The end result: Google crawled and indexed my 2047-character URL, but not all users will be able to access the URL from Google’s SERPs–some will get Google’s 414 error page. Additionally, the “Cached” link in Google’s SERP listing didn’t work in any browser, regardless of whether I was signed in or out.

Yahoo failed to crawl and index this page (it’s been up for 5 days), but we’re moving on anyway. You stink, Yahoo!

The surprising winner of the long URL competition is actually Bing, who successfully crawled and indexed my 2047-character URL…AND made it accessible from its search results. Both of the SERP links work: the regular link and the “Cached page” link.

Now I’m going to cut the URL length down to 1855 characters, to keep the Google SERP link under 2048 total characters.

I’ll be back…

Update: 3/11/2010

The current URL (1855 characters) has been indexed by Google, and it no longer returns the Google 414 error page when someone clicks on the link from the Google search results.

Final Results

The longest functional URL that Google will crawl and index is approximately 1855 characters.

Posted in: Experiments

Leave a Reply

Your email address will not be published. Required fields are marked *