Troubles with serving PDFs

24 November 05

Update: A solution

Lots of people are coming to this article through Google. Just wanted to point people to a resolution I had for Lenya, but could apply to any application you are using to serve websites.

One of things that has haunted me since Hiram College upgraded it's Lenya environment is that we were getting sporadic reports from individuals claiming that they could not download PDFs from our sites - sites that were all hosted in Lenya, that is. I admittedly laughed this off to some odd problem on their computers, but when I saw the same error messages on multiple computers, I started to worry. And that's where my quest began to figure out what was going on.

You see, in the upgrade of Lenya from a SVN release of 1.2.1 to the official release of 1.2.4, we also moved away from Tomcat and began using the default servlet container, Jetty. We did this because developers on the Lenya mailing lists were praising it's speed and small memory footprint (and they are certainly right, in my opinion). In the process of this upgrade, I split up our Authoring and Live environments on two different servers, which was the basis for the article I wrote some time ago on this site.

Well, it was right after this upgrade the problems began appearing, so immediately I suspected that Cocoon, Lenya, or Jetty was up to no good. Can you believe I still haven't found an answer? I know several other people had mentioned they had this problem, but I never saw any fixes for it. So, I come to this site to document the errors in the hopes I can bring some eyes to the problem that have more experience in such matters.

First and foremost, the problem is tied only to Internet Explorer (any version) trying to click on a link to a PDF and having it open up in the Adobe Acrobat reader plugin. Note that if the user in that environment were to right-click on the file, download it to their desktop, then double-click on it, the file works fine. But for some reason, it won't open it into the Acrobat plugin. This has something to do with the strange way Internet Explorer deals with mime-types, but I won't get into that. Every other browser seems to work just fine.

I did quite a bit of digging and setup a mailing list thread where someone mentioned that Fast Web View should be turned off for PDF documents because Cocoon can't serve them properly. I want to make sure that I can isolate which application is causing this error, so I'll run tests on several different applications that are part of my entire environment. So, here comes my first test:

Fast Web View vs. Non-Fast Web View: served by Apache

I created two 4 MB files with some basic text and images - one with Fast Web View enabled and the other without. I placed both of these files in a directory underneath the Apache DocumentRoot and tried to access them both with IE on Windows XP. I was able to retrieve both without an error consistently. Hmm, ok, so this doesn't rule out that one over the other is a bad idea, but with both working under the Apache DocumentRoot, it doesn't seem that Apache is the problem.

Fast Web View vs. Non-Fast Web View: served by Jetty/Cocoon/Lenya

So then I took the same files and placed them under a directory within Lenya. Bypassing Apache all-together, I tried to access them directly from Jetty/Lenya/Cocoon and it's default port. Here's where things get interesting.

The Fast Web View document always showed up properly in the Acrobat plugin, while the other document (without Fast Web View enabled) would just show up with a blank white screen in IE and the word "Done." in the status bar. Strange, since it was mentioned that Fast Web View documents are not good in Cocoon. Using the ieHTTPheaders tool, I got the following response headers when accessing the document with Fast Web View enabled (I modified the GET and Host headers to not give away where all this stuff is):

GET /fast_web_view.pdf HTTP/1.1
Accept: */*
Accept-Language: en-us
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; 601-24; 601-25)
Host: www.someplace.com:8888
Connection: Keep-Alive
NovINet: v1.2

HTTP/1.1 200 OK
Date: Thu, 24 Nov 2005 06:15:44 GMT
Server: Jetty/4.2.23 (Linux/2.4.21-32.0.1.ELsmp i386 java/1.4.2_08)
X-Cocoon-Version: 2.1.7
Expires: Thu, 01 Jan 1970 00:00:00 GMT
Set-Cookie: JSESSIONID=29d8h5filatoe;Path=/
Accept-Ranges: bytes
Last-Modified: Wed, 23 Nov 2005 16:34:58 GMT
Content-Type: application/pdf
Content-Length: 4395593

GET /fast_web_view.pdf HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; 601-24; 601-25)
Host: www.someplace.com:8888
Connection: Keep-Alive
Cookie: JSESSIONID=29d8h5filatoe
NovINet: v1.2

HTTP/1.1 200 OK
Date: Thu, 24 Nov 2005 06:15:45 GMT
Server: Jetty/4.2.23 (Linux/2.4.21-32.0.1.ELsmp i386 java/1.4.2_08)
X-Cocoon-Version: 2.1.7
Accept-Ranges: bytes
Last-Modified: Wed, 23 Nov 2005 16:34:58 GMT
Content-Type: application/pdf
Content-Length: 4395593

Now here are the response headers for the Non-Fast Web View enabled document:

GET /no_fast_web_view.pdf HTTP/1.1
Accept: */*
Accept-Language: en-us
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; 601-24; 601-25)
Host: www.someplace.com:8888
Connection: Keep-Alive
NovINet: v1.2

HTTP/1.1 200 OK
Date: Wed, 23 Nov 2005 17:48:10 GMT
Server: Jetty/4.2.23 (Linux/2.4.21-32.0.1.ELsmp i386 java/1.4.2_08)
X-Cocoon-Version: 2.1.7
Expires: Thu, 01 Jan 1970 00:00:00 GMT
Set-Cookie: JSESSIONID=v6iwevpddrd;Path=/
Accept-Ranges: bytes
Last-Modified: Wed, 23 Nov 2005 16:36:32 GMT
Content-Type: application/pdf
Content-Length: 4390283

GET /no_fast_web_view.pdf HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; 601-24; 601-25)
Host: www.someplace.com:8888
Connection: Keep-Alive
Cookie: JSESSIONID=v6iwevpddrd
NovINet: v1.2

HTTP/1.1 200 OK
Date: Wed, 23 Nov 2005 17:48:10 GMT
Server: Jetty/4.2.23 (Linux/2.4.21-32.0.1.ELsmp i386 java/1.4.2_08)
X-Cocoon-Version: 2.1.7
Accept-Ranges: bytes
Last-Modified: Wed, 23 Nov 2005 16:36:32 GMT
Content-Type: application/pdf
Content-Length: 4390283

GET /no_fast_web_view.pdf HTTP/1.1
Accept: */*
Range: bytes=4389259-4390282, 4185483-4389258, 3866-4185482
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; 601-24; 601-25)
Host: www.someplace.com:8888
Connection: Keep-Alive
Cache-Control: no-cache
Cookie: JSESSIONID=v6iwevpddrd
NovINet: v1.2

HTTP/1.1 416 Requested Range Not Satisfiable
Date: Wed, 23 Nov 2005 17:48:11 GMT
Server: Jetty/4.2.23 (Linux/2.4.21-32.0.1.ELsmp i386 java/1.4.2_08)
X-Cocoon-Version: 2.1.7
Accept-Ranges: bytes
Content-Type: application/pdf
Content-Length: 4390283

So it appears that the difference between the two is that when the non-Fast Web View enabled document is requested, a third GET request is sent asking for byte ranges and then the browser gets back the 416 code, stating Requested Range Not Satisfiable. Hmm, I have no idea what that means, so out comes the trusty W3C site to give me a definition:

"A server SHOULD return a response with this status code if a request included a Range request-header field (section 14.35), and none of the range-specifier values in this field overlap the current extent of the selected resource, and the request did not include an If-Range request-header field. (For byte-ranges, this means that the first- byte-pos of all of the byte-range-spec values were greater than the current length of the selected resource.)

"When this status code is returned for a byte-range request, the response SHOULD include a Content-Range entity-header field specifying the current length of the selected resource (see section 14.16). This response MUST NOT use the multipart/byteranges content- type." (from W3C)

So I'm not really understanding much of the mumbo-jumbo up there, but it seems like the second paragraph is saying there should be a Content-Range header from the request and there isn't any. OK, that's nice, but I have no idea how to make the browser do that, so I'm content to say there's something else going on here.

My next attempt: to isolate the problem to a specific application within the trio. Since I don't think I can separate out Cocoon from Jetty, I'll just setup an instance of Jetty on the server running on a different port and serve both files from there to see what the response is like.

Fast Web View vs. Non-Fast Web View: served by Jetty

The one thing I should mention here is that I am using the 4.24 version of Jetty which has been deprecated recently in favor of the 5.1 stable release. I don't have much of a choice in using the 4.24 version since it's nicely bundled in with Lenya, so the testing will be on this version (the new 1.4 alpha version of Lenya just updated to the 5.1 stable release). I did check for bugs and such in Jetty and found nothing related to problems serving PDFs except for one mailing list thread that turned off a variable called "acceptRanges" in Jetty's configuration. I did that, restarted Jetty/Lenya and the results were no different from above.

So after installing a standalone installation of Jetty, I added the two identical PDF files (except for the Fast Web View option, of course) to the webapps directory and accessed the files again. Here are the results for each file, starting with the Fast Web View enabled document:

GET /fast_web_view.pdf HTTP/1.1
Accept: */*
Accept-Language: en-us
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; 601-24; 601-25)
Host: www.someplace.com:8888
Connection: Keep-Alive
NovINet: v1.2

HTTP/1.1 200 OK
Date: Thu, 24 Nov 2005 14:57:56 GMT
Server: Jetty/4.2.24 (Linux/2.4.21-32.0.1.ELsmp i386 java/1.4.2_08)
Content-Type: application/pdf
Content-Length: 4395593
Last-Modified: Thu, 24 Nov 2005 14:49:55 GMT
Accept-Ranges: bytes

GET /fast_web_view.pdf HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; 601-24; 601-25)
Host: www.someplace.com:8888
Connection: Keep-Alive
NovINet: v1.2

HTTP/1.1 200 OK
Date: Thu, 24 Nov 2005 14:57:57 GMT
Server: Jetty/4.2.24 (Linux/2.4.21-32.0.1.ELsmp i386 java/1.4.2_08)
Content-Type: application/pdf
Content-Length: 4395593
Last-Modified: Thu, 24 Nov 2005 14:49:55 GMT
Accept-Ranges: bytes

GET /fast_web_view.pdf HTTP/1.1
Accept: */*
Range: bytes=81631-4384434
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; 601-24; 601-25)
Host: www.someplace.com:8888
Connection: Keep-Alive
Cache-Control: no-cache
NovINet: v1.2

HTTP/1.1 206 Partial Content
Date: Thu, 24 Nov 2005 14:58:06 GMT
Server: Jetty/4.2.24 (Linux/2.4.21-32.0.1.ELsmp i386 java/1.4.2_08)
Content-Type: application/pdf
Content-Length: 4302804
Last-Modified: Thu, 24 Nov 2005 14:49:55 GMT
Accept-Ranges: bytes
Content-Range: bytes 81631-4384434/4395593

GET /fast_web_view.pdf HTTP/1.1
Accept: */*
Range: bytes=4394976-4395592, 4394974-4394975, 4393148-4394396, 4385183-4386393, 4394397-4394783, 4393126-4393147, 4386394-4393125, 4384435-4384816, 4384817-4385182
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; 601-24; 601-25)
Host: www.someplace.com:8888
Connection: Keep-Alive
Cache-Control: no-cache
NovINet: v1.2

HTTP/1.1 206 Partial Content
Date: Thu, 24 Nov 2005 14:58:34 GMT
Server: Jetty/4.2.24 (Linux/2.4.21-32.0.1.ELsmp i386 java/1.4.2_08)
Content-Type: multipart/byteranges; boundary=org.mortbay.http.MultiPartResponse.boundary.egf697h0
Transfer-Encoding: chunked

GET /fast_web_view.pdf HTTP/1.1
Accept: */*
Range: bytes=4394784-4394973
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; 601-24; 601-25)
Host: www.someplace.com:8888
Connection: Keep-Alive
Cache-Control: no-cache
NovINet: v1.2

HTTP/1.1 206 Partial Content
Date: Thu, 24 Nov 2005 14:58:36 GMT
Server: Jetty/4.2.24 (Linux/2.4.21-32.0.1.ELsmp i386 java/1.4.2_08)
Content-Type: application/pdf
Content-Length: 190
Last-Modified: Thu, 24 Nov 2005 14:49:55 GMT
Accept-Ranges: bytes
Content-Range: bytes 4394784-4394973/4395593

Well, as you can see, the document loaded just fine in the browser. I was sorta expecting that. Now for the non-Fast Web View enabled document:

GET /no_fast_web_view.pdf HTTP/1.1
Accept: */*
Accept-Language: en-us
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; 601-24; 601-25)
Host: www.someplace.com:8888
Connection: Keep-Alive
NovINet: v1.2

HTTP/1.1 200 OK
Date: Thu, 24 Nov 2005 15:00:34 GMT
Server: Jetty/4.2.24 (Linux/2.4.21-32.0.1.ELsmp i386 java/1.4.2_08)
Content-Type: application/pdf
Content-Length: 4390283
Last-Modified: Thu, 24 Nov 2005 14:49:55 GMT
Accept-Ranges: bytes

GET /no_fast_web_view.pdf HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; 601-24; 601-25)
Host: www.someplace.com:8888
Connection: Keep-Alive
NovINet: v1.2

HTTP/1.1 200 OK
Date: Thu, 24 Nov 2005 15:00:34 GMT
Server: Jetty/4.2.24 (Linux/2.4.21-32.0.1.ELsmp i386 java/1.4.2_08)
Content-Type: application/pdf
Content-Length: 4390283
Last-Modified: Thu, 24 Nov 2005 14:49:55 GMT
Accept-Ranges: bytes

GET /no_fast_web_view.pdf HTTP/1.1
Accept: */*
Range: bytes=4389259-4390282, 4185483-4389258, 3531-4185482
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; 601-24; 601-25)
Host: www.someplace.com:8888
Connection: Keep-Alive
Cache-Control: no-cache
NovINet: v1.2

HTTP/1.1 206 Partial Content
Date: Thu, 24 Nov 2005 15:00:34 GMT
Server: Jetty/4.2.24 (Linux/2.4.21-32.0.1.ELsmp i386 java/1.4.2_08)
Content-Type: multipart/byteranges; boundary=org.mortbay.http.MultiPartResponse.boundary.egf6bs58
Transfer-Encoding: chunked

This page also loaded without problems. Which leads me to believe that somewhere between 2.1.6 and 2.1.7 of Cocoon, or between the SVN version of 1.2.1 and the stable 1.2.4 release of Lenya, something changed with the way they serve out PDF files. I already sent out a mailing list request a couple of months ago on this topic and a couple of people were kind enough to read through the questions and frustrations and try some things, but none of them did work.

Armed with this new knowledge, I'll head back out to the Cocoon and Lenya mailing lists and see if anyone knows what the heck is going on here. If the Cocoon documentation says not to use Fast Web View enabled documents, then I'm still left with a PDF document that is not Fast Web View enabled and still doesn't appear for Internet Explorer users. If anyone reading this knows anything about what could be happening here, please post a comment and finally get this strange behavior exposed and resolved!

One solution: An Update

So believe it or not, the answer was simple: sorta. After presenting the research to the Lenya user mailing list, I got a response from Michael Wechner. He said that there was a fix already in place by turning off byteranges for serving PDFs. The downfall to this is that any PDF document would have to be fully downloaded before being viewed. OK, not a big deal for small documents, but 10 MB documents could be a pain. Still, much better than getting an error that the PDF document couldn't load at all. I guess the complicated part is up to the developers: just why isn't it working anyway?!

One interesting thing I noted was that this document shot to the top of Google results for this problem. For those getting here and having problems with another product outside of Cocoon/Lenya, I would first check with your product's support if they have an existing issue with serving PDFs with byteranges on. If there's an unknown cause, then turn off byteranges if possible as a temporary solution. But lastly, test, test, test! Narrow down your problems to a particular application if need be and report bugs! Applications never get fixed if you don't report a bug, and if the bug has already been entered, reiterate your need to have it fixed.

Hope this update helps out some Lenya users out there!

Any suggestions?

commenting closed for this article

in this site