Background
I use PDF.js to show large scanned documents in a project. The backend guys configure web server to support HTTP Range Requests for better render performance. More
PDF.js supports this feature with three related options:
- disableRange
- disableStream
- disableAutoFetch
See comments from source code.
// options used in the project.
const options = {
// ...
disableRange: false,
disableStream: false,
disableAutoFetch: false, // Auto-fetch pages after first view displayed when disableStream enabled for better performance.
// ...
};
Render Process
- Issue a GET request to fetch PDF document.
- After the headers of the request resolved,
- Cancel the GET request as soon as possible(What disableStream means)
- Issue more requests to fetch data what the viewer needed to display first pages use range request.
- As the user scroll, send more range requests to get necessary data.
The Bug
When user reload the page after view all pages, the first GET request become a huge range request which download the whole document, it hurts the performance badly.
How to Resolve
First, I tried to disable the cache, and the bug just disappeared. It's interesting, seems something related to the browser cache policy.
I searched for range request cache issues, it seems not supported perfectly.
So I inspected the response headers of PDF document. The first GET request seems normal. The next range requests got some
Cache-Control: public, max-age=345600
. I guessed maybe the public
cache policy cause the bug,
So I talked with DevOps and backend guys about my guess, they removed the public
to verify.
The bug just disappear, problem solved!
Conclusion
It seems the Chrome browser makes some magic decisions with an inappropriate cache policy. I try to find some theory to support my guess, but get nothing.