Managing a PDF corpus to benchmark and test PDF.js
We ran extensive tests with PDF.js on 6933 PDFs. We crawled the PDFs in a way so we hope they approximate the most popular PDFs on the internet. This is our first public report of the results, in the hope that its useful for anybody working on PDF.js.
We tested 131553 pages in 6933 pdfs in 5 runs with an average variance of 9.16% against an average page of a calibration document (The Tracemonkey Paper). On a 2.5 Ghz Intel Core i5 with 8g GB or memory this average is 300ms. Good user experience probably ends at like 3 - 4 times this amount of time per page.
| Version | 0.2x | 0.4x | 0.6x | 0.8x | 1.0x | 1.2x | 1.4x | 1.6x | 1.8x | 2.0x | 2.2x | 2.4x | 2.6x | 2.8x | 3.0x | Slower | Total |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 140429 | 42010 | 50111 | 12209 | 7097 | 4840 | 2805 | 1969 | 1510 | 1208 | 1122 | 882 | 772 | 567 | 417 | 324 | 3710 | 131553 |
| 31.9% | 38.1% | 9.3% | 5.4% | 3.7% | 2.1% | 1.5% | 1.1% | 0.9% | 0.9% | 0.7% | 0.6% | 0.4% | 0.3% | 0.2% | 2.8% |
There were 56 pdfs that crashed the system when we tried to render them with version 140429:
Sorted by the slowest page in version 140429 that was contained inside the document:
| Slowest Page | Url |
|---|---|
| 138.38430861840067 (#27) | http://thesecretstories.com/K-5_WORKSHOP_HANDOUT.pdf |
| 126.15938018783487 (#1) | http://jrwy.zjol.com.cn/resfile/2014-01-17/07/07.pdf |
| 124.78892733591938 (#1) | http://pdf.wenweipo.com/2014/02/05/a06-0205.pdf |
| 66.09978273148057 (#1) | http://www.sanantonio.gov/dsd/pdf/MDP/MDP_730.pdf |
| 62.65082872957823 (#2) | http://www.makkahnewspaper.com/makkahNews/media/k2/attachments/31012014_1_2.pdf |
| 50.03763689644555 (#1) | http://www.socialism.com/drupal-6.8/sites/all/pdf/class/Lorde-Age%20Race%20Class%20and%20Sex.pdf |
| 44.985641595219384 (#40) | http://www.att.com/support_static_files/manuals/Samsung_Galaxy_S_III.pdf |
| 43.8799903177659 (#1) | http://www.aljarida.com/files/issues/2012/02/03/pages/pdf/33.pdf |
| 42.02718666564841 (#4) | http://launch3telecom.com/shared_media/pdf/manufacturers/rfs_8.pdf |
| 40.43932158742736 (#48) | http://www.nmfs.noaa.gov/pr/pdfs/permits/americascup_iha_application.pdf |
| 40.39531901261407 (#38) | http://betterairport.verdus.nl/upload/forum/geert-jan.verkade@curnet.nl/B%20Superbia.pdf |
| 37.82117243539982 (#38) | http://xn--80aikaaqfdpng.xn--p1ai/shared/files/201304/5_28121.pdf |
| 36.97834475218559 (#1) | http://www.digitalmarketingaward.iabschweiz.ch/fileadmin/templates/pdf/2013/Ricardolino_Pr%C3%A4sentation.pdf |
| 36.22502814885469 (#1) | http://www.drk-karlsruhe.de/fileadmin/Angebote/Veranstaltungen/Flyer%20Familientag_2013_final.pdf |
| 35.63874643890517 (#17) | http://www.aecilluminazione.com/uploads/kcFinder/files/Brochure_Tunnel_ENG_low.pdf |
| 33.496377952955484 (#1) | http://www.attajdid.ma/Pdf/3354_14-03-2014/08.pdf |
| 32.92199712880385 (#1) | http://www.jindai.ac.jp/uploads/jindai/leaflet-education-2013_03.pdf |
| 32.550535556231104 (#41) | http://sun025.sun.ac.za/portal/page/portal/Arts/Departemente1/Joernalistiek/Homepage/publications/smf/SMF.pdf |
| 32.52363525466562 (#2) | http://artsontheblock.com/wp-content/uploads/2011/07/December-2013-AOB-Newsletter.pdf |
| 30.801734820480323 (#1) | http://emidius.mi.ingv.it/DBMI08/aquilano/query_eq/quest.pdf |
| 28.996826430345262 (#1) | http://www.metrolisboa.pt/wp-content/uploads/Mapadarede_Metro_julho2012.pdf |
| 28.767014647682494 (#1) | http://wldaily.zjol.com.cn/images/2013-06/28/wlrb20130628a0007v01n.pdf |
| 28.357488677221756 (#1) | http://www.stalbert.catholic.edu.au/__files/f/2817/T%203%20W%201%20-%20Yrs%201%20&%202%20Royal%20Baby%20Page.pdf |
| 28.355158250881985 (#1) | http://www.iloungeserver.com/iLounge_iPad-iPadminiBG_d.pdf |
| 27.642042325649108 (#28) | http://www.ifpi.org/downloads/Digital-Music-Report-2014.pdf |
| 26.40154757982717 (#1) | http://epaper.ynet.com/images/2014-03/14/A20/bjqnb20140314A20.pdf |
| 25.962248800007632 (#1) | http://www.ichihara-chb.ed.jp/yawatahigashi-j/gakkoudayori/1gatugou.pdf |
| 25.181299886261243 (#1) | http://ig-infol.ru/archive/2012/41/4.pdf |
| 24.739032621019053 (#9) | http://mediauk.gamerzines.com/download/Battlefield-3_1.pdf |
| 24.701162790710807 (#9) | http://www.afd.fr/webdav/shared/PUBLICATIONS/Colonne-droite/Presentation-AFD-VA.pdf |
| 24.455738487368382 (#5) | http://ptgmedia.pearsoncmg.com/images/9780744015423/samplepages/9780744015423.pdf |
| 24.2222410942984 (#2) | http://www.coachusa.com/CoachUsaAssets/files/97/route45.pdf |