Feature Wiki

Information about planned and released features

Tabs

Improving PDF Generation

This article should be available in English!

ILIAS is an international project. Everyone participating in the community should have a chance to inform oneself about new developments. Therefore, the community agreed on using English in the Feature Wiki. Please take consideration of your community fellows.

1 Description

Erstellung von PDF-Dateien aus einer Webbrowseransicht auf Basis eines Linux-basierten Serversystems mit dem Open Source Framework WebKit (PDF-Generator)
 
Ziele der PDF-Generierung:
  • PDF Generierung der vom Prüfling abgegebenen Antworten auf die gestellten Fragen, als Vorstufe für deren digitale Signierung durch den Prüfling selbst
  • Erstellung eines PDFs für die Einsichtnahme durch Prüflinge von Teilen des Tests und/oder des gesamten Tests
  • Erstellung eines PDFs für die Archivierung von Teilen des Tests und/oder des gesamten Tests (Ergebnis-/Prozessdokumentation/-archivierung)
Vorgehen:
  • Um eine möglichst generische Einstellung für die PDF-Generierung zu erreichen, wurde eine neue Administrationsoption „PDF-Erstellung“ für ILIAS hinzugefügt. Dort kann über den Karteireiter "Einstellungen" das Verfahrens zur PDF-Erstellung festgelegt werden.
  • Bislang wird die Erstellung über XML und XSL-FO mit dem ilServer unterstützt. Neu hinzugekommen ist die Erstellung mit Hilfe des Kommandozeilenwerkzeugs wkhtmltopdf. Das Werkzeug sollte sich auf dem ILIAS-Server befinden, eine Auslagerung auf einen reinen PDF-Erzeugungs-Server ist aber auch möglich. Bislang lassen sich neben dem Pfad zum wkhtmltopdf-Werkzeug diverse Einstellungen für Kopfzeilen und Fußzeilen festlegen, sowie der Vergrößerungsfaktor für die Ausgabe.
  • Ebenfalls muss die Generierung des PDFs automatisch am Ende der Prüfung über die WebKIT Engine erfolgen. Ohne diese über eine dedizierte Aktivität auszulösen.
  • Die Realisierung der PDF-Generierung mittels WebKIT stellt _eine_ Variante dar. Eine alternative Methode wäre die clientseitige Generierung der PDFs, da diese später auch zur Signierung durch den Prüfling genutzt werden sollte. Zudem wäre der potentielle Flaschenhals der serverseitigen Generierung damit umgangen. Für die ex post Erstellung der PDFs für Einsichtnahme und Archivierung nach der Testdurchführung, stellt die WebKIT Variante ein flexibles und performantes Werkzeug dar.

2 Status

3 Additional Information

  • If you want to know more about this feature, its implementation or funding, please contact: SIG EA

4 Discussion

JF 5 Mar 2012: We discussed the features today and have some questions. We schedule this topic for 4.3, but the following issues must be solved.
  • What are the technical advantages of the wkhtmltopdf solution? Do we really need two implementations or should XSL-FO be replaced? We would prefer to have only one solution.
  • We think that the settings screens would be useful for both alternatives XSL-FO and wkhtmltopdf, is this correct?
  • We would prefer that the generation can be triggered (for both solutions): a) Via a button after the test by a tutor and b) via cron job during the next night.
ILIAS EA, Erlangen, 15.03.2012:
  • wir präferieren beides parallel beizubehalten
  • Platzierung: Administration :: Software für Drittanbieter, wie auch Excel-Export
  • Anregung: Anlegen neuer Administrationskategorie unter Services :: Exportformate
  • Button oder manueller Prozess als EA-Plugin (statt jetziger PERL Skripte Überführung),
  • cron job - ja, Einbindung der Cronjob Settings in Plugin-Administration für systemweite Einstellungen für Export als Default, Individuelle Einstellungen am Test können systemweite Einstellungen überschreiben

Stefan Schneider, 24.10.2012 (ich schreibs mal auf deutsch)
 
Vorüberlegungen und Hintergründe:
Der Hauptgrund für die Einführung einer alternativen PDF Generierung war die Aussage von Helmut Schottmüller, dass die bisherige auf FOP basierende Lösung für die Klausur PDF Generierung nicht optimal war (hier würde ich gerne noch mal mit ihm Rücksprache halten, was genau die Probleme waren).
Ein weiterer Grund war die Diskussion darüber, dass man mit einer serverseitigen Lösung bei der PDF Generierung sicherlich schnell auch Performance Probleme bekommen würde. Auch aus rechtlichen Gründen haben wir uns überlegt, dass die optimalste Lösung der Austausch der HTML-Seite "Liste der Antworten" nach Beenden des Tests durch dasselbe PDF Dokument erfolgen sollte, was dann auch archiviert (ggfs. voher sogar signiert) werden kann. So erfolgt nicht noch nachträglich oder on-the-fly eine Transformierung der HTML Klausur-Vorlage, die manipuliert werden oder fehlerhaft sein könnte.
Um dann auch eine rechtsichere eindeutige Zuordnung der Identität des Prüflings und dem Dokumnet herstellen zu können wäre eine digitale Signatur möglich. Zur Signatur müsste das PDF ohnehin zum Client und könnte natürlich auch dort generiert werden, um Performance Engpässen aus dem Weg zu gehen.
 
Erfahrungen und Stand:
Wir haben in Marburg gute Erfahrungen mit der webkit Variante gemacht, aber auch dort gab es Darstellungsprobleme, sodass aus der Not auch eine xulrunner basierte Variante enstanden ist, die diese Probleme beheben konnte. Diese Variante ist so konzipiert, dass sie gut in den Safe-Exam-Browser für eine client-seitige PDF Generieung integriert werden könnte, kann aber auch als serverseitige Variante eingesetzt werden (Debian mit virtuellem framebuffer getestet) Zunächst macht es aber aus meiner Sicht keinen Sinn, über ein so enges Browser-Bundling ILIAS <-> SEB zu sprechen, wenn wir mit ILIAS 4.4. die E-Klausur von der Stange anbieten wollen und die PDF Generierung gehört aus meiner Sicht zu einem obligatorischen ILIAS Feature, das ohne SEB auch möglich sein muss und es ja auch bereits ist.
 
Wichtig wäre eine genaue Sichtung der Anforderungen an eine PDF Unterstützung in ILIAS  auch über den Tellerrand der E-Klausur hinaus (Zertifikatserstellungs usw, PDF-A...) und die Definition der entsprechenden Use-Cases und Prozesse. Erst danach sollten wir uns an die Diskussion für eine oder ggfs. mehrere PDF Erzeungungen heranwagen. Sollte im Klausur-Modus eine serverseitige PDF Generierung zu Performance Problemen führen, hätte ich keine Probleme eine clientseitige Generierung auf den SEB auszulagern, ein erstes Modul dazu gibt es schon. Leider habe ich bisher keine Erfahrungen mit dem java server für Lucene und der PDF Generierung machen können, kann daher keine Vergleiche anstellen.
 
Anbei ein erster Soll-Prozessmodell Vorschlag zur PDF (A) Anzeige, Signierung und Archivierung (In Zusammenarbeit mit Nadine Koecher / DHBW Karlsruhe):

JF 29 Oct 2012: We appreciate the feature and support its integration into the trunk for ILIAS 4.4 (if XSL-FO does the job). Databay suggests to follow an incremental implementation path. First the PDF generation on the base of XSL-FO should be implemented and its problems should be examined. If another tool is needed to generate PDF the topic will be put on the JF agenda again. In a second step the possibilities of adding digital signatures will be checked.

DataBay 2013: Im Rahmen des Angebotes  "PDF-Archivierung" der Universität Marburg wurden die PDF Erzeugungsmechanismen analysiert und folgende Vorgehensweise erarbeitet:

Matthias Kunkel, 27 Feb 2013: Will this be a general service that can be used by other components, too (e.g. PDF of wiki page or ILIAS learning module)?

Maximilian Becker, 17 May 2013:
 
Databay has looked into the matter in great detail.

First of all, we need to point out that regarding speed and potential, Apache FOP is good product. Unfortunately, the massive potential of FOP cannot be unleashed with the current design and the available integration. Also, we fear we need to point out that due to technical reasons we do not overcome easily - missing features in FOP and flaky XSL - the refactorings to change this situation remain uncalculable until someone explicitly looks into that. The current PDF generation is highly complex from a developers perspective. The solution to use Apache FOP in conjunction with the Java-ilServer and XSL-FO is sort of fancy, but this comes at the price of technologies, which are almost solely used for this particular feature. That explained, we back the request to improve PDF generation by offering a system-wide replacement.

The resulting PDF-Files from all evaluated technologies - browser vs. browser-print, browser-print-to-pdf, Apache FOP and TCPDF differ greatly both from the screen design we see in ILIAS and from each other.
There is not a single solution available, that does a sufficient conversion from the visual representation the user agent renders "authoritatively". This of course only relates to visuals, the information content is in all cases complete
from our observations, unless these information surface through design.

In light of these findings, Databay strongly recommends to move the PDF-generation away from Apache FOP and into a php-library that simply receives plain html and generates the PDF. Our favourite alternative is TCPDF. Link. The library is available under LGPL-terms and so compatible with the licence of ILIAS. The product is mature - around since more than a decade - and actively developed.

The major benefits of this approach, besides fulfilling the features request in accordance with the requester:

Consolidation of technologies.
We prefer a PHP solution over a Java solution in a project which has selected PHP as primary language to use. Moving towards a homogenous code landscape reduces configuration- and administration-complexity. Finally, this influences server-requirements: The necessity for Java support on the server will be reduced. PDF generation would become a feature that simply works out of the box with no additional setup or dependencies.

Saving unnecessary conversion steps.
ILIAS currently does two (by definition lossy) conversion steps: The first is to render the given html into XSL-FO (with the available XSL-sheets evaluated being far from perfect) the second is to render XSL-FO to PDF (with FOP, that still misses some features in the available XSL-FO). This can be done in one step, at least from our "ignorant" perspective, since we pass the html and page setup to TCPDF and leave eventual intermediate conversion steps with the library developer. The more important step is, that we can save us und our current and future clients to pay for the modification of XSL-sheets, whenever the visuals change. The XSL-sheets evaluated are not flexible in this matter. We would favour a solution that does not force developers into "structurally reformatting structurally formatted markup". We find this as awkward as it sounds.

Page sizes /orientation.
Currently, page dimensions and orientation is hardcoded into our XSL-Sheets. This leads to us being unable to natively and flexibly supporting foreign page formats and orientations. The so far available solution to address this matter with the certificate-service, to string-replace hardcoded placeholders in the FO is considered at least not elegant, as this approach is in a way the opposite of what XSL is designed for.

Taking advantage of a larger user base through the use of a library.
Of course, FOP is a widely used product and there is no doubt in the general feasibility and fitness for the purpose to convert FO to PDF. The conversion from html to FO in regards of the XSL itself, however, is only backed by the ILIAS community users and these are distributed across the available transformations, while with TCPDF, many more applications and users back the products complete conversion step. We consider this a great advantage and find the generic nature of the conversion charming, as it lowers the cost involved to offer PDF.

Easy Themes for PDFs.
The use of XSL of course makes skins / themes possible. The use of XSL for this purpose however makes this a very complicated step for template developers. Taking advantage of a conversion that uses CSS styles for the contents presentation allows designers to easily use their core competencies to also style PDF documents.

Easy support for PDF/A
Again, we have no doubt it is possible to generate PDF/A with FOP. TCPDF also supports the generation of PDF/A, but out of the box: A much shorter way to getting things done.

Future uses / perspectives
The current samples delivered by the TCPDF developers cover a great range. In it are samples for interactive forms in PDF-files, barcodes, digital signature certification and much more. These features are available and can easily be unleashed for ILIAS' purposes at a considerably lower cost for the funding parties, than with the current solution.

The secondary benefits are more subtle, but of great value as well.
We currently think of a new conversion processthat consists of a "job definition" class, a "processor" class and a processing dispatcher. In pseudocode:

1  $job = ilPDFGenerationJob::getDefaultJob(); // Get a preconfigured job. You could start with a "new" job or overwrite settings as necessary.
2  $job->addPage($some_html); // (Where a page can - through auto-pagination and for simplicity - cover multiple sheets of "paper".)
3  $job->addPage($even_more_optional_html); // Optionally, more pages can be attached. This can be used to add cover sheets or "fineprint".
4  ilUtil::deliverData(ilPDFGeneration::renderPDFToBase64($job),
5  ilUtil::getASCIIFilename($filename) . ".pdf", "application/pdf", false, true);

This would allow the dispatcher to route the jobs definition to current and future alternative mechanisms, set up in the administration. (Exactly, you do not see the processor in the code above as such knowledge does not belong into
the consumer.) The support of multiple pdf rendering mechanisms is possible with it, without changes to code in locations where the service is used. Should for one reason or another, a new pdf-generation mechanism be necessary/wanted, it could even be delivered as a plugin. One example for such a plugin is to add external renderings, either with TCPDF or another solution, up to having plain wamp-service-drones working for an installation  with round-robin or "rand-robin".
Another example is wkhtmltopdf, a commandline tool as mentioned above. We even see the possibility to design a plugin-slot in a way, that would allow to take advantage of client CPU-horsepower for the pdf-generation, as mentioned in the original request.
This promotes good software design as well. Today, modules that want to deliver a pdf need to know a lot about the current process, namely RPC. Messy where seen.

The best part of this is, that we will get a rather simple replacement for the current implementations. The html - the common starting point - must always be rendered. Instead of pushing it into FO and then initiate the conversion, or string-replace some other FO, this html will now be put into the job object. In the case of the test, the current solution has lead to 55+ very specific lines of code which could now be ... 5? 10? ... perfectly generic lines.
 
We believe, that the PDF generation as sketched out here, allows for easy integration in ILIAS as a whole and makes moving towards it an easy thing for the developers of other places, that generate PDF:
"Make job, give away job, get file. I like."
 
We present this feature by putting it on the JourFixe-agenda for a decision.
 
We would highly appreciate to discuss/answer any questions from the community and JF-participants prior to the actual JourFixe in order to save the organs limited time ressources.

Alex Killing, 23 May 2013: Matthias Kunkel, Stefan Meyer and I support the migration from Apache-FO to TCPDF. The Apache-FO implementation is used (ILIAS 4.3) in three components:
  • Survey
  • Test
  • Certificate
We ask Databay to
  • Create a new service for PDF generation as outlined by Max in his comment, and to take over the maintainership for this service.
  • To document the service in the dev-guide.
  • To list TCPDF under the third party components section in the dev-guide.
  • To migrate all PDF generation in tests and certificates to the new service. It would be feasible, that some developer at Databay takes fully over the maintainership for the certificate service. Michael is currently listed as maintainer for the user interface, Stefan for the generation.
Leifos will try to migrate the survey to the new service.

JF 27 May 2013: We confirm the proceeding above.

MB, 08 Oct 2015: Following up on the discussions in the bugtracker, I would like to bring up some thoughts about PDF-Generation and TCPDF in ILIAS.

As you can see above, the author of this feature request clearly preferred wkhtmltopdf to generate PDFs. The JourFixe "back in the days" hesitated from adopting this in the attempt to not introduce new external / commandline dependencies. An internal solution was preferred and TCPDF was chosen and implemented as requested. The move away from ilServer and XSL-FO to a new HTML-to-PDF solution was also decided under the expectation that leaning towards a conversion step that is backed by a larger user base would save us from the hassle of defining the conversion step ourselves.

Another goal was to allow a basic ILIAS installation to be able to generate PDFs without additional configuration.

Back in the days a certain drawback in terms of speed was predicted. To address this, the implementation was made with the premise that a plugin-slot was the logical next extension of the product. Fred Neumann already reported that using an alternative PDF-renderer was possible with the code structure in place without greater problems. This plugin-slot is something I would still appreciate.

Then, at the same time, new things happened that crossed our plans:
1. The MathJax was introduced, that renders TeX at runtime in the browsers which is obviously incompatible with the idea to push HTML into TCPDF.
2. The PDF archive was introduced, which generates an enormous load.
3. The move to bootstrap, which also relies on quite some JavaScript, makes the non-js-output look less attractive.
(Regarding the last two points: XSL-FO would have had similar problems. This is also a non-js-rendering.)

To address these issues, moving towards a headless browser can be part of the solution. But it won't deal with some central aspects:
The enormous load that is generated through PDF archives will still have a major impact on system performance unless more advanced mechanism are in place.
According to dhbw Karlsruhe, which did an extensive evaluation of alternatives, even with headless browsers, PDF generation with TeX is still an issue.

In order to resolve the issues that pop up all over the place I recommend to decide the following:
- Allowing for an implementation of a plugin-slot for PDF-Generators to allow for sophisticated and scenario-depending solutions that can include headless browsers, load balancing, addressing the requirements regard TeX on a more "individual" basis et. al.
- Reactivating the old TeX-rendering so it can be used in non-js-renderings.
- Keeping TCPDF as the default solution.

Further changes touch other features:
PDF Storage of e-Exams
The storage solution was designed to allow for maximum transparency. This meant to save every step of changes that are relevant for the exams.
This approach has proven to be impractical. Removal of PDF generation during incremental operations - corrections in particular - seems to be the way to go. A conceptual discussion with SIG-EA should be "ignited" about this change.
An even better solution would be to abandon PDF archiving from the core. 0016487: Subsequent creation of test archives are not valid is a feature request that changes the feature to be nothing more than an export format. For such we have the test export plugin slot and abandoning the feature from the core can help reduce complexity of the module.

Finally, I would like to point out that I still consider TCPDF a library which has great benefits. We have to seriously think about how we want to use it, though. With the general movement to shift visual awesomeness to the clients, the HTML we generate is "less" than what it was when a server-based html-to-pdf was thought to be the way to go. So we should think about generating "print views" instead of just "fire and hope" massively js enhanced pages into such a mechanism.

sschneider 10. Oct. 2015: sschneider: generally i do agree with Max, that we should discuss a subsequent pdf archive creation over a regular export slot. We can gain experience with a scalable pdf creation mechanism (a headless browser like phantomjs).
But:
We have also take care of the requirements to generate well formatted on-the-fly pdfs (like printing certificates or digital signments) and my first experiences with tcpdf css support were unimpressive. Maybe we should analyse the detailled formatting requirements and decide if tcpdf is a suitable core component for well formatted pdfs. I highly appreciate the idea of Max to introduce prospective individual printviews of every questiontype. Maybe the user results can be reduced to a minimal serialized dataset from where all views could be generated (print, web, mobile ...) For the ongoing problems with the already implemented T&A archiving i would suggest a usability patch like dicussed in Mantis:
- only html no pdf archive file is stored after Test is finished
- no pdf creation of new pdf versions on every tests calculation (correction of single questions)
- subsequent creation of pdf archives based on the stored html on archive export

5 Implementation

MJ 08 Jul 2013: Implemented by Max Becker

Implemented where? I can't find this in Administration / PDF-Export. Not with 4.4 or 4.5. Am I missing something?

MB 29 Jul 2014: Indeed, you are.
The change does not come with settings changes at the moment. PDFs are generated using TCPDF where implemented. Since we moved away from the wkhtmltopdf-approach, we have taken the opportunity to make it zero-config.

Last edited: 17. Apr 2025, 14:59, Kunkel, Matthias [mkunkel]