Difference between revisions of "PDF tools"
From Simson Garfinkel
Jump to navigationJump to search
m (→PDF OCR) |
m |
||
Line 16: | Line 16: | ||
* https://stackoverflow.com/questions/391005/convert-html-css-to-pdf-with-php | * https://stackoverflow.com/questions/391005/convert-html-css-to-pdf-with-php | ||
==Extract text from PDF== | |||
* pymupdf (python module) |
Revision as of 16:31, 7 April 2023
PDF page manipulation
- pdftk - combines, removes, and rotates pages in PDFs
- pdfjam - resizes pages (by running through LaTeX)
PDF OCR
- ocrmypdf - creates PDF/A files and runs tesseract
HTML to PDF
- https://pypi.org/project/xhtml2pdf/
- https://github.com/dompdf/dompdf
- https://github.com/spipu/html2pdf
- https://wkhtmltopdf.org
Other sources:
Extract text from PDF
- pymupdf (python module)