Difference between revisions of "PDF tools"
From Simson Garfinkel
Jump to navigationJump to search
m |
m (→PDF OCR) |
||
(4 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
==PDF page manipulation== | |||
* pdftk - combines, removes, and rotates pages in PDFs | |||
* pdfjam - resizes pages (by running through LaTeX) | |||
==PDF OCR== | |||
* ocrmypdf - creates PDF/A files and runs tesseract | |||
* https://techcommunity.microsoft.com/t5/ai-applied-ai-blog/generate-searchable-pdfs-with-azure-form-recognizer/ba-p/3652024 - Searchable PDFs online | |||
==HTML to PDF== | |||
* https://pypi.org/project/xhtml2pdf/ | * https://pypi.org/project/xhtml2pdf/ | ||
Line 9: | Line 17: | ||
* https://stackoverflow.com/questions/391005/convert-html-css-to-pdf-with-php | * https://stackoverflow.com/questions/391005/convert-html-css-to-pdf-with-php | ||
==Extract text from PDF== | |||
* pymupdf (python module) |
Latest revision as of 12:21, 12 April 2023
PDF page manipulation
- pdftk - combines, removes, and rotates pages in PDFs
- pdfjam - resizes pages (by running through LaTeX)
PDF OCR
- ocrmypdf - creates PDF/A files and runs tesseract
- https://techcommunity.microsoft.com/t5/ai-applied-ai-blog/generate-searchable-pdfs-with-azure-form-recognizer/ba-p/3652024 - Searchable PDFs online
HTML to PDF
- https://pypi.org/project/xhtml2pdf/
- https://github.com/dompdf/dompdf
- https://github.com/spipu/html2pdf
- https://wkhtmltopdf.org
Other sources:
Extract text from PDF
- pymupdf (python module)