Difference between revisions of "PDF tools"
From Simson Garfinkel
Jump to navigationJump to search
m (→PDF OCR) |
m (→PDF OCR) |
||
(One intermediate revision by the same user not shown) | |||
Line 4: | Line 4: | ||
==PDF OCR== | ==PDF OCR== | ||
* ocrmypdf - creates PDF/A files and runs tesseract | * ocrmypdf - creates PDF/A files and runs tesseract | ||
* https://techcommunity.microsoft.com/t5/ai-applied-ai-blog/generate-searchable-pdfs-with-azure-form-recognizer/ba-p/3652024 - Searchable PDFs online | |||
==HTML to PDF== | ==HTML to PDF== | ||
Line 16: | Line 17: | ||
* https://stackoverflow.com/questions/391005/convert-html-css-to-pdf-with-php | * https://stackoverflow.com/questions/391005/convert-html-css-to-pdf-with-php | ||
==Extract text from PDF== | |||
* pymupdf (python module) |
Latest revision as of 12:21, 12 April 2023
PDF page manipulation
- pdftk - combines, removes, and rotates pages in PDFs
- pdfjam - resizes pages (by running through LaTeX)
PDF OCR
- ocrmypdf - creates PDF/A files and runs tesseract
- https://techcommunity.microsoft.com/t5/ai-applied-ai-blog/generate-searchable-pdfs-with-azure-form-recognizer/ba-p/3652024 - Searchable PDFs online
HTML to PDF
- https://pypi.org/project/xhtml2pdf/
- https://github.com/dompdf/dompdf
- https://github.com/spipu/html2pdf
- https://wkhtmltopdf.org
Other sources:
Extract text from PDF
- pymupdf (python module)