PDF tools
From Simson Garfinkel
Jump to navigationJump to search
PDF page manipulation
- pdftk - combines, removes, and rotates pages in PDFs
- pdfjam - resizes pages (by running through LaTeX)
PDF OCR
- ocrmypdf - creates PDF/A files and runs tesseract
- https://techcommunity.microsoft.com/t5/ai-applied-ai-blog/generate-searchable-pdfs-with-azure-form-recognizer/ba-p/3652024 - Searchable PDFs online
HTML to PDF
- https://pypi.org/project/xhtml2pdf/
- https://github.com/dompdf/dompdf
- https://github.com/spipu/html2pdf
- https://wkhtmltopdf.org
Other sources:
Extract text from PDF
- pymupdf (python module)