Difference between revisions of "PDF tools"

From Simson Garfinkel
Jump to navigationJump to search
m
 
(4 intermediate revisions by the same user not shown)
Line 1: Line 1:
Tools for HTML to PDF:
==PDF page manipulation==
* pdftk - combines, removes, and rotates pages in PDFs
* pdfjam - resizes pages (by running through LaTeX)
 
==PDF OCR==
* ocrmypdf - creates PDF/A files and runs tesseract
* https://techcommunity.microsoft.com/t5/ai-applied-ai-blog/generate-searchable-pdfs-with-azure-form-recognizer/ba-p/3652024 - Searchable PDFs online
 
==HTML to PDF==


* https://pypi.org/project/xhtml2pdf/
* https://pypi.org/project/xhtml2pdf/
Line 9: Line 17:


* https://stackoverflow.com/questions/391005/convert-html-css-to-pdf-with-php
* https://stackoverflow.com/questions/391005/convert-html-css-to-pdf-with-php
==Extract text from PDF==
* pymupdf (python module)

Latest revision as of 12:21, 12 April 2023

PDF page manipulation

  • pdftk - combines, removes, and rotates pages in PDFs
  • pdfjam - resizes pages (by running through LaTeX)

PDF OCR

HTML to PDF

Other sources:

Extract text from PDF

  • pymupdf (python module)