Difference between revisions of "PDF tools"

From Simson Garfinkel
Jump to navigationJump to search
m
 
Line 4: Line 4:


==PDF OCR==
==PDF OCR==
* ocrmypdf - creates PDF/A files and runs tesseract  
* ocrmypdf - creates PDF/A files and runs tesseract
* https://techcommunity.microsoft.com/t5/ai-applied-ai-blog/generate-searchable-pdfs-with-azure-form-recognizer/ba-p/3652024 - Searchable PDFs online


==HTML to PDF==
==HTML to PDF==

Latest revision as of 12:21, 12 April 2023

PDF page manipulation

  • pdftk - combines, removes, and rotates pages in PDFs
  • pdfjam - resizes pages (by running through LaTeX)

PDF OCR

HTML to PDF

Other sources:

Extract text from PDF

  • pymupdf (python module)