Difference between revisions of "PDF tools"

From Simson Garfinkel
Jump to navigationJump to search
 
(One intermediate revision by the same user not shown)
Line 4: Line 4:


==PDF OCR==
==PDF OCR==
* ocrmypdf - creates PDF/A files and runs tesseract  
* ocrmypdf - creates PDF/A files and runs tesseract
* https://techcommunity.microsoft.com/t5/ai-applied-ai-blog/generate-searchable-pdfs-with-azure-form-recognizer/ba-p/3652024 - Searchable PDFs online


==HTML to PDF==
==HTML to PDF==
Line 16: Line 17:


* https://stackoverflow.com/questions/391005/convert-html-css-to-pdf-with-php
* https://stackoverflow.com/questions/391005/convert-html-css-to-pdf-with-php
==Extract text from PDF==
* pymupdf (python module)

Latest revision as of 12:21, 12 April 2023

PDF page manipulation

  • pdftk - combines, removes, and rotates pages in PDFs
  • pdfjam - resizes pages (by running through LaTeX)

PDF OCR

HTML to PDF

Other sources:

Extract text from PDF

  • pymupdf (python module)