tika

Extract text and metadata from documents and images

brewmacoslinux
Try with needOr install directly
Source

About

Content analysis toolkit

Commands

tika

Examples

extract text from a PDF file$ tika --text document.pdf
get metadata from an image$ tika --metadata photo.jpg
convert document to plain text output$ tika -t file.docx > output.txt
detect document type and language$ tika --detect file.unknown
extract text from multiple files at once$ tika --text *.pdf