tika

Extract text and metadata from documents and images

brewmacoslinux

Try with needOr install directly

About

Content analysis toolkit

tika

extract text from a PDF file$ tika --text document.pdf

get metadata from an image$ tika --metadata photo.jpg

convert document to plain text output$ tika -t file.docx > output.txt

detect document type and language$ tika --detect file.unknown

extract text from multiple files at once$ tika --text *.pdf