tika

Extract text and metadata from various document formats

brewmacoslinux
Try with needOr install directly
Source

About

Content analysis toolkit

Commands

tika

Examples

Extract text from a PDF file$ tika --text document.pdf
Extract metadata from a document$ tika --metadata document.docx
Convert document to XML representation$ tika --xml image.jpg