trafilatura

Extract and process main text content from web pages

brewmacoslinux
Try with needOr install directly
Source

About

Discovery, extraction and processing for Web text

Commands

trafilatura

Examples

Extract text from a URL$ trafilatura https://example.com
Extract text from a file and output as JSON$ trafilatura --json input.html
Extract text with metadata from URL$ trafilatura --with-metadata https://example.com