trafilatura

Extract and process main text content from web pages

brewmacoslinux

Try with needOr install directly

About

Discovery, extraction and processing for Web text

trafilatura

Extract text from a URL$ trafilatura https://example.com

Extract text from a file and output as JSON$ trafilatura --json input.html

Extract text with metadata from URL$ trafilatura --with-metadata https://example.com