A tool to automatically summarize documents using the BART Machine Learning Model.
BART (BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension) is the state-of-the-art in text summarization as of 02/02/2020. It is a “sequence-to-sequence model trained with denoising as pretraining objective” (Documentation & Examples).
This tool will convert a PDF to XML and then interpret that XML file using the font
property of each text
element.
Project link: https://github.com/HHousen/DocSum