DocSum

A tool to automatically summarize documents using the BART Machine Learning Model.

BART (BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension) is the state-of-the-art in text summarization as of 02/02/2020. It is a “sequence-to-sequence model trained with denoising as pretraining objective” (Documentation & Examples).

This tool will convert a PDF to XML and then interpret that XML file using the font property of each text element.

Project link: https://github.com/HHousen/DocSum