How to Summarize Scientific Papers Using the BART Model with Hugging Face Transformers
Learn how to perform paper summarization with BART.

Image by Editor (Kanwal Mehreen) | Canva
Scientific papers are sometimes hard to understand because of the complex structure and longer text, which makes us unable to know where to start. Luckily, we can use Language Models to simplify the reading process by summarizing them.
In this article, we will explore how to summarize scientific papers using the BART Model. So, let’s get into it.
Preparation
To follow the tutorial, we will need to install the following packages.
pip install transformers pymupdf
Then, you must install the PyTorch package, which could work in your environment.
With the package installed, we will get into the next part.
Scientific Paper Summarization with BART
BART (Bidirectional and Auto-Regressive Transformers) is a transformer-based neural network model developed by Facebook (currently called Meta) for sequence-to-sequence tasks such as summarization.
BART architecture is based on a bidirectional encoder that understands the input text content while using an autoregressive encoder to generate relevant output sequences. The model is also trained with noisy input text and learns to reconstruct the original text based on it.
We will try the model out, as it is good for summarizing scientific papers. For the tutorial, we will use the PDF of the Attention Is All You Need paper.
First, let’s extract all the text from the scientific paper using the following code.
import fitz
def extract_paper_text(pdf_path):
text = ""
doc = fitz.open(pdf_path)
for page in doc:
text += page.get_text()
return text
pdf_path = "attention_is_all_you_need.pdf"
cleaned_text = extract_paper_text(pdf_path)
All the text has been extracted, and we will pass it into the BART model for summarization. Let’s try out the following code. In this code, we will take token chunks instead and summarize them while joining all the summaries to make the output more coherent.
from transformers import BartTokenizer, BartForConditionalGeneration
tokenizer = BartTokenizer.from_pretrained("facebook/bart-large-cnn")
model = BartForConditionalGeneration.from_pretrained("facebook/bart-large-cnn")
def summarize_text(text, model, tokenizer, max_chunk_size=1024):
chunks = [text[i:i+max_chunk_size] for i in range(0, len(text), max_chunk_size)]
summaries = []
for chunk in chunks:
inputs = tokenizer(chunk, max_length=max_chunk_size, return_tensors="pt", truncation=True)
summary_ids = model.generate(
inputs["input_ids"],
max_length=200,
min_length=50,
length_penalty=2.0,
num_beams=4,
early_stopping=True
)
summaries.append(tokenizer.decode(summary_ids[0], skip_special_tokens=True))
return " ".join(summaries)
summary = summarize_text(cleaned_text, model, tokenizer)
The result will be a long summary as we get the output of around 200 tokens per chunk of 1024 words. To make the summarization much more neat, we will perform hierarchical summarization, in which we summarize the first summary we have.
To do that, we will add additional code like below.
def hierarchical_summarization(text, model, tokenizer, max_chunk_size=1024):
first_level_summary = summarize_text(text, model, tokenizer, max_chunk_size)
inputs = tokenizer(first_level_summary, max_length=max_chunk_size, return_tensors="pt", truncation=True)
summary_ids = model.generate(
inputs["input_ids"],
max_length=200,
min_length=50,
length_penalty=2.0,
num_beams=4,
early_stopping=True
)
final_summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
return final_summary
final_summary = hierarchical_summarization(cleaned_text, model, tokenizer)
Output:
The Transformer is the first transduction model relying solely on self-attention to compute representations. It can reach a new state of the art in translation quality after being trained for as little as twelve hours on eight P100 GPUs. The attention function can be described as mapping a query and a set of key-value pairs to an output.
The summarization result is quite good, and it pinpoints a few main parts of the paper. You can play around with the chunk size to improve the summarization quality.
I hope this has helped!
Additional Resouces
- Using Hugging Face Transformers with PyTorch and TensorFlow
- How to Summarize Texts Using the BART Model with Hugging Face Transformers
- How to Build and Train a Transformer Model from Scratch with Hugging Face Transformers
Cornellius Yudha Wijaya is a data science assistant manager and data writer. While working full-time at Allianz Indonesia, he loves to share Python and data tips via social media and writing media. Cornellius writes on a variety of AI and machine learning topics.