cover

Last edited at 2024-07-16

Comparing the Summarization models to find which is effective for generating infographics from Medical Research paper

Slug
automated-imfographics-generator-of-research-paper
Published
Published
Date
Jul 16, 2024
Category
AI
GenAI
Machine Learning
Deep Learning
Comparitive-Analysis
BART
T5
TextRank

Introduction

Many of you, who are reading this have someday spent loads of time looking at every reserach paper, and among them only few are relevant. As technology is moving, over attention span is decreasing and hence we need to quickly go through the reserach paper in a way such that its short and at the same time intriguing. For this project, I have focussed on medical domain, wherein the project is made specifically for medical reserach paper, as its trained on that data, but following the same methods, you can also do it for other domains as well.

Project Modules

This project will have 5 modules:
  1. Extraction: here we will extract the text from the submitted document.
  1. Segmentation: We will segment each part of research paper into 5 parts, i.e Background, Introduction, Methodology, Results and Conclusion (We will only use Background, Results and Conclusion for this project).
  1. Summarization: An “Extractive” Summarization will be performed on each segment.
  1. Image Extraction and Selection: The images will be extracted from the reserach paper. Simultaneously captions fro each image will also be extracted. Then, the captions will be compared with the overall summary and finally most relevant images will be chosed
  1. Title Extraction: Using combination of several techniques we extracted the title of the reserach paper, which will be discussed in later part.
  1. Infographics Generation: As the summary, images and title are ready we need to finally present it in an infographics format.
 
Overview of the Project
Overview of the Project

Results

I tested our final model on 3 differents medical research papers. Each paper were of different length, topic and author which allowed fair comparison between our summarization models. I have used 3 summarization models for this task, among which 1 is Text Ranking model, that is a graph based model and 2 LLMs which are T5 and BART. There were 2 main parameters to comapre: Time and Reliability & meaningfullness of the summarized text.
 
For reference:
We will be using Paper 1, Paper 2 and Paper 3 in the rest of the blog for comparison.
 
  1. Time Analysis
I calculated the total time to run the model for each summarization model. This includes the time to run summarization models and other dedicated models in the system and their processing time.
Text Rank
T5
Bart
Paper 1
72 second
390 second
187 second
Paper 2
111 second
292 second
205 second
Paper 3
35 second
353 second
163 second
From the above results, Text Rank algorithm seems to be the clear winner, but lets now see the meaningfullness and reliability of the summarized text of each algorithm.
 
  1. Meaningfullness and reliability
Now its time for the most crucial comparison. To study this, we will breakdown the infographics of each paper for each model into abstarct, results and conclusion. As images and title are also extracted because of the summary generated we will include it as well in our study. To compare the models, we will assign points to each model. For eg, if we have 3 models, i.e. Model A, Model B and Model C. Now, assume Model A performs better than Model B, and Model B performs better than Models C, then we give 2 points to Models A, 1 point to Model B and 0 to Model C. Now that we are aware about the metrics, lets start.
Infographics genreated from paper 1 using Text Rank
Infographics genreated from paper 1 using Text Rank
Infographics genreated from paper 1 using T5
Infographics genreated from paper 1 using T5
Infographics genreated from paper 1 using BART
Infographics genreated from paper 1 using BART
Comparitive Analysis:
  1. Abstract (Paper 1):
    1. TextRank : 2 points
      T5: 1 point
      BART: 0 points
  1. Results (Paper 1):
    1. Text Rank: 0 point
      T5: 2 points
      BART: 1 point
  1. Conclusion (Paper 1):
    1. Text Rank: 0 point
      T5: 1 point
      BART: 2 points
  1. Images (Paper 1):
    1. Text Rank: 1 point
      T5: 1 point
      BART: 1 point
  1. Title (Paper 1):
    1. Text Rank: 2 points
      T5: 2 points
      BART: 1 point
Total points of each model after Paper 1:
Text Rank
5 points
T5
7 points
BART
5 points
Infographics genreated from paper 2 using Text Rank
Infographics genreated from paper 2 using Text Rank
Infographics genreated from paper 2 using T5
Infographics genreated from paper 2 using T5
Infographics genreated from paper 2 using BART
Infographics genreated from paper 2 using BART
  1. Abstract (Paper 2):
    1. TextRank : 0 points
      T5: 2 points
      BART: 1 point
  1. Results (Paper 2):
    1. Text Rank: 2 points
      T5: 1 point
      BART: 0 points
  1. Conclusion (Paper 2):
    1. Text Rank: 2 points
      T5: 2 points
      BART: 2 points
  1. Images (Paper 2):
    1. Text Rank: 1 point
      T5: 2 points
      BART: 0 points
  1. Title (Paper 2):
    1. Text Rank: 2 point
      T5: 1 points
      BART: 2 points
Total points of each model after Paper 2:
Text Rank
12
T5
15
BART
10
Infographics genreated from paper 3 using Text Rank
Infographics genreated from paper 3 using Text Rank
Infographics genreated from paper 3 using T5
Infographics genreated from paper 3 using T5
Infographics genreated from paper 3 using BART
Infographics genreated from paper 3 using BART
  1. Abstract (Paper 3):
    1. TextRank : 0 points
      T5: 2 points
      BART: 1 point
  1. Results (Paper 3):
    1. Text Rank: 2 points
      T5: 2 points
      BART: 0 points
  1. Conclusion (Paper 3):
    1. Text Rank: 2 points
      T5: 2 points
      BART: 2 points
  1. Images (Paper 3):
    1. Text Rank: 2 points
      T5: 2 points
      BART: 2 points
  1. Title (Paper 2):
    1. Text Rank: 0 points
      T5: 1 point
      BART: 2 points
Total points of each model after Paper 3:
Text Rank
18
T5
24
BART
17
Finally, we can see that T5 model outperforms Text Rank and BART summarization model. One of the main reason for T5 model’s win could be that it was specifically trained on a medical dataset while BART was trained on generic dataset. By utilizing the strengths of each model and combining them can lead to more accurate results.
 
If you want the source code of the above project then you can head on to my github profile and check it out:
automatic-infographics-system-for-medical-paper
protocornUpdated Apr 26, 2024
If you have any questions related to this project or any other queries you can connect with me on linkedin:

Related Posts