Summarize a PDF

In this tutorial, we'll use Python to build a PDF summary application.

Wouldn't it be great to use AI to summarize a PDF (or other document?). IN this tutorial, we will take a local PDF file, convert it to text, and then generate a summary using the Corcel Cortext Text API.

The code for this example can be found in a Jupyter Notebook on Github

This demo assumes you have a local PDF that you would like to summarize. It can be easily modified for a PDF at a url.

To convert a PDF to a text file, we will use Convertapi.com. The free tier offers 250 free file conversions before needing to enter a credit card. You'll need your API Secret from convertAPI, and an API key from Corcel.

Set up your Python libraries

In the first cell of the Jupyter Notebook, we import the libraries needed, and we initialize the environmental variables:

import convertapi
from dotenv import dotenv_values
import json, requests

# api secrets
convertapi.api_secret = dotenv_values(".env")['convertApi_secret']
corcelKey = dotenv_values(".env")['corcel_apikey']

Convert your PDF to text

In this cell, we read the PDF file (add in the path on your computer to the PDF file), then we make the API call to convertAPI. Convert API returns a URL where the text text file is saved. We read in that file and place it in a variable.

#path to your PDF
file_path = '/path/to/myPDF.pdf'

#use convertAPI to create a text document. You get 250 conversions for free.
result = convertapi.convert('txt', { 'File': file_path }, from_format = 'pdf')
text_url=result.response['Files'][0]['Url']

#get the text
textResponse = requests.get(text_url)
text_toSummarize = textResponse.text

Summarize the text

Now that we have the text, we can ask the Cortext Text endpoint to create a summary of the file for us:

# Corcel Magic

import requests

url = "https://api.corcel.io/cortext/text"

payload = {
    "model": "cortext-ultra",
    "stream": False,
    "miners_to_query": 1,
    "top_k_miners_to_query": 40,
    "ensure_responses": True,
    "messages": [
        {
            "role": "user",
            "content": f"provide a brief summary of the following document:  {text_toSummarize}"
        }
    ]
}
headers = {
    "accept": "application/json",
    "content-type": "application/json",
    "Authorization": corcelKey
}

response = requests.post(url, json=payload, headers=headers)

print(response.text)

The response from Corcel has a concise summary of the PDF document. You can edit the prompt to optimize the length and tone of the summary.

Conclusion

In this demo, we used the ConvertAPI API to convert a PDF to text. There are a number of tools on the web that can extract text from documents.

We then used the Cortext text endpoint to ask for a summary of the giant text string that we extracted from the document.