Find and Replace Text in PDF using Python

Find and Replace Text

The Portable Document Format (PDF) file contains 7-bit ASCII characters, except for certain elements that may have binary content. Usually, it is comprised of text, images, embedded fonts, hyperlinks, videos, interactive buttons, forms, and much more. Sometimes, we may have a requirement to update Text content within a PDF file and it becomes cumbersome when we have to perform the bulk update operations. In such situations, a programmatic solution is a viable approach. Furthermore, it becomes further handy when you have minimal to no complications pertinent to environment setup/installation. So we are going to discuss the details on how to search and replace text using Python SDK. No Adobe Acrobat or other software download and installation is required and perform all the PDF processing within Cloud.

PDF Manipulation API

Our Aspose.PDF Cloud API provides the capabilities to create as well as manipulate existing PDF files. It also enables you to perform operations related to Text manipulation, where you get the options to read Text items, Add Text items or replace occurrences of Text in a PDF file. To further facilitate our users, we have developed Aspose.PDF Cloud SDK for Python which is a wrapper around Aspose.PDF Cloud API. So now you get all the PDF processing capabilities within your Python application.

The first step in using the API is its installation. The SDK is available for free download over PIP and GitHub repository. Now execute the following command on the terminal/command prompt to install the latest version of SDK on the system.

pip install asposepdfcloud

MS Visual Studio

When using Visual Studio, you may also add the reference in your Python project within the Visual Studio project. Please search asposepdfcloud as a package under the Python environment window. Please follow the steps numbered in the image below to complete the installation process.

Aspose.PDF Cloud Python
Image 1:- Aspose.PDF Cloud SDK for Python package.

Cloud Dashboard Account

In order to ensure the integrity and privacy of the data being used by our customers, the Cloud APIs are only accessible to authorized persons. Therefore, the next step is to create an account on Aspose.Cloud dashboard. If you have GitHub or Google account, simply Sign Up or, click on the Create a new Account button and provide the required information. Now login to the dashboard using credentials and expand the Applications section from the dashboard and scroll down towards the Client Credentials section to see Client ID and Client Secret details.

Client credentials
Image 2:- Client credentials on Aspose.Cloud dashboard.

Search and Replace Text using Python

Please follow the instructions given below to search a particular string and replace all its occurrences in the PDF document.

  • Firstly, create an instance of ApiClient class while providing Client ID Client Secret as arguments
  • Secondly, create an instance of PdfApi class which takes ApiClient object as input argument
  • Create variables specifying input PDF doucment
  • Now create an object of TextReplaceListRequest defining text replace properties
  • Finally, call the post_document_text_replace(..) method to intiate search and replace operation and save resultant in Cloud storage
Text replacement output preview
Image 3:- Text replacement output preview.

In the above code snippet, please specifically observe two parameters i.e. StartIndex and CountReplace. The StartIndex defines the specific occurrence of text from which text replace operation will be initiated and CountReplace defines the text occurrences that need to be replaced. In the image below, notice only two occurrences of the Product Family string are updated starting with index 2.

Two occurrences of String replace
Image 4:- Two occurrences of string are replaced.

For your reference, the input URL2PDF.pdf and resultant Text-Replace-Output.pdf have been attached.

Search and Replace Text using cURL Command

The beauty of REST APIs is that they can also be accessed via cURL commands. So in this section, we are going to discuss the steps on how we can search and replace text using the cURL command. So in order to access Aspose.PDF Cloud via the cURL command, we first need to generate a JSON Web Token (JWT) based on your individual client credentials specified over Aspose.Cloud dashboard. It is mandatory because our APIs are only accessible to registered users. Please execute the following command to generate the JWT token.

curl -v "https://api.aspose.cloud/connect/token" \
-X POST \
-d "grant_type=client_credentials&client_id=bbf94a2c-6d7e-4020-b4d2-b9809741374e&client_secret=1c9379bb7d701c26cc87e741a29987bb" \
-H "Content-Type: application/x-www-form-urlencoded" \
-H "Accept: application/json"

Now that we have generated our personalized JWT token, we need to execute the following cURL command to replace the Productivity string in the PDF document and save the updated document in the same cloud storage.

curl -X POST "https://api.aspose.cloud/v3.0/pdf/URL2PDF.pdf/text/replace" \
-H  "accept: application/json" \
-H  "authorization: Bearer <JWT Token>" \
-H  "Content-Type: application/json" \
-d "{  \"TextReplaces\": [    {      \"OldValue\": \"Product Family\",      \"NewValue\": \"Product Families\",      \"Regex\": true,      \"TextState\": {        \"FontSize\": 0,        \"Font\": \"Arial\",        \"ForegroundColor\": {          \"A\": 0,          \"R\": 252,          \"G\": 240,          \"B\": 3        },        \"BackgroundColor\": {          \"A\": 0,          \"R\": 252,          \"G\": 3,          \"B\": 248        },        \"FontStyle\": \"Regular\"      },      \"Rect\": {        \"LLX\": 0,        \"LLY\": 0,        \"URX\": 0,        \"URY\": 0      }    }  ],  \"DefaultFont\": \"Arial\",  \"StartIndex\": 2,  \"CountReplace\": 2}"

Conclusion

Let’s recap our discussion of the article. So in this blog post, we have explored the amazing capabilities of Aspose.PDF Cloud related to text search and replace. We also have discussed how we can specify text appearance properties of replaced string in the resultant document. As well, we have learned the steps on how to control the number of text occurrences that can be replaced. To further facilitate our loyal customers, the complete source code of Apsose.PDF Cloud SDK for Python has been made available for download over GitHub. In case you encounter any issues while using the API or you have any further queries, please feel free to contact us via the Free product support forum.

Related Articles

We also recommend visiting the following links to learn more about