Redact PDF using Python | Free API to Expunge sensitive data

Redact PDF files

The PDF document may contain sensitive information and before sharing the documents, you may have a requirement to expunge such details from the document. This feature can be accomplished using PDF editing software but for bulk operations, we may require a human-independent solution. So in this article, we are going to discuss the steps and details on how to redact PDF using Python and cURL commands.

PDF Processing API

In order to implement PDF processing capabilities in your application, you need to try using Aspose.PDF Cloud. No Adobe Acrobat or other application is required and perform all PDF processing operations in the Cloud. In order to further make the PDF manipulation job easy, we have created programming language-specific SDKs so that you can implement all the features in the language of your choice. So as per the scope of this article, we need to try using Aspose.PDF Cloud SDK for Python. The first step is to install the SDK on a local system and can easily be downloaded from PIP and GitHub repository. Now in order to complete the installation process, please execute the following command on the terminal/command prompt.

pip install asposepdfcloud

PyCharm IDE

If you are using PyCharm IDE, you may directly add the SDK as a dependency in your project.

File -> Settings -> Project -> Python Interpreter -> asposepdfcloud

Image 1:- PyCharm settings option.
Aspose.PDF Python package
Image 2:- Aspose.Pdf Cloud Python Package.

Free Cloud Dashboard Account

After the installation, the next major step is a free subscription to our cloud services via Aspose.Cloud dashboard. The purpose of this subscription is to only allow authorized persons to access our file processing services. If you have GitHub or Google account, simply Sign Up or, click on the Create a new Account button and provide the required information. Now login to the dashboard using credentials and expand the Applications section from the dashboard and scroll down towards the Client Credentials section to see Client ID and Client Secret details.

Client credentials
Image 3:- Client Credentials on Aspose.Cloud Dashboard.

Redact PDF using Python

Please follow the instructions given below to redact PDF content using Python code snippet:

  • Create an instance of ApiClient while passing client credentials as arguments
  • Now initialize PdfApi while passing ApiClient object as an argument
  • In order to redact data, we need to create an object of RedactionAnnotation and call post_page_redaction_annotations(..) method of PdfApi

Expunge PDF Content using cURL Command

The cURL commands can be used to access REST APIs on any platform. Since Aspose.PDF Cloud is based on REST architecture, we can also access the APIs using cURL commands to expunge sensitive information from PDF documents. However, before we call the REST API, we need to generate a JSON Web Token (JWT) based on your client credential details (as specified over Aspose.Cloud dashboard). This step is mandatory because our APIs are only accessible to registered users. Please execute the following command to generate the JWT token.

curl -v "https://api.aspose.cloud/connect/token" \
-X POST \
-d "grant_type=client_credentials&client_id=88d1cda8-b12c-4a80-b1ad-c85ac483c5c5&client_secret=406b404b2df649611e508bbcfcd2a77f" \
-H "Content-Type: application/x-www-form-urlencoded" \
-H "Accept: application/json"

Now please execute the following command to expunge data inside the PDF document at following rectangular region (“LLX”: 20, “LLY”: 700, “URX”: 220, “URY”: 650). After the operation, the resultant file is also saved in the same cloud storage.

curl -v -X POST "https://api.aspose.cloud/v3.0/pdf/marketing.pdf/pages/1/annotations/redaction?apply=true" \
-H  "accept: application/json" \
-H  "authorization: Bearer <JWT Token>" \
-H  "Content-Type: application/json" \
-d "[  {    \"Color\": {      \"A\": 0,      \"R\": 158,      \"G\": 50,      \"B\": 168    },    \"Contents\": \"Confidential\",    \"Modified\": \"01/18/2022 12:00:00.000 AM\",    \"Id\": \"1\",    \"Flags\": [      \"Default\"    ],    \"Name\": \"Name\",    \"Rect\": {      \"LLX\": 20,      \"LLY\": 700,      \"URX\": 220,      \"URY\": 650    },    \"PageIndex\": 1,    \"ZIndex\": 1,    \"HorizontalAlignment\": \"CENTER\",    \"VerticalAlignment\": \"CENTER\",    \"QuadPoint\": [      {        \"X\": 5,        \"Y\": 10      }    ],    \"FillColor\": {      \"A\": 10,      \"R\": 50,      \"G\": 168,      \"B\": 182    },    \"BorderColor\": {      \"A\": 10,      \"R\": 168,      \"G\": 50,      \"B\": 141    },    \"OverlayText\": \"Sensitive data\",    \"Repeat\": true,    \"TextAlignment\": \"Left\"  }]"

Conclusion

In this article, we have discussed the steps on how to redact PDF using Python and cURL commands. Apart from Redaction annotation, a plethora of other Annotation features are supported by the API and their details can be found over Working with Annotation. You may consider visiting the Product home page for further information about its capabilities.

Should you have any related queries or you encounter any issues, please feel free to contact us via the Free Product support forum.

Related Articles

We recommend visiting the following articles to learn about: