Convert PDF to HTML in Python

pdf to html
Learn how to convert PDF to HTML using Python

PDF files are widely used for data and information sharing as they possess an amazing capability of preserving document formatting when viewing the document on various platforms. However, in order to view PDF documents, we need to use a specific PDF viewing application, and also, if the fonts used within the document are not available on certain platforms, the rendering of text inside the document might be compromised. Therefore, one quick solution is to view PDF in HTML. In this article, we are going to further discuss the details on of converting PDF to HTML in Python.

PDF to HTML Conversion API

Aspose.PDF Cloud is our REST-based solution offering the capabilities to create, edit or transform PDF files to EPUB, PS, SVG, XLSX, PPTX, DOCX, HTML, and other supported document formats. In order to implement PDF processing capabilities in the Python application, we are going to use Aspose.PDF Cloud SDK for Python. So the first step is installation and the SDK is available for download over PIP and GitHub repository. Now please execute the following command on the terminal/command prompt to install the latest version of SDK on the system.

pip install asposepdfcloud

In case you need to directly add the reference in your Python project within Visual Studio IDE, please search asposepdfcloud as a package under the Python environment window. Please follow the steps numbered in the image below to complete the installation process.

pdftohtml api
Image 1:- PDF to HTML conversion API.

Convert PDF to HTML in Python

Please follow the instructions below to first upload the PDF file to cloud storage and then convert it to HTML format. The resultant file is stored in the same cloud storage.

  • First, we need to create an instance of ApiClient class while passing ClinetID and Client secret details as arguments
  • Secondly, create an object of PdfApi while passing the ApiClient object as an argument
  • Thirdly, specify the name of the input PDF and the resultant output
  • Now call the put_pdf_in_storage_to_html(…) method of PdfApi class to initiate the conversion. Upon successful conversion, the output is also stored in cloud storage

PDF to HTML using cURL Command

The cURL commands provide an excellent mechanism for accessing REST APIs through the command line terminal. So we can also the cURL command to access Aspose.PDF Cloud API through command line terminal. But before triggering the conversion operation, we need to generate a JWT access token based on client credentials. Please execute the following command to generate the JWT token.

curl -v "https://api.aspose.cloud/connect/token" \
-X POST \
-d "grant_type=client_credentials&client_id=88d1cda8-b12c-4a80-b1ad-c85ac483c5c5&client_secret=406b404b2df649611e508bbcfcd2a77f" \
-H "Content-Type: application/x-www-form-urlencoded" \
-H "Accept: application/json"

Once we have the JWT token, we can execute the following command to convert a PDF file available in Cloud storage to HTML format and as a result, the output is returned as a stream response.

curl -v -X GET "https://api.aspose.cloud/v3.0/pdf/awesomeTable.pdf/convert/html?documentType=Xhtml&fixedLayout=true&splitCssIntoPages=false&splitIntoPages=false&fontSavingMode=AlwaysSaveAsTTF" \
-H  "accept: multipart/form-data" \
-H  "authorization: Bearer <JWT Token>" \
-o .\Documents\PDFConversion.zip

In case you need to convert the PDF file to HTML and also want to save the output in cloud storage, please try using the following command.

curl -v -X PUT "https://api.aspose.cloud/v3.0/pdf/completeWorkbook.pdf/convert/html?outPath=converted.html&fixedLayout=true&splitIntoPages=false&outputFormat=Zip" \
-H  "accept: application/json" \
-H  "authorization: Bearer <JWT Token>"
PDF to HTML preview
Image 2:- PDF to HTML conversion preview.

Conclusion

In this article, we have discussed the details related to the conversion of PDF to HTML format. So we have explored an option to either use the python code snippet or use the cURL command to convert PDF to web page. We suggest you to please visit the following link and learn about the numerous parameters supported by PutPdfInStorageToHtml API. Please note that our cloud SDKs are developed under an MIT license, so the complete source code of Aspose.PDF Cloud SDK for Python is available on GitHub. In case you encounter any issues while using the API or you have any further queries, please feel free to contact us through the free product support forum.

Related Articles

We highly recommend you to visit the following links to learn more about