Custom Module (NodeJS): PDF to Text

MH2ag · November 20, 2020, 4:32pm

I was looking for a possibility of searching through PDF’s. With the following custom module you can extract text from a searchable PDF and save it to your db or use the output directly.

The module makes use of the following npm package:

You can download the module on GitHub:

Available output:

Thanks again @patrick, @sid & @JonL for your help with this.

sid · November 20, 2020, 4:36pm

Looks great. Thanks for trying and sharing.

JonL · November 20, 2020, 4:40pm

More goodies for Wappler
Thanks for sharing!

jellederijke · July 23, 2024, 2:23pm

I have 2 small suggestions:

Add square brackets to the hjson
Add pdf-extraction to usedModules

This way, supposedly, the module is automatically installed. Not totally sure, but it is mentioned in the documentation. Perhaps it should also be added to the dockerfile for the remote targets.

Here is my working code:

[
  {
  type: 'PDFtoText_getValue',
  module : 'pdftotext',
  action : 'getValue',
  groupTitle : 'My Modules',
  groupIcon : 'fas fa-lg fa-project-diagram comp-images',
  title : 'PDF to Text',
  icon : 'fas fa-lg fa-file-pdf comp-images',
  usedModules : {
    node: {
      "pdf-extraction": "^1.0.2"
    }
  },
  dataScheme: [
  { name: 'numpages', type: 'number' },
  { name: 'numrender', type: 'number' },
  { name: 'info', type: 'object', sub: [
    { name: 'PDFFormatVersion', type: 'text' },
    { name: 'IsAcroFormPresent', type: 'boolean' },
    { name: 'IsXFAPresent', type: 'boolean' }
  ]},
  { name: 'text', type: 'text' },
  { name: 'version', type: 'text' }
],
  dataPickObject: true,
  properties : [
    {
      group: 'Source File',
      variables: [
        { name: 'name', optionName: 'name', title: 'Name', type: 'text', required: true, defaultValue: ''},
        { name: 'path', optionName: 'path', title: 'Path', type: 'file', required: true, defaultValue: '', serverDataBindings: true},
        { name: 'output', optionName: 'output', title: 'Output', type: 'boolean', defaultValue: false }
      ]
    }
  ]
}
]