I’m searching for a way to extract text from a searchable PDF and save it to my database. I know that there are some npm packages for this, but I have no idea how to implement this.
I have absolute no idea how to translate the following code for the .js file to make it work:
const pdf = require("pdf-extraction");
let dataBuffer = fs.readFileSync("path to PDF file...");
pdf(dataBuffer).then(function (data) {
// number of pages
console.log(data.numpages);
// number of rendered pages
console.log(data.numrender);
// PDF info
console.log(data.info);
// PDF metadata
console.log(data.metadata);
// PDF.js version
// check https://mozilla.github.io/pdf.js/getting_started/
console.log(data.version);
// PDF text
console.log(data.text);
});
Hey @patrick - I feel as though the above nesting is incorrect when there are multiple layers/nesting as some data is not being handled well when trying to format/display - would you mind taking a look to see if the above is OK?