Is there a way to preview or view parts of MS Word files?

Is there a way to preview, view or see an extract of text from a Word file using Server Connect File Management tools?

I have a situation where I have hundreds and hundreds of MS Word files (.doc, .docm but mainly .docx). My client would like access to these files (easy enough) but the file names are often not good enough to know what the file contains or refers to.

Without actually opening the files one-by-one, is there a way to either preview, view or extract text and display on screen? The files vary in size from 500kb to about 12Mb.

If not, can Word files be read and the text contents be stored in a database so they can be searched?

Any suggestions and guidance would be appreciated.

Some ways of extracting data from Word (docx format anyway) are covered in this thread. If you change the .docx to .zip and take the contents apart, you should find what you’re looking for and decide how feasible it will be to extract it. However, it might not be very helpful. There might be well be tools/APIs available which could do what you need more easily.

Just an idea off top of head. Batch convert to .pdfs to allow previews in browser?

Cheers @TomD for pointing me at that thread.

And thanks @Hyperbytes for the suggestion, this is a good option. Can you tell me how to batch convert the files?

I have, in the past, created an Excel spreadsheet which extracted everything from a folder full of Word docs and separated everything out so every bookmark, paragraph, form field etc could be catalogued. A real pain in the backside that was. I would like a simple ‘Wappler’ way to do it :smile:

A couple years ago I worked on a team making an app for short stories. We used pandoc extensively. You might have to create a custom php file, or api pointed to custom api that calls it, but it does great things with all sorts of document formats.

Lots of online services and utilities offering various solutions for that.