Word (.docx) conversion to HTML
Introduction
Consider this a beta release as i am still "tweaking" some areas but though i would give you all a look.
This module takes a docx file (microsoft word) and converts it to an html document.
Leveraged by using the mammoth npm, text conversion is quite effective.
Embedded images are converted to base64 strings.
There are some limitations related to image alignment as the module prioritises structure over presentation.
For those with a good understanding if word formatting, there is a class conversion feature.
NPM source
Installing
Install as per these instructions
Interface
The extension is loaded to the "Data Transformations" group
(note. In v1.0.0 the module was incorrectly loaded to a group called RSS Functions". This was corrected in V1.0.1
Docx file to convert.
This is simply the path to the file, noermally available in the /Public tree
Style mappings.
These can be either simple of copmpex.
For example <p>
tags counld be mapped to <h6>
tags or more complex mappings based on .docx classes can be added.
For exended details i refer you to the mammoth nom where explanations of mappings are detailed. See section on Styles and Style Mappings
PLEASE NOTE, Mammoth and style mappings can lead to issues. I am still working on this to improve integration so some mappings may fail at this time. Treat this as experimental at this time. An Incorrect Style mapping willt throw a console error and all mappings will be ignored.
Sample use
I start with a simple word document
I pass it to the module at get an output of:
html: "<h1>Leave Policy</h1><h2>Purpose</h2><p>Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</p><h2>Scope</h2><p>Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</p><p>do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</p><h2>1. Annual Leave</h2><h3>Entitlement</h3><p><strong>Full-time employees</strong>: Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do.</p><ul><li><strong>Part-time employees</strong>: do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim</li><li><strong>Casual employees</strong> do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim </li><li>do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim Procedure</li><li>do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim Approval is subject to business needs, and refusals will be reasonable and communicated in writing.</li><li>do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim </li></ul><h3>Payment</h3><ul><li>Paid do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim greement.</li></ul><h2>2. Personal/Carer’s Leave</h2><h3>Entitlement</h3><ul><li><strong>Full-time employees</strong>: 10 do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim essively.</li><li><strong>Part-time employees</strong>: Pro-rata based on hours worked.</li><li><strong>Casual employees</strong>: do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim occasion.</li><li>do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim.</li><li>do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim.</li></ul><h3>Procedure</h3><ul><li>do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim.</li><li>A do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim.</li><li>do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim ils.</li></ul><h3>Payment</h3><ul><li>Paid do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim.</li></ul><h2>3. Compassionate/Bereavement Leave</h2><h3>Entitlement</h3><ul><li><strong>Full-time and part-time employees</strong>: 2 days of do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim /injury.</li><li><strong>Casual employees</strong>: 2 days of unpaid compassionate leave per occasion.</li><li>Additional do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim.</li></ul><h3>Procedure</h3><ul><li>eiusmod tempor incididunt ut labore et dolore o eiusmod tempor incididunt ut labore et dolore sted.</li></ul><h3>Payment</h3><ul><li>eiusmod tempor incididunt ut labore et dolore hours.</li></ul>"
which displays on a webapge as:
Pretty good yes?
I now decide i want to map all word Heading1
tags to html <h6>
i add the mapping to the extension whereupon all headings1
tags should be mapped to <h6>
Images
Embedded images are converted to base64 and returned as <img>
tags.
So a word page like this with image added:
results in an output of|:
which carries over nicely to the html output
Raw Output
Checking raw output strips all formatting from the output adding newlines at the end of each section:
Debug
Turns on/off console logging messages
Issues
As always, any issues/ questions you know where i am!