This is the first of a few extensions I plan on publishing over the next few months as I work on a rather large project. This extension is rather simple, and implements the Sanitize-HTML package into Wappler.
sanitize-html is tolerant. It is well suited for cleaning up HTML fragments such as those created by CKEditor and other rich text editors. It is especially handy for removing unwanted CSS when copying and pasting from Word.
sanitize-html allows you to specify the tags you want to permit, and the permitted attributes for each of those tags. If an attribute is a known non-boolean value, and it is empty, it will be removed. For example
checked
can be empty, buthref
cannot.If a tag is not permitted, the contents of the tag are not discarded. There are some exceptions to this, discussed below in the “Discarding the entire contents of a disallowed tag” section.
The syntax of poorly closed
p
andimg
elements is cleaned up.
href
attributes are validated to ensure they only containhttp
,https
,ftp
andmailto
URLs. Relative URLs are also allowed. Ditto forsrc
attributes.Allowing particular urls as a
src
to an iframe tag by filtering hostnames is also supported.HTML comments are not preserved. Additionally,
sanitize-html
escapes ALL text content - this means that ampersands, greater-than, and less-than signs are converted to their equivalent HTML character references (&
→&
,<
→<
, and so on). Additionally, in attribute values, quotation marks are escaped as well ("
→"
).
Why would I need this?
Here’s one example: When you’re building a project, you might want to use a WYSIWYG like Quill, Summernote, or CKEditor to handle text boxes with formatting. These text editors allow users to input/create formatted text in an HTML format, which can could potentially contain malicious code. Most of these editors only handle protection on the front end, and require you to implement it on the backend yourself.
While displaying content directly from the database using InnerHTML in Wappler should not directly execute any harmful code. It’s considered good practice to sanitize user input before storing it in the database in the first place just in case.
It’s also useful for making sure users are only in fact using allowed tags, such as if you disable headings in the toolbar of Quill, you might also want to check that the sent code does in fact not contain headings on the backend. Or perhaps you might want to simply check that an Iframe code only allows certain domains, or just clean up broken HTML.
Small Example:
If the HTML text set on the Sanitize HTML action is “<p>text</p> <h3>text 2</h3> <h4>text 3</h4>
”, but you only have the allowed tags set as “<p><h4>
”, the returned text would be “<p>text</p> text 2 <h4>text 3</h4>
”, as “<h3>
” is not an allowed tag.
Config:
This server connect extension currently has the following options in the Wappler UI:
disallowedTagsMode, allowedTags, nonBooleanAttributes, allowedIframeHostnames, allowedIframeDomains, allowIframeRelativeUrls
The current version (1.0.0) has been put together relatively quickly. As I use the extension more on my own projects, I might add some more options or change some things around, but from brief testing, it all works fine. You can also contribute on Github
Install:
You can install this extension automatically by following the steps here:
NPM:
Changelog:
- Fixed data bindings not working (20/03/2024)