Unicode Normalization

This question is specifically regarding Unicode Normalization / Account Takeover on registration forms.

Does Wappler automatically handle ensuring canonical encoding is used across all the text and that no invalid characters are present? or is this something that needs to be manually configured?

The validator should not accept double registration with Unicode signs.

Register account with Unicode letter. As an example:

U+0212A normalizes to K and can be sent URL encoded as %e2%84%aa.

  • List item An attacker could use this to hijack an existing account by creating the same one with Unicode characters and assigning different sign-in components (email/username/phone number etc.)
  • List item Records in the database of the old account could be overwritten, which could allow logging in using the new password.

1 Like

@JonL expert opinion plz

Assuming you are using nodejs as server model you can check the backend validators here:

lib/validator/

If you are using email as the identity you shouldn’t have issues if you lowercase it always and apply the email validator.

However if you are using a simple text field as identity(i.e. username) you should make sure to clean up the input and allow only what you want.

Wappler doesn’t take care of this further than what the current validators disallow.

@patrick maybe you can look into:

1 Like

Not sure if it will be an issue, the email indeed looks the same but in javascript when you compare the 2 strings it returns false, so it detects the difference. With databases results can differ depending on the collation used, binary compare would detect the difference.

btw. the U+0212A is the Kelvin sign, it looks like a K but isn’t the same character.

Thanks everyone for the replies on this. I shall do a PoC on my register forms and try to break them with the goal of registering two users with competing information. I will report back!

I believe there are going to be multiple layers of vulnerability here unfortunately (db vs js) it needs a PoC for sure :wink:

Thanks @patrick

1 Like

Yeah. That’s an issue too :smiley:

Normally you want to normalise the string before adding it to the database. Take the example from the link I left above.

const name1 = '\u0041\u006d\u00e9\u006c\u0069\u0065';
const name2 = '\u0041\u006d\u0065\u0301\u006c\u0069\u0065';

name1 is Amélie and name2 is Amélie and as you said if you compare them they are different.

You wouldn’t want to allow those two usernames to be saved in the database as someone with malicious intent could abuse the similarity of the usernames.

So for that you would want to use the normalize() function before comparing the post data with the data in the database to disallow additional registrations.

And as @obsidianux the different layers can affect how a password reset is treated for instance.

In summary, adding normalization is a good practice to avoid bad actors abusing similar usernames(visually speaking), but adding normalization adds some complexity to the security of the different layers (browser, server and database) that need to be in sync as how they treat unicode.

It’s not a trivial issue.

1 Like

Nope not a trivial issue which is why I was concerned

The normalize is indeed a good practice, but it is only for normalizing characters that can be represented as a decomposed form, like with umlaut characters.

In the example from the topic the normalize would not work, since it is not the same character represented in a different way. In his example it was a completely different character that visually looks the same as the other character.

1 Like

I was referring to this. Once I started digging into this I expanded beyond the OP.

Sorry for opening a can of worms :laughing:

1 Like