Forms not submitting with special characters like é

Getting this error
HY000, file: “/home/******/public_html/dmxConnectLib/lib/db/Connection.php”, line: 108, …

SQLSTATE [HY000] General error: 1366 Incorrect string value: ‘\xCC\x011fa…’

This happens on normal textfields as well as on Medium Editor fields, is there anything I am missing from my head section to add unicode char support?

Well Paul, please check your database/table collation.

Its set to latin1_swedish_ci

I can add the funny é char into the cell inside phpMyAdmin, just errors when i try add it from a text field on the form.
Can I change that collation thing that I have never used before to something different to make it work?

Why not use utf8(utf8 general ci) collation for Unicode characters?

1 Like

Use utf8mb4 better.

And yes you can change it.

1 Like

Thanks, I have never touched those things, always just left them as whatever they were already, i had no idea what they were for, so figured rather not touch, hahaha, thanks guys, i will change it quickly. Luckily South Africa really does not use many if any of those strange é thingymabobs

1 Like

Holy pooh, I changed to utf8 general ci and my field which was type text has now changed to blob and instead of data in the columns i have these [BLOB -1.4 KiB] links, im a little nervous, should have backed my database first, that will teach me.

If I turn on Show Blob Contents then some are correct with the expected text that was there and others are just strings and strings of code.

Well glad I did not go too far, I changed back to latin1

I’ve always used general… not because I’m smart, I just have. Why utf8mb4?

What I had to do to change the collation was, firstly fix what phpMyAdmin did, then once it was all back to latin1_swedish_ci which for some reason is how my default is of phpMyAdmin or my MySQL server or INNODB, then i did a full database export.

Opened the file in BBEdit, find and replace latin1 with utf8mb4, drop all tables from my database, import my BBEdit file, and wham all converted and perfect.

For interests sake - https://stackoverflow.com/questions/6769901/why-is-mysqls-default-collation-latin1-swedish-ci

2 Likes

My surname ends in an 'é' :slightly_smiling_face: (but I'm not French).
For a while I've been using utf8mb4. I can't quite remember why, but it seemed liked the best option at the time. I would need to use this character set if I wanted to store this message in a database - because it supports emojis (but that probably wasn't the reason I chose it).

1 Like

General, unicode and language locales refer to the collation. So how the database sorts and compares the data.

I was referring to using charset utf8mb4 instead of plain utf8 which is a bit old and doesn't support fully unicode. Utf8 uses 3 bytes to store data while utf8mb4 uses 4.

Of course once you have decided to use utf8mb4 and right now it is the recommended charset. You should decide your collation type.

As for going for general or unicode collations. General is usually faster than unicode but “less correct” when comparing or sorting.

In the latest MySql versions(8+ I think) the recommended charset and collation is “ utf8mb4_0900_ai_ci”. Nonetheless general and unicode will still do the job.

3 Likes

I am going to bug for one more piece of info on this, sorry, I just can’t seem to find a direct answer for this.

if the latin collation i had was 1byte, and utf8 is 3bytes and utf8mb4 is 4bytes does that mean a tinytext of 256 char length in utf8mb4 is only (256/4)=64 char length now, or is tinytext always 256 chars regardless of it’s collation?

The amount of bytes of the charset(not the collation) defines how many characters it can represent. To represent the number of characters that unicode has now you need at least 4 bytes.

The size of the data types is separate thing and it is defined by the database engine.

2 Likes

Aha, thank you, that’s exactly what I wanted to know

1 Like