How to Fix Mojibake: Understanding Text Encodings
Mojibake is rarely random. In most cases, it appears because text was saved with one encoding and read back with another. Once you treat it as an encoding mismatch instead of a mysterious corruption event, recovery becomes much more systematic.
1. What text encoding actually does
Computers do not understand characters directly. They interpret byte sequences according to an encoding rule. Common examples includeASCII,GBK,Shift_JIS,Big5, andUTF-8. UTF-8 is the default choice for most modern web and cross-platform systems.
2. Why mojibake happens
The most common cause is a mismatch between the encoding used to save the data and the encoding used to read it. A CSV exported as GBK and opened as UTF-8 can still render successfully from the software's perspective, but the rendered characters will be wrong. This shows up often in legacy files, cross-region collaboration, and older office exports.
3. What to check first
- Which system originally produced the file.
- What encoding your editor, browser, or import pipeline assumes by default.
- Whether the problem is only in display or has already been saved back incorrectly.
- Whether the source came from CSV exports, logs, API payloads, or email content with legacy defaults.
4. Handling it in code
In the browser, TextDecoder is usually enough for UTF-8. Legacy encodings such as GBK or Shift_JIS require explicit conversion support so the original bytes are interpreted correctly.
// UTF-8 decode
const decoder = new TextDecoder('utf-8');
const text = decoder.decode(uint8Array);
// Node.js with iconv-lite
import iconv from 'iconv-lite';
const utf8Text = iconv.decode(buffer, 'gbk');5. A safer recovery workflow
The reliable approach is not to keep re-encoding text blindly. First identify the likely source encoding, then convert once into UTF-8 and verify the result. A useful text conversion tool should help you compare outputs quickly, inspect whether the text has recovered, and do all of that without sending the content to an external service.
You can use our Text Converter to inspect and convert common string representations and encoding-related text issues.
Conclusion
Mojibake is usually an interpretation problem, not a sign that the text is permanently lost. If you identify the source encoding and control the conversion path, most cases are recoverable. Standardizing on UTF-8 and checking encoding assumptions at import time remains the lowest-cost long-term fix.