Explain Codes LogoExplain Codes Logo

How to get string objects instead of Unicode from JSON

python
json-decoding
unicode-conversion
object-hook
Anton ShumikhinbyAnton Shumikhin·Feb 18, 2025
TLDR

In Python 2, for direct string objects while decoding JSON, utilize json.loads() with encoding='latin-1'.

import json data = '{"name": "value"}' strings = json.loads(data, encoding='latin-1')

With this, strings will store string objects sidestepping Unicode conversion. This is particularly useful for dealing with JSON string data in Python 2 whilst evading additional Unicode handling. For Python 3.x, strings are natively Unicode, thus requiring no explicit encoding conversion. If you deal purely with ASCII values, and demand to ensure you're working with string objects, consider implementing object_hook or object_pairs_hook in json.loads().

Customized JSON decoding

Implementing object_hook for direct conversion

Large JSON? Go for object_hook. It transforms objects into strings while parsing, saving you the headache of post-processing. Ideal for big data, ain't it?

import json def string_decoder(dct): return {key: str(value) for key, value in dct.items()} data = '{"name": "value"}' strings = json.loads(data, object_hook=string_decoder)

Embracing ruamel.yaml: Strict to String rule

For JSON parsing with string-only output, ruamel.yaml library can be your savior. Call it a "stick to string" rule that aligns really well with YAML 1.2 specs.

from ruamel.yaml import YAML yaml = YAML(typ='safe') data = '{"name": "value"}' strings = yaml.load(data)

Byteify: Travel back in time to Python 2

Python 2.6 or earlier surrounded by evil Unicode objects? byteify to the rescue! Here's how to craft a byteify function to convert them to byte strings.

import json def byteify(input): if isinstance(input, dict): return {byteify(key): byteify(value) for key, value in input.iteritems()} # Yes, it's a Sword of Damocles above Python 2 elif isinstance(input, list): return [byteify(element) for element in input] # Voila,Thor's hammer against Unicode! elif isinstance(input, unicode): return input.encode('utf-8') else: return input data = u'{"name": "value"}' strings = json.loads(data, object_hook=byteify)

Advanced techniques: String from Unicode in every situation

Journey into the deep: Handling nested JSON

Deeply buried JSON treasures requires object_hook to be smart. It's all about recursion here and performance is the holy grail.

def byteify_deep(input): # Recursive byteify function goes here # Beware, though! It's like an inception, within an inception. # ... data = u'{"name": {"first": "John", "last": "Doe"}}' strings = json.loads(data, object_hook=byteify_deep)

Old Python, new tricks

Old Python versions not proficient in dictionary comprehension? No problem, use good ol' loop constructs within the byteify function.

def byteify(input): result = {} for key, value in input.iteritems(): # Python's version of 'old is gold' result[byteify(key)] = byteify(value) return result

Choose your battles: Updating Unicode-unaware libraries

Not all libraries are created equal. Some are Unicode-unaware. It's like not serving everyone at the party. Upgrade when you can. It'll help avoid encoding issues lurking in the shadows.

Post-parsing: The selective encoding art

Got mixed content in your JSON data and wanna convert specific values only to bytestrings? Custom conversion functions post-parsing is your Picasso.

strings= [str(item) if isinstance(item, unicode) else item for item in parsed_JSON_list]

Ensure JSON integrity: A step ahead

Prevention is better than cure. Always validate JSON file integrity. Corruption can lead to unexpected conversion errors. Nobody likes a malformed JSON.

Fall back on the pros

All else fails, fall back on the tried and tested Stack Overflow solution. There's wisdom in experience.

Judge the situation

Specific use case in your application? Wrestling between retaining Unicode or strings? Evaluate before deciding. Remember, context is the king.