How to get string objects instead of Unicode from JSON

python

json-decoding

unicode-conversion

object-hook

byAnton Shumikhin·Feb 18, 2025

In Python 2, for direct string objects while decoding JSON, utilize json.loads() with encoding='latin-1'.

import json

data = '{"name": "value"}'
strings = json.loads(data, encoding='latin-1')

With this, strings will store string objects sidestepping Unicode conversion. This is particularly useful for dealing with JSON string data in Python 2 whilst evading additional Unicode handling. For Python 3.x, strings are natively Unicode, thus requiring no explicit encoding conversion. If you deal purely with ASCII values, and demand to ensure you're working with string objects, consider implementing object_hook or object_pairs_hook in json.loads().

Customized JSON decoding

Implementing object_hook for direct conversion

Large JSON? Go for object_hook. It transforms objects into strings while parsing, saving you the headache of post-processing. Ideal for big data, ain't it?

import json

def string_decoder(dct):
    return {key: str(value) for key, value in dct.items()}

data = '{"name": "value"}'
strings = json.loads(data, object_hook=string_decoder)

Embracing ruamel.yaml: Strict to String rule

For JSON parsing with string-only output, ruamel.yaml library can be your savior. Call it a "stick to string" rule that aligns really well with YAML 1.2 specs.

from ruamel.yaml import YAML

yaml = YAML(typ='safe')
data = '{"name": "value"}'
strings = yaml.load(data)

Byteify: Travel back in time to Python 2

Python 2.6 or earlier surrounded by evil Unicode objects? byteify to the rescue! Here's how to craft a byteify function to convert them to byte strings.

import json

def byteify(input):
    if isinstance(input, dict):
        return {byteify(key): byteify(value) for key, value in input.iteritems()}
    # Yes, it's a Sword of Damocles above Python 2
    elif isinstance(input, list):
        return [byteify(element) for element in input]
    # Voila,Thor's hammer against Unicode!
    elif isinstance(input, unicode):
        return input.encode('utf-8')
    else:
        return input

data = u'{"name": "value"}'
strings = json.loads(data, object_hook=byteify)

Advanced techniques: String from Unicode in every situation

Journey into the deep: Handling nested JSON

Deeply buried JSON treasures requires object_hook to be smart. It's all about recursion here and performance is the holy grail.

def byteify_deep(input):
    # Recursive byteify function goes here
    # Beware, though! It's like an inception, within an inception.
    # ...

data = u'{"name": {"first": "John", "last": "Doe"}}'
strings = json.loads(data, object_hook=byteify_deep)

Old Python, new tricks

Old Python versions not proficient in dictionary comprehension? No problem, use good ol' loop constructs within the byteify function.

def byteify(input):
    result = {}
    for key, value in input.iteritems():
        # Python's version of 'old is gold'
        result[byteify(key)] = byteify(value)
    return result

Choose your battles: Updating Unicode-unaware libraries

Not all libraries are created equal. Some are Unicode-unaware. It's like not serving everyone at the party. Upgrade when you can. It'll help avoid encoding issues lurking in the shadows.

Post-parsing: The selective encoding art

Got mixed content in your JSON data and wanna convert specific values only to bytestrings? Custom conversion functions post-parsing is your Picasso.

strings= [str(item) if isinstance(item, unicode) else item for item in parsed_JSON_list]

Ensure JSON integrity: A step ahead

Prevention is better than cure. Always validate JSON file integrity. Corruption can lead to unexpected conversion errors. Nobody likes a malformed JSON.