Explain Codes LogoExplain Codes Logo

What does the 'b' character do in front of a string literal?

python
byte-literals
encoding
unicode
Anton ShumikhinbyAnton Shumikhin·Jan 4, 2025
TLDR

The b prefix creates a bytes object from a string literal in Python. It converts every character into its ASCII equivalent, which is useful for dealing with binary data (such as in network communication or binary files) instead of text.

Example:

byte_data = b'data' # 'data' in binary format

Using non-ASCII characters in bytes

Besides ASCII characters, the b prefix can handle non-ASCII characters with escape sequences.

Example:

# Escaping the non-ASCII bytes byte_data_with_escape = b'\xf0\x9f\x98\x80' # Byte equivalent of a smiley face emoji # Encoding a string with a non-ASCII character to bytes string_with_unicode = "😀" encoded_data = string_with_unicode.encode('utf-8') # Smiley face in bytes

This example demonstrates how to incorporate non-ASCII characters in bytes objects either by escape sequences or encoding.

Bytes vs. str: A Tale of Two Types

In Python, str is a sequence of Unicode characters, while bytes is a sequence of 8-bit values or raw binary data. The encode() and decode() methods transform str to bytes and vice versa, respectively.

Example:

string = "data" bytes_object = string.encode('utf-8') # Welcome to binary world! decoded_string = bytes_object.decode('utf-8') # Back to human-readable region!

Appropriate times for a 'b' appearance

Here are the key scenarios that beckon for the use of b prefix:

  • Binary file operations
  • Data transmission over a network
  • Data processing with binary-data-friendly APIs or libraries

Example:

# Binary file operations as in The Matrix with open('binary_file.bin', 'wb') as file: file.write(b'some binary data') # Writing raw bytes to file # For the 'netizens' import socket s = socket.socket() s.send(b'GET / HTTP/1.1\r\n\r\n') # Sending HTTP request as bytes

The 'b' and 'r' duo

Combine b with r (raw) to form byte literals where escape sequences, such as \n or \t, are interpreted as raw text.

Example:

raw_bytes = br'\n does not start a new line' # '\n' escaped and treated as raw text

Bytes and Unicode in harmony

b and u play well together in the Python 2.x syntax, where u indicates a Unicode string. In Python 3, strings are intrinsically Unicode, and b is used to specify a byte string.

Example:

unicode_string = 'hello' # Implicit Unicode in Python 3 byte_string = b'hello' # Explicitly a byte string

Proceed with caution: Potential issues

  • Information loss: During transformation from str to bytes, use the correct encoding to prevent losing information.
  • TypeError: A TypeError arises when you perform an operation or concatenate bytes and str objects directly.
  • Presentation of byte literals: Byte literals with printable ASCII characters show as ASCII, while others appear as escape sequences.

In code:

# Do not try this at home. TypeError incoming incorrect_concatenation = b'data' + 'more data'

Here's the right way to do it:

# And the TypeError is gone! correct_concatenation = b'data' + 'more data'.encode('utf-8')