Explain Codes LogoExplain Codes Logo

What characters are valid in a URL?

html
percent-encoding
url-validation
browser-compatibility
Alex KataevbyAlex Kataev·Mar 10, 2025
TLDR

The valid URL characters are alphanumeric (A-Z, a-z, 0-9), safe (-, _, ., ~), extra (!, *, ', (, )), and reserved (;, /, ?, :, @, &, =, +, $, ,). Use percent-encoding for the remaining, by substituting a character with its %ASCII hex code—like %20 for a space. Here's an example:

https://www.example.com/search?q=coding%20tips

In this case, %20 encodes a space, forming a well-structured query string with parameters.

The functions of URL characters: Assigning roles

Each character in a URL serves a specific purpose. The ? starts the query string, while # indicates a fragment identifier. Ensuring URLs are both human-readable and standard-compliant comes from this structured approach.

Within the path component of a URL, sequences like ../ come into play to navigate up a level. However, certain combinations like // in the path could change the semantics of a URL, possibly redirecting to a different authority (hostname).

More complex characters, for instance, emojis or spaces, demand percent-encoding as they deviate from standard ASCII. A 😊 emoji would be encoded as %F0%9F%98%8A.

When programmatically constructing URLs, use in-built functions such as encodeURIComponent() in JavaScript to seamlessly handle such encoding.

Need for percent-encoding: Making a case

URL parameters use characters like & and =. Encoding these helps preserve their literal value without causing any confusions.

Beyond English, internationalized characters and control characters from ASCII also require being percent-encoded for accurate interpretation by servers and clients.

Correctly encoded URLs can thus dodge problems related to caching, request handling, and security threats like SQL injection.

Creating flawless URLs: Best practices

Some characters might be seen as unsafe due to their varying interpretation across different systems, platforms, and protocols. Others are excluded because they do not contribute to a URL's semantics.

Preventing characters like <, >, ", {, }, |, \, ^, and ` in URLs is a best practice for avoiding breakage and security issues. Think of these as your stranger-danger list.

Unusual scenarios and troubleshooting: Your URL first-aid kit

Browser interpretation can be mysterious. What works in one browser might fail in another. Unencoded URL characters like pipes (|) exemplify this. Consistency is key here, with encoding ensuring cross-browser compatibility.

A frequent goof-up involves spaces. Use %20 or + (mostly for query strings) to maximize compatibility. In space, nobody can hear you scream... unless it is encoded.

When dealing with user input within URLs, it's essential to perform a comprehensive validation and encoding. This helps avoid issues that lead to broken links and security vulnerabilities. Good fences make good neighbours.