What characters are valid in a URL?
The valid URL characters are alphanumeric (A-Z
, a-z
, 0-9
), safe (-
, _
, .
, ~
), extra (!
, *
, '
, (
, )
), and reserved (;
, /
, ?
, :
, @
, &
, =
, +
, $
, ,
). Use percent-encoding for the remaining, by substituting a character with its %
ASCII hex code—like %20
for a space. Here's an example:
https://www.example.com/search?q=coding%20tips
In this case, %20
encodes a space, forming a well-structured query string with parameters.
The functions of URL characters: Assigning roles
Each character in a URL serves a specific purpose. The ?
starts the query string, while #
indicates a fragment identifier. Ensuring URLs are both human-readable and standard-compliant comes from this structured approach.
Within the path component of a URL, sequences like ../
come into play to navigate up a level. However, certain combinations like //
in the path could change the semantics of a URL, possibly redirecting to a different authority (hostname).
More complex characters, for instance, emojis or spaces, demand percent-encoding as they deviate from standard ASCII. A 😊 emoji would be encoded as %F0%9F%98%8A
.
When programmatically constructing URLs, use in-built functions such as encodeURIComponent()
in JavaScript to seamlessly handle such encoding.
Need for percent-encoding: Making a case
URL parameters use characters like &
and =
. Encoding these helps preserve their literal value without causing any confusions.
Beyond English, internationalized characters and control characters from ASCII also require being percent-encoded for accurate interpretation by servers and clients.
Correctly encoded URLs can thus dodge problems related to caching, request handling, and security threats like SQL injection.
Creating flawless URLs: Best practices
Some characters might be seen as unsafe due to their varying interpretation across different systems, platforms, and protocols. Others are excluded because they do not contribute to a URL's semantics.
Preventing characters like <
, >
, "
, {
, }
, |
, \
, ^
, and `
in URLs is a best practice for avoiding breakage and security issues. Think of these as your stranger-danger list.
Unusual scenarios and troubleshooting: Your URL first-aid kit
Browser interpretation can be mysterious. What works in one browser might fail in another. Unencoded URL characters like pipes (|
) exemplify this. Consistency is key here, with encoding ensuring cross-browser compatibility.
A frequent goof-up involves spaces. Use %20
or +
(mostly for query strings) to maximize compatibility. In space, nobody can hear you scream... unless it is encoded.
When dealing with user input within URLs, it's essential to perform a comprehensive validation and encoding. This helps avoid issues that lead to broken links and security vulnerabilities. Good fences make good neighbours.
Was this article helpful?