Monday, July 12, 2010

URL - safe and unsafe characters

"Unsafe characters"
    
Why:
Some characters present the possibility of being misunderstood within URLs for various reasons. These characters should also always be encoded.
Characters:
Character
Code
Points
(Hex)
Code
Points
(Dec)
Why encode?
Space
20
32
Significant sequences of spaces may be lost in some uses (especially multiple spaces)
Quotation marks
'Less Than' symbol ("<")
'Greater Than' symbol (">")
22
3C
3E
34
60
62
These characters are often used to delimit URLs in plain text.
'Pound' character ("#")
23
35
This is used in URLs to indicate where a fragment identifier (bookmarks/anchors in HTML) begins.
Percent character ("%")
25
37
This is used to URL encode/escape other characters, so it should itself also be encoded.
Misc. characters:
   Left Curly Brace ("{")
   Right Curly Brace ("}")
   Vertical Bar/Pipe ("|")
   Backslash ("\")
   Caret ("^")
   Tilde ("~")
   Left Square Bracket ("[")
   Right Square Bracket ("]")
   Grave Accent ("`")

7B
7D
7C
5C
5E
7E
5B
5D
60

123
125
124
92
94
126
91
93
96
Some systems can possibly modify these characters.

"Reserved characters"
    
Why:
URLs use some characters for special use in defining their syntax. When these characters are not used in their special role inside a URL, they need to be encoded.
Characters:
Character
Code
Points
(Hex)
Code
Points
(Dec)
 Dollar ("$")
 Ampersand ("&")
 Plus ("+")
 Comma (",")
 Forward slash/Virgule ("/")
 Colon (":")
 Semi-colon (";")
 Equals ("=")
 Question mark ("?")
 'At' symbol ("@")
24
26
2B
2C
2F
3A
3B
3D
3F
40
36
38
43
44
47
58
59
61
63
64

0 comments: