Thursday, February 2, 2017

About URL encoding

When browser sends request to server using http requests, the url cannot have special characters, like space or colon. If the url includes the special characters, they need to be encoded.

Based on http://www.ietf.org/rfc/rfc2396.txt, 2.4.2, the url should already be encoded when combining each component into a full url, and once the url gets to server, the server will split the components and then do a decoding on each component.

So if the url path expected by backend is myserver/my file.txt the encoded url should be http://myserver/my%20file.txt. If the backend path is myserve/my%20file.txt, then the encoded url should be http://myserve/my%2520file.txt. As the server side will decode the url component only once, so the browser should only encode once on the reserved chars.

Although ideally the url should already encode each component when it is generated, but this may not always be true, particularly when user input url from browser address bar. So the browser handles this case by checking the url and if it finds any invalid chars (like space) in the url, then it will assume the url is not yet encoded, and it will encode the special char in the url. However, if there is not invalid chars in the url, then it will assume the url is already encoded, and skip this steps.

If user agent other than browser sends the http request to http server, it should follow the same convention as browser, so the server can handle the request in the same way without concerning which user agent sends the requests.


This also applies to file url scheme of file:///

No comments:

Post a Comment