HTTP in JavaScript

1. The Protocol

If you type eloquentjavascript.net/18_http.html into your browser’s address bar, the browser first looks up the address of the server associated with eloquentjavascript.net and tries to open a TCP connection to it on port 80, the default port for HTTP traffic. If the server exists and accepts the con­nection, the browser might send something like this:

GET /18_http.html HTTP/1.1

Host: eloquentjavascript.net

User-Agent: Your browser’s name

Then the server responds, through that same connection.

HTTP/1.1 200 OK

Content-Length: 65585

Content-Type: text/html

Last-Modified: Mon, 07 Jan 2019 10:29:45 GMT

<!doctype html>

… the rest of the document

The browser takes the part of the response after the blank line, its body (not to be confused with the HTML <body> tag), and displays it as an HTML document.

The information sent by the client is called the request. It starts with this line:

GET /18_http.html HTTP/1.1

The first word is the method of the request. GET means that we want to get the specified resource. Other common methods are DELETE to delete a resource, PUT to create or replace it, and POST to send information to it. Note that the server is not obliged to carry out every request it gets. If you walk up to a random website and tell it to DELETE its main page, it’ll probably refuse.

The part after the method name is the path of the resource the request applies to. In the simplest case, a resource is simply a file on the server, but the protocol doesn’t require it to be. A resource may be anything that can be transferred as if it is a file. Many servers generate the responses they produce on the fly. For example, if you open https://gythub.com/marijnh, the server looks in its database for a user named marijnh, and if it finds one, it will generate a profile page for that user.

After the resource path, the first line of the request mentions HTTP/1.1 to indicate the version of the HTTP protocol it is using.

In practice, many sites use HTTP version 2, which supports the same concepts as version 1.1 but is a lot more complicated so that it can be faster. Browsers will automatically switch to the appropriate protocol version when talking to a given server, and the outcome of a request is the same regardless of which version is used. Because version 1.1 is more straightforward and easier to play around with, we’ll focus on that.

The server’s response will start with a version as well, followed by the status of the response, first as a three-digit status code and then as a human- readable string.

HTTP/1.1 200 OK

Status codes starting with a 2 indicate that the request succeeded. Codes starting with 4 mean there was something wrong with the request. 404 is probably the most famous HTTP status code—it means that the resource could not be found. Codes that start with 5 mean an error happened on the server and the request is not to blame.

The first line of a request or response may be followed by any number of headers. These are lines in the form name: value that specify extra information about the request or response. These headers were part of the example response:

Content-Length: 65585

Content-Type: text/html

Last-Modified: Thu, 04 Jan 2018 14:05:30 GMT

This tells us the size and type of the response document. In this case, it is an HTML document of 65,585 bytes. It also tells us when that document was last modified.

For most headers, the client and server are free to decide whether to include them in a request or response. But a few are required. For example, the Host header, which specifies the hostname, should be included in a request because a server might be serving multiple hostnames on a single IP address, and without that header, the server won’t know which hostname the client is trying to talk to.

After the headers, both requests and responses may include a blank line followed by a body, which contains the data being sent. GET and DELETE requests don’t send along any data, but PUT and POST requests do. Similarly, some response types, such as error responses, do not require a body.

2. Browsers and HTTP

As we saw in the example, a browser will make a request when we enter a URL in its address bar. When the resulting HTML page references other files, such as images and JavaScript files, those are also retrieved.

A moderately complicated website can easily include anywhere from 10 to 200 resources. To be able to fetch those quickly, browsers will make several GET requests simultaneously, rather than waiting for the responses one at a time.

HTML pages may include forms, which allow the user to fill out informa­tion and send it to the server. This is an example of a form:

form method=”GET” action=”example/message.html”>

<p>Name: <input type=”text” name=”name”></p>

<p>Message:<br><textarea name=”message”></textarea></p>

<p><button type=”submit”>Send</button></p>

</form>

This code describes a form with two fields: a small one asking for a name and a larger one to write a message in. When you click the Send but­ton, the form is submitted, meaning that the content of its field is packed into an HTTP request and the browser navigates to the result of that request.

When the <form> element’s method attribute is GET (or is omitted), the information in the form is added to the end of the action URL as a query string. The browser might make a request to this URL:

GET /example/message.html?name=Jean&message=Yes%3F HTTP/1.1

The question mark indicates the end of the path part of the URL and the start of the query. It is followed by pairs of names and values, corre­sponding to the name attribute on the form field elements and the content of those elements, respectively. An ampersand character (&) is used to sepa­rate the pairs.

The actual message encoded in the URL is Yes?, but the question mark is replaced by a strange code. Some characters in query strings must be escaped. The question mark, represented as %3F, is one of those. There seems to be an unwritten rule that every format needs its own way of escap­ing characters. This one, called URL encoding, uses a percent sign followed by two hexadecimal (base 16) digits that encode the character code. In this case, 3F, which is 63 in decimal notation, is the code of a question mark character. JavaScript provides the encodeURIComponent and decodeURIComponent functions to encode and decode this format.

console.log(encodeURIComponent(“Yes?”));

// → Yes%3F

console.log(decodeURIComponent(“Yes%3F”));

// → Yes?

If we change the method attribute of the HTML form in the example we saw earlier to POST, the HTTP request made to submit the form will use the POST method and put the query string in the body of the request, rather than adding it to the URL.

POST /example/message.html HTTP/1.1

Content-length: 24

Content-type: application/x-www-form-urlencoded

name=Jean&message=Yes%3F

GET requests should be used for requests that do not have side effects but simply ask for information. Requests that change something on the server, for example creating a new account or posting a message, should be expressed with other methods, such as POST. Client-side software, such as a browser, knows that it shouldn’t blindly make POST requests but will often implicitly make GET requests—for example to prefetch a resource it believes the user will soon need.

We’ll come back to forms and how to interact with them from JavaScript in “Form Fields” on page 317.

3. Fetch

The interface through which browser JavaScript can make HTTP requests is called fetch. Since it is relatively new, it conveniently uses promises (which is rare for browser interfaces).

fetch(“example/data.txt”).then(response => {

console.log(response.status);

// → 200

console.log(response.headers.get(“Content-Type”));

// → text/plain });

Calling fetch returns a promise that resolves to a Response object hold­ing information about the server’s response, such as its status code and its headers. The headers are wrapped in a Map-like object that treats its keys (the header names) as case insensitive because header names are not supposed to be case sensitive. This means headers.get(“Content-Type”) and headers.get(“content-TYPE”) will return the same value.

Note that the promise returned by fetch resolves successfully even if the server responded with an error code. It might also be rejected if there is a network error or if the server that the request is addressed to can’t be found.

The first argument to fetch is the URL that should be requested. When that URL doesn’t start with a protocol name (such as http:), it is treated as rel­ative, which means it is interpreted relative to the current document. When it starts with a slash (/), it replaces the current path, which is the part after the server name. When it does not, the part of the current path up to and including its last slash character is put in front of the relative URL.

To get at the actual content of a response, you can use its text method. Because the initial promise is resolved as soon as the response’s headers have been received, and because reading the response body might take a while longer, this again returns a promise.

fetch(“example/data.txt”)

.then(resp => resp.text())

.then(text => console.log(text));

// → This is the content of data.txt

A similar method, called json, returns a promise that resolves to the value you get when parsing the body as JSON or rejects if it’s not valid JSON.

By default, fetch uses the GET method to make its request and does not include a request body. You can configure it differently by passing an object with extra options as a second argument. For example, this request tries to delete example/data.txt:

fetch(“example/data.txt”, {method: “DELETE”}).then(resp => {

console.log(resp.status);

// → 405

});

The 405 status code means “method not allowed,” an HTTP server’s way of saying “I can’t do that.”

To add a request body, you can include a body option. To set headers, there’s the headers option. For example, this request includes a Range header, which instructs the server to return only part of a response.

fetch(“example/data.txt”, {headers: {Range: “bytes=8-19”}})

.then(resp => resp.text())

.then(console.log);

// → the content

The browser will automatically add some request headers, such as Host and those needed for the server to figure out the size of the body. But add­ing your own headers is often useful to include things such as authentication information or to tell the server which file format you’d like to receive.

4. HTTP Sandboxing

Making HTTP requests in web page scripts once again raises concerns about security. The person who controls the script might not have the same inter­ests as the person on whose computer it is running. More specifically, if I visit themafia.org, I do not want its scripts to be able to make a request to mybank.com, using identifying information from my browser, with instruc­tions to transfer all my money to some random account.

For this reason, browsers protect us by disallowing scripts to make HTTP requests to other domains (names such as themafia.org and mybank.com).

This can be an annoying problem when building systems that want to access several domains for legitimate reasons. Fortunately, servers can include a header like this in their response to explicitly indicate to the browser that it is okay for the request to come from another domain:

Access-Control-Allow-Origin: *

5. Appreciating HTTP

When building a system that requires communication between a JavaScript program running in the browser (client-side) and a program on a server (server-side), there are several different ways to model this communication.

A commonly used model is that of remote procedure calls. In this model, communication follows the patterns of normal function calls, except that the function is actually running on another machine. Calling it involves making a request to the server that includes the function’s name and arguments.

The response to that request contains the returned value.

When thinking in terms of remote procedure calls, HTTP is just a vehicle for communication, and you will most likely write an abstraction layer that hides it entirely.

Another approach is to build your communication around the con­cept of resources and HTTP methods. Instead of a remote procedure called addUser, you use a PUT request to /users/larry. Instead of encoding that user’s properties in function arguments, you define aJSON document format (or use an existing format) that represents a user. The body of the PUT request to create a new resource is then such a document. A resource is fetched by making a GET request to the resource’s URL (for example, /user/larry), which again returns the document representing the resource.

This second approach makes it easier to use some of the features that HTTP provides, such as support for caching resources (keeping a copy on the client for fast access). The concepts used in HTTP, which are well designed, can provide a helpful set of principles to design your server inter­face around.

6. Security and HTTPS

Data traveling over the internet tends to follow a long, dangerous road. To get to its destination, it must hop through anything from coffee shop Wi-Fi hotspots to networks controlled by various companies and states. At any point along its route it may be inspected or even modified.

If it is important that something remain secret, such as the password to your email account, or that it arrive at its destination unmodified, such as the account number you transfer money to via your bank’s website, plain HTTP is not good enough.

The secure HTTP protocol, used for URLs starting with https://, wraps HTTP traffic in a way that makes it harder to read and tamper with. Before exchanging data, the client verifies that the server is who it claims to be by asking it to prove that it has a cryptographic certificate issued by a certificate authority that the browser recognizes. Next, all data going over the connec­tion is encrypted in a way that should prevent eavesdropping and tampering.

Thus, when it works right, HTTPS prevents other people from imper­sonating the website you are trying to talk to and from snooping on your communication. It is not perfect, and there have been various incidents where HTTPS failed because of forged or stolen certificates and broken software, but it is a lot safer than plain HTTP.

Source: Haverbeke Marijn (2018), Eloquent JavaScript: A Modern Introduction to Programming, No Starch Press; 3rd edition.

Leave a Reply

Your email address will not be published. Required fields are marked *