Java Networking: URL

The primary classes to access the Internet are the java.net.URL class and the java.net. HttpURLConnection class. The class url encapsulates the Uniform Resource Locator (URL), which identifies a resource in the WWW uniquely. A resource can be anything such as a file or a directory, or it can be a reference to a more complicated object, such as a query to a database or to a search engine. The URL class provides mechanisms to download web resources to the client computer. It also has methods to retrieve different parts, such as method, hostname, port etc. of a URL.

1. Creating URL

The class URL has several overloaded constructors some of which are mentioned as follows:

URL(String url)

URL(String protocol, String host, String file)

URL(String protocol, String host, int port, String file)

Any constructor can be used depending upon one’s convenience. The most commonly used constructor takes a URL as a string argument and creates a url object. For example, the following code creates a url object for the URL http://www.yahoo.com.

URL url = new URL(“http://www.google.com”);

The String used in the above example itself represents an absolute URL. We can also create absolute URL objects from a relative URL object and a specification. The general form of this constructor is:

URL(URL baseURL, String relativeURL)

The following examples show how to use this constructor:

URL url = new URL(new URL(”https://plus.google.com”), ”u/0/?tab=wX”);

//Creates the URL https://plus.google.com/u/0/?tab=wX

URL url = new URL(new URL(“https://login.yahoo.com/”), “?.src=ym&.intl=in&.lang=en-IN&.done=http://mail.yahoo.com”);

//Creates the URL https://login.yahoo.com/?.src=ym&.intl=in&.lang=en- IN&.done=http://mail.yahoo.com

Each of the URL constructors throws a MaiformedURLException if the arguments to the constructor refer to a syntactically incorrect URL

2. Parsing URL

A URL consists of several parts, as follows:

protocol://host:[port]/[path[?params][#anchor]]

The optional parts are shown in [ ]. Examples of protocols include http, https, ftp, and File. The path is also called filename, and the host is also referred to as the authority. If a URL does not specify a port, a default port for the protocol is used. For example, for HTTP, the default port is 80.

The URL class provides numerous methods to retrieve these parts. For example, we can get the protocol, authority, host name, port number, path, query, filename, and reference from a URL using these methods. The following program (parsingURL.java) prints the different parts of a URL specified as a command line argument.

import java.net.*;

public class ParsingURL

{

public static void main(String[] args) throws Exception

{

URL aURL = new URL(args[0]);

System.out.println(”Protocol = ” + aURL.getProtocol());

System.out.println(”Authority = ” + aURL.getAuthority());

System.out.println(”Host = ” + aURL.getHost());

System.out.println(”Port = ” + aURL.getPort());

System.out.println(”Default port = ” + aURL.getDefaultPort());

System.out.println(”Path = ” + aURL.getPath());

System.out.println(”Query = ” + aURL.getQuery());

System.out.println(”File = ” + aURL.getFile());

System.out.println(”Ref = ” + aURL.getRef());

}

}

The program generated the following output when it was supplied the argument

http://www.uroy.biz:8080/ajp/BasicNetworking/index.html?topic=URL#ParsingURL

Protocol = http

Authority = www.uroy.biz:8080

Host = www.uroy.biz Port = 8080

Default port = 80

Path = /ajp/BasicNetworking/index.html

Query = topic=URL

File = /ajp/BasicNetworking/index.html?topic=URL

Ref = ParsingURL

3. Web Page Retrieval

Making a connection to a web server in the Internet using socket is sometimes problematic, especially when the client computer is connected to the Internet through a proxy computer that does not support this socket. The Java classes url and URLConnection allow client applications to connect to an HTTP server very easily. This mechanism will work even if the clients are behind the firewall and use HTTP proxy. A detailed description of how to specify proxy in a Java program can be found later in this chapter. These classes are special-purpose classes, used for accessing HTTP servers only. For example, the following code creates a url object for the URL http://www.google.com.

URL url = new URL(“http://www.google.com”);

You can then call its openStream() method to establish an HTTP socket connection with the web server specified by the URL. The method openStream() returns an Inputstream object which can be used to read data from this HTTP socket. The following example displays the content of the URL specified as a command line argument.

//URLReadDemo.java import java.net.*;

public class URLReadDemo

{

public static void main(String args[]) throws Exception

{

int c;

URL url = new URL(args[0]);

java.io.InputStream in = url.openStream();

while (((c = in.read()) != -1))

System.out.print((char) c);

in.close();

}

}

Run the program with the argument http://www.googie.com. java URLReadDemo http://www.google.com

A sample output is shown in Figure 12.1:


Figure 12.1: Web page retrieval using URL

The class url allows us to only read the content of the URL. It does not allow us to apply other parts of the HTTP protocol, e.g. accessing the header. The class URLConnection provides mechanisms to access content as well as to inspect properties of the resource. These properties are HTTP specific and do not make any sense for protocols other than HTTP.

Source: Uttam Kumar Roy (2015), Advanced Java programming, Oxford University Press.

Leave a Reply

Your email address will not be published. Required fields are marked *