The .NET Framework Overview: Reading Web Pages in C#

The following example demonstrates how to write a “screen scraper” using C#. The following bit of code will take a stock symbol, format a URL to fetch a quote from the MSN Money site, and then extract the quote from the HTML page using a regular expression:

using System;

using System.Net;

using System.IO;

using System.Text;

using System.Text.RegularExpressions;

class QuoteFetch {

public QuoteFetch(string symbol)

{

this.symbol = symbol;

}

public string Last {

get

{

string url = “http://moneycentral.msn.com/scripts/” +

“webquote.dll?ipage=qd&Symbol=”;

url += symbol;

ExtractQuote(ReadUrl(url));

return(last);

}

}

string ReadUrl(string url)

{

Uri uri = new Uri(url);

//Create the request object

WebRequest req = WebRequest.Create(uri);

WebResponse resp = req.GetResponse();

Stream stream = resp.GetResponseStream();

StreamReader sr = new StreamReader(stream);

string s = sr.ReadToEnd();

return(s);

}

void ExtractQuote(string s)

{

// Line like: “Last</TD><TD ALIGN=RIGHT NOWRAP><B>&nbsp;78 3/16”

Regex lastmatch = new Regex(@”Last\D+(?<last>.+)<\/B>”);

last = lastmatch.Match(s).Groups[1].ToString();

}

string          symbol;

string          last;

}

class Test {

public static void Main(string[] args)

{

if (args.Length != 1)

Console.WriteLine(“Quote <symbol>”);

else

{

// GlobalProxySelection.Select = new DefaultControlObject(“proxy”, 80);

QuoteFetch q = new QuoteFetch(args[0]);

Console.WriteLine(“{0} = {1}”, args[0], q.Last);

}

}

}

In this age of Web services, screen scrapers are generally seen as passe. If a Web service is available that offers the equivalent functionality, you should use it over a screen scraper. However, it’s interesting to note that it has been more than three years since this code was written (in the first edition of this book), and it still works as well today as it did when first written.

Source: Gunnerson Eric, Wienholt Nick (2005), A Programmer’s Introduction to C# 2.0, Apress; 3rd edition.

Leave a Reply

Your email address will not be published. Required fields are marked *