Ask the Web Pro 10-Minute Solutions

Use IE's Web Browser Control to Extract Data From a Web Site
By Charles C. Caison

Runs in Microsoft Internet Explorer 4+: Yes
Runs in Netscape Navigator: No

Being able to extract data from any Web site can be useful in many applications. For example, you could hook directly into the United State Postal Service (USPS) Web site to retrieve the most up-to-date information on zip codes in the United States. You could then use this data in your application to display the city and state automatically when a user enters in a zip code.

Fortunately, you can achieve this functionality using Visual Basic and its Web Browser control. The Web Browser control is a programmatic interface to Microsoft Internet Explorer specifically. This control was built by Microsoft and cannot control Netscape Navigator. To the best of my knowledge, Netscape has not provided a similar-functioning programmatic interface to Navigator.

Many Visual Basic programmers build customer data entry screens. These screens typically ask for a customer name, address, and other information. That information is usually written to a data store of some kind.

You can make users happy if you reduce the number of keystrokes required in such applications. For example, you can automatically fill in information in some of the fields on the form, such as determining a city and state based on a provided zip code. A handful of companies build and sell components that can provide this functionality to your applications. I'll show you how to easily get the same (or even more up-to-date) information for free.

Those other applications work by using a static database of zip code and state information. Associated software looks up information in that static database. One of the problems with this approach is that the United States government sometimes changes or adds zip codes and where those zip codes exist. These changes will not be reflected in the static databases provided by these commercial zip code components, and if you want an updated database, you have to pay for it.

However, the United State Postal Service (USPS) Web site has a page that provides the most up-to-date information on zip codes in the U.S. to the public, free of charge. You can use Visual Basic and its Web Browser control to hook directly into the USPS Web site to retrieve this information. I'll show you how to create an application that will pass a zip code to that page, parse the return page, and display the associated city and state information (download the code).

The user interface uses an invisible Web Browser control, a textbox (that allows the user to enter the zip code), a listbox (to display the associated city and states), and a few command buttons that have various functions. To start the application, use this code:


Private Sub Form_Load()
    WebBrowser1.Navigate START_SITE
End Sub

As the application loads, you immediately navigate to the Web site of interest. A good approach is to use a constant because it is organized. In this case, you use a constant with this value:


Const START_SITE = "http://www.usps.com/ncsc/lookups/lookup_ctystzip.html"

As the page loads, the Web Browser control navigates to the site. The user enters the zip code and clicks the "Lookup" button. That button, in turn, calls this procedure:


Private Sub SubmitPage()
    
    WebBrowser1.Document.Forms(0).Elements(0).Value = Trim$(txtZipCode.Text)
    WebBrowser1.Document.Forms(0).Elements(1).Click
    
End Sub

This procedure works because you know the layout of the page that appears initially. To traverse the first line of code in the procedure, you trim and assign the Text property of the textbox to the value of the first element (which happens to be an HTML <INPUT Type=text> control) on the first form of that page. In other words, the user enters the zip code into the VB textbox, then you pass that zip code to the HMTL input box. The second line of code simulates a click of the second item on the HTML page, the SUBMIT button.

Using these two lines of code, you can programmatically enter information from the VB application and pass that information to the HTML page. You can submit a form by initiating the "click" method of a button.

After you submit the page, you must wait for the return page. There are several VB events that are designed to run when particular actions occur throughout the course of navigation. You need a way to determine when the page has completely returned with the information that you're looking for. Unfortunately, none of those events work in exactly the way you need in this application. Instead of using those events, you can simply loop until you find the page with the page title that you want:


Do Until WebBrowser1.Document.Title = "USPS City State / ZIP Code Associations"
	DoEvents
Loop

When the page has completed loading, the page title will have the value that you are looking for. You could have included code for measuring the amount of time that elapses during the looping process. This would enable you to timeout and exit if the page does not appear for some reason. In addition, you most likely do not need DoEvents. I included it merely to show that you can place code in between the Do and the Loop if you desire.

Finally, when you are sure that the page has been returned successfully, you must parse the source code of the returned page. You can use the innerHTML property to return the HTML source code into a VB variable.


txtBodyInnerHTML = WebBrowser1.Document.Body.innerHTML 

What remains is simple parsing and adding items to the listbox. You are aware of the format of the page that will be returned, so you can easily strip unnecessary text and display the information you're looking for any way you choose. In this case, you display the information in a listbox.

There are a few points to consider about this application. First of all, you can be sure that you will always see the most up-to-date information with regard to U.S. zip codes. Second, this application is small and free. Lastly, the big picture to consider is that you can extract any information from any Web site in a manner similar to this and use it in your own applications.

Download the code for this article.

 
Other 10-Minute Solutions
 Write Text Files to a Drive on a Web Server
 Create Clickable Image Hotspots
 Programmatically Convert Excel Spreadsheets to HTML
 Detecting Browser Type and Client Settings
 Create a Cool Color-Changing Rollover Effect
 Launch a Window From a Listbox Using JavaScript
 Add User-Interface Elements to Your Web Pages
 Provide Online Help Using Layers in Netscape
 Build an HTML-Based Color Picker
 Manipulate Browser Windows Using JavaScript
 Control Scriptlets With JavaScript
 Extract Data From a Web Site
 Create Lightning-Fast Tabbed Windows in IE
 Add Context-Sensitive Help to Your IE Applications
 Add Dramatic Transition Effects to Your Web Pages
 Create Cross-Browser Scrolling Hyperlinks
 Enable Users to Set Home Pages with a Single Click
 A Painless Introduction to the Wonderful World of XML


Ask the Web Pro | Who is the Pro? | Usage Policies | Search | Feedback


Sponsored Links


Advertising Info  |   Member Services  |   Contact Us  |   Help  |   Feedback  |   Site Map
Jupiterweb networks

internet.comearthweb.comDevx.comClickZ

Search Jupiterweb:

Jupitermedia Corporation has four divisions:
JupiterWeb, JupiterResearch, JupiterEvents, and JupiterImages

Copyright 2004 Jupitermedia Corporation All Rights Reserved.
Legal Notices, Licensing, Reprints, & Permissions, Privacy Policy.

Jupitermedia Corporate Info | Newsletters | Tech Jobs | E-mail Offers