Building Large Scale Web Applications
with Visual FoxPro

By Rick Strahl, West Wind Technologies
http://www.west-wind.com/

Amazon Honor System Click Here to Pay Learn More

An Example: Egghead.com's Surplus Direct Site
    Site Statistics
    Site Configuration

Web Development Issues
    Performance
            Data Optimization
            To SQL or not to SQL
            Code Optimization
           Web Site Optimization
            Web Server Optimization
    Security

Scaling applications
    Visual FoxPro and MultiThreading
        Apartment Model Threading in 6.0
        Microsoft Transaction Server
        Pool Managers in Web Connection and FoxISAPI
    Scaling and Loadbalancing
        Scaling with multiple processors
        Scaling with multiple machines
        Scaling with redundant servers and IP routing

The Development Process
    Team Development
    Source Control
    Integration of Code and HTML


Overview

Find out about common issues involved in building Web applications that have large hit loads and require a team of developers, artists and HTML designers to be built.

This paper includes discussion of Web development in a team environment and integration of Visual FoxPro into this environment. You'll see how performance and scalability affect Visual FoxPro development with discussion of performance, scalability, load balancing and tuning applications and the Web Server (Internet Information Server). Other topics include security for Web applications, site maintenance and management related to Visual FoxPro and keeping track of site statistics.

This document discusses the following:

 

What's Large Scale anyway?

There are many types of applications that can be built with Visual FoxPro. When it comes to building applications that perform in a high transaction environment and require sophisticated development scenarios the rules of typical development change somewhat. Most successful commercial Web applications usually fall into this category.

When I talk about large scale Web development realize that this is a relative term that depends on your particular environment and how you approach an application. In general I consider large scale based on two particular issues:

A site can be very heavily used and thus fall into the large scale bucket by the sheer transaction volume and load issues it must deal with. Other applications involve a very complex business environment that requires the sheer size of the application to be large scale. Some applications have both.

Web Site Operation

This is probably the most common aspect people look at when judging the 'size' of a Web application. What's the volume of traffic, how many hits to this or that page, how many backend hits, how many records go into the database each day. These issues are extremely critical for server based applications as they need to be carefully balanced against the capacity of the system(s) that are running the application. Overload the system and you lock up the Web site with disastrous results and loss of money.

The Development Process

Performance is the glamour spot when talking about big applications, but at least as much attention and effort needs to go into tuning the development process as there is for performance and scalability.

Egghead.com / Surplus Direct: An Example Site

Egghead.com is on its way to becoming one of the biggest computer/software resellers on the Net. Egghead.com consists of three sites the main Egghead site, Surplus Direct Discount Warehouse and Surplus Auction. The two Surplus sites which were acquired by Egghead last year. Both Surplus sites are running Visual FoxPro based applications to drive the Web sites.

All of the sites are ranked within the top 20 of the busiest commerce Internet sites.

In order to demonstrate Visual FoxPro in a live, high volume application I want to show the Surplus Direct Discount Ware site. This is a fairly straight forward shopping site that allows reviewing and ordering from a catalog of inventory online.

This site sells previous version software and hardware - stuff that is 'surplus' to manufacturers and other distributors. Items are sold at rock bottom prices which are advertised via a printed catalog that's sent out by mail, ads in all the major Computer hardware magazines like Computer Shopper.

The company uses one of the heaviest advertising plans on the Internet to promote their site which has been rated as low as #5 Commercial site on the Web by PCMeter (a popular Web visitor rating service) with corresponding traffic on the site. Surplus runs large scale Web advertising programs on Yahoo, Netscape, InfoSeek, Excite, Shareware.com and several other of the highest volume sites that take advertising online.

Online catalog of hardware/software

Inventory catalog
Site displays inventory of between 2-3,000 items in various categories.
Online, secure ordering
Visitors can pick up items for ordering and purchase items securely using SSL encryption. Email confirmation.

Online Credit Card Validation
Credit cards are validated online via an ISAPI extension and the validated response is processed in VFP for order completion.

Electronic Software Download
(feature has been temporarily removed pending vendor agreements)

A special sections of the online site allows purchase of items that can be downloaded immediately. The Visual FoxPro application interfaces with an extended version of the Web Connection ISAPI interface that handles communication with the CyberSource third party packaging and authorization software running on a dedicated ESD server over the Internet.

Extensive custom promotional features
The site includes a number of promotional features to 'lure' potential customers. Free items advertised on banners, free items with orders over a certain dollar amount, rotating banners on the site, weekly specials displayed in frames, email specials, featured items, hot items list etc.

Sub Sites
The Surplus site often features specific vendor sub-sites. For example, a special build your own Everex computer was running for a few months. There are frequent vendor 'plug-in' apps that run and can be hooked in with minimal effort.

Extensive Site Management tools

Total Remote Administration
M
ost aspects of the site can be administered remotely via an HTML Admin interface that allows both for server management features as well statistics and maintenance operations on the data such as data transfer with the HP mini.

Detailed Site and User Statistics
The site keeps track of detailed information about individual hits and shopper information in order to determine traffic patterns. Shoppers are tracked through the site anonymously and valuable information about where they came from, how much traffic they generate is tracked into a shoppers table. The information can be displayed at a glance in online graphs as well as is exported for more detailed daily reports that are run and presented in Excel.

Customization Tasks
Several tasks such as rotating Banner administration and special displays are also handled through the HTML interface. Site designers can use the HTML interface to add/delete items for these tasks to allow a fluid interface for site administration.

Data Updates
The data uploads and downloads to and from the HP mini are also administered through HTML interface. Orders are exported once every hour and inventory is imported 3 times a day.

Running offline Web site

Both of the Surplus sites run as 'offline' Web sites, meaning they don't access the main business application directly.

HP mini Point of sale system
The main company application runs on an HP mini computer and the Web site is running offline from the mini. Data is transferred several times a day to update inventory on the Web and import orders captured online.

Security issues
The offline view serves as a security buffer from fraud. Orders go through a rigorous 3 step input validation process before being taken into the Point of Sale system. Web data entry has serious fraud potential so extra steps are taken to minimize fraud.

Accuracy issue
Web based data entry is often not as accurate as that taken by a qualified phone technician in the phone center. Although the Web site provides extensive validation, there are some things that can't be checked online such as obviously bad order amounts etc. Web orders are scrutinized to a detailed import routine that rejects orders based on stringent rules. VFP Conversion program exports to Mini import format.

Site Statistics

Peaks (exclusive front page ads on Yahoo and Netscape):

Site Configuration

The Surplus site runs on several separate machines in a server pool. Each machine is a fully functional Web server plus HTML and Visual FoxPro backends. Each server is fully redundant – if one fails it'll drop out of the server pool, but the site continues to run. Data is stored in SQL Server on a separate server.

Web Development Issues at Surplus

Let's take a look at some of the development issues that need to be dealt with when building applications for this high volume Web environment. We'll look at the following topics in more detail:

Performance

Performance is extremely critical in high transaction environments. Any application needs to run as fast as it possibly can, but for high transaction environments tuning and making sure code runs at its optimum is crucial as slow requests can tie up valuable resources that might be needed by the next request in line. There are a number of areas that should be focused on.

Data Optimization

Since we're dealing with database applications here data optimization is the most important piece to deal with. Database operations tend to be the slowest operations in any Web application and also the most resource intensive, so optimizations here can bring the biggest benefits.

To SQL or Not To SQL

Using Visual FoxPro Data

Using SQL Server Backend

Code Optimization

The next step in performance tuning deals with code optimization

Web Site Optimization

Web Server Optimization

Following are a few suggestions for improving performance on IIS 4.0 that can be important for heavily loaded sites:

Visual FoxPro and Multi Threading

Understanding CPU load and Speed

When examining load on a site it's crucial to understand how the application is performing on a given machine. When talking about load we're mostly looking at the CPU load that is incurred by the application. This load is affected by all system components such as disk and memory, but shows itself most consistently in the level of CPU usage. As disks get saturated queries slow down and use more CPU power to get to data. As memory runs out more data is stored on disk rather than in memory cache and you get more CPU load to access the data.

The key pieces to look at are:

Scalability

With Internet commerce growing at over 100% each year it's very likely that a commercial Web site will run into growing pains. Scalability issues come to the forth especially when it comes to running applications that outrun the single Visual FoxPro server and even more so when having to run more than a single server machine simultaneously in order to handle volume.

Security on the Web

When building database Web applications security is important. You wouldn't want to capture orders online including credit card numbers and then have somebody steal the entire order/customer file with that sensitive information.

Security comes in many flavors and applies to different aspects of a Web site. Is the information you pass over the Web safe? And how do you keep people from accessing certain parts of your application?

Keep it simple - let NT do the work

Windows NT provides excellent, though somewhat complex security features that should address the majority of your security needs. NT allows configuration of files at the file level as well as the directory level. Web directories need to have Read and typically Execute (or Script) rights set to allow Web clients to access the pages.

NT uses an account IUSR_ machine name to identify anonymous users to the Web site and rights must be given to this user for any public areas that public users to your site should be able to access. Beyond that however, make sure you remove any IUSR_ references (they shouldn’t be there in the first place), and also the Everyone account.

Also, be careful in playing with the rights of the IUSR_ account in User manager. When working with IIS and COM it's very easy to give the IUSR_ account Admin rights to get some security issues resolved, which is fine while developing – just don't forget to undo this setting once you put your site online.

Keep data in an unmapped path

Data security should be a top priority on your list. If you keep sensitive data on your Web server first of all make sure that the data is not accessible via a relative path over the Web. Ideally the data should reside in a totally off limits area away from the Web site in an unmapped path. Even better if the data can sit on another machine and be accessed over a non-TCP/IP network connection only you can just about eliminate your risk for data piracy (at the cost of overhead for the network access). For extra security you can also consider putting the data access over a separate network leg and use a non-TCP/IP protocol on that leg to disallow access.

Setting rights on directory and files

If you must have data in a Web relative path so that the data can be downloaded via an HTML link for authorized personal, make sure you set the proper password rights on these directories to disallow anonymous access by Web users. If you use IE 3 and IIS, NT's Challenge Response mechanism ties securely into NT's security system. With other Web servers security of passwords passed over the Web varies.

File Security with NT Challenge Response Validation

NT supports NT Challenge Response for access to files, which means that if you're accessing a page and IUSR_ doesn't have rights NT will try to validate your user account through the local machine or domain if you have IIS configure to run through a specific domain server. If you are a user of the local network you may not be prompted for a password – if you aren't, NT will request a login dialog and validate you. If you type the correct password you're allowed access. Security in this fashion works both at the directory level (which really just delegates down to the file level) and the file level.

Make sure you set the Allow NT Challenge Response option in the IIS configuration.

Using Authentication from your code

You can also force authentication from dynamically generated result pages with Basic Authentication. Authentication occurs as part of the HTTP header passed back to the Web server/browser which interprets the header and pops up a validation box.

There are two steps to make this happen:

Here's a simple example:

************************************************************************
* wwDemoProcess :: Authentication
*********************************
***  Function: Demonstrate how to check authorization for users
************************************************************************
FUNCTION Authentication
LOCAL lcUsername, lcPassword, loCGI

*** Easier reference
loCGI=THIS.oCGI

*** Try to retrieve the Authenticated Username
lcUserName=loCGI.ServerVariables("Authenticated Username")

IF EMPTY(lcUserName)   && Any validations against password here...
    *** Send Password Dialog  if Successful response this request is re-run
    THIS.oHTML.HTMLAuthenticate(loCGI.GetServername())
   RETURN
ENDIF   

THIS.StandardPage("You've been validated for this request...",;
              "You've entered a username of <b> "+lcUserName+ ;
              "</b> and password of <b> "+lcPassword + "</b><p>"+;
              "Subsequent requests for this server won't prompt you for "+;
              "a password again until you shut down your browser.")
RETURN

The actual authentication request is implemented via a special HTTP request that is returned instead of an HTML document. The following code generates the actual password box popup when sent back to the Web server:

**********************************************************************
* wwHTTPHeader :: Authenticate
******************************
***  Function: Sends the authorization content type header
***            Use to pop up Security Dialog and force authentication.
***            You can use Authentication Username (CGI)
***            to retrieve the entered user name if valid...           
***      Pass: tcRealm     -  Domain to log in to.
***            tcErrorText -  Error message to display when failing
***    Return: nothing or string if tlNoOutput=.T.
**********************************************************************
FUNCTION Authenticate
LPARAMETERS tcRealm, tcErrorText
tcRealm=IIF(type("tcRealm")="C",tcRealm,"")
tcErrorText=IIF(type("tcErrorText")="C",tcErrorText,;
                "<h2>Gotta enter your password to get in!</h2>")

THIS.cOutput=[HTTP/1.0 401 Not Authorized]+CR+;
             [WWW-Authenticate: basic realm="]+ tcRealm + ["]+ CR+CR +;
             [<HTML>]+tcErrorText+[</HTML>]
ENDFUNC             
* Authenticate

Authentication provides a built-in mechanism tied to the Operating system to validate users. Once authenticated you can always check the users Username which is passed along with each subsequent request until the browser is shut down.

You can also implement your own security scheme bypassing authentication altogether and creating an HTML page that asks for login information. You can then capture the login information on your own and validate against a user table denying access if it's invalid. Note though that you need to set some sort of flag that can be checked on each request to make sure the user does not access unauthorized pages or requests directly simply by typing the URL.

Note that basic authentication is not encrypted unless you combine it with a secure transaction request (HTTPS)!

Do you need secure transactions?

By default all the information that travels over the Web is not encrypted in any way. All the information including the HTML form variables and Server information that is returned to your backend programs from the Web browser including authentication information is not encrypted. This means somebody with a protocol analyzer could potentially snatch passwords or ID or credit card numbers while in transit.

Secure server transactions use certificate based encryption based on a private and public key to encrypt all the content that flows between the Web server and browser. Keys are administered by a few 3rd party key authority companies at $250 for a year. You create a key request with the server's key manager utility and fill out an online submission form for a key request. (see www.versign.com for more information on obtaining a key). The server sends the key request which is used to generate your private key. This key is returned to you as file and merged with your existing key to provide the secure certificate on your site. Once installed using secure transactions means accessing the HTTPS protocol instead of HTTP - a simple change to your URL is all that is required once the key is in place to make a transaction secure.

http://www.west-wind.com/wconnect/wc.dll?wwDemo~SecureCheck
https://www.west-wind.com/wconnect/wc.dll?wwDemo~SecureCheck

do the same thing and can be handled identically in code. The latter is encrypted and secure. To check whether a request is secure you can check the SERVER_PORT or Server Port Server variables.

Not all browser support secure transactions and attempts to access a secure page with a non-secure browser will cause the page to fail. Tell those users to get a browser from this century, Ok? <s>

Do you need secure transactions? If your site captures sensitive information like credit cards - definitely. If you're using a custom password scheme with passwords entered on HTML pages - probably. For general applications? Probably not.

Secure transactions are easy to implement. You simply use the https:// prefix instead of http:// to reference links with. But secure transactions are much, much slower than non-secure transactions. Therefore it's a good idea to use secure transactions only when you need them. For example on the Surplus site, the site runs in secure mode only when actually capturing the order information from the user and for some of the maintainence taks – all other site operations run non-secure.

Web Development as a team

Building complex Web sites typically involves more people than just programmers. Web applications tend to bring together a variety of skills

Team members (for Surplus and Auction Sites):

At Surplus the breakdown of the team at Surplus:

Considering the volume and income generated by the Web sites this staff is rather modest.

Source Code Control

In this environment where multiple people are involved in the development process Source Control is extremely important to make sure integrity of code and HTML documents is kept intact. Source control is applied on the Visual FoxPro project and the custom ISAPI DLL extensions to the Web Connection framework, as well as the HTML pages. Graphics are not under Source Control for graphics.

Integrating HTML and Code

HTML generation is probably the most 'different' aspect of Web application development compared to traditional desktop applications. At Surplus it was extremely important to work closely with the HTML design team in creating pages that could be visually maintained by the design staff. It wouldn't have been sufficient to build a FoxPro backend application that does all of the HTML generation internally. Instead the tools needed to provide a mechansim for mixing HTML with minimal code/expression syntax so dynamic information from the database could be displayed on the HTML.

Today you have many options to build HTML based applications whether you use a script based engine like Active Server Pages or a code based engine such as Web Connection.

At Surplus a combination of code and scripting is used. All requests fire a method inside of a class that runs to process the mainline business logic that needs to occur on a request. When the code is complete it calls an HTML page stored on disk and embeds FoxPro expressions into the page. The Script page uses an Active Server like scripting language (using different tags from an older version of Web Connection) to allow embedding of simple expressions like field names or PRIVATE/PUBLIC variables into a page. Any valid string based FoxPro expression can also be embedded – this includes FoxPro native functions as well as User Defined Functions (UDFs). In addition, blocks of code can also be embedded inside of the page, but this is avoided at Surplus due to speed issues with interpreting the code at runtime.

HTML is the Front End Interface

For Web applications HTML is the front end to the user. HTML usage can be simple using basic HTML at the lowest common denominator so all browsers can access pages, or can be advanced taking advantage of the most recent browser enhancements actually embedding advanced functionality on the client side in the HTML page.

At Surplus the focus is on making the page run on as many browsers as possible and creating pages that are small to download, so HTML extensions and scripting are kept to a minimum. This has changed recently as some interface scripting has been added to pages to allow for basic visual effects such as changing buttons etc.

Understand the limitations of HTML

Even the newest HTML standards don't provide the same functionality you'd expect from a typical GUI development environment. DHTML introduced in IE 4.0 takes a huge step in the right direction, but currently building complex forms and user interfaces is a far cry from using say the Form Designer in Visual FoxPro. The event model in the browser is also more limited and trapping events and responding to them is a little more complex and can require a fair amount of code.

Data connectivity

Pure HTML makes no provisions for data connectivity! If you're dealing with typical Web server based Web applications like Surplus Direct you're seeing an application that's all driven by the server. The server generates the HTML for a page and recreates the entire page whenever the user makes new choices and updates.

Again, DHTML makes provisions for data connectivity, but at the cost of substantial installation on the client site, which is usually not an option for public commercial applications – noone wants to wait around for 20 minutes to download a set of data ActiveX controls and the client side ADO engine at 28.8k. Most of these technologies also require IE 4 exclusively, which is leaving out a large portion of the market.

The bottom line is that commercial sites will continue to be driven by heavy server side applications that rely on the server accessing the data, generating HTML from the data.

Keep HTML and Code separate

Whenever possible try to build your application in such a way that business logic and HTML are clearly separated and don't reside in the same place. If you're using a tool like Web Connection or FoxISAPI, that should be easy as most of the code will sit in a VFP project and most of the HTML will sit in pages stored in a Web directory. If you're using Active Server Pages it's easy to get pages that heavily mix HTML and code which is a bear to maintain. With ASP it is a good idea to create 'code modules' as ASP include files or use ASP pages that act as router pages that contain code to perform logic and route off to the actual display pages.

The reason for all of this is two-fold: For one it's easier to maintain code in a code environment! I don't care how much better Visual Interdev has gotten in the latest rev it has nothing on the VFP or VB development environments in richness. Also, even with syntax color highlighting (which helps a lot) it's difficult to have to look through a huge HTML page, just to find that 2 line snippet of code that was embedded in the middle of the page. Keeping the code out of the way is also important when passing of pages to the HTML team. Most of the HTML team probably don't know how the database logic works – nor should they have to look at it and be tempted to mess with it. There are some useful things that should be accessible to designers, but this should be kept to a minimum. Typical that should be accessible are database fields, some known (and hopefully documented) variables that might be required and maybe basic operations for handling HTTP headers like redirects and Cookie read/write etc.

Scripting/Templates for data and display logic

I'm a little biased to the code drives the HTML approach of development and this is the approach that's used at Surplus. Basically you have an application that handles each request and then branches off to a script page to handle display of the HTML.

This functionality is actually implemented at the Visual FoxPro code level within the Web Connection Framework that handles the script parsing.

Whether you use Web Connection or Active Server Pages, scripting is a necessary part of development. Scripting makes it possible to keep the display logic in an easily maintainable medium of a simple text file that can be edited and updated simply by copying the file to the Web server. Imagine that every time you change an image in a page you had to recompile your application…

Site Administration

When running high volume site any downtime is a problem that can drive away customers. Hence, it's important to get through any administration tasks as quickly as possible to keep the site running at full operation.

 

Visual FoxPro or Active Server Pages

I've shown a lot of functionality in this document and related it to Visual FoxPro and Web Connection, because that's what was used for the Surplus application. So the table below compares some features/functionality of Active Server and Visual FoxPro to put some of the issues discussed into the context of Active Server. Keep in mind that I'm a little biased as the author of Web Connection, but I do believe the points made here are very valid and fair.

Visual FoxPro

Active Server Pages

Business Logic lives mostly in VFP, optionally in COM and minimally in scripted pages.

It's possible to test and debug applications without using COM, which makes the development process in VFP a lot easier.
Business logic lives in Scripts and COM objects only.


Typical ASP applications keep a lot of business logic in scripts. For more complex operations the only way to extend the functionality is to use COM objects – objects that cannot be unloaded without shutting down the Web server or Web 'application'.
Code drives HTML

When using a VFP based solution like Web Connection or FoxISAPI the focus is on using code and objects to address the business logic. The environment encourages working within the development environment and using classes to access business logic. Objects created in code can even be passed forward into scripted pages. HTML and scripting is used more towards the end of displaying the results, although you also have the option of mixing code and HTML. The environment does not encourage this though.
HTML drives Code

Active Server relies heavily on scripting to tie together logic. Since HTML and scripts live in the same page it often turns out that the HTML is the driving force of the page using the scripting to figure the display. In my opinion this is backwards and violates mixing business rules with interface code.
This type of implementation can be avoided with ASP, but the architecture certainly doesn't encourage it.
Easy HTML generation requires a framework

If you're using a VFP based tool you either need to build your own library of high level functions or use a framework supplied by the vendor. This can be good or bad – some tools provide lots of functionality, but more importantly you can use VFP to extend the framework in anyway you see fit.
All HTML works through the scripting engine

With ASP the scripting engine and ASP's built in objects allow creating of output and retrieving form and server data. It's built-in and the engine provides just about all the basic functionality you need. ASP does not provide generic page generation for data displays and other high level functions – you have to build that yourself.
Easy development and debugging inside of VFP


Building applications within Visual FoxPro makes it possible to use VFP interactively and even debug live requests within VFP including setting break points and stepping through code. Errors show and can be fixed right away.
COM development is complex with no way of debugging live components. Scripts not conducive to lots of code

Objects can only run as COM objects and cannot be debugged while running inside of the IIS process. Some debugging for complex object passing cannot be debugged at all as ASP's intrinsic objects are not available for you to test with outside of the IIS environment. Debugging components without a debugger is a drag!
Code updates require a recompile. Scripts can be updated online at any time.

Servers created with VFP require recompilation and updating online. With Web Connection it's possible to not shut down the server to update the server. Scripts are just text files and can be updated at any time.
Scripts can uploaded at any time. COM objects require a server shutdown


This is probably the strongest feature ASP has going for it: Updating a script is as simple as copying a file and this encompasses both HTML and code. Things get more complex with COM objects though – if you use and update them you have to shut down the Web server or the Web application at least.
Fairly complex one time setup

Setting up a VFP based solution is fairly complex as it involves properly registering servers and making sure all the paths are properly set and configured. Troubleshooting setup issues can easily frustrate new and experienced users.
Easy setup for scripts. Complex setup for COM objects.
Scripts are easy – set Script rights to a directory and install the files and you're off and running. You need an ODBC data source for data access, but that's about it.
COM object setup is complex as you have to deal with security issues and proper server configuration. Since ASP works efficiently only with InProc servers your servers have no UI and are difficult to debug if there's an error at startup.
VFP Servers can scale much better

As discussed previously, the pool managers in Web Connection and FoxISAPI provide some of the best ways to scale Visual FoxPro applications to multiple servers both on local or remote machines. A properly designed application can also run on multiple Web server simultaneously.
ASP can't scale to multiple machines

ASP cannot run on multiple machines and maintain context information such as session objects and object references. You can however call remote components, but configuration for this is tricky. You'll need to use Transaction server to make this work right and get security settings configured correctly and still be able to have a scalable server.

 

Click Here to Pay Learn More Amazon Honor System