|
|
8000XXXX Errors Alerts ASP.NET 2.0 Classic ASP 1.0 Databases Access DB & ADO General SQL Server & Access Articles MySQL Other Articles Schema Tutorials Sql Server 2000 Sql Server 2005 General Concepts Search Engine Optimization (SEO)Search | ASP FAQ Tutorials :: Databases :: Other Articles :: Should I use recordset iteration, or GetRows(), or GetString()? Should I use recordset iteration, or GetRows(), or GetString()?(Special thanks to Chris Hohmann for submitting the basis for this article.) This article is a comparative analysis of three (3) approaches for retrieving and displaying data from an ADO Recordset object. They are Recordset iteration, the GetString method and the GetRows method. This article in no way seeks to dictate which approach should be used in practice. Its intent is simply to aid the developer in making an informed decision when selecting an approach. The same task will be performed using each of the three (3) approaches. Namely, display a list of the articles from www.aspfaq.com, grouped by category. We have "borrowed" the list from "The Big List" portion of the site. Each method will be timed and every effort will be made to insure a level playing field. Whenever possible, optimizations to improve the performance of a given approach will be used. This is the environment where these tests were run:
The table we are going to use:
This table stores the ID, Category and Title of the four hundred sixty six (466) articles at www.aspfaq.com (as of 2003-07-10). The INSERT statements to populate the table can be found in inserts_2467.txt. The stored procedure we are going to use:
This returns a result set that is the union of articles and categories, used as the data source for the examples throughout this article. And here is the ASP code that we will be running:
Explanation of relevant portions of code Lines 3-9: These lines of code are the same in all three functions. They handle the initialization of the start variable (used to measure the execution time of the function), the connections and recordset objects, and call the stored procedure using the "stored procedure as Connection method" technique. We use this latter method, as it strikes a balance between the benefits of using a Command object to execute stored procedures and passing command text to the Execute method of the Connection object. Command objects have the benefit of being able to perform parameter type checking. On the downside, the process for passing parameters to the Command object can be cumbersome and somewhat complicated. Conversely, using the Execute method of the Connection object is often more straightforward than its Command object counterpart. On the downside, the Execute method provides no mechanism for type validation of the parameters that are passed to the stored procedure. Using the "stored procedure as Connection method" technique allows parameters to be passed directly to the stored procedure while also offering parameter type checking. Line 10: This is the call to the GetString method of the Recordset object. Detailed information about the syntax and use of the GetString method is available on the MSDN website (see this article). Put simply, the GetString method returns a string representation of a Recordset based upon supplied parameters that determine how the Recordset should be converted to a string, how many rows to return, which character(s) should be used to separate columns/rows and what expression should be substitutes when a null value is encountered. In this example we have specified the following parameter values:
The string returned from GetString has been HTML encoded to insure it is displayed properly when rendered and outputting to the ASP Response object. Also note, that the NullExpr parameter has been employed to identify and format the category rows. Generally speaking, avoiding the use of nulls in databases is advisable. However, there are instances where nulls can be useful when utilized correctly. This is such a case. Line 26: This is the call to the GetRows method or the Recordset object. Detailed information about the syntax and use of the GetRows method is available on the MSDN website (see this article). Put simply, the GetRows method loads a Recordset into a two (2) dimensional array based upon supplied parameters that determine how many rows are returned, which row in the Recordset to start from and which fields should be returned. The first dimension of the resultant array represents the column index and the second dimension represents the row index. Both dimensions are zero-based. In this example no parameters where specified. As such, the default values were used. Namely to start at the beginning of the Recordset, return all rows and include all columns. Line 30: Declare/initialize the jMax variable used to iterate through the array. Localizing the upper bound of an array can improve performance by eliminating the need to call the Ubound function on each iteration of the loop. Lines 32-42: Iterate through each row in the array. Implicit type conversion is used to determine whether the current row is an article or a category. Nulls evaluate to false through implicit type conversion. If the first column (article id) is not null, then the row represents an article and the article id and title are displayed. If the first column is null, then the row represents a category and the category name is displayed, prefixed with "Category:" Lines 54-68: Iterate through each row in the Recordset object. The With statement is employed throughout the iteration process to minimize object model traversal and improve performance. An explicit reference to the Field value is made to eliminate any ambiguity that can arise from the context sensitive nature of using default methods/properties and again to improve performance. As in the GetRows function, implicit type conversion for null values is used to determine whether the row is an article or a category. Lines 76-87: Output a standard HTML document containing a single <pre> tag in the body. Within the <pre> tag, all three (3) functions are called one hundred (100) times, and the total execution time associated with each function is displayed. Also note the call to the Clear method of the Response object. This clears the data outputted thus far. The aforementioned code generates over 140,000 lines of text. For the purposes of this analysis, there is no need to actually render said output. Analysis GetString: 0.015625 Pros The GetString method was consistently the fastest approach. Its performance is largely attributable to the fact that it minimizes the use of objects and complex data structures such as arrays. As such, while execution time was the only metric used to rate performance, the resource utilization (memory, stack, handles, etc.) for this approach would also be quite small for the same reasons. Note should also be taken that one optimization method not employed for the GetRows approach in this article was to make use of the NumRows parameter. The NumRows parameter determines how many rows are retrieved from the Recordset. For large result sets, looping through the Recordset and outputting a "bursts" of records for each iteration of the loop can provide a significant performance boost in comparison to retrieving the entire result set in a single call. The reason for this lies in the fact that allocating buffer space for large strings can become very expensive. The optimal setting for NumRows is dependant upon a number of factors including, the size of each row, available memory and concurrent load on the ASP server. For this article, empirical testing showed that the additional cost of the looping construct exceeded the performance benefits of specifying the NumRows parameter. Cons The GetString method provides limited functionality with regards to formatting the data contained in the result set. The ColumnDelimiter and RowDelimiter parameters are applied indiscriminately. There is no opportunity for row level or column specific processing beyond the use of the NullExpr parameter. This limitation in formatting capabilities can be overcome in part by altering the Recordset source to return pre-formatted data. This method however has the drawback of increasing the amount of data returned by the Recordset and increasing the complexity and execution time of the Recordset data source. Another alternative is to specify a value for the DataFormat property of each Field in the Recordset prior to calling the GetString method. Unfortunately, this method requires the creation and configuration of an StdDataFormat object for each Field. The StdDataFormat object is part of the Visual Basic (6.0) object model. As such, this method requires that MSSTDFMT.DLL be installed on the web server. MSSTDFMT.DLL is the Visual Basic (6.0) direct link library that contains the implementation of the StdDataFormat object. The GetString approach is also linear in nature. As such, processing occurs on a row-by-row basis. The GetString method provides no means for processing on a column-by-column basis or a cell-by-cell basis. This shortcoming can also be overcome by altering the Recordset data source with the same aforementioned costs. GetRows: 0.03125 Pros The GetRows method was the second fastest approach, taking about twice as long to run as the GetString method. This is attributable to the fact that while the GetRows method minimizes the use of objects, it does rely on an array which is a larger and more complex data structure. The GetRows example demonstrates an advantage over the Recordset iteration approach that is not readily apparent in the GetString example. Namely, both GetString and GetRows benefit from their ability to release the resources allocated for the Recordset early in the processing cycle. For the GetString examples, this is a moot point since no further processing is required after the string is output. However, in the GetRows example, we are able to close and deallocate both the Recordset object and the Connection object immediately after loading the array, then proceed to the processing and display phase of the function. While not as fast, the GetRows method offers a number of benefits that address some of the shortcomings of the GetString method. First and foremost, the GetRows method offers a great deal of flexibility in how the result set data can be processed and formatted. By its very nature, the 2-dimensional array returned by the GetRows method is non-linear. This has important implications for the ways in which the result set data can be processed. The data in the result set can be processed in a row-by-row, column-by-column or cell-by-cell manner, or any combination there-of. Formatting can also take place at the row, column and/or cell level. An array also has the benefit of being column name independent. From a maintenance standpoint, if the names of the columns returned by the Recordset change, the code does not need to be altered to accommodate the new column names. Finally, there is one aspect of the GetRows method that was not explored in this article; namely, the Fields parameter. The Fields parameter of the GetRows method determines which Field(s) from the Recordset should be loaded into the resultant array. This functionality can be useful in situations where supplemental data is returned from the Recordset for the purposes of setting the Filter or Sort properties of the Recordset based upon information not available during the retrieval of the data, but not for inclusion in the display. Cons From a development standpoint, the GetRows method can be somewhat more complicated than its GetString and Recordset iteration counterparts. The fact that columns and rows in the array must be referenced positionally and that the array dimensions are zero based can be counterintuitive to someone who is accustomed to referencing columns by name and navigating through the Recordset with methods such as MoveNext. There are a number of techniques available to overcome this shortcoming. One such method is to create a Dictionary object to store the column index of each Field in the Recordset. By doing so, the Dictionary object can be used to lookup a Field's associated column index when referencing the array. The Dictionary object can be populated by enumerating the Fields collection of the Recordset object. Another downside to the GetRows method is that the resultant array is specific to the VBScript language. While other scripting languages recognize VBArrays, additional steps must be taken to transform the VBArray into, for example a native JScript array. VBArrays also lack many of the methods and properties available to JScript array. For example, JScript arrays provide a powerful method that by default sorts an array lexically but also accepts a custom function as a parameter to the method call. In the absence of such functionality one must sort the data prior to calling the GetRows method by setting the Recordset Sort property or altering the underlying data source. Or in the alternative, build a custom function to reproduce the sorting capabilities of JScript arrays. Finally, as counterpoint to the aforementioned maintenance benefit of GetRows approach, this method can be more difficult to maintain if the order of the columns in a Recordset changes. This shortcoming can also be address by enumerating the Fields collection of a Recordset into a Dictionary object. Recordset: 0.046875 Pros The Recordset object offers a wealth of powerful tools to interact with the result set of its underlying data source. We have already examined two (2) of the most powerful methods, GetString and GetRows. However, in addition to these methods, the Recordset object provides access to more than fifty other properties and methods that would be difficult to reproduce using a string or array representation of the Recordset data. For example, the Save method of the Recordset object allow the underlying data to be converted into XML format. The XML data can be saved to a file or more interestingly be sent to any object that implements the OLE DB IStream interface, such as the ASP Response object. While outside the scope of this article, this technique provides exciting possibilities to move from simply displaying Recordset data to publishing that data in a format that can be utilized by any XML aware application. As part of the ADO object model, the Recordset object has access not only to its own properties, collections and methods but also the properties, collections and methods for all the objects associated with the Recordset. By referencing the Fields collection, one could add a new field to the Recordset and populate it with data that was not available when the data was being retrieved. For instance, in the Recordset iteration example, one could add a new field to the Recordset to indicate whether or not the specified article is currently available on the aspfaq.com website by polling the site with XMLHTTP and populate said field while iterating the Recordset. In the case of the GetString method, achieving the same result would not be possible. In the case of the GetRows approach, a new array would need to be created to accommodate the new column and the data from the original array accompanied by the article status would need to be migrated. Cons The additional functionality provided by the Recordset object comes at the price of performance and resource utilization. Recordset iteration was the slowest approach, taking roughly three times as long as the GetString approach. It is important to note that it is not the Recordset itself that degrades performance and resource utilization. In each of the three (3) examples, the same Recordset object is being instantiated in an identical manner. The performance of the Recordset iteration method is attributable to its heavy use of properties and methods of the Recordset object and repeated traversal of the object model. The Recordset iteration approach also requires that the Recordset object say open through the entire processing and display cycle. This results in much larger resource utilization footprint. In addition to performance and resource utilization concerns, Recordset objects are significantly more complex from a structural standpoint and as such are inherently less stable than low-level objects such as strings and arrays. For certain types of Recordsets and Connections, the sustained communication also adds to the size of the resource utilization footprint under the category of inter-process and network bandwidth. If columns are referenced by ordinal position, then the Recordset iteration approach suffers the same shortcomings that the GetRows approach does from a code maintenance standpoint. If the position of the columns changes in the underlying data source, code will need to be modified accordingly. If columns are referenced by name, then the alternative holds true. Code will need to be modified to reference the new column names. Finally, Recordset iteration, like the GetString approach, is linear in nature. As such Recordset iteration suffers from some of the same shortcomings as the GetString approach does in this regard. Performing column-by-column processing or column-based formatting would either require multiple passes through the Recordset or altering its underlying data source. Conclusions There is no one right approach to displaying data from a Recordset. As in all development, a balance must be struck between a number of factors including performance, maintainability and development effort. Hopefully having these factors well defined in the light of some of the information presented here will aid you in selecting the right approach for your particular circumstance. As always, due diligence demands that one not accept any premise at face value. The reader is encouraged to rigorously test the findings in this article and challenge any unsubstantiated claims. Related Articles How do I build a query with optional parameters? How do I calculate the median in a table? How do I create a store locator feature? How do I deal with MEMO, TEXT, HYPERLINK, and CURRENCY columns? How do I deal with multiple resultsets from a stored procedure? How do I debug my SQL statements? How do I determine if a column exists in a given table? How do I enable or disable connection pooling? How do I enumerate through the DSNs on a machine? How do I find a stored procedure containing <text>? How do I get a list of Access tables and their row counts? How do I get the latest version of the JET OLEDB drivers? How do I handle alphabetic paging? How do I handle BIT / BOOLEAN columns? How do I handle error checking in a stored procedure? How do I ignore common words in a search? How do I page through a recordset? How do I present one-to-many relationships in my ASP page? How do I prevent duplicates in a table? How do I prevent my ASP pages from waiting for backend activity? How do I prevent NULLs in my database from mucking up my HTML? How do I protect my Access database (MDB file)? How do I protect my stored procedure code? How do I protect myself against the W32.Slammer worm? How do I remove duplicates from a table? How do I rename a column? How do I retrieve a random record? How do I return row numbers with my query? How do I send a database query to a text file? How do I simulate an array inside a stored procedure? How do I solve 'Could not find installable ISAM' errors? How do I solve 'Operation must use an updateable query' errors? How do I temporarily disable a trigger? How do I use a SELECT list alias in the WHERE or GROUP BY clause? How do I use a variable in an ORDER BY clause? Should I index my database table(s), and if so, how? Should I store images in the database or the filesystem? Should I use a #temp table or a @table variable? Should I use a view, a stored procedure, or a user-defined function? What are all these dt_ stored procedures, and can I remove them? What are the limitations of MS Access? What are the limitations of MSDE? What are the valid styles for converting datetime to string? What datatype should I use for my character-based database columns? What datatype should I use for numeric columns? What does "ambiguous column name" mean? What is this 'Multiple-step OLE DB' error? What is wrong with 'SELECT *'? What naming convention should I use in my database? What should I choose for my primary key? What should my connection string look like? When should I use CreateObject to create my recordset objects? Where can I get this 'Books Online' documentation? Where do I get MSDE? Which database platform should I use for my ASP application? Which tool should I use: Enterprise Manager or Query Analyzer? Why are there gaps in my IDENTITY / AUTOINCREMENT column? Why can I not 'open a database created with a previous version...'? Why can't I access a database or text file on another server? Why can't I use the TOP keyword? Why do I get 'Argument data type text is invalid for argument [...]'? Why do I get 'Not enough space on temporary disk' errors? Why does ASP give me ActiveX errors when connecting to a database? Should I use COALESCE() or ISNULL()? Where can I get basic info about using stored procedures? |