//  home   //  advanced search   //  news   //  categories   //  sql build chart   //  downloads   //  statistics
ASP FAQ Tutorials

   8000XXXX Errors
   ASP.NET 2.0
   Classic ASP 1.0
      Access DB & ADO
      General SQL Server & Access Articles
      Other Articles
      Schema Tutorials
      Sql Server 2000
      Sql Server 2005
   General Concepts
   Search Engine Optimization (SEO)

Contact Us
Site Map



ASP FAQ Tutorials :: Databases :: Other Articles :: Should I use a view, a stored procedure, or a user-defined function?

Should I use a view, a stored procedure, or a user-defined function?

A pretty common question is when to use views, stored procedures, and user-defined functions. More importantly, what are the differences between them, and what advantages do they each have that make them the best choice in certain scenarios? 
Well, as with many technologies, the answer really is: it depends. I know, you're groaning, because you hate that answer. But hopefully, by showing you the pros and cons of each element, you can decide for yourself which to use in *your* situation--and you can back that up with your own performance tests and whatever other criteria you want to measure. 
Stored Procedures 
    If I can borrow from traditional programming languages like VB for a moment, a stored procedure is like a subroutine. It is used to perform tasks within the database, whether it be to INSERT, UPDATE, DELETE, SELECT, send return values, send output parameters, send e-mail, call command line arguments, encapsulate business logic, enforce data integrity, or any combination thereof. Here is a fictitious example: 
    CREATE PROCEDURE dbo.doStuff 
        @dt SMALLDATETIME 
        DECLARE     @un SYSNAME, 
            @now SMALLDATETIME 
        SET @un = SUSER_SNAME() 
        SET @now = GETDATE() 
        INSERT dbo.myLog 
        DELETE dbo.myLog 
            WHERE dt < (@now-7) 
        UPDATE dbo.Users 
            SET LastActivity = @now 
            WHERE UserName = @un 
        SELECT TOP 3 ProcName, dt 
            FROM MyLog 
            WHERE UserName = @un 
            ORDER BY dt DESC 
        DECLARE @subject VARCHAR(255) 
        SET @subject = @un + ' used the doStuff procedure.' 
        EXEC master..xp_smtp_sendmail 
            @from = 'foo@bar.com', 
            @to = 'bar@foo.com', 
            @server = 'mail.myserver.com', 
            @subject = @subject 
        DECLARE @cmd VARCHAR(255) 
        SET @cmd = 'del c:\users\'+@un+'\archive\*.log' 
        EXEC master..xp_cmdshell @cmd, NO_OUTPUT 
        RETURN 0 
    Stored procedures accept parameters, and are the preferred method of both manipulating data and simply returning data. They are compiled when first run, and the query plans are stored and cached by SQL Server's optimizer, and those cached plans are swapped out depending on frequency of usage. Generally, a stored procedure will perform faster than an ad hoc query, but there are certain cases (e.g. when a bad plan is cached) that this is not the case. 
    It is, for the most part, difficult to share data between stored procedures within SQL Server. As an example, you cannot say: 
    SELECT * FROM ProcedureName 
    -- nor 
    SELECT * FROM a 
        INNER JOIN (EXEC StoredProcedureName) 
        ON ...
    Erland Sommarskog has devoted a lengthy article to the treatment of this topic: How to share data between stored procedu....
    A view is like a table, but SQL Server does not store the data, only the definition. Views are typically used for two primary purposes: to simplify complex schemas and/or queries, and to implement security. With the exception of indexed and distributed partitioned views, they are *not* used to enhance performance! This is a very common myth, and I'm not sure where the sentiment comes from, but it is inaccurate. 
    By simplifying complex schemas or queries, I mean several things. One could simply be to return the *relevant* columns from a very wide table. You might have an inventory table with 200 columns, detailing a product's every last detail, from weight, size and color to origin country, manufacture date and shipping carrier. However, the database developer trying to write a report for the salespeople, trying to determine which product was their best seller last month, could use a view that only referenced the ProductID and ProductName columns (and join that against the Orders/OrderDetails tables). 
    Another is to simulate JOINs to reduce the complexity of other queries. For example, let's say you have the following query, that is used in several places: 
        a.foo, b.bar, c.state 
        FROM a 
        INNER JOIN b 
        ON a.userID = b.userID 
        INNER JOIN c 
        ON b.userID = c.userID 
        AND c.state IN ('RI', 'MA')
    Well, if you were to create a view like this: 
        SELECT b.userID, b.bar, c.state 
            FROM b 
            INNER JOIN c 
            ON b.userID = c.userID 
            AND c.state IN ('RI', 'MA') 
    Now your above query can look like this: 
    SELECT a.foo, b.bar, b.state 
        FROM a 
        INNER JOIN dbo.RIMA b 
        ON a.userID = b.userID
    Essentially, the optimizer swaps out your reference to dbo.RIMA with the actual code from the view, and they can all be optimized together. And it sure is easier to look at. 
    Now, that's a pretty trivial example; I'm sure you can come up with much more convoluted and real examples where simply cutting down on the number of lines of code in a query can speed development time greatly. It can also be easier to explain to junior database developers that to get a result of x, you query the view m, instead of having to explain (or provide direct access to) the joins required against the main tables. 
    Which brings me to my point about security. Let's say you have a table called Employees, and it has columns like salary and SSN. Well, you probably don't want everyone with access to the database to be able to see those columns, so you can create a view called EmployeeInfo which neglects to select those columns, and deny access to the base table to the users (either by placing them in specific roles/groups and denying access that way, or issuing explicit DENY statements to each user). This way, users don't even have to know that there is a table with more information behind the scenes. 
    CREATE TABLE dbo.Employees 

        EmployeeID INT, 
        FirstName VARCHAR(32), 
        LastName VARCHAR(32), 
        Salary INT, 
        SSN CHAR(9) 

    CREATE VIEW dbo.EmployeeInfo 
        SELECT EmployeeID, FirstName, LastName 
            FROM dbo.Employees 
    A view does not accept parameters, which is definitely a strange concept for users migrating from Microsoft Access. To restrict the rows returned by a view, you use a WHERE clause in the query that is accessing the view, not in the view itself. Table-valued functions, described further down, are more like parameterized views; in fact, they were called parameterized views during development, according to Hal Berenson. 
    Like a subquery or inline query, a view is not ordered. There is a kludgy workaround where users implement something like this: 
    CREATE VIEW dbo.Kludge 
        SELECT TOP 100 PERCENT ... 
            FROM table 
            ORDER BY column
    But this can cause problems and, AFAIK, is still not guaranteed to be in the order you define (even in this case, the order will be dictated by the outer query that selects from this view). If you want the results of your view to be in a specific order, then like the WHERE clause above, you add an ORDER BY in the query that is calling the view, not in the view itself. 
    I said earlier that views do not store the data, only a definition of the query. This is true except for the case of indexed views, in which the data IS materialized to disk and a query plan that uses the clustered index can be cached. Books Online has plenty of reading material on indexed views: Designing an Indexed View and Creating an Indexed View.
User-Defined Functions (UDFs) 
    In general, UDFs can be a serious source of performance issues. Also, UDFs cannot be used for DML operations (INSERT/UPDATE/DELETE), cannot use non-deterministic functions (see Article #2439), cannot use dynamic SQL, and cannot have error-handling (e.g. RAISERROR). 
    However, UDFs do have their place in the SQL Server world. Note that there are three kinds of user-defined functions: scalar functions, inline table-valued functions, and multi-statement table-valued functions. 
    • Scalar functions 
      Scalar functions are usually deemed a performance hit, because they are executed on a per-row basis. There is no caching or statistics to take advantage of, however they can be useful if you are merely obtaining single values or you want to isolate very convoluted logic from the query itself.  
      As an example, see Article #2519 for a technique used to populate a calendar table with every year's unique date for Good Friday and Easter Monday. Since this is only done at table population time, and not at run time, even using nested functions is acceptable performance-wise.  
      Another application for scalar functions is to simulate global variables. For example: 
      CREATE TABLE dbo.Properties 

          PropertyName VARCHAR(32), 
          IntValue INT 

      CREATE FUNCTION dbo.getPropertyValue 

          @propertyName VARCHAR(32) 

              SELECT IntValue 
              FROM dbo.Properties 
              WHERE PropertyName = @propertyName 
      (Of course, there are a number of ways to implement a global properties scheme, and this is only one of them. That is for a forthcoming article.) 
      In most cases, you want to avoid the use of scalar UDFs in SELECT, WHERE, GROUP BY and ORDER BY clauses. As discussed above, these will essentially force the optimizer to execute the function once for each row in the query, and depending on the complexity of the function, this could bring your system to its knees. 
      However, there are always cases where you can't avoid all evils. I worked on a system where log files were stored with their GMT timestamp, but reporting was done based on our local timeā€”and there was no freedom to adjust the schema or the log file generation. In a place where daylight savings time is observed, we used a calendar table to properly identify 4:00 AM vs. 5:00 AM in relation to GMT. So, in order to determine the actual day that a log file's data belonged to, we had to perform a function against its timestamp to determine, via the calendar table, whether it was summertime or wintertime, and subtract 4 or 5 hours accordingly. This allowed us to dump log files into their proper Eastern US time "buckets" instead of constantly guessing whether 4:30 GMT this morning belonged to yesterday or today. 
      I am not dead-set against scalar functions, and I am sure there are scenarios where performance is more than acceptable. But in general, like cursors, the majority of cases will show that a scalar function will hurt performance. It really is up to you to compare different implementations and decide which is best. 
    • Inline table-valued functions 
      These are a bit more fun than scalar functions. Some will say they are preferential over stored procedures because they can easily be used in JOINs and regular SELECTs; some will say they are preferential over views because they can be parameterized. In this way, they behave just like Access parameterized views, with the added benefit that a query plan can be cached. 
      Let's go back to our calendar table example from Article #2519, where we have created this function: 
      CREATE FUNCTION dbo.dateRange  
          @sDate SMALLDATETIME,  
          @eDate SMALLDATETIME  
      RETURN (SELECT dt FROM dbo.Calendar  
      WHERE dt BETWEEN @sDate AND @eDate)  
      This will allow as to add this easily to a more complex query, e.g. imagine if we wanted to correlate employees' sales figures to their number of vacation days in a defined time period, we could use something like this: 
          s.EmployeeID, COUNT(s.EmployeeID), SUM(s.total), COUNT(v.dt) 
          FROM Sales s 
          INNER JOIN (dbo.dateRange('20050101', '20050115')) d 
          ON s.dt = d.dt 
          INNER JOIN VacationDays v 
          ON s.employeeID = v.employeeID 
          AND d.dt = v.dt 
          GROUP BY s.EmployeeID
      As a stored procedure, the inner date query would look like this: 
      CREATE PROCEDURE dbo.dateRange 
          @sDate SMALLDATETIME,  
          @eDate SMALLDATETIME  
          SET NOCOUNT ON 
          SELECT dt 
              FROM dbo.Calendar  
              WHERE dt BETWEEN @sDate AND @eDate  
      But, as mentioned, this couldn't easily be used in a more complex query as a derived table or subquery. For that, we'd need to implement it as a function. 
    • Multi-statement table-valued functions 
      Like scalar functions, these act like a traditional function and are called once per row. Sometimes, that is the intent; for example, take a look at the function used to break apart a comma-separated list (Article #2248).  
      However, in most cases, this is the kind of thing that is better handled at the client/presentation layer, not only due to performance concerns, but also because it's just not a relational kind of operation. I'd be interested to hear of any examples of scalar- or multi-statement UDFs that have some benefit being applied in the SELECT, WHERE, GROUP BY or ORDER BY clauses.

Now, having said all that, there is a difference between client->server interaction, and interaction within SQL Server itself. All client->server interactions with SQL Server, whether they be from an application or an end user in Query Analyzer, should be through stored procedures. Processes outside of SQL Server should not have direct access to tables, views or functions. So my basic ground rule is, for anything client-facing, build a stored procedure. Whether that stored procedure accesses tables, views or UDFs behind the scenes, rests entirely on the points above that are relevant to your environment, and the testing you perform in order to yield which implementation works best. 
Note that due to time constraints and the availability of warm bodies, my ideal scenario is not always possible--but it can always be a goal on the radar screen.

Related Articles

How do I build a query with optional parameters?
How do I calculate the median in a table?
How do I create a store locator feature?
How do I deal with MEMO, TEXT, HYPERLINK, and CURRENCY columns?
How do I deal with multiple resultsets from a stored procedure?
How do I debug my SQL statements?
How do I determine if a column exists in a given table?
How do I enable or disable connection pooling?
How do I enumerate through the DSNs on a machine?
How do I find a stored procedure containing <text>?
How do I get a list of Access tables and their row counts?
How do I get the latest version of the JET OLEDB drivers?
How do I handle alphabetic paging?
How do I handle BIT / BOOLEAN columns?
How do I handle error checking in a stored procedure?
How do I ignore common words in a search?
How do I page through a recordset?
How do I present one-to-many relationships in my ASP page?
How do I prevent duplicates in a table?
How do I prevent my ASP pages from waiting for backend activity?
How do I prevent NULLs in my database from mucking up my HTML?
How do I protect my Access database (MDB file)?
How do I protect my stored procedure code?
How do I protect myself against the W32.Slammer worm?
How do I remove duplicates from a table?
How do I rename a column?
How do I retrieve a random record?
How do I return row numbers with my query?
How do I send a database query to a text file?
How do I simulate an array inside a stored procedure?
How do I solve 'Could not find installable ISAM' errors?
How do I solve 'Operation must use an updateable query' errors?
How do I temporarily disable a trigger?
How do I use a SELECT list alias in the WHERE or GROUP BY clause?
How do I use a variable in an ORDER BY clause?
Should I index my database table(s), and if so, how?
Should I store images in the database or the filesystem?
Should I use a #temp table or a @table variable?
Should I use recordset iteration, or GetRows(), or GetString()?
What are all these dt_ stored procedures, and can I remove them?
What are the limitations of MS Access?
What are the limitations of MSDE?
What are the valid styles for converting datetime to string?
What datatype should I use for my character-based database columns?
What datatype should I use for numeric columns?
What does "ambiguous column name" mean?
What is this 'Multiple-step OLE DB' error?
What is wrong with 'SELECT *'?
What naming convention should I use in my database?
What should I choose for my primary key?
What should my connection string look like?
When should I use CreateObject to create my recordset objects?
Where can I get this 'Books Online' documentation?
Where do I get MSDE?
Which database platform should I use for my ASP application?
Which tool should I use: Enterprise Manager or Query Analyzer?
Why are there gaps in my IDENTITY / AUTOINCREMENT column?
Why can I not 'open a database created with a previous version...'?
Why can't I access a database or text file on another server?
Why can't I use the TOP keyword?
Why do I get 'Argument data type text is invalid for argument [...]'?
Why do I get 'Not enough space on temporary disk' errors?
Why does ASP give me ActiveX errors when connecting to a database?
Should I use COALESCE() or ISNULL()?
Where can I get basic info about using stored procedures?



Created: 3/20/2005 | Last Updated: 4/12/2005 | broken links | helpful | not helpful | statistics
© Copyright 2006, UBR, Inc. All Rights Reserved. (237)


Copyright 1999-2006, All rights reserved.
Finding content
Finding content.  An error has occured...