ADO.NET TUTORIAL

Designing Performance-Optimized .NET Applications

Updated: 30 Jul 2024

Overview

This document presents some general guidelines to help you improve the performance of your .NET applications.

In addition, it contains C# code examples that illustrate best practices for optimizing performance, including:

Determining connection requirements leads to better data source design and more effective use of connection pooling.
Using managed code takes advantage of features in the .NET Framework such as security, automatic memory management, and garbage collection.
Selecting the right .NET objects and methods can optimize resources. For example, choosing whether to use a DataReader or DataSet depends on your production environment. Similarly, using a CommandBuilder object can have drawbacks.
Using the correct transaction model, managing commits, and retrieving and updating data efficiently can reduce network input/output.

NOTE: Many code examples in this document use syntax introduced in the Microsoft .NET Framework 2.0, such as the generic DbProviderFactories. If you are using an earlier version of the .NET Framework or the DataDirect Connect® for.NET 2.2 data providers, refer to Designing Performance-Optimized .NET Applications.

Introduction

Developing performance-oriented .NET applications is not easy. The .NET standard includes only basic guidelines and interface definitions to help programmers develop .NET applications. In addition, ADO.NET data providers do not throw exceptions to say that your code is running too slowly.

Designing a .NET application is a complex process in part because the code can be very data provider-specific. If you are working with multiple databases, you will find differences in the programming concepts for the data providers. Even if you are using the Common Programming Model in ADO.NET 2.0 to write provider independent code, you will need database knowledge to design your application efficiently.

This document contains guidelines compiled by examining the .NET implementations of shipping .NET applications and analyzing the common performance mistakes that we found. The guidelines discuss selecting .NET objects and methods, designing .NET applications, retrieving data, and updating data.

Following these general rules will help you solve some common .NET system performance problems, such as those listed in the following table:

Problem	Solution	See guidelines in…
Disk input/output is slow.	Limit disk input/output.	Designing .NET Applications
Evaluation of complex SQL queries on the database is slow and can reduce concurrency.	Simplify queries.	Selecting .NET Objects and Methods
Excessive calls from the application to the data provider slow performance.	Optimize application-to-data provider interactions.	Selecting .NET Objects and Methods Retrieving Data
Network communication is slow.	Reduce network traffic.	Retrieving Data

Designing .NET Applications

The guidelines in this section will help you to optimize system performance when designing .NET applications.

Using Connection Pooling

Connecting to a database is the single slowest operation inside a data-centric application. That's why connection management is important to application performance. Optimize your application by connecting once and using multiple statement objects, instead of performing multiple connections. Avoid connecting to a data source after establishing an initial connection.

Connection pooling is part of the ADO.NET data provider. Connection pooling lets you reuse active connections. Closing connections in the application code does not close the physical connection to the database. When an application requests a connection, an active connection is reused, thus avoiding the network input/output needed to create a new connection.

Pre-allocate connections using the Min Pool Size connection string option. Decide what connection strings you will need. Remember that each unique connection string creates a new connection pool.

In the following C# code example, three new OracleConnection objects are created, but only two connection pools are required to manage them. Note that the first and second connection strings differ only by the value assigned for the User ID, Password, and Max Pool Size connection string options.

DbProviderFactory Factory = DbProviderFactories.GetFactory("DDTek.Oracle");

DbConnection Conn1 = Factory.CreateConnection();

Conn1.ConnectionString =

"Host=Accounting;Port=1521;User ID=scott;Password=tiger; " +

vService Name=ORCL;Min Pool Size=50";

Conn1.Open();

// Pool A is created and filled with connections to the

// minimum pool size

DbConnection Conn2 = Factory.CreateConnection();

Conn2.ConnectionString =

"Host=Accounting;Port=1521;User ID=Jack;Password=quake; " +

"Service Name=ORCL;Min Pool Size=100";

Conn2.Open();

// Pool B is created because the connections strings differ

DbConnection Conn3 = Factory.CreateConnection();

Conn3.ConnectionString =

"Host=Accounting;Port=1521;User ID=scott;Password=tiger; " +

"Service Name=ORCL;Min Pool Size=50";

Conn3.Open();

// Conn3 is assigned an existing connection that was created in

// Pool A when the pool was created for Conn1

Once created, connection pools are not destroyed until the active process ends or the connection lifetime is exceeded. Maintenance of inactive or empty pools involves minimal system overhead.

Connection and statement handling should be addressed before implementation. Spending time and thoughtfully handling connection management improves application performance and maintainability.

Opening and Closing Connections

If you are not using connection pooling, open connections just before they are needed. Opening them earlier than necessary decreases the number of connections available to other users and can increase the demand for resources.

To keep resources available, explicitly Close the connection as soon as it is no longer needed. If you wait for the garbage collector to implicitly clean up connections that go out of scope, the connections are not made available to the connection pool immediately, tieing up resources that are not being used.

In C#, you can close connections inside a finally block. Code in the finally block always runs, even if an exception occurs. This guarantees explicit closing of connections. For example:

try

{

MyConn.Open();

... // Do some other interesting work

}

catch (Exception ex)

{

// Handle exceptions

}

finally

{

// Close the connection

if (MyConn != null)

MyConn.Close();

}

If you use connection pooling, opening and closing connections is not an expensive operation. Using the Close() method of the data provider's Connection object adds or returns the connection to the connection pool. Remember, however, that closing a connection automatically closes all DataReader objects associated with the connection.

Managing Commits in Transactions

Committing transactions is slow due to the result of disk input/output and potentially, network input/output. Always start a transaction after connecting; otherwise, your applications are in autocommit mode.

What does a commit actually involve? The database server must flush back to disk every data page that contains updated or new data. This is usually a sequential write to a journal file, but nonetheless, is a disk input/output. By default, autocommit is on when connecting to a data source. Autocommit mode usually impairs performance because of the significant amount of disk input/output needed to commit every operation.

Furthermore, some database servers do not provide an autocommit mode natively. For this type of server, the ADO.NET data provider must explicitly issue a COMMIT statement and a BEGIN TRANSACTION for every operation sent to the server. In addition to the large amount of disk input/output required to support autocommit mode, a performance penalty is paid for up to three network requests for every statement issued by an application.

The following code fragment starts a transaction for Oracle:

DbConnection Conn3 = Factory.CreateConnection();

MyConn.ConnectionString = ("Connection String info");

OracleConnection MyConn = new OracleConnection

("Connection String info");

MyConn.Open()

// Start a transaction

OracleTransaction TransId = MyConn.BeginTransaction();

// Enlist a command in the current transaction

OracleCommand OracleToDS = new OracleCommand();

OracleToDS.Transaction = TransId;

...

// Continue on and do more useful work in the transaction

Although using transactions can help application performance, do not take this tip too far. Leaving transactions active can reduce throughput by holding locks on rows for long times, preventing other users from accessing the rows. Commit transactions in intervals that allow maximum concurrency.

Choosing the Right Transaction Model

Many systems support distributed transactions; that is, transactions that span multiple connections. Distributed transactions are substantially slower than local transactions due to the logging and network input/output necessary to communicate between all the components involved in the distributed transaction (the ADO.NET data provider, the transaction monitor, and the database system). Distributed transactions should be used only when transactions must span multiple DBMSs or multiple servers.

Unless distributed transactions are required, avoid using them. Instead, use local transactions when possible.

Using Commands that Retrieve Little or No Data

Commands such as INSERT, UPDATE, and DELETE do not return data. Use these commands with the ExecuteNonQuery method of the Command object. Although you can successfully execute these commands using the ExecuteReader method, the ADO.NET data provider properly optimizes database access for INSERT, UPDATE, and DELETE statements only through the ExecuteNonQuery method.

The following example shows how to insert a row into the EMPLOYEES table using ExecuteNonQuery:

DBConn.Open();

DBTxn = DBConn.BeginTransaction();

// Set the Connection property of the Command object

DBCmd.Connection = DBConn;

// Set the text of the Command to the INSERT statement

DBCmd.CommandText = "INSERT into EMPLOYEES VALUES (15,'HAYES','ADMIN',6,

'17-APR-2002',18000,NULL,4)";

// Set the transaction property of the Command object

DBCmd.Transaction = DBTxn;

// Execute the statement with ExecuteNonQuery, because we are not

// returning results

DBCmd.ExecuteNonQuery();

// Now commit the transaction

DBTxn.Commit();

// Close the connection

DBConn.Close();

Use the ExecuteScalar method of the Command object to return a single value, such as a sum or a count, from the database. The ExecuteScalar method returns only the value of the first column of the first row of the result set. Once again, you could use the ExecuteReader method to successfully execute such queries, but by using the ExecuteScalar method, you instruct the ADO.NET data provider to optimize for a result set that consists of a single row and a single column. By doing so, the data provider can avoid a lot of overhead and improve performance. The following example shows how to retrieve the count of a group:

// Retrieve the number of employees who make more than $50000

// from the EMPLOYEES table

// Open connection to Sybase database

SybaseConnection Conn;

Conn = new SybaseConnection("host=bowhead;port=4100;User ID=test01;

Password=test01;Database Name=Accounting");

Conn.Open();

// Make a command object

SybaseCommand salCmd = new SybaseCommand("SELECT count(sal) FROM" +

"EMPLOYEES WHERE sal>'50000'",Conn);

try

{

int count = (int)salCmd.ExecuteScalar();

}

catch (Exception ex)

{

// Display any exceptions in a messagebox

MessageBox.Show (ex.Message);

}

// Close the connection

Conn.Close();

Using Commands Multiple Times

Choosing whether to use the Command.Prepare method can have a significant positive (or negative) effect on query execution performance. The Command.Prepare method instructs the underlying data provider to optimize for multiple executions of statements that use parameter markers. Note that it is possible to prepare any command regardless of which execution method is used (ExecuteReader, ExecuteNonQuery, or ExecuteScalar).

Consider the case where an ADO.NET data provider implements Command.Prepare by creating a stored procedure on the server that contains the prepared statement. Creating stored procedures involves substantial overhead, but the statement can be executed multiple times. Although creating stored procedures is performance-expensive, execution of that statement is minimized because the query is parsed and optimization paths are stored at create procedure time. Applications that execute the same statement multiples times can benefit greatly from calling Command.Prepare and then executing that Command multiple times.

However, using Command.Prepare for a statement that is executed only once results in unnecessary overhead. Furthermore, applications that use Command.Prepare for large single execution query batches exhibit poor performance. Similarly, applications that always use Command.Prepare or never use Command.Prepare do not perform as well as those that use a logical combination of prepared and unprepared statements.

Using Native Managed Data Providers

Bridges into unmanaged code, that is, code outside the .NET environment, adversely affect performance. Calling unmanaged code from managed code causes the CLR (Common Language Runtime) to make additional checks on calls to the unmanaged code, which impacts performance (see "The Importance of Managed Code for .NETPerformance and Security" for more information).

The .NET CLR is a very efficient and highly tuned environment. By using 100% managed code so that your .NET assemblies run inside the CLR, you can take advantage of the numerous built-in services to enhance the performance of your managed application and your staff. The CLR provides automatic memory management, so developers don't have to spend time debugging memory leaks. Automatic lifetime control of objects includes garbage collection, scalability features, and support for side-by-side versions. In addition, the .NET Framework security enforces security restrictions on managed code that protects the code and data from being misused or damaged by other code. An administrator can define a security policy to grant or revoke permissions on an enterprise, a machine, an assembly, or a user level.

However, many ADO.NET data provider architectures must bridge outside the CLR into native code to establish network communication with the database server. The overhead and processing required to enter this bridge is slow in the current version of the CLR.

Depending on your architecture, you may not realize that that the underlying ADO.NET data provider is incurring this security risk and performance penalty. Be careful when choosing an ADO.NET data provider that advertises itself as a 100% or pure managed code data provider. If the "Managed Data Provider" requires unmanaged database clients or other unmanaged pieces, it is not a 100% managed data access solution. Only a few vendors produce true managed code providers that implement their entire stack as a managed component.

Selecting .NET Objects and Methods

The guidelines in this section will help you to optimize system performance when selecting and using .NET objects and methods.

Choosing Between a DataSet and DataReader

Whether you should connect with a DataReader or DataSet depends on your production environment. The DataReader uses more resources on the database server while the DataSet ties up more resources on the client.

If you need to retrieve many records rapidly, use a DataReader. The DataReader object is fast, returning a fire hose of read-only data from the server, one record at a time. In addition, retrieving results with a DataReader requires significantly less memory than creating a DataSet. The DataReader does not allow random fetching, nor does it allow for updating the data. However, ADO.NET data providers optimize their DataReaders for efficiently fetching large amounts of data.

In contrast, the DataSet object is a cache of disconnected data stored in memory on the client. In effect, it is a small database in itself. Because the DataSet contains all of the data that has been retrieved, you have more options in the way you can process the data. You can randomly choose records from within the DataSet and update/insert/delete records at will. You can also manipulate relational data as XML. This flexibility provides impressive functionality for any application, but comes with a relatively high cost in memory consumption. In addition to keeping the entire result set in memory, the DataSet maintains both the original and the changed data, which leads to even higher memory usage. Using DataSets with very large result sets drastically reduces the scalability of the application.

Avoiding the CommandBuilder Object

It is tempting to use a CommandBuilder object because it generates SQL statements and can save the developer time when coding a new application that uses DataSets. However, this shortcut can have a negative effect on performance. Because of concurrency restrictions, the CommandBuilder can generate highly inefficient SQL statements. For example, suppose you have a table named EMPLOYEES, which is an 8-column table with simple employee records. A CommandBuilder would generate the following UPDATE statement:

CommandText: "UPDATE EMPLOYEES SET EMPNO = ?, ENAME = ?, JOB = ?,

MGR = ?, HIREDATE = ?, SAL = ?, COMM = ?, DEPT = ? WHERE

( (EMPNO = ?) AND (ENAME = ?) AND (JOB = ?) AND

((MGR IS NULL AND ? IS NULL) OR (MGR = ?)) AND (HIREDATE = ?)

AND (SAL = ?) AND ((COMM IS NULL AND ? IS NULL) OR (COMM = ?))

AND (DEPT = ?) )"

The end-user can often write more efficient UPDATE and DELETE statements than those that the CommandBuilder generates. A programmer who knows the underlying database schema and the primary key for the EMP table (EMPNO column), can code the same UPDATE statement as follows:

UPDATE EMPLOYEES SET EMPNO = ?, ENAME = ?, JOB = ?, MGR = ?,

HIREDATE = ?, SAL = ?, COMM = ?, DEPT = ? WHERE EMPNO = ?

The latter statement runs more efficiently on the database server than the statement generated by the CommandBuilder.

Another drawback is also implicit in the design of the CommandBuilder object. The CommandBuilder must generate statements at runtime. Each time a DataAdapter.Update method is called, the CommandBuilder must analyze the contents of the result set and generate UPDATE, INSERT, and DELETE statements for the DataAdapter. By explicitly specifying the UPDATE, INSERT, and DELETE statements for the DataAdapter, the programmer can avoid this extra processing time.

Using Parameter Markers as Arguments to Stored Procedures

When calling stored procedures, always use parameter markers for argument markers instead of using literal arguments. ADO.NET data providers can call stored procedures on the database server either by executing the procedure the same way as any other SQL query or by optimizing the execution by invoking a Remote Procedure Call (RPC) directly into the database server. When you execute the stored procedure as a SQL query, the database server parses the statement, validates the argument types, and converts the arguments into the correct data types.

Remember that SQL is always sent to the database server as a character string, for example, "getCustName (12345)". In this case, even though the application programmer might assume that the only argument to getCustName is an integer, the argument is actually passed inside a character string to the server. The database server parses the SQL query, consults database metadata to determine the parameter contract of the procedure, isolates the single argument value 12345, and converts the string '12345' into an integer value before finally executing the procedure as a SQL language event.

Invoking an RPC inside the database server avoids the overhead of using a SQL character string. Instead, an ADO.NET data provider constructs a network packet that contains the parameters in their native data type formats and executes the procedure remotely. To use stored procedures correctly, set the CommandText property of the Command object to the name of the stored procedure. Then, set the CommandType property of the command to StoredProcedure. Finally, pass the arguments to the stored procedure using parameter objects. Do not code the literal arguments into the CommandText.

Case 1

SybaseCommand DBCmd = new SybaseCommand("getCustName(12345)", Conn);

newSybaseDataReader myDataReader;

myDataReader = DBCmd.ExecuteReader();

In this case, the stored procedure cannot be optimized to use a server-side RPC. The database server must treat the SQL request as a normal language event, which includes parsing the statement, validating the argument types, and converting the arguments into the correct data types before executing the procedure.

Case 2

SybaseCommand DBCmd = new SybaseCommand("getCustName", Conn);

DBCmd.Parameters.Add("param1",SybaseDbType.Int,10,""").Value = 12345

myDataReader.CommandType = CommandType.StoredProcedure;

myDataReader = DBCmd.ExecuteReader();

In Case 2, the stored procedure can be optimized to use a server-side RPC. Because the application avoids literal arguments and calls the procedure by specifying all arguments as parameters, the ADO.NET data provider can optimize the execution by invoking the stored procedure directly inside the database as an RPC. This case avoids SQL language processing on the database server and execution time is greatly improved.

Retrieving Data

To retrieve data efficiently, return only the data that you need, and choose the most efficient method of doing so. The guidelines in this section will help you to optimize system performance when retrieving data with .NET applications.

Retrieving Long Data

Unless it is necessary, do not request long data because retrieving long data across a network is slow and resource-intensive. Remember that when you use a DataSet, all data is retrieved from the data source, even if you never use it.

Although the best method is to exclude long data from the select list, some applications do not formulate the select list before sending the query to the ADO.NET data provider (that is, some applications send SELECT * FROM table name ...). If the select list contains long data, most ADO.NET data providers must retrieve that data at fetch time, even if the application does not ultimately bind the long data result columns to display to the user. When possible, try to implement a method that limits the number of columns that are retrieved.

Most users do not want to see long data. If the user does want to process these result items, the application can query the database again, specifying only the long columns in the select list. This method allows the average user to retrieve the result set without having to pay a high performance penalty for network traffic.

Consider the query:

SELECT * FROM EMPLOYEES WHERE SSID = '999-99-2222'

An application might only want to retrieve an employee's name and address. But, remember that an ADO.NET data provider has no idea which result columns an application might be trying to retrieve when the query is executed. A data provider only knows that an application can request any of the result columns. When the ADO.NET data provider processes the fetch request, it will most likely return at least one, if not more, result rows across the network from the database server. In this case, a result row will contain all the column values for each row (including an employee picture if the EMPLOYEES table contains such a column). Limiting the select list to contain only the name and address columns results in decreased network traffic and a query that performs faster at runtime.

Reducing the Size of Data Retrieved

To reduce network traffic and improve performance, you can reduce the size of any data being retrieved to some manageable limit by using a database-specific command. For example, an Oracle data provider might let you limit the number of bytes of data the connection uses to fetch multiple rows. A Sybase data provider might let you limit the number of bytes of data that can be returned from a single IMAGE column in a result set. For example, with Microsoft SQL Server and Sybase, you can issue "Set TEXTSIZE n" on any connection, where n sets the maximum number of bytes that will be returned from any TEXT or IMAGE column.

If the data provider allows you to define the packet size, use the smallest packet size that meets your needs. In addition, be careful to return only the rows you need. If you return five rows when you only need two rows, performance is decreased, especially if the unnecessary rows include long data.

Particularly when using a DataSet, be sure to use a WHERE clause with every SELECT statement to limit the amount of data that is retrieved. Even when using a WHERE clause, a SELECT statement that does not adequately restrict the request could return hundreds of rows of data. For example, if you want the complete row of data from the EMPLOYEES table for each manager hired in recent years, you might be tempted to issue the following statement and then, in your application code, filter out the rows of employees who are not managers:

SELECT * FROM EMPLOYEES WHERE hiredate > 2000

However, suppose the EMPLOYEES table contains a PHOTOGRAPH column. Retrieving all the extra rows could be extremely expensive. Let the database filter them for you and avoid sending all the extra data that you do not need across the network. A better request further limits the data returned and improves performance:

SELECT * FROM EMPLOYEES WHERE hiredate > 2003 and job_title='Manager'

Choosing the Right Data Type

Advances in processor technology have brought significant improvements to the way that operations such as floating-point math are handled. However, retrieving and sending certain data types is still expensive when the active portion of your application will not fit into on-chip cache. When you are working with data on a large scale, it is still important to select the data type that can be processed most efficiently. For example, integer data is processed at least twice as fast as floating-point data. Floating-point data is defined according to internal database-specific formats, usually in a compressed format. The data must be decompressed and converted into a different format so that it can be processed by the wire protocol.

Processing time is shortest for character strings, followed by integers, which usually require some conversion or byte ordering.

Updating Data

This section provides guidelines to help you to optimize system performance when updating data in databases with an ADO.NET data provider.

Synchronizing Changes With the Data Source

You must build the update logic into the DataAdapter for synchronizing the changes with the data source (or let the CommandBuilder do it). By using (in the WHERE clause) only the columns that guarantee a unique row in the table, your UPDATE/INSERT/DELETE statements will be most efficient. The following table shows the update mechanisms that you can use for different databases:

Database	Update Mechanism
DB2	Primary key, Identity
Oracle	Rowid
SQL Server	Primary key, Identity
Sybase	Primary key, Identity

The following example shows the application flow for updating DataSets, using Oracle's Rowid as the update mechanism:

// Create the DataAdapter and DataSets

OracleCommand DbCmd = new OracleCommand

(?"SELECT rowid, deptid, deptname FROM department", DBConn);

myDataAdapter = new OracleDataAdapter();

myDataAdapter.SelectCommand = DBCmd;

myDataAdapter.Fill(myDataSet,"Departments");

// Build the Update rules

// Specify how to update data in the data set

myDataAdapter.UpdateCommand = new OracleCommand("Update department" +

"set deptname = ?, deptid = ? WHERE rowid =?", DBConn);

// Bind parameters

myDataAdapter.UpdateCommand.Parameters.Add

("param1", OracleDbType.VarChar,100,"deptname");

"myDataAdapter.UpdateCommand.Parameters.Add(vparam2",

OracleDbType.Number,4,"deptid";

myDataAdapter.UpdateCommand.Parameters.Add("param3",

OracleDbType.Number, 4, "rowid");

In this example, performance of the queries on the Oracle server improves because the WHERE clause includes only the Rowid as a search condition.

Conclusion

With thoughtful design and implementation, the performance of .NET applications can be improved. By retrieving only required data, selecting functions that optimize performance, using managed code, and by properly managing connections, transactions, and updates, your applications can run more efficiently and generate less network traffic.

DataDirect Technologies offers the following ADO.NET data providers built with 100% managed code that support the .NET Framework Version 2.0:

DataDirect Connect for ADO.NET 3.0 DB2 data provider
DataDirect Connect for ADO.NET 3.0 Oracle data provider
DataDirect Connect for ADO.NET 3.0 SQL Server data provider
DataDirect Connect for ADO.NET 3.0 Sybase data provider

Existing code written for earlier versions of the .NET Framework and earlier versions of DataDirect Connect for .NET is compatible with the 3.0 version of the data providers. Note that the applications must be re-compiled using the .NET Framework 2.0.

However, if your applications require Windows 98 and Windows Me and/or the .NET Framework 1.x, you can use the DataDirect Connect for .NET 2.2 data providers, which DataDirect will continue to make available.

Designing Performance-Optimized .NET Applications

Overview

Introduction

Designing .NET Applications

Using Connection Pooling

Opening and Closing Connections

Managing Commits in Transactions

Choosing the Right Transaction Model

Using Commands that Retrieve Little or No Data

Using Commands Multiple Times

Using Native Managed Data Providers

Selecting .NET Objects and Methods

Choosing Between a DataSet and DataReader

Avoiding the CommandBuilder Object

Using Parameter Markers as Arguments to Stored Procedures

Case 1

Case 2

Retrieving Data

Retrieving Long Data

Reducing the Size of Data Retrieved

Choosing the Right Data Type

Updating Data

Synchronizing Changes With the Data Source

Conclusion

Contents

Connect any application to any data source anywhere

Need additional help with your product?