The guidelines in this section will help you to optimize system performance when designing .NET applications.
Using Connection Pooling
Connecting to a database is the single slowest operation inside a data-centric application. That's why connection management is important to application performance. Optimize your application by connecting once and using multiple statement objects, instead of performing multiple connections. Avoid connecting to a data source after establishing an initial connection.
Connection pooling is part of the ADO.NET data provider. Connection pooling lets you reuse active connections. Closing connections in the application code does not close the physical connection to the database. When an application requests a connection, an active connection is reused, thus avoiding the network input/output needed to create a new connection.
Pre-allocate connections using the Min Pool Size connection string option. Decide what connection strings you will need. Remember that each unique connection string creates a new connection pool.
In the following C# code example, three new OracleConnection objects are created, but only two connection pools are required to manage them. Note that the first and second connection strings differ only by the value assigned for the User ID, Password, and Max Pool Size connection string options.
DbProviderFactory Factory = DbProviderFactories.GetFactory(
"DDTek.Oracle"
);
DbConnection Conn1 = Factory.CreateConnection();
Conn1.ConnectionString =
"Host=Accounting;Port=1521;User ID=scott;Password=tiger; "
+
vService Name=ORCL;Min Pool Size=50";
Conn1.Open();
// Pool A is created and filled with connections to the
// minimum pool size
DbConnection Conn2 = Factory.CreateConnection();
Conn2.ConnectionString =
"Host=Accounting;Port=1521;User ID=Jack;Password=quake; "
+
"Service Name=ORCL;Min Pool Size=100"
;
Conn2.Open();
// Pool B is created because the connections strings differ
DbConnection Conn3 = Factory.CreateConnection();
Conn3.ConnectionString =
"Host=Accounting;Port=1521;User ID=scott;Password=tiger; "
+
"Service Name=ORCL;Min Pool Size=50"
;
Conn3.Open();
// Conn3 is assigned an existing connection that was created in
// Pool A when the pool was created for Conn1
Once created, connection pools are not destroyed until the active process ends or the connection lifetime is exceeded. Maintenance of inactive or empty pools involves minimal system overhead.
Connection and statement handling should be addressed before implementation. Spending time and thoughtfully handling connection management improves application performance and maintainability.
Opening and Closing Connections
If you are not using connection pooling, open connections just before they are needed. Opening them earlier than necessary decreases the number of connections available to other users and can increase the demand for resources.
To keep resources available, explicitly Close the connection as soon as it is no longer needed. If you wait for the garbage collector to implicitly clean up connections that go out of scope, the connections are not made available to the connection pool immediately, tieing up resources that are not being used.
In C#, you can close connections inside a finally block. Code in the finally block always runs, even if an exception occurs. This guarantees explicit closing of connections. For example:
try
{
MyConn.Open();
...
// Do some other interesting work
}
catch
(Exception ex)
{
// Handle exceptions
}
finally
{
// Close the connection
if
(MyConn !=
null
)
MyConn.Close();
}
If you use connection pooling, opening and closing connections is not an expensive operation. Using the Close() method of the data provider's Connection object adds or returns the connection to the connection pool. Remember, however, that closing a connection automatically closes all DataReader objects associated with the connection.
Managing Commits in Transactions
Committing transactions is slow due to the result of disk input/output and potentially, network input/output. Always start a transaction after connecting; otherwise, your applications are in autocommit mode.
What does a commit actually involve? The database server must flush back to disk every data page that contains updated or new data. This is usually a sequential write to a journal file, but nonetheless, is a disk input/output. By default, autocommit is on when connecting to a data source. Autocommit mode usually impairs performance because of the significant amount of disk input/output needed to commit every operation.
Furthermore, some database servers do not provide an autocommit mode natively. For this type of server, the ADO.NET data provider must explicitly issue a COMMIT statement and a BEGIN TRANSACTION for every operation sent to the server. In addition to the large amount of disk input/output required to support autocommit mode, a performance penalty is paid for up to three network requests for every statement issued by an application.
The following code fragment starts a transaction for Oracle:
DbConnection Conn3 = Factory.CreateConnection();
MyConn.ConnectionString = (
"Connection String info"
);
OracleConnection MyConn =
new
OracleConnection
(
"Connection String info"
);
MyConn.Open()
// Start a transaction
OracleTransaction TransId = MyConn.BeginTransaction();
// Enlist a command in the current transaction
OracleCommand OracleToDS =
new
OracleCommand();
OracleToDS.Transaction = TransId;
...
// Continue on and do more useful work in the transaction
Although using transactions can help application performance, do not take this tip too far. Leaving transactions active can reduce throughput by holding locks on rows for long times, preventing other users from accessing the rows. Commit transactions in intervals that allow maximum concurrency.
Choosing the Right Transaction Model
Many systems support distributed transactions; that is, transactions that span multiple connections. Distributed transactions are substantially slower than local transactions due to the logging and network input/output necessary to communicate between all the components involved in the distributed transaction (the ADO.NET data provider, the transaction monitor, and the database system). Distributed transactions should be used only when transactions must span multiple DBMSs or multiple servers.
Unless distributed transactions are required, avoid using them. Instead, use local transactions when possible.
Using Commands that Retrieve Little or No Data
Commands such as INSERT, UPDATE, and DELETE do not return data. Use these commands with the ExecuteNonQuery method of the Command object. Although you can successfully execute these commands using the ExecuteReader method, the ADO.NET data provider properly optimizes database access for INSERT, UPDATE, and DELETE statements only through the ExecuteNonQuery method.
The following example shows how to insert a row into the EMPLOYEES table using ExecuteNonQuery:
DBConn.Open();
DBTxn = DBConn.BeginTransaction();
// Set the Connection property of the Command object
DBCmd.Connection = DBConn;
// Set the text of the Command to the INSERT statement
DBCmd.CommandText = "INSERT into EMPLOYEES VALUES (15,
'HAYES'
,
'ADMIN'
,6,
'17-APR-2002'
,18000,NULL,4)";
// Set the transaction property of the Command object
DBCmd.Transaction = DBTxn;
// Execute the statement with ExecuteNonQuery, because we are not
// returning results
DBCmd.ExecuteNonQuery();
// Now commit the transaction
DBTxn.Commit();
// Close the connection
DBConn.Close();
Use the ExecuteScalar method of the Command object to return a single value, such as a sum or a count, from the database. The ExecuteScalar method returns only the value of the first column of the first row of the result set. Once again, you could use the ExecuteReader method to successfully execute such queries, but by using the ExecuteScalar method, you instruct the ADO.NET data provider to optimize for a result set that consists of a single row and a single column. By doing so, the data provider can avoid a lot of overhead and improve performance. The following example shows how to retrieve the count of a group:
// Retrieve the number of employees who make more than $50000
// from the EMPLOYEES table
// Open connection to Sybase database
SybaseConnection Conn;
Conn =
new
SybaseConnection("host=bowhead;port=4100;User ID=test01;
Password=test01;Database Name=Accounting");
Conn.Open();
// Make a command object
SybaseCommand salCmd =
new
SybaseCommand(
"SELECT count(sal) FROM"
+
"EMPLOYEES WHERE sal>'50000'"
,Conn);
try
{
int
count = (
int
)salCmd.ExecuteScalar();
}
catch
(Exception ex)
{
// Display any exceptions in a messagebox
MessageBox.Show (ex.Message);
}
// Close the connection
Conn.Close();
Using Commands Multiple Times
Choosing whether to use the Command.Prepare method can have a significant positive (or negative) effect on query execution performance. The Command.Prepare method instructs the underlying data provider to optimize for multiple executions of statements that use parameter markers. Note that it is possible to prepare any command regardless of which execution method is used (ExecuteReader, ExecuteNonQuery, or ExecuteScalar).
Consider the case where an ADO.NET data provider implements Command.Prepare by creating a stored procedure on the server that contains the prepared statement. Creating stored procedures involves substantial overhead, but the statement can be executed multiple times. Although creating stored procedures is performance-expensive, execution of that statement is minimized because the query is parsed and optimization paths are stored at create procedure time. Applications that execute the same statement multiples times can benefit greatly from calling Command.Prepare and then executing that Command multiple times.
However, using Command.Prepare for a statement that is executed only once results in unnecessary overhead. Furthermore, applications that use Command.Prepare for large single execution query batches exhibit poor performance. Similarly, applications that always use Command.Prepare or never use Command.Prepare do not perform as well as those that use a logical combination of prepared and unprepared statements.
Using Native Managed Data Providers
Bridges into unmanaged code, that is, code outside the .NET environment, adversely affect performance. Calling unmanaged code from managed code causes the CLR (Common Language Runtime) to make additional checks on calls to the unmanaged code, which impacts performance (see "The Importance of Managed Code for .NETPerformance and Security" for more information).
The .NET CLR is a very efficient and highly tuned environment. By using 100% managed code so that your .NET assemblies run inside the CLR, you can take advantage of the numerous built-in services to enhance the performance of your managed application and your staff. The CLR provides automatic memory management, so developers don't have to spend time debugging memory leaks. Automatic lifetime control of objects includes garbage collection, scalability features, and support for side-by-side versions. In addition, the .NET Framework security enforces security restrictions on managed code that protects the code and data from being misused or damaged by other code. An administrator can define a security policy to grant or revoke permissions on an enterprise, a machine, an assembly, or a user level.
However, many ADO.NET data provider architectures must bridge outside the CLR into native code to establish network communication with the database server. The overhead and processing required to enter this bridge is slow in the current version of the CLR.
Depending on your architecture, you may not realize that that the underlying ADO.NET data provider is incurring this security risk and performance penalty. Be careful when choosing an ADO.NET data provider that advertises itself as a 100% or pure managed code data provider. If the "Managed Data Provider" requires unmanaged database clients or other unmanaged pieces, it is not a 100% managed data access solution. Only a few vendors produce true managed code providers that implement their entire stack as a managed component.