In the first half of this two part podcast, Rob Steward explains the impact reducing network traffic has on database application performance. The podcast runs for 4:57.
Click on the following link to listen to the podcast: http://blogs.datadirect.com/media/RobSteward_June3_ReducingNetworkTraffic_1.mp3
Well, network traffic is in fact one of the biggest places where you can see bottlenecks in the overall performance and scalability of your applications. This is one of the areas in which your code that you write and that database middleware, those JDBC, or ODBC or ADO.NET driver/ provider can make a huge impact in how they can handle the amount of network traffic. So for example, with some databases you can actually ask the database to not return all the data in every row. For example, if I have a row and column one is ‘Rob’ and column two is ‘Steward.’ And then in row number two, column one is ‘Rob’ and column 2 is ‘Simpson,’ I can ask the database not to return successive values in the same column that are the same name. So in the example that I just gave, the first row may come back as ‘Rob Steward,’ but the second row may come back with some marker in the first column that tells the driver, ‘hey, the data in this column is the same as it was in the previous row’; and then it may return ‘Simpson’ in column two.
So if you think about that on a much larger scale, if I were to return 10,000 records, and typically in real world data some of those columns are going to be repeating. For example, you may have something sorted by address; you may use a clustered index on your database that physically orders the records by address. One of the address fields may be city and another one may be state. Well, instead of sending the string ‘North Carolina’ back 100 times for those 100 addresses that are North Carolina, I may be able to eliminate 99 of those and return some 2 or 4 byte marker that tells me that ‘North Carolina’ is the string for that particular column and is exactly the same as it was in the previous row.
So again, do the math on that, if I have a string field that was 20 characters or 25 bytes and I return those 100 times, that’s 2,000 bytes. But if I use a compression technique that I just talked about, which some databases can do, then I reduce that 2,000 bytes down to 20 + 2 bytes * 99 – I’ll let you do the math, because I need a calculator to do it that fast. But the idea here is that the amount of data and the amount of network traffic is dramatically reduced. Take that on up to a bigger scale where you’ve got bigger string columns, or many more records – it’s not uncommon to have 1 million records in a table, and it’s very common to have a 10,000 row result set – you start to think about gains of a couple of 100 bytes per row, and you’re getting into a very serious amount of reduction of the amount of bytes that actually flow across the network. And of course, it’s very intuitive to say that the less data that we actually have to transfer, the faster everything is going to be.
So there are techniques like that that are available on some databases and in some drivers. But some other things that can happen are a driver or an application may reduce the number of actual network round trips. What I mean is your application or the driver that you’re using sends a packet across the database server, the database processes that and sends you back some response. So that back and forth travel takes some amount of time. Now it depends on your network, but I’ve seen, particularly over a wide area network or if you’re using a VPN something like that across a wide area network, the actual network traffic time is much more significant. It’s a much higher percentage of the overall time than what you spend in your application processing the data, or what the database itself spends processing the data. That network in the middle can actually be the lion’s share of the time involved in the data access. So if you can reduce the amount of data that flows across or if you can reduce the number of trips across there – even if they’re smaller packets – just each round trip takes a certain amount of time. So there are a lot of techniques that we talk about in The Data Access Handbook to reduce those numbers of round trips, and to reduce the actual amount of data that comes across.
View all posts from Rob Steward on the Progress blog. Connect with us about all things application development and deployment, data integration and digital business.
Copyright © 2018 Progress Software Corporation and/or its subsidiaries or affiliates.
All Rights Reserved.
Progress, Telerik, and certain product names used herein are trademarks or registered trademarks of Progress Software Corporation and/or one of its subsidiaries or affiliates in the U.S. and/or other countries. See Trademarks for appropriate markings.