One of the most important and useful new capabilities in the upcoming OpenEdge 11 release (planned for December 2011) is direct support for multi-tenancy in the database. What is multi-tenancy and why would I want one? Read on and I'll tell you.
The notion of multi-tenancy arises in the field of Software-as-a-Service (aka SaaS). When a vendor offers an application to be used as a "service", its customers do not have to buy a computer system to run the application on, nor do they have to have staff trained in the care and feeding of the system or backing up the data. Instead, customers subscribe to the service and the vendor does all of that. The customer simply uses the application over the Internet and has no idea where it is or what computer it is in. When you use a search engine, you don't know where it is and it doesn't matter. It's the same with SaaS applications. A SaaS application you are probably already familiar with is email. Google, Microsoft, Yahoo, and most ISP's run email servers for their subscribers. You don't have to know anything except where to log in. Someone else is responsible for everything to do with operating and maintaining the service. You just receive, read, compose and send your mail.
All this is very easy, even trivial, for the Software-as-a-Service customer. But what about the poor vendor? The vendor has to do all that messy IT stuff. Is he going to have a dedicated computer for each of the customers, as they would if they were running the application on their own computer? No, of course not. The vendor is going do everything possible to hold their operating costs as low as they can. That is where multi-tenancy comes in. Each of the SaaS vendor's customers is called a "tenant", a word taken from the rental housing market. In an apartment block we can have many tenants in the same building, all living in separate spaces. Similarly, we can put many application tenants into the same computer.
With a number of tenants sharing the same computer, the SaaS vendor has fewer machines to buy, fewer machines to take care of, and less work to do. No tenant can see the other tenants or their data and in fact do not know anything about them or their existence. Quite a number of the SaaS vendors do this. Since it is very simple to with OpenEdge, many of them have created a separate database for each tenant. But that means you have to do backups, schema changes, and other maintenance functions individually for each tenant's database. It would be much better if tenants could share the database too. We call this database multi-tenancy.
In OpenEdge 10 and before, with quite a bit of work, you can achieve database multi-tenancy. Some of our partners have rolled up their sleeves and done it. What you need to do is this:
Once you do all those things, you can have database multi-tenancy. But in addition from the obvious fact that taking this approach is labor-intensive and invasive, there are a number of other disadvantages. I will list just a few here:
0) It is error prone. If you make a mistake when you change the code to do multi-tenancy, the wrong tenant's data will be returned. Or if you forget when you or another developer is fixing a bug, the wrong tenant's data will be returned.
1) Even if you use Type ii data areas, rows from multiple tenants will be commingled in the same data blocks and the same table's allocation clusters. This negates many of the advantages of using Type ii data areas. You get lower I/O efficiency because one tenant will have to read a data block that contains other tenants' data. Your customers will probably have the perception (whether true or not) that commingling their data reduces its security.
2) You can't do per-tenant maintenance easily. How do you reindex just one tenant's data?
3) How do you restore one tenant's data when they do something foolish like run end of month processing in the middle of the month?
4) You can't do per-tenant disk space allocation or disk space usage tracking very easily, if at all.
5) There is lock interference among tenants. Table-locks can lock out all the other tenants.
In spite of the disadvantages, I think the advantages far outweigh them and it is worth considering the use of this approach. But what if you could eliminate all the disadvantages? What if you could have your cake and eat it too? That's where OpenEdge 11 comes in. All that work I said you have to do? Gone. All those disadvantages I listed? All gone. OpenEdge 11 does all the hard work.
With the OpenEdge 11 RDBMS, database multi-tenancy is an inbuilt feature. The database knows what tenants are, who they are, and where their data are. It knows where to put new data and where to get existing data for each and every tenant. You do not have to modify all of the data access parts of your application. In fact, you shouldn't have to change much of anything! Most of your code should just work.
Well, all right, maybe you do have to make a few changes. Those changes have to do with how a user logs in to the application and the database and how the user's identify is verified. As I said, the database knows about tenants. But you will have to tell it which tenant a user belongs to. In the 4GL we use something called the CLIENT-PRINCIPAL to help in detraining that.
The CLIENT-PRINCIPAL (aka the "cp") is an inbuilt and extensible security token that we added to OpenEdge a few years ago, in the 10.1 release. The cp encapsulates a user's identity once it has been validated. In OpenEdge 11 we use the cp (with some enhancements) to encapsulate both user identity and tenant identity. Depending on which cp token is currently in effect in the 4GL runtime, the database uses the tenant id to decide what data to return for a query. For code running in AppServers and accessing the database on behalf of different users at different times, the AppServer can easily switch the cp that is in effect to that of the user that made the AppServer call.
To get ready for OpenEdge 11, you should learn about the CLIENT-PRINCIPAL. The name may sound a bit intimidating but it is really very easy to use. It takes only 3 lines of code to make one and to validate the user's identity. Go and watch the video of Sarah Marshall's Exchange Online 2010 talk over on PSDN.
In the OpenEdge 11 RDBMS, each tenant gets a separate data partition for each multi-tenant table (and not every table has to be made multi-tenant), and each data partition has its own associated index partitions. The tenant id in the cp is used to control which data partition to fetch table rows from and a tenant only gets to see their own data (and data in regular shared tables). We also have a special tenant called the "super tenant", conceptually similar to the UNIX root user, that is allowed to see /all/ the data.
This scheme works really well, is very efficient, and requires very few application changes. There are of course a lot of other things in OpenEdge 11. But I don't have space to talk about them just now and we will have to do that another time.
I hope you will like the new release. It is really cool.
View all posts from Gus Bjorklund on the Progress blog. Connect with us about all things application development and deployment, data integration and digital business.
Copyright © 2018 Progress Software Corporation and/or its subsidiaries or affiliates.
All Rights Reserved.
Progress, Telerik, and certain product names used herein are trademarks or registered trademarks of Progress Software Corporation and/or one of its subsidiaries or affiliates in the U.S. and/or other countries. See Trademarks for appropriate markings.