Cool Stuff: OpenEdge 11 Multi-tenancy

Cool Stuff: OpenEdge 11 Multi-tenancy

October 25, 2011 0 Comments

One of the most important and useful new capabilities in the upcoming OpenEdge 11 release (planned for December 2011) is direct support for multi-tenancy in the database. What is multi-tenancy and why would I want one? Read on and I'll tell you.

The notion of multi-tenancy arises in the field of Software-as-a-Service (aka SaaS). When a vendor offers an application to be used as a "service", its customers do not have to buy a computer system to run the application on, nor do they have to have staff trained in the care and feeding of the system or backing up the data. Instead, customers subscribe to the service and the vendor does all of that. The customer simply uses the application over the Internet and has no idea where it is or what computer it is in. When you use a search engine, you don't know where it is and it doesn't matter. It's the same with SaaS applications. A SaaS application you are probably already familiar with is email. Google, Microsoft, Yahoo, and most ISP's run email servers for their subscribers. You don't have to know anything except where to log in. Someone else is responsible for everything to do with operating and maintaining the service. You just receive, read, compose and send your mail.

All this is very easy, even trivial, for the Software-as-a-Service customer. But what about the poor vendor? The vendor has to do all that messy IT stuff. Is he going to have a dedicated computer for each of the customers, as they would if they were running the application on their own computer? No, of course not. The vendor is going do everything possible to hold their operating costs as low as they can. That is where multi-tenancy comes in. Each of the SaaS vendor's customers is called a "tenant", a word taken from the rental housing market. In an apartment block we can have many tenants in the same building, all living in separate spaces. Similarly, we can put many application tenants into the same computer.

With a number of tenants sharing the same computer, the SaaS vendor has fewer machines to buy, fewer machines to take care of, and less work to do. No tenant can see the other tenants or their data and in fact do not know anything about them or their existence. Quite a number of the SaaS vendors do this. Since it is very simple to with OpenEdge, many of them have created a separate database for each tenant. But that means you have to do backups, schema changes, and other maintenance functions individually for each tenant's database. It would be much better if tenants could share the database too. We call this database multi-tenancy.

In OpenEdge 10 and before, with quite a bit of work, you can achieve database multi-tenancy. Some of our partners have rolled up their sleeves and done it. What you need to do is this:

  • First, add a "tenant identifier" column to every table. This tenant id column is a column that contains a unique identification number, perhaps an integer, assigned to each tenant. The value indicates which tenant owns the data in each row of the table.
  • Next, add the tenant id column to every index as the leading key component.
  • Create a table to store the tenant names and their tenant id's and assign an id to each tenant.
  • Then, go through all the code in your application and everywhere that a new table row is created, assign the correct value to the row's tenant id column.
  • You also have to invent a way to keep track of which tenant id is currently in effect.
  • Finally, go through all the code in the application again and find all the queries. Modify each WHERE clause to add a term that says "(tenantId = currentTenant) and ". Don't forget CAN-FIND. And make sure to add the tenant id term for each table in a multi-table query.

Once you do all those things, you can have database multi-tenancy. But in addition from the obvious fact that taking this approach is labor-intensive and invasive, there are a number of other disadvantages. I will list just a few here:

0) It is error prone. If you make a mistake when you change the code to do multi-tenancy, the wrong tenant's data will be returned. Or if you forget when you or another developer is fixing a bug, the wrong tenant's data will be returned.

1) Even if you use Type ii data areas, rows from multiple tenants will be commingled in the same data blocks and the same table's allocation clusters. This negates many of the advantages of using Type ii data areas. You get lower I/O efficiency because one tenant will have to read a data block that contains other tenants' data. Your customers will probably have the perception (whether true or not) that commingling their data reduces its security.

2) You can't do per-tenant maintenance easily. How do you reindex just one tenant's data?

3) How do you restore one tenant's data when they do something foolish like run end of month processing in the middle of the month?

4) You can't do per-tenant disk space allocation or disk space usage tracking very easily, if at all.

5) There is lock interference among tenants. Table-locks can lock out all the other tenants.

In spite of the disadvantages, I think the advantages far outweigh them and it is worth considering the use of this approach. But what if you could eliminate all the disadvantages? What if you could have your cake and eat it too? That's where OpenEdge 11 comes in. All that work I said you have to do? Gone. All those disadvantages I listed? All gone. OpenEdge 11 does all the hard work.

With the OpenEdge 11 RDBMS, database multi-tenancy is an inbuilt feature. The database knows what tenants are, who they are, and where their data are. It knows where to put new data and where to get existing data for each and every tenant. You do not have to modify all of the data access parts of your application. In fact, you shouldn't have to change much of anything! Most of your code should just work.

Well, all right, maybe you do have to make a few changes. Those changes have to do with how a user logs in to the application and the database and how the user's identify is verified. As I said, the database knows about tenants. But you will have to tell it which tenant a user belongs to. In the 4GL we use something called the CLIENT-PRINCIPAL to help in detraining that.

The CLIENT-PRINCIPAL (aka the "cp") is an inbuilt and extensible security token that we added to OpenEdge a few years ago, in the 10.1 release. The cp encapsulates a user's identity once it has been validated. In OpenEdge 11 we use the cp (with some enhancements) to encapsulate both user identity and tenant identity. Depending on which cp token is currently in effect in the 4GL runtime, the database uses the tenant id to decide what data to return for a query. For code running in AppServers and accessing the database on behalf of different users at different times, the AppServer can easily switch the cp that is in effect to that of the user that made the AppServer call.

To get ready for OpenEdge 11, you should learn about the CLIENT-PRINCIPAL. The name may sound a bit intimidating but it is really very easy to use. It takes only 3 lines of code to make one and to validate the user's identity. Go and watch the video of Sarah Marshall's Exchange Online 2010 talk over on PSDN.

In the OpenEdge 11 RDBMS, each tenant gets a separate data partition for each multi-tenant table (and not every table has to be made multi-tenant), and each data partition has its own associated index partitions. The tenant id in the cp is used to control which data partition to fetch table rows from and a tenant only gets to see their own data (and data in regular shared tables). We also have a special tenant called the "super tenant", conceptually similar to the UNIX root user, that is allowed to see /all/ the data.

This scheme works really well, is very efficient, and requires very few application changes. There are of course a lot of other things in OpenEdge 11. But I don't have space to talk about them just now and we will have to do that another time.

I hope you will like the new release. It is really cool.

Gus Bjorklund

View all posts from Gus Bjorklund on the Progress blog. Connect with us about all things application development and deployment, data integration and digital business.

Comments are disabled in preview mode.
Latest Stories
in Your Inbox

Subscribe to get all the news, info and tutorials you need to build better business apps and sites

More From Progress
Getting Ahead of the Hybrid Data Curve
Read More
570x321__Top 7 Considerations Before Choosing a Chatbot for Your Enterprise
Top 7 Considerations Before Choosing a Chatbot for Your Enterprise
Read More
232-132_How to Simplify Application Usage & Deployment with Microapps
How to Simplify Application Usage & Deployment with Microapps
Read More