Resource Handling Behind Reusable Services in Cloud Foundry
What does the Service Broker do when it gets these requests?
In one of my previous articles I explained what is a reusable service in cloud foundry and how a reusable service can be created and made available in the marketplace. The central piece is the Service Manager that serves the requests from the cloud controller when users creates, updates, deletes or binds a reusable service instance in cloud foundry. The main question that can arise is "What does the Service Broker do when it gets these requests?"
What does the Service Broker do when it gets these requests?
The answer is only known by the development team who developed the Service Broker. In general we can say creating a service instance is a way to get hold of some kind resource that the developer can use. So, whenever we create a service instance, we create a resource behind the service instance. The service instance represents the resource and hides the actual resource. The Service Broker hides the complexity to provision such resources and its operational details from the end user. The next question in line will be "What is a resource ?"
What is a resource ?
A resource can be anything. It depends on the service that is being offered. A resource can be a user account, a database entry registered to represent the end user, a database itself or a schema in a database management system when the reusable service is offering a database. The resource can be a full fledged SaaS application along with its own database. In this case the reusable service is called a multi-tenant solution because it provides a fullfleged application for each service instance. As for every service instance the reusable service has to maintain an application with its own database each of these applications can be viewed as a tenant that is hosted by the reusable service. Each of these applications will have its own user database along with user management and application login mechanism. A resource for a service instance can be a few licenses or a subscription to use a shared software or service.
As you can see the lifecycle of the resource is tightly coupled with the lifecycle of the service instance. But on the contrary the lifecycle of the service instance is managed by the Cloud Controller or a Service Manager. The Service Manager is a layer between the Cloud Controller and Service Broker so that the Cloud Controller does not have to talk to each Service Broker. The life cycle of the resource is managed by the Service Broker. That is why it's crucial that both the Cloud Controller and Service Manager are always in sync with the Service Brokers. Basically it is the responsibility of the Service Brokers to do so. Together they should correctly produce the same result without any inconsistency. Being said that there is still some scope for the Service Broker to efficiently manage the resource especially when the resource is costly in terms of CPU, storage and memory utilization and overall in terms of Total Cost of Ownership (TCO). So depending on how costly the resource is the Service Manager has to adopt a few tactics to reduce the underlying cost.
Handling Resource Creation
At the beginning when a service instance creation is being triggered the Service Manager will pass the request to the Service Broker alone with the service instance Id and service plan Id. When the resource is lightweight (e.g. a database entry) the Service Broker can implement this request in a synchronous way and immediately create the resource and return back the status as successful to the Service Manager. Till the time the Service Manager and Cloud Controller will show the status of the service instance, creation is in progress. When the Service Manager gets the confirmation from the Service Broker that the resource has been created successfully the status of the service instance is changed to created state. When the resource creation takes time it is better to implement the request in an asynchronous manner. The Open Service Broker API that the Service Brokers have to implement supports that. In this case the Service Manager will periodically poll the service Broker to know the status of the resource creation. When the instance creation is in progress the Service Broker will always replay an intermediate status when the instance is created the Service Broker will reply with a success. The HTTP status codes are being used to communicate the status.
The Service Broker also needs to remember that it needs to associate the resource with the service instance for lifecycle management reasons. That is why it is mandatory that the Service Broker creates a unique identifier or a combination that can be used to uniquely identify the resource. The Service Broker should link the resource to the service instance after the resource is created successfully and mark the status of the service instance as success as well by replying to the request from Service Manager. The Service Broker should not mark the status of the service instance as success before the resource is fully functional because users may start to access it as soon as they see the status of the service instance creation is a success. We can only afford to adopt a lazy creation when the user starts to access the resource when the resource creation is done within a few milliseconds. Then the Service Broker needs to remember that such a resource creation had been asked before.
Handling Resource Update
When it is allowed that the service instance can be updated, that means the resource also has to be updated. In general, updating means updating a few configurations of the resource. This also can be implemented synchronously or asynchronously by the Service Broker. The user could expect that the resource is operational during update, if that is not the case you need to manage expectations of the user.
Handling Resource Deletion
When the service instance is deleted, the trigger will come to the Service Broker again. Now the Service Broker can first finish deletion of the resource and then signal the Service Manager to delete the service instance. A better approach is not to keep users waiting till the resource is deleted. The Service Broker can detach the linked resource from the service instance and remove the entry of the service instance from its internal database and signal the Service Manager to delete the service instance. And mark the resource somehow to understand that it has to be deleted. This way users do not have to wait to delete their service instance.
Handling Failed Resource Creation
It might happen that the resource creation fails during service instance creation. There must be a retry logic that retries the resource creation from scratch and marks the unsuccessful halfway created resource so that it can be cleaned up later. After a certain number of retries, if still the resource is not created then the Service Broker needs to reply to the Service Manager to mark the service instance creation as failed. At this stage the service instance will be there but will be marked as failed and the users who created it need to delete it. As the Service Broker does not have any influence on the service instance deletion it's wise not to keep the resource linked with the failed service instance. The service broker still needs to keep the reference of the service instance in its database but can remove the link between the service instance and the resource and then mark the resource for cleanup. This can be done because once a service instance is failed there is no way to retry creation. The user has to delete it and create a new service instance. This way the wastage of resources can be reduced and better TCO can be achieved. The deletion of the failed service instance will be executed by the user who created it. If the user is reluctant to do so, the Service Broker at most needs to maintain a database entry to keep the reference of the failed service instance. When the user deletes the failed service instance the request comes to the Service Broker via the Service Manager. The Service Broker sees there is no reference to any resource for the service instance and can instantly remove the entry for the service instance from its database and reply successful deletion to the Service Manager and the Cloud Controller.
Handling Failed Resource Update
When a resource update fails, the resource should not go into an inconsistent state. It should remain in the state before the update request comes. This way we can ensure that the resource is always consistent and usable by the user. Users can retry updates as many times as they like.
Handling Failed Resource Deletion
As I stated earlier, when a resource creation fails the resource has to be marked so that it can be deleted and has to be detached from the service instance. That makes the cleanup independent from what other tasks related to OSB API the Service Manager has to do. The deletion can be done by running a periodic job for example and this task can also be delegated to some other service or backend application other than the Service Broker.
In my opinion these are the ways the Service Brokers can efficiently handle resources and be consistent in state along with the Service Manager and the Cloud Controller.