eZPublish - User account limits and solutions

Tags : user , authentication , solution , ezpublish , sso
Maxime THOMAS on 12 of Sep 2011 - 09am

Working on one of my projects with eZ Systems French consultant Jérôme Cohonner, we got an excellent conversation on how users were handled in eZPublish and how sometimes this could lead to some troubles. This post will give you some clues on how important users management can be, what are the limits and some common solutions to get the best way of doing things. I will not talk about SSO or procurement systems as I have already dealt with or as it's out of the scope.

Users management, some concepts

Let's get back to the roots and have a look on principle concepts of AAA :

  • Authentication : the fact that someone can prove that he is who he is by any way of proof : password, certificate, tokens, fingerprint...
  • Authorization : the fact that someone has been given some rights, credentials, habilities to do something.
  • Accounting : the fact that someone activities could be observed, monitored, audited to get data to be exploited after.

Generally, in all IT systems, these concepts are implemented in different ways, together or alone, merged with other systems or not. When an IT solution becomes complex, you will need to provide a strong user management strategy to be sure that all will work together. The strategy is defined by combining different approaches that could be listed like this :

  • Authentication : handle authentication ways, simple to complex, available everywhere in the system
  • Data : handle user data and make it available everywhere in the system
  • Organization : handle an organization of people, available everywhere in the system
  • Access : handle rights, what people are allowed to do

Moreover, all those approaches are submitted to the centralization dilemma : do we need to centralize all those things in one service or not ? If one of these approach is not centralized, do all the software of our solution are able to do the job ?

On each IT projects, choices are done, sometimes depending on the software capabilities, sometimes not. The most important is to know where you want to get the maximum flexibility.

 

eZ Publish and the limits

In eZ Publish, users are stored and considered as content objects which is a choice in itself. It means that the accent has been set on data management before everything. The cool thing with that is that you can handle your users as pages (as they are nodes) and that you can add and remove attributes as you want. Best, data can be versionned. The only thing you have to do is to ensure that the user content class gets the User account datatype.

You can also plug the LDAP Login Handler to access a remote directory. The mechanic is quite good. At the authentication, the user provides its login and password. eZ Publish will try to log this user in the LDAP and if it succeeds, eZ Publish will create an eZ user or update it if it does not exist. Then the user is authenticated and receive eZ Publish credentials.

It's also possible to use the multi locations mechanism to get some flexibility on the role assignment. For example, as you can set a user in several groups, you can give each group a different role so multi located users will inherit all the roles from all their parents. You can look at one of my former post about content design, it explains how to organize your content in eZPublish.

The limits of this model are :

  • Data and user account are in the same place and that the data container is not efficient when there is a lot of users.
  • If user data has to be shared, size and count of data really imports as it has to be managed locally or remotely.
  • The remote model implies a direction on the way data are managed. Data needs a reference that should be unique at one time and on which all other software must refer. It also implies that you will have to have a simple model in your directory as rights must be managed locally.

Some examples :

  • Try to have a user class with 50 class attributes, which is possible if you are storing every information of your users at the same place. Create then 100 000 users, that is quite normal for a big website. Requests that are made against the generic model of eZ Publish are just to slow for standard fetches. Having a lot off attributes in a user is quite common and is resulting from very strong business needs or technical needs. For example, you may need an attribute to avoid to use a directory. This has been explained in one of my former post about content design.
  • Having a very big LDAP with a lot of users with a lot of attributes can be long to synchronize with regular scripts.
  • The LDAP Login Handler is very powerful but a bit tricky to master. If you got a complex LDAP, your LDAP configuration will be crazy. Moreover, there's no script to impact an LDAP by users updates in eZ Publish.

eZ Systems is refactoring, time after time eZ Publish's model so everything is split and highly efficient little by little.

 

Solutions to common issues

Case 1 - It's too late : there's too much data in the user !

It can give you some troubles on performance when you reached high number of users. The main issue is that user data are in eZ Publish and not away. The first point is to know if you need some customization or finally if you just need the user to be logged in to just access some private area.

Solution 1.1 - Store it elsewhere

Make a datatype or extend the eZ User type to only let eZ Publish manage what it needs to authenticate the user, I mean the eZ User Account. Ok, it's cool to have users as pages but in real life, customers don't really want a picture directory of the whole members of the web site. It's not done today in eZ but the global approach of un-content-ization has began. More recently, the eZ Comments extension provides comments for everything in eZPublish but is not set with the classic content mechanism.

The approach of the datatype may not be the right one. You may except some troubles, depending on your storage mechanism. I was thinking about several containers, like LDAP (of course), a custom SQL table, even a file (XML or whatever).

Solution 1.2 - You don't (really) need that

Sometimes troubles are coming from a bad interpretation of the customers need. Sometimes, people don't want data to be hold by the system, it's just an helper for them and avoid to get information elsewhere. Sometimes, people, I mean the main population of the website, doesn't know that there are data about them loaded in the system. The point is that finally you don't need the data, you can do without it.

An example : your customer is telling you that their eZ Publish instance is holding 400K users with a lot of data. This data is not shown on the front side because it's an institutional web site with a poor logged in section. The data is shown in the back office to the webmaster in the user section (so for 1 guy).

One good approach is the following : ask why customer needs all these data and try to figure out if the data is stored elsewhere and if it can be accessed in a asynchronous way, by requesting a LDAP or other external data source.

The solution that will work is to set up a meta user by big business role you are using. For example, define an HR user, IT user, a Board user and so on. When people are logging in, check access against the source and then log the user with eZ Meta User predefined. You will get a severe reduction of your members count : 400K to 4 !

Case 2 - It's not too late : how to share users data ?

The second point is the way you can share users between different application of your IT system. We can think of it on two aspects : the fact to share data (first name, last name and so on) and the fact to authenticate people. It's different and this could be implemented in different ways.

Solution 2.1 - Define a reference

The most important thing in your IT system is to define an architecture block that will handle a centralized reference of user data, both for the data and the authentication. From a strict architecture point of view, directories can provide both features and that's not good. However, as the password is generally the cheapest and easiest way to authenticate people, architects do not recommend two services and prefer to have only layer for this.

So, the most important thing is to use an external service to hold the data and the password mechanism.

Solution 2.2 - Purely share user data

Sometimes, it's a bit difficult to find the allocation of your data between the remote (I mean the reference users data) and the local (data from your application). At this point, you may have three choices with drawbacks and advantages :

  • Full local : so why put a remote reference ? :-)
  • Full remote : all your data are hold in a remote system and you need to request it each time you want to have an information.
  • Half remote - half local : data are stored at remote's and some are synchronized (or not) with local.

As your system needs some consistency, you have the choice to centralized everything at the remote reference but this is implying a bottleneck. Moreover, you will have to get a fine strategy to synchronize remote and local, replicate data from one to the other.

Questions to ask yourself :

  • Do all data have to be in the reference ? Does the business piece of data that I manage in my application can be shared with others ?
  • What is important, performance or consistency ? Do I have to store all data inside the directory ? How do I synchronize all this ? What if I have update at local's ?

For eZ Publish, it's quite simple as the User mechanism is not so efficient with a lot of users. So the best way, if possible, is to

  • simplify the user attributes to the minimal user account datatype attribute, 
  • store everything which can be shared out in the directory,
  • store all others attributes in another place (for example a custom datatype that writes a list of fields in a table).

Conclusion

Users management is not so easy, there's a lot of thing to think and some merged concepts that make us difficult to take decisions about how to manage the users in an IT system. This is a common issue that is shared by all companies aver the world and leads to interesting solutions like oAuth or OpenID.

Regarding to eZ publish, the difficulty is coming from the technical lock inferred by the eZ User Account type that forces you to have a data user instance in eZ Publish. My recommendation is to quickly kick out the user account (login, email and password) from the user class. This will lead to the division between data and authentication and then it will be possible to authenticate someone without any data inside eZ publish. Then if we actually need a node, we can make a synchronization or an after account creation trigger to generate the node. This mechanism has to be disengageable.

eZPublish SSO Handler or not...

Tags : sso , ezpublish
Maxime THOMAS on 30 of Jan 2010 - 12pm

On my last eZPublish project, i have developped a SSO handler which aim was to automatically log out a user if a session cookie is set. This experience allowed me to clarify the process which is not or hardly documented on ez.no.

Back to the roots

Little recall, the SSO(Single Sign On) is a fonctionnality very powerfull which allows the user of a heterogen applications set to log in once and be authentified on the application set without giving again his login / passsword couple.

This need appear soon enough in companies wich were installing applications for their intranet without thinking about technology or authentication management. The consequency was desastrous because the user had to handle a different login and password for each application. In order to insure the minimum consistency, the establishment of directories allowed to centralized connection data but the user had to enter it for each application.

This type of mechanism inevitably leads to one or more exchanges between a client (the application where the user is located in) and a server (the server that determines whether the user is authenticated or not). Many tools exist to manage the server side with a more or less similar approach, but I shall mention only one: CAS.

The advantage of such a service is to manage a set of authentication sources (backends) on one side (eg a MySQL database, a flat file, a LDAP, a web service ...) and an application set (frontend) on the other side. The CAS server only ensure that the application has the right to authenticate and make available all or part of the authentication sources.

The SSO established can do two things

  • First to authenticate. The user arrives at website A, authenticates, gets on site B, he is automatically authenticated.
  • In addition to disconnect. The user disconnects from the website, it returns to the site B where he was authenticated, he is automatically disconnected.

In summary, for a company concerned about the comfort of its users, SSO is a real revolution because it gives access to all resources with a single pair username / password and enables unified management of authentication methods.

eZPublish SSO Handler

Out of respect for my client, I do not disclose their sources even if they are a very good example of SSO management by eZPublish. We will study the integration of an SSO-type CAS or equivalent.

eZPublish has a handler that is integrated with the traditional authentication mechanism which is triggered at two points:

  • When the user is not logged on any pages.
  • Only when the user has authenticated through the user module and its function loginafter login authentication via a Login Handler.

I think that this feature was designed to serve as a eZPublish SSO server and not be a client of an SSO server. This means that eZPublish serves as a reference for other applications and once logged in eZPublish the SSO Handler must allow to authenticate the user directly within each application.
The SSO mechanism positions a SSO token that allows third party applications to detect whether they should contact the SSO server to check if the user is already authenticated or not. After analysis it is found to handle the following 4 cases:

Action SSO token existing SSO token not existing
or invalid
Logged in user Nothing is done Case never reached
Logged out user The user is logged in Nothing is done

As you can see, the case of automatic disconnection is never reached in eZPublish since the SSO handler is hidden behind the Login Handler in the case of the connection and is not called when the user is already authenticated. The problem is that even if you have a token out of date or even better invalid in the current application, you're never disconnected.

A palliative

The implemented solution is quite simple since it is to disconnect the user eZPublish for him to trigger again the SSO Handler. This is done simply by setting a SessionTimeoutbelow the validity of the token in the SSO configuration file site.ini. This maneuver forces eZPublish to disconnect the user . At the next page display, the user triggers the SSO handler that evaluate the invalid token and removes it.
The idea being that is that in later versions of eZPublish, eZSystems functionally correct this point.