Database

Install Scripts/Philosophy

The database is configured to be 100% scripted. It can be built and rebuilt entirely from the command line. SQL is code. Code is organized into related modules. Modules are version controlled.

The directory, InstallationAndConversion, contains scripts that create both the app control and CMS databases, create database users and password, create tables, load the initial data, load all of the stored procedures and then set security on those stored procedures.

The batch file, refreshdb.bat, is used for updates and performs the last 2 steps, loading the stored procedures and resetting security.

Stored Procedures

Titan is a data intensive application. Everything in Titan revolves around being able to quickly retrieve the necessary data from the database. As such, all data work is performed inside the database. This reduces the amount of data transported throughout the system to only that which is required by the applications while keeping the transaction locks as short as possible.

The stored procedures and functions use a naming convention that both denote how the procedure is used as well as the security settings on the stored procedure (see GrantExecutePermissions.sql). During load, only the default database user ([dbName]User) is granted execute rights to stored procedures that begin with the strings “usp_” or “userDef_”. Only these stored procedures can be called from the component layer; all other stored procedures (usually named “internal_*”) may only be called from other stored procedures.

In this manner, we create a semblance of public and private stored procedures. Public-facing stored procedures are responsible for verifying non-document access rights (e.g., is the user requesting the list of users an uber-user ?) as well as starting and controlling transactions. Internal stored procedures can assume that non-document access rights have been verified and can assume that they are appropriately nested inside of transactions, if needed.

Note: Despite attempts to organize the SQL source, T-SQL is not a language that supports well the concept of code reuse. As such, there are several places in the source code base where queries are repeated because the alternative is just too expensive or ungainly.

Coding Patterns

There are a few other standard coding patterns within the stored procedures:

@temp tables. Titan makes extensive use of @temp tables to simplify queries. In some cases, we have pulled out an inline-view into a temp table to enforce an index constraint that got lost by SQL Server’s optimizer. In that situation, pulling the inline view into a @temp table with a primary key resulted in significant performance gains.
XML is the primary input and output format of Titan. However, per their own admission, Microsoft’s xquery implementation is weak. Thus, the common pattern for input is to parse into a @temp table and work off that table. In some cases we will also bundle up tabular data and pass it into an internal_* stored procedure (prior to the concept of table parameters). However, Microsoft has a well-known bug where parsing out large xml can go quadratic. In those cases we resort to #temp tables and, possibly, OPENXML.
#temp tables are used for many of the display-side queries for nav and/or filtering requests. Fast. No parsing. Keep it simple, albeit in the presence of evil, global variables.
Dynamic SQL occurs in two forms:
- Internal to the stored procedures. When internal to the stored procedure we just use it and make the assumption that our inputs are clean.
- Generated from the display or workstation interaction. As just a minor hedge, we don’t accept arbitrary dynamic SQL from the components. We’ll take parts and strcat them together. Yes, this isn’t really any more secure; we just raised the height of the hurdle.

Group By/Having Dance

Initially, the most complex processing was in DisplaySupport.sql. Here the stored procedure internal_BuildActiveDocIDs was once the ugly beast inside the database code, performing the group by/having dance(*) to calculate display side security. Over time, the functionality of Titan has evolved and it is now relatively simple, but don’t underestimate the room for error. Before tackling any of the And/Or, Or/Or functions within BlockSupport.sql, it is best to completely understand the basics of the group by/having logic of internal_BuildActiveDocIDs.

Consider only security filters. Before a user can see a document we need to determine if a user has rights. A user has rights if and only if, for every security tag, one of three conditions applies: the document says “ALL”; the user says “ALL” or there is a matching attribute between the user and the document.

Translated to SQL, we can look at each case as a query between the document and user tags (aka classifications). Now if we union those 3 cases together and group by the tag ID, we will get a table of classifications where the user and the document agree.

But, we need to know that the document and the user agree for all of the classifications. So we take the table of matching tag IDs and count them up. If the count matches the number of security classifications for the document, the user can view the document.

But, why call it group by/having? The query doesn’t work on a single document at a time. It has to work over a set of documents. Thus, the generalized solution is to use the group by over the tags and DocID and the documents to only those having the right count.

Features that add to the calculation fun (both in DisplaySupport.sql and BlockAndPageSupport.Sql):

Security rights are not the same for all sites. Thus, to calculate rights for document sets that span a single site (e.g., filter block results), we spin a cursor and process one site at a time. Perhaps the cursor could have been avoided, but the cross-site operation is not the primary usage so the overhead is acceptable.
A malicious or poorly written client may try to configure the system to use a security tag as if it were a filter (security are required; filters are optional) or vice versa. Before application, the code must make sure that the tag is being used in the correct manner or documents may be returned that are prohibited to the user.
“Or” became “And/Or”. With security and the initial implementation of the filter block, the matching of attributes between the document and the user was always an “or”. If one attribute matched, then the tag matched. With the second generation of the filter block, the option was added that the author could specify that all attributes had to match. Thus, the having clause becomes more complex than just counting the number of tags that had a match; now it has to keep track of the number of matching attributes and whether we need to match just one or all of them.
The Required Flag and Subsets. With the second generation of the filter block we added the concept of filter tag subsets. That alone wouldn’t have really mattered except that we added a flag that said “only show the documents that have one of the attributes for this tag”. This added the necessary step of pre-filtering the documents to only those with the proper settings for the tag.
User Preferences. At 5.0, we added the ability for users to be assigned filter tags. This could have added a 3rd variable to the calculation of matches. However, user preferences are used only as a pre-filter. That is, user preferences are only used the first time a filtering block is displayed. The user preferences are combined with any pre-selections the author may have made for the display-side filter.
Product Folders as documents. Products broke a lot of things. Injected nav behavior, the ability to inject classifications and data folders into your nav forced a lot of changes through the system. At times we want the supporting functions to only consider folders; at other times we need to work only with documents. Thus, we have a 3-way flag that indicates documents only, folders only or both. We pass this through to our existing functions instead of post processing so that our data set can be reduced to the smallest possible size as soon as possible.
Linked documents. Products have the concept of linked documents. Again, the supporting stored procedures need to know whether or not they need to go over the linked document table to find additional documents.
Returning matching document counts. See the extensive comment in internal_BuildDisplayFilterXml2Detail for more information
File Types. Added by Segmented Search, the display treats these as a pseudo tag, but internally we must process separately, but in the same area as standard tags.
Date Ranges. Added by Segmented Search, date ranges are also supported. Unlike file types, these cannot be treated as pseudo classifications. In particular, since the meaning of the date field can vary, we are forced, in places, to resort to dynamic SQL.

A small bright spot for viewing and handling the complexity of the group by/having dance is to look at SegSearch.sql. With this block, we tried to create a grand unification theorem for filtering blocks. All of the segments make use of internal_SegSearchCommonFilter. This stored procedure is a good starting point for understanding the steps required to go from “I’d like this set of documents” to “these are the documents you may have”.

Workstation Security

The Group By/Having Dance controls display-side security. Display security is strictly tag-based and a document’s visibility is controlled only by the user settings and the document’s settings.

Workstation security is controlled by workflow settings and a user’s ability to interact with a document is based both on their rights to the current document as well as their rights in child documents. For example, a user may have author rights to the selected document, but if they do not have author rights to a document below the selected document, they will not have move or copy rights.

Workstation security takes on an added twist as rights are hierarchical. A user with Step-1 editor rights also has author rights; an Item Admin can approve and author as well as perform Item Admin duties. This hierarchy can obfuscate the query.

But it is the need to search a document’s sub-tree to determine full workstation document rights makes all multi-document workstation queries inherently O(N^2). Thus, before calling into the queries that will determine the full workstation rights for each document in the set, it is important to have attempted to prune your results set as much as possible.

Note that Titan, having been based on the database structure of CMSv2, has the concept of 4 different types of workflow: Create Self; Update Self; Create Children; Update Children. Many of the calls still support this although not one customer has made use of it in more than 10 years. In some cases, the SQL code fully supports the concept; in other cases, for performance issues, the code has been stubbed out. In the latter case, the code will retain comments that the support for multiple workflow types has been removed.