3.10.2010

SQL Challenge #2

Let's take a look at some information requirements and how the expected output information could be generated using SQL. For this challenge, let's use the snapshot of a database design (shown below)



Suppose we were asked to generate this information -- "List the employees who are 1-2 years older/younger than their spouse. The report should contain the name of the employee and the name of the spouse"

When we work on the SQL statement that will generate this requirement, what is the first thing we generate mentally? Imagine the output? That's right. Imagining the output end in mind will help you in writing SQL statements. For more details about the steps that we will be doing here, you can read the blog post on Writing SQL statements.

Going back to our problem, if you are imagining the output information shown below, then you are doing just fine so far.


How did you get that mentally? Definitely it involved you extracting, comparing and all sorts of processing in your head to get this output information. That logic needs to be translated to some structure like SQL.

Let's take a look at some information requirements and how the expected output information could be generated using SQL. For this challenge, let's use the snapshot of a database design (shown below)


Suppose we were asked to generate this information -- "List the secretaries (names only) that are currently working on projects. The report should contain the name of the employee and the name of the project he/she is working at"

When we work on the SQL statement that will generate this requirement, what is the first thing we generate mentally? Imagine the output? That's right. Imagining the output end in mind will help you in writing SQL statements. For more details about the steps that we will be doing here, you can read the blog post on Writing SQL statements.

Going back to our problem, if you are imagining the output information shown below, then you are doing just fine so far.


How did you get that mentally? Definitely it involved you extracting, comparing and all sorts of processing in your head to get this output information. That logic needs to be translated to some structure like SQL.

Let's do it structurally. The first thing we know from the output information we have is that the data (NAME and PNAME) will not come from one single table. That is NAME is data coming from the EMPLOYEE table, and PNAME is data coming from the PROJECT table. If you have read the blog post about using joins, you will realize that we will need an SQL statement that will combine two or more tables for this problem. So far, we know that our SQL statement will have this skeleton structure:

SELECT NAME, PNAME
FROM   EMPLOYEE JOIN PROJECT

But looking at the table design EMPLOYEE and PROJECT are not directly related together. There are no data fields in both tables that actually relates the two tables together. Therefore, they cannot be joined directly. If we will look at the table design, EMPLOYEE and PROJECT may not be directly related together, but they can be indirectly related together through EMPLOYEE_PROJECT table. This gives you a clue that the skeleton structure of our SQL statement will look like this:'

SELECT NAME, PNAME
FROM   EMPLOYEE JOIN EMPLOYEE_PROJECT JOIN PROJECT


In order for us not to keep on using the whole table name, we can place table aliases:

SELECT E.NAME, P.PNAME
FROM   EMPLOYEE E JOIN EMPLOYEE_PROJECT EP JOIN PROJECT P

Now that we know that we need to join three tables together in order to get the date we need, let's complete our join by indicating to what data fields would they need to merge. EMPLOYEE relates to EMPLOYEE_PROJECT through EMPLOYEE'S ENUM and EMPLOYEE_PROJECT'S ENUM. This is translated in SQL as:

SELECT E.NAME, P.PNAME
FROM   (EMPLOYEE E JOIN EMPLOYEE_PROJECT EP ON E.ENUM=EP.ENUM)
        JOIN PROJECT P

Looking at our database design, we could also see that the PROJECT table relates to EMPLOYEE_PROJECT through PROJECT's PNO and EMPLOYEE_PROJECT'S PNO. This is translated in SQL as:

SELECT E.NAME, P.PNAME
FROM   (EMPLOYEE E JOIN EMPLOYEE_PROJECT EP ON E.ENUM=EP.ENUM)
        JOIN PROJECT P ON P.PNO=EP.PNO

What have we accomplished at this point? 
We have written the data fields where our output will come from (E.NAME, P.PNAME) and the tables where these data will be extracted from (EMPLOYEE JOIN EMPLOYEE_PROJECT JOIN PROJECT). If we look back at the requirement, we don't need all the employees, we only need those that are:
  1. Secretaries
  2. Working currently on projects
How do we filter the data to only involve secretaries?
Investigating our database design, if an employee is in the EMPLOYEE_SECRETARY table, that means, that employee is a secretary right? And the same is what we have to place in our SQL statement. We wanted only those employees who can be found in the EMPLOYEE_SECRETARY table. This is translated in SQL as:

SELECT E.NAME, P.PNAME
FROM   (EMPLOYEE E JOIN EMPLOYEE_PROJECT EP ON E.ENUM=EP.ENUM)
        JOIN   PROJECT P ON P.PNO=EP.PNO
WHERE  E.ENUM IN (SELECT ENUM FROM EMPLOYEE_SECRETARY)

How do we filter further only those who are working on current projects?
If we will go back to our table design, we will know that an employee is currently working on a project if the DATELEFT of the record in the EMPLOYEE_PROJECT is empty OR the DATELEFT is after today. This is translated in SQL as:

EP.DATELEFT IS NULL OR EP.DATELEFT > SYSDATE()

SYSDATE() is a pre-defined function in MYSQL that actually returns the date today. 
Adding this to our SQL statement, we will have:

SELECT E.NAME, P.PNAME
FROM   (EMPLOYEE E JOIN EMPLOYEE_PROJECT EP ON E.ENUM=EP.ENUM)
        JOIN PROJECT P ON P.PNO=EP.PNO
WHERE  E.ENUM IN (SELECT ENUM FROM EMPLOYEE_SECRETARY)
AND    ( EP.DATELEFT IS NULL OR EP.DATELEFT > SYSDATE() )

And we are done!


Transforming forms to table design is not really difficult if you will follow a set of structured steps without doing immediate shortcuts. For me to illustrate that process, let us have this sample form shown left.

Any business form will have the following parts - Header data, body, footer. Identifying these parts of the form will allow you to create a skeleton table design which you will adjust later. As a rule to follow, header and footer data are all in one table called the parent table, and the data in the body is in another table called the child table.

Investigating our sample form, what are the data in the form that forms the header, footer and the body? You can seek the assistance of the company to help you identify these parts if you are not accustomed to it yet.


It would help if you underline the data items with red (header and footer) or blue (body). Before we design the tables, each form is unique and there is a particular data that identifies a form (this is typically the form #). This identifier help us identify one form from another. In our example, we can obviously identify the Purchase Order # as our form identifier.

The image on the left shows the same form marked with the footer and header data (in red) and the body data (in blue).

What's next?
After identifying header and footer data, and body data, we can directly do a draft table design from this following the basic rule we have stated above (parent and child table). See the image below.



If you have noticed, the primary key of the parent table (PURCHASE ORDER) is the identifier we decided for the form early on. The primary key of the child table (PURCHASE ORDER DETAILS) is the combined identifier of the form and the identifier for each of the records in the body (which in our sample is obviously the item#).

At this point, you have a draft table design. You are ready to go for design refinements through normalization and other splitting the table methods. To refer to the possible situations that will signal us that splitting tables are necessary, check this blog post.