3.10.2010

SQL Challenge #1

Let's take a look at some information requirements and how the expected output information could be generated using SQL. For this challenge, let's use the snapshot of a database design (shown below)


Suppose we were asked to generate this information -- "List the secretaries (names only) that are currently working on projects. The report should contain the name of the employee and the name of the project he/she is working at"

When we work on the SQL statement that will generate this requirement, what is the first thing we generate mentally? Imagine the output? That's right. Imagining the output end in mind will help you in writing SQL statements. For more details about the steps that we will be doing here, you can read the blog post on Writing SQL statements.

Going back to our problem, if you are imagining the output information shown below, then you are doing just fine so far.


How did you get that mentally? Definitely it involved you extracting, comparing and all sorts of processing in your head to get this output information. That logic needs to be translated to some structure like SQL.

Let's do it structurally. The first thing we know from the output information we have is that the data (NAME and PNAME) will not come from one single table. That is NAME is data coming from the EMPLOYEE table, and PNAME is data coming from the PROJECT table. If you have read the blog post about using joins, you will realize that we will need an SQL statement that will combine two or more tables for this problem. So far, we know that our SQL statement will have this skeleton structure:

SELECT NAME, PNAME
FROM   EMPLOYEE JOIN PROJECT

But looking at the table design EMPLOYEE and PROJECT are not directly related together. There are no data fields in both tables that actually relates the two tables together. Therefore, they cannot be joined directly. If we will look at the table design, EMPLOYEE and PROJECT may not be directly related together, but they can be indirectly related together through EMPLOYEE_PROJECT table. This gives you a clue that the skeleton structure of our SQL statement will look like this:'

SELECT NAME, PNAME
FROM   EMPLOYEE JOIN EMPLOYEE_PROJECT JOIN PROJECT


In order for us not to keep on using the whole table name, we can place table aliases:

SELECT E.NAME, P.PNAME
FROM   EMPLOYEE E JOIN EMPLOYEE_PROJECT EP JOIN PROJECT P

Now that we know that we need to join three tables together in order to get the date we need, let's complete our join by indicating to what data fields would they need to merge. EMPLOYEE relates to EMPLOYEE_PROJECT through EMPLOYEE'S ENUM and EMPLOYEE_PROJECT'S ENUM. This is translated in SQL as:

SELECT E.NAME, P.PNAME
FROM   (EMPLOYEE E JOIN EMPLOYEE_PROJECT EP ON E.ENUM=EP.ENUM)
        JOIN PROJECT P

Looking at our database design, we could also see that the PROJECT table relates to EMPLOYEE_PROJECT through PROJECT's PNO and EMPLOYEE_PROJECT'S PNO. This is translated in SQL as:

SELECT E.NAME, P.PNAME
FROM   (EMPLOYEE E JOIN EMPLOYEE_PROJECT EP ON E.ENUM=EP.ENUM)
        JOIN PROJECT P ON P.PNO=EP.PNO

What have we accomplished at this point? 
We have written the data fields where our output will come from (E.NAME, P.PNAME) and the tables where these data will be extracted from (EMPLOYEE JOIN EMPLOYEE_PROJECT JOIN PROJECT). If we look back at the requirement, we don't need all the employees, we only need those that are:
  1. Secretaries
  2. Working currently on projects
How do we filter the data to only involve secretaries?
Investigating our database design, if an employee is in the EMPLOYEE_SECRETARY table, that means, that employee is a secretary right? And the same is what we have to place in our SQL statement. We wanted only those employees who can be found in the EMPLOYEE_SECRETARY table. This is translated in SQL as:

SELECT E.NAME, P.PNAME
FROM   (EMPLOYEE E JOIN EMPLOYEE_PROJECT EP ON E.ENUM=EP.ENUM)
        JOIN   PROJECT P ON P.PNO=EP.PNO
WHERE  E.ENUM IN (SELECT ENUM FROM EMPLOYEE_SECRETARY)

How do we filter further only those who are working on current projects?
If we will go back to our table design, we will know that an employee is currently working on a project if the DATELEFT of the record in the EMPLOYEE_PROJECT is empty OR the DATELEFT is after today. This is translated in SQL as:

EP.DATELEFT IS NULL OR EP.DATELEFT > SYSDATE()

SYSDATE() is a pre-defined function in MYSQL that actually returns the date today. 
Adding this to our SQL statement, we will have:

SELECT E.NAME, P.PNAME
FROM   (EMPLOYEE E JOIN EMPLOYEE_PROJECT EP ON E.ENUM=EP.ENUM)
        JOIN PROJECT P ON P.PNO=EP.PNO
WHERE  E.ENUM IN (SELECT ENUM FROM EMPLOYEE_SECRETARY)
AND    ( EP.DATELEFT IS NULL OR EP.DATELEFT > SYSDATE() )

And we are done!

0 comments: