3.10.2010

SQL Challenge #2

Let's take a look at some information requirements and how the expected output information could be generated using SQL. For this challenge, let's use the snapshot of a database design (shown below)



Suppose we were asked to generate this information -- "List the employees who are 1-2 years older/younger than their spouse. The report should contain the name of the employee and the name of the spouse"

When we work on the SQL statement that will generate this requirement, what is the first thing we generate mentally? Imagine the output? That's right. Imagining the output end in mind will help you in writing SQL statements. For more details about the steps that we will be doing here, you can read the blog post on Writing SQL statements.

Going back to our problem, if you are imagining the output information shown below, then you are doing just fine so far.


How did you get that mentally? Definitely it involved you extracting, comparing and all sorts of processing in your head to get this output information. That logic needs to be translated to some structure like SQL.

Let's take a look at some information requirements and how the expected output information could be generated using SQL. For this challenge, let's use the snapshot of a database design (shown below)


Suppose we were asked to generate this information -- "List the secretaries (names only) that are currently working on projects. The report should contain the name of the employee and the name of the project he/she is working at"

When we work on the SQL statement that will generate this requirement, what is the first thing we generate mentally? Imagine the output? That's right. Imagining the output end in mind will help you in writing SQL statements. For more details about the steps that we will be doing here, you can read the blog post on Writing SQL statements.

Going back to our problem, if you are imagining the output information shown below, then you are doing just fine so far.


How did you get that mentally? Definitely it involved you extracting, comparing and all sorts of processing in your head to get this output information. That logic needs to be translated to some structure like SQL.

Let's do it structurally. The first thing we know from the output information we have is that the data (NAME and PNAME) will not come from one single table. That is NAME is data coming from the EMPLOYEE table, and PNAME is data coming from the PROJECT table. If you have read the blog post about using joins, you will realize that we will need an SQL statement that will combine two or more tables for this problem. So far, we know that our SQL statement will have this skeleton structure:

SELECT NAME, PNAME
FROM   EMPLOYEE JOIN PROJECT

But looking at the table design EMPLOYEE and PROJECT are not directly related together. There are no data fields in both tables that actually relates the two tables together. Therefore, they cannot be joined directly. If we will look at the table design, EMPLOYEE and PROJECT may not be directly related together, but they can be indirectly related together through EMPLOYEE_PROJECT table. This gives you a clue that the skeleton structure of our SQL statement will look like this:'

SELECT NAME, PNAME
FROM   EMPLOYEE JOIN EMPLOYEE_PROJECT JOIN PROJECT


In order for us not to keep on using the whole table name, we can place table aliases:

SELECT E.NAME, P.PNAME
FROM   EMPLOYEE E JOIN EMPLOYEE_PROJECT EP JOIN PROJECT P

Now that we know that we need to join three tables together in order to get the date we need, let's complete our join by indicating to what data fields would they need to merge. EMPLOYEE relates to EMPLOYEE_PROJECT through EMPLOYEE'S ENUM and EMPLOYEE_PROJECT'S ENUM. This is translated in SQL as:

SELECT E.NAME, P.PNAME
FROM   (EMPLOYEE E JOIN EMPLOYEE_PROJECT EP ON E.ENUM=EP.ENUM)
        JOIN PROJECT P

Looking at our database design, we could also see that the PROJECT table relates to EMPLOYEE_PROJECT through PROJECT's PNO and EMPLOYEE_PROJECT'S PNO. This is translated in SQL as:

SELECT E.NAME, P.PNAME
FROM   (EMPLOYEE E JOIN EMPLOYEE_PROJECT EP ON E.ENUM=EP.ENUM)
        JOIN PROJECT P ON P.PNO=EP.PNO

What have we accomplished at this point? 
We have written the data fields where our output will come from (E.NAME, P.PNAME) and the tables where these data will be extracted from (EMPLOYEE JOIN EMPLOYEE_PROJECT JOIN PROJECT). If we look back at the requirement, we don't need all the employees, we only need those that are:
  1. Secretaries
  2. Working currently on projects
How do we filter the data to only involve secretaries?
Investigating our database design, if an employee is in the EMPLOYEE_SECRETARY table, that means, that employee is a secretary right? And the same is what we have to place in our SQL statement. We wanted only those employees who can be found in the EMPLOYEE_SECRETARY table. This is translated in SQL as:

SELECT E.NAME, P.PNAME
FROM   (EMPLOYEE E JOIN EMPLOYEE_PROJECT EP ON E.ENUM=EP.ENUM)
        JOIN   PROJECT P ON P.PNO=EP.PNO
WHERE  E.ENUM IN (SELECT ENUM FROM EMPLOYEE_SECRETARY)

How do we filter further only those who are working on current projects?
If we will go back to our table design, we will know that an employee is currently working on a project if the DATELEFT of the record in the EMPLOYEE_PROJECT is empty OR the DATELEFT is after today. This is translated in SQL as:

EP.DATELEFT IS NULL OR EP.DATELEFT > SYSDATE()

SYSDATE() is a pre-defined function in MYSQL that actually returns the date today. 
Adding this to our SQL statement, we will have:

SELECT E.NAME, P.PNAME
FROM   (EMPLOYEE E JOIN EMPLOYEE_PROJECT EP ON E.ENUM=EP.ENUM)
        JOIN PROJECT P ON P.PNO=EP.PNO
WHERE  E.ENUM IN (SELECT ENUM FROM EMPLOYEE_SECRETARY)
AND    ( EP.DATELEFT IS NULL OR EP.DATELEFT > SYSDATE() )

And we are done!


Transforming forms to table design is not really difficult if you will follow a set of structured steps without doing immediate shortcuts. For me to illustrate that process, let us have this sample form shown left.

Any business form will have the following parts - Header data, body, footer. Identifying these parts of the form will allow you to create a skeleton table design which you will adjust later. As a rule to follow, header and footer data are all in one table called the parent table, and the data in the body is in another table called the child table.

Investigating our sample form, what are the data in the form that forms the header, footer and the body? You can seek the assistance of the company to help you identify these parts if you are not accustomed to it yet.


It would help if you underline the data items with red (header and footer) or blue (body). Before we design the tables, each form is unique and there is a particular data that identifies a form (this is typically the form #). This identifier help us identify one form from another. In our example, we can obviously identify the Purchase Order # as our form identifier.

The image on the left shows the same form marked with the footer and header data (in red) and the body data (in blue).

What's next?
After identifying header and footer data, and body data, we can directly do a draft table design from this following the basic rule we have stated above (parent and child table). See the image below.



If you have noticed, the primary key of the parent table (PURCHASE ORDER) is the identifier we decided for the form early on. The primary key of the child table (PURCHASE ORDER DETAILS) is the combined identifier of the form and the identifier for each of the records in the body (which in our sample is obviously the item#).

At this point, you have a draft table design. You are ready to go for design refinements through normalization and other splitting the table methods. To refer to the possible situations that will signal us that splitting tables are necessary, check this blog post.

I have a simple rule in deciding when to use a join or not.
If the information that you are getting will come from two tables, then you should use a join, otherwise other strategies like a sub-query would do just fine. Let's take a look at two cases below using the sample  database design we have used in my previous post.

  1. Generate a list of employees (name and employee number only) that are married.
  2. Generate a list of employees (name and employee number) that are married with their date of marriage.

If we will look at the output information of these two cases, we will have these shown below:


If you will look at the difference between the two output information, in the first case, all the fields of our output are all coming from one table -- the EMPLOYEE table. In the other case, we can see that DOM is a field that is coming from another table -- the EMPLOYEE_SPOUSE table. We can already predict at this point that case#1 will not use a join, but case #2 will use a join.

When using a join, we are actually combining two tables together to form a bigger and/or wider table. Let us see what happens when we join EMPLOYEE and EMPLOYEE_SPOUSE tables. 



There are things that happened when you join two tables together:
  1. The fields of each table are also combined together. The four (4) columns of the EMPLOYEE table and the three (3) columns were merged together in the result.
  2. Only records in the EMPLOYEE TABLE that has a match on ENUM (e.enum=es.enum) with the EMPLOYEE_SPOUSE table is part of the result. Investigate this, INVENTADO, PAUL is a record in the EMPLOYEE table but since his ENUM has no match with an ENUM in the EMPLOYEE_SPOUSE table, INVENTADO, PAUL is not part of our result. Logically, you can also say that he is not included primarily because he is not married and he is not listed in the other table -- EMPLOYEE_SPOUSE.
And going back to our case, since we don't want all the columns to be part of the output information, we only select the columns we wanted, given us the final SQL statement of

SELECT E.ENUM, E.NAME, ES.DOM
FROM   EMPLOYEE E JOIN EMPLOYEE_SPOUSE ES 
ON     E.ENUM=ES.ENUM

It is that easy.





Structured Query Language (SQL) is a language intended to interact with the database (whether a software or a user directly interacting with the database). SQL is not a programming language. This means that it cannot be used for other purposes aside from extracting, manipulation of information from databases and storing data into databases. SQL is also used to construct and manipulate structures inside databases where all the data will be stored.

Learning how to write SQL statement is not that difficult. Writing SQL statement is about translating your data extraction logic into a structured statement that the DBMS would understand. Let us illustrate this point with the given example below.

What we have is a snapshot of the database design made for a company. The Employee table holds important data and information about employees and the EMPLOYEE_SPOUSE table holds data about the spouse of married employees (including the date of their marriage).


The two tables above are related to each other (through the foreign keys declared in the EMPLOYEE_SPOUSE table). These relationships are important to anyone who writes SQL. For the meantime, let us concentrate on practical tips to writing SQL statements a lot easier even for beginners.

As we have said earlier, SQL statements are written with purpose of extracting information from databases. Let us say that a typical report that the software you will build is this --  LIST OF EMPLOYEES THAT ARE SINGLE.

What do we need to think of first for us to write SQL statement?
Definitely, it will be how we see the output information to be like. Just like in programming, not having the output in mind will surely not set a good and defined direction to which we will code a software. So, let us see how do we see the output information to be like.

If you are thinking of an output like what is shown below, then you are doing fine so far. When thinking of the output information, two things you have to take note of

  1. Structure of the output (columns that are part of the output)
  2. Information itself (assuming that a snapshot of the data is available to you)

Given that we have a snapshot of data with us and you were able to think of the output information shown above, this shows that you can actually do SQL. Mentally, you were able to extract the data you needed and produce the information we wanted. This mental process just simply needs to be translated into a logic conducive for SQL writing.

Analyzing how you have probably extracted the data you needed to produce the information, you may have observed that you actually looked for employees recorded in the EMPLOYEE table that does not exist in the list of employees in the EMPLOYEE_SPOUSE table. This is quite logical, if I am an employee recorded in the EMPLOYEE_SPOUSE table, that means I have a spouse and married. So if I am not in this list, then I am single.

This logic will form the cornerstone of your SQL statement. Let's go back to your logic and see how easy it is to translate to SQL. First, we looked for employees in the EMPLOYEE table. In SQL, this is simply

SELECT ENUM, NAME, DOB, DNAME
FROM   EMPLOYEE

But we know from our discussion that we don't need all the employees, only those that are not in the EMPLOYEE_SPOUSE table. How is this written?

SELECT ENUM, NAME, DOB, DNAME
FROM   EMPLOYEE
WHERE  ENUM NOT IN (      )

SQL is like English if you have noticed. If you will read our SQL so far, it reads -- "We are getting the ENUM, NAME, DOB, DNAME from the EMPLOYEE table BUT we are only interested in those whose ENUM is not in a list." What will list be? This is the list of employee numbers in the EMPLOYEE_SPOUSE table. How do we get this list?

SELECT ENUM
FROM   EMPLOYEE_SPOUSE

Now, plugging it in our draft SQL statement, we will have:
SELECT ENUM, NAME, DOB, DNAME
FROM   EMPLOYEE
WHERE  ENUM NOT IN ( SELECT ENUM
                       FROM EMPLOYEE_SPOUSE )

See, writing SQL is not really that difficult.
If you have a mental picture of what the output will look like, and have an extraction logic with you, writing it's SQL statement will just be a breeze. 

SQL (Structured Query Language) is a special computer language intended for users to interact with databases either in getting or manipulating data (SQL-Data Manipulation Language or SQL-DML) or developing structures (SQL-Data Definition Language or SQL-DDL) where data will eventually be stored.

How is SQL used?
SQL can be used either in an interactive tool or embedded (or constructed dynamically) in a programming language. When used interactively by most IT professionals, they are normally in order to test statements, dynamically and instantly making changes in the database.

SQL used dynamically using a database tool like MySQL Query Browser

















SQL used inside a Java Program











Databases store data about what happened in a transaction (or any event) that happened in an organization. A form is a reflection that such transaction (or event) actually happened. The data in the form may not necessarily be complete. There are important data in most of the forms used that are not recorded but are important to completely record what happened in the transaction. What can we use as a guide to what's the minimum amount of information to capture and store in the database?
  1. What is the transaction?
  2. Who did the transaction? Who were the people involved in the transaction?
  3. When did the transaction happen? When was it completed? When was it started?
  4. Where did the transaction happen?
  5. What were done in the transaction?
  6. How was the transaction done (phone, email, form), should other forms of transacting exists.
Groups who cannot show me the reports in which these data to be recorded will be used, only signals that they have not done thorough data requirements analysis, and the design of the database is no convincingly complete. It is also through simulated forms that are incomplete and lacks the vital details to record the transaction that we can tell whether the group is doing the database project well or not.


We know that we can dissect a table into parts (creating new tables). But this is done with a solid set of reason. What may these valid reasons be?
  1. We wanted to optimize spaces by separating fields that a less commonly field up to those that are always filled-up. This is specially if these fields less commonly field up are many.
  2. We notice fields that are only applicable to a specific set of records, and there are other fields that are only applicable to other set of records. In this, we can create a separate table containing the applicable fields for one set of records and another table for the fields applicable to another set of records. This is called specialization/generalization
  3. We wanted to reduce redundancy. This is called normalization
  4. We wanted a field to have a controlled value set. This is called mechanisms for domain constraints.
  5. A field is a multi-valued field, and each multi-valued data is atomically important to affect other possible records. This is called data flattening.
See book: Database Systems (Connoly and Begg), Chapter 3


What is the relationship of data flows with Databases?
To understand this relationship, let us examine what data flows really are first.

A data flow has three components:
  1. the data that is transfered from one process to another
  2. what process did this data come from?
  3. to what process will use this data.
These components tell us that organizational processes is about using data and exchanging information across functions and units of the organization. As an information systems student, this is a critical part of the analysis activity you perform to determine the problems and identify the solution.

When data flowed from source to destination, this is a transactional event. As a transactional event, data is recorded to ensure that the transaction is not lost or completely forgotten. For a system that will able to improve the flow of information in an organization, the first to ensure is that the data about transactions are completely and accurately recorded This tells us that when we do analysis in the objective of developing Information Systems, we also perform transactional analysis. Transactional analysis means to dissect the elements of the transaction and evaluate whether the pre-requisite complete capture is present - Are the data needed to generate the information that should flow to destination complete?

Important elements of a transaction is something that we already learned in our early years - WWWWH (Who, What, Where, When, How).
  1. What is the transaction?
  2. Who transacted it?
  3. How was it transacted?
  4. When did it happen?
  5. Where did it happen?
  6. What exactly was transacted?
  7. What completed the transaction?
  8. Who completed it?
  9. When was it completed?
  10. How was it completed?
This discussion actually leads us to the role of Databases in Data Flows. Since Data Flows contains data about the transaction, such is important to be recorded in Databases.