Could I get some help with JOINs?
I see a lot of questions out there that suggest a lack of basic understanding of how tables in a relational database, well, relate to each other. Let's say we have the following schema of authors and the stories they have written:
Notice that we have at least one author with no stories, and we also have at least one story with no author.
First, let's handle the basic INNER JOIN. This returns every row from the "left" table with at least one matching row in the "right" table. (By "left" I mean the table mentioned on the left side of the JOIN expression.)
This returns all of the authors and each of their stories, one row per story. Notice that multiple authors have multiple rows in the result, and that there are no rows for any author without a story, or any story without an author.
A common question is, "Should I use this ANSI-style JOIN, or should I use the old style with the JOIN criteria in the WHERE clause?" Here is the old-style syntax:
You will note that this produces the exact same result as the above query. Personally, I prefer the ANSI-style JOIN because it will last longer (the old style is deprecated) and because keeping the JOIN criteria separate from the filter criteria is much more intuitive and readable, IMHO. And while you should see no difference in the performance or in the results from INNER JOIN queries using either syntax, this is not always true for OUTER JOINs (I will provide examples later). So, since you should definitely use ANSI-style JOIN syntax for all OUTER JOINs, it would be consistent to use this syntax throughout all of your queries.
Another common request I see for this kind of schema is, "How do I return one row per author, including the name of a story they wrote?" To which I always respond, "Which story do you want?" See, SQL Server doesn't know which story you want, so you have to be specific. There is no ANY() function in SQL Server, so you have to tell the engine whether you want the latest story they wrote, the first story title alphabetically, or something else that can be quantified — rather than expecting an arbitrary / non-deterministic result. And each of these results can be derived in a slightly different way. You must also account for cases where an author has not yet written any stories; do you include such authors in the result? If so, what do you want in the story column?
This query list authors and their first title alphabetically:
This query lists authors and their latest story, including dt (this assumes that authorID + dt is unique).
What if you want to include authors that don't have stories? You'll need to replace the INNER JOINs with LEFT OUTER JOINs so that story-less authors aren't left out:
What if you want to return all stories, and their authors (if available)?
Note that the only difference between a LEFT and RIGHT JOIN is the order of the tables that are being JOINed. Basically, the table on the mentioned side of the JOIN expression (e.g. the table mentioned first in a LEFT JOIN and last in a RIGHT JOIN) is the table that is going to produce real data for all rows, and the other table will be populated with NULLs when no matching row is found.
The other types of JOINs are FULL OUTER and CROSS.
FULL OUTER returns the every row from the left table *and* every row from the right table. If no match exists on one side, that side contains NULL values in all columns. To put it simply, a FULL OUTER JOIN is the UNION of a LEFT JOIN and RIGHT JOIN.
A CROSS JOIN takes every row in the left table, and matches it with every row in the right table. A CROSS JOIN is generally referred to as the Cartesian product.
I don't know of many practical uses for this type of query, in terms of actually looking at the results, except a quick way to generate many menial rows from two tables and not trying to form any relationship between the two sides.
I've probably left out quite a bit of information, but I'm tired. Please feel free to leave feedback about any errors above or about any JOIN topics you think should be included here. And don't forget to clean up:
Related ArticlesCan I fix this mm/dd/yyyy <-> dd/mm/yyyy confusion once and for all?
How can I tell which version of MDAC I'm running?
How do I access MIN, MAX, SUM, COUNT values from SQL statements?
How do I change column order in a table structure?
How do I change the order of columns in a table?
How do I concatenate strings from a column into a single row?
How do I convert columns of values into a single list?
How do I determine if a database exists?
How do I document / compare my SQL Server database(s)?
How do I get the IDENTITY / AUTONUMBER value for the row I inserted?
How do I solve 'ADO Could Not Find The Specified Provider'?
Should I use BETWEEN in my database queries?
Why can't I use the * wildcard in a database search?
Why do I get 'Syntax Error in INSERT INTO Statement' with Access?
Why do I get weird results when using both AND and OR in a query?
Why do some SQL strings have an 'N' prefix?
Why does AbsolutePosition return as -1?
Why doesn't SQL Server allow me to separate DATE and TIME?
Why is Query Analyzer only returning 255 characters?
Why should I avoid NULLs in my database?
How do I deal with an apostrophe (') in a SQL statement?