Dear readers of our blog, we'd like to recommend you to visit the main page of our website, where you can learn about our product SQLS*Plus and its advantages.
 
SQLS*Plus - best SQL Server command line reporting and automation tool! SQLS*Plus is several orders of magnitude better than SQL Server sqlcmd and osql command line tools.
 

REQUEST COMPLIMENTARY SQLS*PLUS LICENCE

Enteros UpBeat offers a patented database performance management SaaS platform. It proactively identifies root causes of complex revenue-impacting database performance issues across a growing number of RDBMS, NoSQL, and deep/machine learning database platforms. We support Oracle, SQL Server, IBM DB2, MongoDB, Casandra, MySQL, Amazon Aurora, and other database systems.

SQL Joins

8 June 2020

SQL Joins

 

The SQL JOIN language operator is intended for connecting two or more database tables by a matching condition. This operator exists only in relational databases.

It is thanks to JOIN relational databases have such powerful functionality that allows you to store not only data, but also their, at least the simplest, analysis by queries.

Let’s analyze the main nuances of writing SQL queries with the JOIN operator, which are common for all DBMS (database management systems). To connect two tables, the SQL JOIN operator has the following syntax:

SELECT THE COLUMN_NAMES (1..N)
FROM TABLE_NAME_1 JOIN TABLE_NAME_2
ON CONDITION

One or more links with the JOIN operator can be followed by an optional WHERE or HAVING section, which, like a simple SELECT query, sets the sampling condition. Common for all DBMS is that in this construction instead of JOIN INNER JOIN, LEFT OUTER JOIN, RIGHT OUTER JOIN, FULL OUTER JOIN, CROSS JOIN (or optionally a comma) may be specified.

INNER JOIN (internal connection)

The INNER JOIN operator query is intended for connecting tables and outputting the resulting table, in which data are completely overlapped by the condition specified after ON.

INNER JOIN (internal connection)

Simply JOIN does the same. Thus, the word INNER is optional.

There is a database of the ad portal – 2. It has a table Categories (categories of ads) and Parts (parts, or otherwise – headings, which belong to categories). For example, parts of Apartments, Villas belong to the Real Estate category and parts of Cars, Motorcycles – to the Transport category.

Tables of this database with filled in data have the following form.

Table Categories:

CatnumbCat_namePrice
10Building materials105,00
505Real Estate210,00
205Transport160,00
30Furniture77,00
45Technique65,00

 

Table Parts:

Part_IDPartCat
1Apartments505
2Cars205
3Boards10
4Cabinets30
5Books160

 

Note that in the Parts table, Books have Cat, a reference to a category that is not in the Categories table, and in the Categories table, Technique has Catnumb, a reference to a category that is not in the Parts table.

Example 1. It is necessary to combine the data of these two tables so that the resulting table has the fields Part (Part), Cat (Category) and Price (Announcement submission price) and so that the data completely overlap on the condition. Condition – coincidence of the category number (Catnumb) in the Categories table and references to the category in the Parts table. To do this, we write the following query:

SELECT Parts.Part, Categories.Catnumb AS Cat, Categories.Price FROM Parts INNER JOIN Categories ON Parts.Cat = Categories.Catnumb

The result of the query is the following table:

PartCatPrice
Apartments505210,00
Cars205160,00
Boards10105,00
Cabinets3077,00

 

There are no Books in the resulting table, as this entry refers to a category that is not in the Categories table, and Techniques, as this entry has a foreign key in the Categories table that is not referenced in the Parts table.

In some cases, when connecting tables, you can create less cumbersome queries using the EXISTS predicate and without using JOIN.

Write SQL queries with JOIN yourself and then see the solutions

There’s a database called “Theatre”. The Play table contains data on productions. The Team table is about the roles of actors. Table Actor – about actors. Table Director – about the directors. The table fields, primary and external keys can be seen in the figure below.

Write SQL queries with JOIN yourself and then see the solutions

Example 2. Identify the most popular actor in the last 5 years.

Use the JOIN operator 2 times. Use COUNT(), CURDATE(), LIMIT 1.

Example 2 solution for an SQL query with JOIN

SELECT
act.fname, act.lname, COUNT(tea.play_id)
FROM play pl JOIN team tea
ON tea.play_id=pl.play_id
JOIN
actor act
ON act.actor_id=tea.actor_id
WHERE TIMESTAMPDIFF(YEAR, pl.premieredate, CURDATE()) < 5
GROUP BY act.fname, act.lname
ORDER BY COUNT(tea.play_id)
DESC LIMIT 1

The resulting table will contain FNAME, LNAME, COUNT fields. With the first JOIN, the TEAM and PLAY table data are crossed by the Play_ID condition. With the second JOIN, the ACTOR and TEAM table data are crossed by the Actor_ID condition. It should be noted that when making a query to calculate the difference between time points, the function TIMESTAMPDIFF() was used, which is present in MySQL, but which is not in many other SQL dialects. They most often use DATEFIFF() fuktsziya, and to set the interval instead of YEAR other words are used.

Example 3. Print a list of actors who play more than one role in one performance, and the number of their roles.

Use the JOIN operator once. Use HAVING, GROUP BY. Hint. The HAVING operator applies to the number of roles counted by the COUNT aggregate function.

Example 3 solution for SQL query with JOIN

Request will be next:

SELECT act.fname, act.lname, count(tea.play_id) FROM actor act
JOIN
team tea ON tea.actor_id=act.actor_id GROUP BY
act.fname, act.lname,tea.play_id HAVING
count(tea.play_id) > 1 ORDER BY count(tea.play_id) DESC

The resulting table will contain FNAME, LNAME, COUNT fields. With JOIN, the data in the TEAM and ACTOR tables are crossed by the Actor_ID condition.

LEFT OUTER JOIN (left external connection)

The query with the LEFT OUTER JOIN operator is intended for connecting tables and outputting the resulting table, in which data are completely overlapped by the condition specified after ON and are supplemented by records from the first (left) table even if they do not meet the condition. For left table entries that do not meet the condition, the column value from the right table will be NULL (undefined).

LEFT OUTER JOIN (left external connection)

Example 4. The database and tables are the same as in Example 1.

To get the resulting table, in which data from two tables are completely overlapped by the condition and complemented by all data from the Parts table that do not meet the condition, write the following query:

SELECT Parts.Part, Categories.Catnumb AS Cat, Categories.Price
FROM Parts LEFT OUTER JOIN Categories
ON Parts.Cat = Categories.Catnumb

The result of the query is the following table:

PartCatPrice
Apartments505210,00
Cars205160,00
Boards10105,00
Cabinets3077,00
Books160NULL

 

In the resulting table, unlike the table from example 1, there are Books, but the value of the column Price (Price) they have – NULL, as this record has a category identifier, which is not in the table Categories.

RIGHT OUTER JOIN (right external connection)

The query with the RIGHT OUTER JOIN operator is intended for connecting tables and outputting the resulting table, in which data are completely overlapped according to the condition specified after ON and are supplemented with records from the second (right) table even if they do not meet the condition. For the right table entries that do not meet the condition, the column value from the left table will be NULL (undefined).

RIGHT OUTER JOIN (right external connection)

Example 5. The database and tables are the same as in the previous examples.

To get the resulting table, in which the data from two tables completely overlapping on the condition and complemented by all data from the Categories table that do not meet the condition, write the following query:

SELECT Parts.Part, Categories.Catnumb AS Cat, Categories.Price
FROM Parts RIGHT OUTER JOIN Categories
ON Parts.Cat = Categories.Catnumb

The result of the query is the following table:

PartCatPrice
Apartments505210,00
Cars205160,00
Boards10105,00
Cabinets3077,00
NULL4565,00

 

In the resulting table, in contrast to the table from Example 1, there is a record with a category 45 and a price of 65.00, but the value of the column Part (Part) is NULL, because this record has an identifier of the category to which there are no references in the table Parts.

FULL OUTER JOIN (full external connection)

The query with the FULL OUTER JOIN operator is intended for connecting tables and outputting the resulting table, in which the data are completely overlapped according to the condition specified after ON and complemented by records from the first (left) and second (right) tables, even if they do not meet the condition. For records that do not meet the condition, the column value from another table will be NULL (undefined).

FULL OUTER JOIN (full external connection)

Example 6: Database and tables are the same as in the previous examples.

To get the resulting table, in which data from two tables are completely overlapping on condition and complemented by all data from both the Parts table and the Categories table, which do not meet the condition, write the following query:

SELECT Parts.Part, Categories.Catnumb AS Cat, Categories.Price
FROM Parts FULL OUTER JOIN Categories
ON Parts.Cat = Categories.Catnumb

The result of the query is the following table:

PartCatPrice
Apartments505210,00
Cars205160,00
Boards10105,00
Cabinets3077,00
Books160NULL
NULL4565,00

 

The resulting table has Book entries (from the left table) and 45 category entries (from the right table), with the first having an undefined price (column from the right table) and the second having an undefined part (column from the left table).

Aliases of the tables to be connected

In previous queries we have specified full names of the columns to be extracted from different tables. Such queries look cumbersome: the same word is repeated several times. Is there any way to simplify the design? Well, it turns out you can. To do this, you should use table aliases – their abbreviated names.

An alias can also consist of one letter. Any number of letters in an alias is possible, the main thing is that the request after the abbreviation is clear to you. General rule: In the query section defining the connection, i.e., the full table names must be specified around the word JOIN, and each name must be followed by a table alias.

Example 7. Write a query from Example 1 using the table aliases to be connected.

The query is as follows:

SELECT P.Part, C.Catnumb AS Cat, C.Price
FROM Parts P INNER JOIN Categories C
ON P.Cat = C.Catnumb

The request will return the same as the request in Example 1, but it is much more compact.

JOIN and connection of more than two tables

Relational databases must comply with the requirements of data integrity and redundancy, so that data on one business process can be contained not only in one, but also in three or more tables.

In these cases, chains of linked tables are used for data analysis: for example, one (first) table contains some quantitative index, the second table is linked with the first and third external keys – data intersect, but only the third table contains a condition, depending on which quantitative index can be derived from the first table.

And there can be even more tables. Using the SQL JOIN statement in one query you can connect a large number of tables. In such queries, one section of the connection is followed by another, and each successive JOIN connects to the next table the table that was the second in the previous chain link.

Thus, the SQL query syntax for connecting more than two tables is as follows:

SELECT NAMES OF COLUMNS (1...N)
FROM TABLE_NAME_1 JOIN TABLE_NAME_2
ON CONDITION
JOIN TABLE_NAME_3
ON CONDITION
...
JOIN NAME_TABLE_M
ON CONDITION

Example 8. The database is the same as in the previous examples.

To the Categories and Parts tables in this example, we will add an Ads table containing data on the ads published in the portal. Here is a snippet of the Ads table, which contains records about ads that expire 2020-04-02.

A_IdPart_IDDate_startDate_endText
211‘2020-02-11’‘2020-04-20’“Selling…”
221‘2020-02-11’‘2020-05-12’“Selling…”
271‘2020-02-11’‘2020-04-02’“Selling…”
282‘2020-02-11’‘2020-04-21’“Selling…”
292‘2020-02-11’‘2020-04-02’“Selling…”
303‘2020-02-11’‘2020-04-22’“Selling…”
314‘2020-02-11’‘2020-05-02’“Selling…”
324‘2020-02-11’‘2020-04-13’“Selling…”
333‘2020-02-11’‘2020-04-12’“Selling…”
344‘2020-02-11’‘2020-04-23’“Selling…”

 

Suppose that today is ‘2020-04-02’, i.e. this value is taken by the CURDATE() function – the current date. We need to find out to which categories the ads whose publication term expires today belong. The names of categories are only in the CATEGORIES table, and the expiration dates of ads are only in the ADS table.

The PARTS table contains parts of the categories (or more simply, subcategories) of the published ads. But with the Cat_ID external key, the PARTS table is linked to the CATEGORIES table and the ADS table is linked with the Part_ID external key to the PARTS table. So we connect three tables in one query and this query can be called a chain with maximum correctness.

The query will be next:

SELECT C.Cat_name FROM Categories C JOIN Parts P
ON P.Cat=C.Catnumb JOIN ads A ON A.Part_id=P.Part_id
WHERE A.Date_end=CURDATE()

Request result – a table containing the names of two categories – “Real Estate” and “Transport”:

Cat_name
Real Estate
Transport

CROSS JOIN (cross-connection)

CROSS JOIN (cross-connection)

Using the SQL CROSS JOIN operator in the simplest form – without the condition of connection – implements the operation Cartesian work in relational algebra. The result of such a connection is the binding of each row of the first table to each row of the second table. The tables can be written in the query either through the CROSS JOIN operator or by a comma between them.

Example 9. The database is still the same, the tables are Categories and Parts. To implement the Cartesian operation of these two tables.

Request will be next:

SELECT (*) Categories CROSS JOIN Parts

Or without an explicit CROSS JOIN – by a comma:

SELECT (*) Categories, Parts

The request will return a table of 5 * 5 = 25 rows, a fragment of which is shown below:

CatnumbCat_namePricePart_IDPartCat
10Building materials105,001Apartments505
10Building materials105,002Cars205
10Building materials105,003Boards10
10Building materials105,004Cabinets30
10Building materials105,005Books160
45Technique65,001Apartments505
45Technique65,002Cars205
45Technique65,003Boards10
45Technique65,004Cabinets30
45Technique65,005Books160

 

As you can see from the example, if the result of such a query has any value, it may be a demonstrative value in some cases when it is not necessary to derive structured information, especially the simplest analytical sample. By the way, you can specify the output columns from each table, but even then the information value of such a query will not increase.

But for CROSS JOIN you can specify a connection condition! The result will be quite different. When using the comma operator instead of explicitly specifying CROSS JOIN, the connection condition is set not by the word ON but by the word WHERE.

Example 10. Same database of the ad portal, Categories and Parts tables. Using a cross-connection, connect the tables so that the data are completely overlapping on the condition. Condition – match the category identifier in the Categories table and the category reference in the Parts table.

The request is as follows:

SELECT P.Part, C.Catnumb AS Cat, C.Price
FROM Parts P, Categories C
WHERE P.Cat = C.Cat_ID

The request will return the same as the request in example 1:

PartCatPrice
Apartments505210,00
Cars205160,00
Boards10105,00
Cabinets3077,00

 

And it’s not a coincidence. The request with cross-connection is completely similar to the request with internal connection – INNER JOIN – or, considering that the word INNER is not mandatory, just JOIN.

Thus, which variant of query to use is a question of style or even habit of the database specialist. Perhaps a cross-connection with a condition for two tables may seem more compact. But the advantage of a cross-connection for more than two tables (it’s also possible) is quite controversial.

In this case, the WHERE intersection conditions are enumerated by the word AND. This design can be cumbersome and difficult to read if there is also a WHERE section with sampling conditions at the end of the query.

SQL Joins Explained. Joins in SQL. SQL Tutorial (video)

 
Tags: , , ,

MORE NEWS

 

Preamble​​NoSql is not a replacement for SQL databases but is a valid alternative for many situations where standard SQL is not the best approach for...

Preamble​​MongoDB Conditional operators specify a condition to which the value of the document field shall correspond.Comparison Query Operators $eq...

5 Database management trends impacting database administrationIn the realm of database management systems, moreover half (52%) of your competitors feel...

The data type is defined as the type of data that any column or variable can store in MS SQL Server. What is the data type? When you create any table or...

Preamble​​MS SQL Server is a client-server architecture. MS SQL Server process starts with the client application sending a query.SQL Server accepts,...

First the basics: what is the master/slave?One database server (“master”) responds and can do anything. A lot of other database servers store copies of all...

Preamble​​Atom Hopper (based on Apache Abdera) for those who may not know is an open-source project sponsored by Rackspace. Today we will figure out how to...

Preamble​​MongoDB recently introduced its new aggregation structure. This structure provides a simpler solution for calculating aggregated values rather...

FlexibilityOne of the most advertised features of MongoDB is its flexibility.  Flexibility, however, is a double-edged sword. More flexibility means more...

Preamble​​SQLShell is a cross-platform command-line tool for SQL, similar to psql for PostgreSQL or MySQL command-line tool for MySQL.Why use it?If you...

Preamble​​Writing an application on top of the framework on top of the driver on top of the database is a bit like a game on the phone: you say “insert...

Preamble​​Oracle Coherence is a distributed cache that is functionally comparable with Memcached. In addition to the basic function of the API cache, it...

Preamble​​IBM pureXML, a proprietary XML database built on a relational mechanism (designed for puns) that offers both relational ( SQL / XML ) and...

  What is PostgreSQL array? In PostgreSQL we can define a column as an array of valid data types. The data type can be built-in, custom or enumerated....

Preamble​​If you are a Linux sysadmin or developer, there comes a time when you need to manage an Oracle database that can work in your environment.In this...

Preamble​​Starting with Microsoft SQL Server 2008, by default, the group of local administrators is no longer added to SQL Server administrators during the...