Dear readers of our blog, we'd like to recommend you to visit the main page of our website, where you can learn about our product SQLS*Plus and its advantages.
 
SQLS*Plus - best SQL Server command line reporting and automation tool! SQLS*Plus is several orders of magnitude better than SQL Server sqlcmd and osql command line tools.
 

REQUEST COMPLIMENTARY SQLS*PLUS LICENCE

Enteros UpBeat offers a patented database performance management SaaS platform. It proactively identifies root causes of complex revenue-impacting database performance issues across a growing number of RDBMS, NoSQL, and deep/machine learning database platforms. We support Oracle, SQL Server, IBM DB2, MongoDB, Casandra, MySQL, Amazon Aurora, and other database systems.

SQL for data analysis

15 June 2020

If you want to become a data processing and analysis specialist, you cannot do without SQL. This language can be used not only by analysts or scientists, but also by anyone who can spend several days studying SQL queries:

SQL for data analysis

SQL for data analysis? What is it? What good does it do?

SQL stands for Structured Query Language, which translates to “the language of structured queries”.

But I like another description: it is Excel on steroids. This is what the same data set in Excel and SQL looks like:

 I like another description it is Excel on steroids

As we can see, SQL and Excel are quite similar. Both represent data as a two-dimensional table with columns and rows. Both are extremely structured and easy to understand.

But there are also some differences between them:

  • The first is the performance. Excel works great with small amounts of data, but when you have 100,000 rows, using formulas becomes a long and inefficient exercise.
  • The second is the way you interact with the database. Excel is primarily a graphical user interface (GUI). It allows you to scroll through a table, enter formulas, and cursor the necessary areas. There is no GUI in SQL – so-called SQL queries are used here.

There is no GUI in SQL - so-called SQL queries are used here

SUM in Excel

SUM in SQLSUM in SQL

When you master SQL queries for beginners, you will understand that working with it is more transparent and efficient than with Excel. In addition, it is much more convenient to automate your work in SQL and use the previous developments.

When analyzing data you will use SQL for quite simple tasks: summing up, combining arrays of data, simple statistical and mathematical methods. But with SQL your work will be more efficient, and you will be able to use larger arrays of data than before.

And what about Python, R or bash?

When you start using these languages to analyze data, you will realize that Python and R are good for one and SQL good for another. The main difference between these languages is in syntax, “functions” and performance. Now let’s move on to the practical part!

Step 0 – install the SQL database management system.

I will use postgreSQL. There are other types of SQL DBMS. But they are all similar, so if you learn postgreSQL, it will only take a few hours (or even minutes) to adapt to another language.

At this point, you should have three things:

  • A data server with access via terminal or iTerm;
  • PostgreSQL installed on your server;
  • Installed on your Pgadmin4 (or SQL Workbench) computer.

Note: I will use SQL Workbench as a tool for working with SQL.

Step 1 – Go to your SQL database via command line

First, you need to access the database from the command line. I will show you how to do this and in the future you will need to do the same:

  • Open the Terminal (or iTerm):
  • Connect to the server via SSH.

In my case, I will enter:

ssh tomi@[my_ip_address]

Go to your SQL database via command line

By logging on to the server, I access the postgreSQL database. Since I have already allowed access to my user, I enter the following command:

psql -U tomi -d postgres

psql is the command itself, -U specifies the username (in my case it’s “tomi”) and -d specifies the database name (in my case postgres, like yours). The command line invitation should change to the following:

postgres=>

The command line invitation should change to the following

Here we go! Now you have full access to your SQL database.

You can enter an SQL query to check it:

dt

A list of all data tables will be displayed. There’s only one so far. But that will change soon:

A list of all data tables will be displayed

Note: SQL databases are also called “relational databases”.

Step 2 – Downloading data

In this section we will work with a small data set called zoo. You can download it from here in the text format .tsv.

But let’s continue with the SQL query example:

1) Create a table:

CREATE TABLE zoo (
animal varchar(10),
uniq_id integer PRIMARY KEY,
water_needed integer
);

SQL query example

2) Check that the table has been accurately created:

dt

We have a new data table: zoo!

We have a new data table: zoo!

3) Downloading data!

Copy the request:

INSERT INTO zoo (animal,uniq_id,water_need) VALUES
('elephant', 1001,500),
('elephant', 1002,600),
('elephant', 1003,550),
('tiger', 1004,300),
('tiger', 1005, 320),
('tiger', 1006, 330),
('tiger', 1007, 290),
('tiger', 1008, 310),
('zebra', 1009,200),
('zebra',1010,220),
('zebra', 1011, 240),
('zebra', 1012, 230),
('zebra',1013,220),
('zebra', 1014,100),
('zebra', 1015, 80),
('lion', 1016, 420),
('lion', 1017,600),
('lion', 1018,500),
('lion', 1019, 390),
('kangaroo', 1020,410),
('kangaroo',1021,430),
('kangaroo',1022,410);

If all goes well, you will receive the following message:

INSERT 0 22

you will receive the following message

The most important SQL statement: SELECT

It’s time to study the main SQL query language operator. Here it is:

SELECT * FROM table_name;

We will use SELECT every time we need to read, filter, convert and sum up data. First of all, we will select (the English word select translates to “select”) everything in the zoo table.

SELECT * FROM zoo;

The most important SQL statement: SELECT

The operator gave me back the whole table.

SELECT * FROM zoo;

  • SELECT is the main operator that tells SQL that we want to read data from the table.
  • * – indicates that we want to select all columns.
  • FROM tells SQL that we are going to specify a table to read the data.

zoo is the name of the table. You can replace it with any other name if you have more than one table.
; this is the syntax required for SQL. Each query must end with a semicolon. If you accidentally skip it, SQL will wait for the query to continue, and nothing will show up.

SQL will wait for the query to continue, and nothing will show up

We will use the above expression regularly when working with complex SQL queries.

Selecting columns

You can replace the * sign with the column names. Let us consider the following query as an example:

SELECT animal, water_need FROM zoo;

Selecting columns

We get exactly what we expected: there are “animal” and “water_need” columns on the screen, but “uniq_id” is gone.

In the same way, you can select any column, just specify their names separated by a comma. You can do something like that as well:

SELECT animal, animal, animal FROM zoo;

The same column will be repeated several times… But since it makes little sense, it’s not worth doing.

Show the first few lines of data – LIMIT instruction

At the moment we are working with a data table consisting of 22 rows. To limit the selection to the first few rows, we will use the LIMIT instruction – a small “extension” to the base query:

SELECT * FROM zoo LIMIT 10;

Only the first 10 lines will be displayed on the screen.

Only the first 10 lines will be displayed on the screen

Line filtering – WHERE manual

Using the WHERE manual, you can select specific rows based on their values. SQL query example:

SELECT * FROM zoo WHERE animal = 'elephant';

Line filtering - WHERE manual

  • SELECT * FROM zoo -“ is “base request”.
  • WHERE” says SQL that you want to filter something out.
  • animal = ‘elephant’ -“ animal is the name of the column where we look for the desired value, and elephant is the value itself. In SQL you should always add the column where we are looking for the right value.
  • ;” Do not forget about the semicolon!

Self-test #1

This is an introductory article, so the first task will also be quite simple:

Select the first three zebras from the zoo table.

This task will essentially be the result of today’s article.

Are you ready?

Here is my solution:

SELECT * FROM zoo WHERE animal = 'zebra' LIMIT 3;

You have to admit, SQL is easy

And one more thing: the syntax…
All SQL queries must end with a semicolon (;). If you accidentally miss it, SQL will assume that the query is not complete and nothing will be displayed. For example:

postgres=> SELECT * FROM zoo
postgres->

That’s not very… That’s better:

postgres=> SELECT * FROM zoo;

The SQL language is case insensitive when it comes to keywords (SELECT, WHERE, LIMIT, etc.). For example:

SELECT * FROM zoo;

works in the same way as

select * from zoo;

Sensitivity to the case of table names, columns and their values depends on the settings. In our situation (when using postgreSQL), table and column names are case insensitive, unlike field values. For example:

SELECT * FROM zoo WHERE = 'elephant'; -" WORKING
SELECT * FROM ZOO WHERE ANIMAL = 'elephant'; -" WORKS.
SELECT * FROM ZOO WHERE ANIMAL = 'ELEPHANT'; -" DOES NOT WORK.

Note that most people in the SQL query language write keywords in capital letters (SELECT, WHERE, LIMIT, etc…), which improves code readability.

Conclusion
We have taken the first step towards using SQL for data analysis! Congratulations! Now you can write basic queries. But this is just the beginning!

SQL & Data Analytics for Beginners: Introduction

 
Tags: , ,

MORE NEWS

 

Preamble​​NoSql is not a replacement for SQL databases but is a valid alternative for many situations where standard SQL is not the best approach for...

Preamble​​MongoDB Conditional operators specify a condition to which the value of the document field shall correspond.Comparison Query Operators $eq...

5 Database management trends impacting database administrationIn the realm of database management systems, moreover half (52%) of your competitors feel...

The data type is defined as the type of data that any column or variable can store in MS SQL Server. What is the data type? When you create any table or...

Preamble​​MS SQL Server is a client-server architecture. MS SQL Server process starts with the client application sending a query.SQL Server accepts,...

First the basics: what is the master/slave?One database server (“master”) responds and can do anything. A lot of other database servers store copies of all...

Preamble​​Atom Hopper (based on Apache Abdera) for those who may not know is an open-source project sponsored by Rackspace. Today we will figure out how to...

Preamble​​MongoDB recently introduced its new aggregation structure. This structure provides a simpler solution for calculating aggregated values rather...

FlexibilityOne of the most advertised features of MongoDB is its flexibility.  Flexibility, however, is a double-edged sword. More flexibility means more...

Preamble​​SQLShell is a cross-platform command-line tool for SQL, similar to psql for PostgreSQL or MySQL command-line tool for MySQL.Why use it?If you...

Preamble​​Writing an application on top of the framework on top of the driver on top of the database is a bit like a game on the phone: you say “insert...

Preamble​​Oracle Coherence is a distributed cache that is functionally comparable with Memcached. In addition to the basic function of the API cache, it...

Preamble​​IBM pureXML, a proprietary XML database built on a relational mechanism (designed for puns) that offers both relational ( SQL / XML ) and...

  What is PostgreSQL array? In PostgreSQL we can define a column as an array of valid data types. The data type can be built-in, custom or enumerated....

Preamble​​If you are a Linux sysadmin or developer, there comes a time when you need to manage an Oracle database that can work in your environment.In this...

Preamble​​Starting with Microsoft SQL Server 2008, by default, the group of local administrators is no longer added to SQL Server administrators during the...