Dear readers of our blog, we'd like to recommend you to visit the main page of our website, where you can learn about our product SQLS*Plus and its advantages.
 
SQLS*Plus - best SQL Server command line reporting and automation tool! SQLS*Plus is several orders of magnitude better than SQL Server sqlcmd and osql command line tools.
 

REQUEST COMPLIMENTARY SQLS*PLUS LICENCE

Enteros UpBeat offers a patented database performance management SaaS platform. It proactively identifies root causes of complex revenue-impacting database performance issues across a growing number of RDBMS, NoSQL, and deep/machine learning database platforms. We support Oracle, SQL Server, IBM DB2, MongoDB, Casandra, MySQL, Amazon Aurora, and other database systems.

10 major SQL programming errors and how to avoid them

17 September 2020

10 major SQL programming errors and how to avoid them

SQL (Structured Query Language) is a simple language, but it is not as simple when you work with datasets that contain millions of records.

When you work with medium and large tables, writing the most efficient SQL statements is a priority. Before you start writing SQL statements for any platform – Oracle, SQL Server, MySQL, or any other – these are the 10 most common SQL programming errors you should avoid.

Forgotten primary keys

Each table requires a primary key for performance. If you don’t have a primary key in any table, your tables don’t meet standard SQL requirements and suffer from performance problems. Primary keys are automatically set as cluster indexes, which speeds up queries. They are always unique, so you can use an auto-incremental numeric value if you do not have another column in a table that meets the unique requirement.

Primary keys are your first step towards relational databases. They refer to the external keys in the relational tables. For example, if you have a table with a list of clients, the “CustomerId” column shall be unique for each client. This may be your column of the primary key. Your CustomerId value will then be placed in the “Order” table to link the two tables together. Always use the primary key in every table you create, regardless of its size.

Poorly managed data redundancy

Data redundancy is good for backups, but not for tabular data. Each table must contain a unique set of data that does not repeat the data elsewhere in the table. This is one of the most difficult ideas for a new SQL developer. It is very easy to forget the normalization rules and repeat data in several tables for convenience, but it is not necessary and represents a bad design of a table.

For example, suppose you have a client table that contains the client’s address. Since the address refers to the client, it is in the right place. Then you create an “Order” table and add the client address to the “Order” table. This type of data redundancy is poorly designed.

The Customer table and the “Order” table can communicate with each other using the primary and external keys. What happens if you forget to update the customer address in the order table? As a result, you now have two addresses for the customer, and you do not know which one is the exact one.

Bottom line: always store data in one place and use the relationship between primary and external keys to request data.

Avoid NOT IN or IN and use JOIN instead

NOT IN and IN operators are poorly optimized. They are convenient, but you can usually replace them with a simple JOIN operator. Take a look at a sample request.

SELECT *FROM Customer
WHERE NOT IN (SELECT CustomerId FROM Order)

The above application returns the data set of all customers who do not have an order. In this statement, the SQL database extracts all orders from the Order table and then filters the set of records based on the main external query in the Customer table. If you have millions of orders, this is an extremely slow query.

An alternative, more efficient option is as follows.

SELECT * FROM Customer c
LEFT JOIN Order o on c.CustomerId = o.CustomerId
WHERE o.CustomerId IS NULL

The LEFT JOIN operator returns the same data set as the previous operator, but it is much more optimized. It combines two tables by primary and external key, which increases the query speed and avoids NOT IN and IN sentences.

Forgotten NULL values and empty string values

Discussions between empty and empty lines between database administrators continued for decades. You can use NULL values if no values are available, or you can use actual literal values such as zero-length strings or 0 integer values.

What you use in the database must be the same for all tables, otherwise, queries may become chaotic. Remember that NULL values do not match, for example, a zero-length string, so your queries should take these values into account if there is no standard in the table design.

When you determine what you want to use, make sure that your queries take these values into account. For example, if you enable NULL for a username, you must execute a query using the NULL filter (NOT NULL or IS NULL) in your queries to include or exclude these entries.

The asterisk symbol in SELECT operators

Always define the columns that you want to return to your requests. This standard is for performance and security. Take the following request for example.

SELECT * FROM Customer

The request returns all customer values, including any confidential data you may have stored in the table. What if you keep a customer password or social security number on the table? We hope these values are encrypted, but even having a hashed value can help hackers. It is also a performance problem if you have dozens of columns in the table.

Instead of the above query, always define each column. The following statement is an example.

SELECT CustomerId, FirstName, LastName FROM Customer

The above operator defines each column and limits the size of the recordset and what the hacker can see in case of data hacking.

A cycle with too many cursors

Cursors, cyclic structures in SQL language, is the basis of database performance. They allow you to go through millions of records and run operators for each of them individually. While this may seem like an advantage, it can reduce database performance.

Circles are common in programming languages, but they are inefficient in SQL programming. Most database administrators reject SQL procedures with cursors embedded.

It is best to write the procedure differently to avoid a negative impact on database performance if possible. You can replace most cursors with a well-written SQL statement. If you cannot avoid this, the cursors should be saved for scheduled tasks that run during off-peak hours.

Cursors are used in query reports and data conversion jobs, so they cannot always be avoided. Simply limit them to as many as possible in the production databases that perform daily queries to your database.

Data inconsistencies in-field assignment procedures

When you declare the columns of a table, you must assign each column a data type. Make sure that this data type covers all the necessary values to be saved. Once you have defined the data type, you can only store this type of value in a column.

For example, you probably need a decimal precision of 2-3 points in the column that stores the total order value. If you assign this column as an integer, your database can only store integers without decimal values. What happens to decimal places depends on your database platform.

It may automatically crop the values or generate an error. Any alternative may cause a serious error in your application. Always keep in mind what you need to keep when developing your tables.

This includes writing queries – when you write your queries and pass parameter values to the stored procedure, the variable must be declared with the correct data type. Variables that do not represent the column data type will also generate errors or crop data in the process.

Logical OR and AND operations

It’s easy to miss a simple logical order when writing your requests. Operators AND and OR can change the data set significantly. You can avoid common SQL errors by using parentheses or by organizing your operators to represent the logic to be applied.

Let’s take a look at an SQL statement that mixes AND and OR.

SELECT CustomerId
FROM Customer
WHERE FirstName = 'Jack' AND LastName = 'Destroyer' OR CustomerId > 0

The purpose of the above statement is to obtain any customer with the name “Jack” and “Destroyer” and the customer ID is greater than zero.

However, since we have mixed the AND operator with OR, all records where the client ID is above zero are returned. We can overcome this logical obstacle by using parentheses. Let us add them to the above statement.

SELECT CustomerId
FROM Customer
WHERE (FirstName = 'Jack' OR LastName = 'Destroyer') AND CustomerId > 0

We changed the logic for this statement. Now the first set of brackets returns all entries where the client’s name is Jack or Destroyer. With this filter, we tell SQL to return only those values where CustomerId is greater than zero.

These types of logical statements must be well verified before they are released into production.

The subqueries should return a single record

Subqueries are not the optimal way to write SQL statements, but sometimes they cannot be avoided. When you use subqueries, they must always return one record, otherwise, your query will not be executed.

Let’s have a look at an example.

SELECT CustomerId,
(SELECT OrderId FROM Order o WHERE c.CustomerId = o.CustomerId)
FROM Customer c

In the above query, we get a list of customer IDs from the Customer table. Please note that we get the order identifier from the order table where the customer identifier matches. If there is only one order, this query works fine. However, if there is more than one order for a customer, the subquery returns more than one entry and the request is not executed. You can avoid this problem by using the “Top 1” operator.

Let’s change the request to the next one.

SELECT CustomerId,
(SELECT Top 1 OrderId FROM Order o WHERE c.CustomerId = o.CustomerId ORDER BY OrderDate)
FROM Customer c

In the above query, we extract only one record and organize the records by date. This request receives the first order placed by the customer.

JOIN to indexes

The table should always be well indexed and you can take advantage of these performance improvements by using the JOIN operator for the columns assigned to the index. In the previous examples, we used the primary and external keys of the table.

The column of the primary key is always an index, so you don’t have to worry about performance. However, the external key must also have an index.

Any JOIN operators that you use must have an index for the column. If you do not have an index, consider adding it to the table.

Conclusion

Relational databases are ideal for most internal procedures, but you need to create the right SQL statement and optimize tables and queries for maximum performance.

Avoid these ten SQL errors and you will be on your way to creating a fast and efficient database for any small, medium, or large online business.

 
Tags:

MORE NEWS

 

Preamble​​NoSql is not a replacement for SQL databases but is a valid alternative for many situations where standard SQL is not the best approach for...

Preamble​​MongoDB Conditional operators specify a condition to which the value of the document field shall correspond.Comparison Query Operators $eq...

5 Database management trends impacting database administrationIn the realm of database management systems, moreover half (52%) of your competitors feel...

The data type is defined as the type of data that any column or variable can store in MS SQL Server. What is the data type? When you create any table or...

Preamble​​MS SQL Server is a client-server architecture. MS SQL Server process starts with the client application sending a query.SQL Server accepts,...

First the basics: what is the master/slave?One database server (“master”) responds and can do anything. A lot of other database servers store copies of all...

Preamble​​Atom Hopper (based on Apache Abdera) for those who may not know is an open-source project sponsored by Rackspace. Today we will figure out how to...

Preamble​​MongoDB recently introduced its new aggregation structure. This structure provides a simpler solution for calculating aggregated values rather...

FlexibilityOne of the most advertised features of MongoDB is its flexibility.  Flexibility, however, is a double-edged sword. More flexibility means more...

Preamble​​SQLShell is a cross-platform command-line tool for SQL, similar to psql for PostgreSQL or MySQL command-line tool for MySQL.Why use it?If you...

Preamble​​Writing an application on top of the framework on top of the driver on top of the database is a bit like a game on the phone: you say “insert...

Preamble​​Oracle Coherence is a distributed cache that is functionally comparable with Memcached. In addition to the basic function of the API cache, it...

Preamble​​IBM pureXML, a proprietary XML database built on a relational mechanism (designed for puns) that offers both relational ( SQL / XML ) and...

  What is PostgreSQL array? In PostgreSQL we can define a column as an array of valid data types. The data type can be built-in, custom or enumerated....

Preamble​​If you are a Linux sysadmin or developer, there comes a time when you need to manage an Oracle database that can work in your environment.In this...

Preamble​​Starting with Microsoft SQL Server 2008, by default, the group of local administrators is no longer added to SQL Server administrators during the...