Pixy-Open-Source Vulnerability Scanner for PHP Applications

January 9, 2010 · Leave a Comment
Filed under: Databases, DB Tool, Featured, MS SQL Server, MySql, Oracle, PHP, Postgres, SQL 

An Open-Source Vulnerability Scanner for PHP Applications.

The Secure Systems Lab at the Technical University of Vienna has released the newest version of Pixy, an open-source vulnerability scanner. Here are some of the highlights:

Detection of SQL injection and XSS vulnerabilities in PHP source code

  1. Automatic resolution of file inclusions
  2. Computation of dependence graphs that help you understand the causes of reported vulnerabilities
  3. Static analysis engine (flow-sensitive, interprocedural, context-sensitive)
  4. Platform-independent written in Java.

http://pixybox.seclab.tuwien.ac.at




Pixy-Open-Source Vulnerability Scanner for PHP Applications

List SQL injection tools for exploiting

January 9, 2010 · Leave a Comment
Filed under: Databases, DB Tool, Featured, MS SQL Server, MySql, Oracle, Postgres, SQL 

List SQL injection tools for exploiting.

sqlmap

SQLInjector

Bobcat

Automagic

HacmeBank

Absinthe

Many different SQL Injections variations

January 9, 2010 · Leave a Comment
Filed under: Databases, Featured, MS SQL Server, MySql, Oracle, SQL 

SQL Injections variations from my collection..

admin’–

‘ or 0=0 –

” or 0=0 –

or 0=0 –

‘ or 0=0 #

” or 0=0 #

or 0=0 #

‘ or ‘x’=’x

” or “x”=”x

‘) or (’x’=’x

‘ or 1=1–

” or 1=1–

or 1=1–

‘ or a=a–

” or “a”=”a

‘) or (’a’=’a

“) or (”a”=”a

hi” or “a”=”a

hi” or 1=1 –

hi’ or 1=1 –

hi’ or ‘a’=’a

hi’) or (’a’=’a

hi”) or (”a”=”a

How to retrieve stored procedure return values from TableAdapter

If you’ve been wondering why you are not able to access stored procedure return values from TableAdapter, here’s the solution for you.

I will use tbTasks table and spInsertTask stored procedure to demonstrate the solution. You can see definitions for both below.

CREATE TABLE dbo.tbTasks

(

intID INT NOT NULL IDENTITY(1,1),

strName VARCHAR(100) NOT NULL,

intPriority INT NOT NULL,

dtDueDate DATETIME NOT NULL

)

—————————————-

CREATE PROCEDURE dbo.spInsertTask

(

@strName VARCHAR(200),

@intPriority INT,

@dtDueDate DATETIME

)

AS

INSERT INTO tbTasks (strName, intPriority, dtDueDate)

VALUES (@strName, @intPriority, @dtDueDate)

RETURN SCOPE_IDENTITY()

GO

Notice that tbTasks has an identity column named intID. Also, stored procedure spInsertTask returns the new identity column value using SCOPE_IDENTITY(). Knowing this new identity value is extremely useful on the client side.

Create a new Typed Dataset called TasksDataset and add tbTasks. Also, add a new query to tbTasksTableAdapter using spInsertTask stored procedure. When adding a new query, choose ‘A single value’ option.

At this point, you probably would expect that following code would assign the new identity value returned by spInsertTask stored procedure to returnValue variable.

[ VB ]

Dim taTasks As New TasksDatasetTableAdapters.tbTasksTableAdapter

Dim TaskName As String

Dim TaskPriority As Integer

Dim TaskDueDate As Date

Dim returnValue As Integer

TaskName = “Test”

TaskPriority = 1

TaskDueDate = Now()

returnValue = taTasks.InsertTask(TaskName, TaskPriority, TaskDueDate)

[C#]

TasksDatasetTableAdapters.tbTasksTableAdapter taCustomers = new WindowsApplication1.TasksDatasetTableAdapters.tbTasksTableAdapter();

String taskName;

int taskPriority;

DateTime taskDueDate;

int returnValue;

taskName = “Test”;

taskPriority = 1;

taskDueDate = System.DateTime.Now;

returnValue = taCustomers.InsertTask(taskName, taskPriority, taskDueDate);

However, running above code results in System.InvalidOperationException during run-time for VB and “Cannot implicitly convert type ‘int?’ to ‘int’.” compile error for C#. If you look at what actually gets returned by tbTasksTableAdapter.InsertTask() function, you will understand why above code does not work. You can find the function from the generated Typed Dataset code, TasksDataset.Designer.vb / TasksDataset.Designer.cs in this case.

[ VB ]

Public Overridable Overloads Function InsertTask(…) As System.Nullable(Of Integer)

Dim returnValue As Object

Try

returnValue = command.ExecuteScalar

Finally

End Try

If ((returnValue Is Nothing) _

OrElse (returnValue.GetType Is GetType(System.DBNull))) Then

Return New System.Nullable(Of Integer)

Else

Return New System.Nullable(Of Integer)(CType(returnValue, Integer))

End If

End Function

* C# version omitted since there’s no significant difference.

As you can see from above, what gets returned from InsertTask function is actually the return value of System.Data.SqlClient.SqlCommand.ExecuteScalar() which is the first column of the first row in the result set, or a null reference if the result set is empty, not the return value of the stored procedure. In this case, InsertTask returns null since the stored procedure does not return any result set.

If you choose ‘No value’ option, System.Data.SqlClient.SqlCommand.ExecuteNonQuery() is used instead. And the return value of ExecuteNonQuery() is the number of rows affected. Again, this is not the stored procedure return value.

So, how do you retrieve the stored procedure return value? Although it’s not immediately obvious, there’s an easy way to access it. Let’s look at the definition of the command object for the stored procedure. You can see it from tbTasksTableAdapter.InitCommandCollection() in TasksDataset.Designer.vb / TasksDataset.Designer.cs file.

[ VB]

Private Sub InitCommandCollection()

Me._commandCollection = New System.Data.SqlClient.SqlCommand(1) {}

Me._commandCollection(0) = New System.Data.SqlClient.SqlCommand

Me._commandCollection(0).Connection = Me.Connection

Me._commandCollection(0).CommandText = “SELECT intID, strName, intPriority, dtDueDate FROM dbo.tbTasks”

Me._commandCollection(0).CommandType = System.Data.CommandType.Text

Me._commandCollection(1) = New System.Data.SqlClient.SqlCommand

Me._commandCollection(1).Connection = Me.Connection

Me._commandCollection(1).CommandText = “dbo.spInsertTask”

Me._commandCollection(1).CommandType = System.Data.CommandType.StoredProcedure

Me._commandCollection(1).Parameters.Add(New System.Data.SqlClient.SqlParameter(“@RETURN_VALUE”, System.Data.SqlDbType.Int, 4, System.Data.ParameterDirection.ReturnValue, 10, 0, Nothing, System.Data.DataRowVersion.Current, false, Nothing, “”, “”, “”))

Me._commandCollection(1).Parameters.Add(New System.Data.SqlClient.SqlParameter(“@strName”, System.Data.SqlDbType.VarChar, 200, System.Data.ParameterDirection.Input, 0, 0, Nothing, System.Data.DataRowVersion.Current, false, Nothing, “”, “”, “”))

Me._commandCollection(1).Parameters.Add(New System.Data.SqlClient.SqlParameter(“@intPriority”, System.Data.SqlDbType.Int, 4, System.Data.ParameterDirection.Input, 10, 0, Nothing, System.Data.DataRowVersion.Current, false, Nothing, “”, “”, “”))

Me._commandCollection(1).Parameters.Add(New System.Data.SqlClient.SqlParameter(“@dtDueDate”, System.Data.SqlDbType.DateTime, 8, System.Data.ParameterDirection.Input, 23, 3, Nothing, System.Data.DataRowVersion.Current, false, Nothing, “”, “”, “”))

End Sub

* C# version omitted since there’s no significant difference.

You can see from above that parameters collection does actually include @RETURN_VALUE parameter. It’s the first parameter in the collection. When the stored procedure is executed, return value from the stored procedure is added to this item in the collection. So, all we need to do is to retrieve this value after executing InsertTask() function. To do that, I will add some code to the partial class defined in TasksDataset.vb / TasksDataset.cs.

[ VB ]

Namespace TasksDatasetTableAdapters

Partial Public Class tbTasksTableAdapter

Public Function GetReturnValue(ByVal commandIndex As Integer) As Object

Return Me.CommandCollection(commandIndex).Parameters(0).Value

End Function

End Class

End Namespace

[ C# ]

namespace WindowsApplication1.TasksDatasetTableAdapters {

public partial class tbTasksTableAdapter

{

public object GetReturnValue(int commandIndex)

{

return this.CommandCollection[commandIndex].Parameters[0].Value;

}

}

}

Since Dataset Designer does not generate partial class structure for TableAdapters, you will have to add above code yourself to partial class file. The commandIndex parameter is the index of the command object in _commandCollection to retrieve return value from. You can get that information by looking at tbTasksTableAdapter.InitCommandCollection(). Now, let’s modify the code that was not running to use this new function.

[ VB ]

Dim taTasks As New TasksDatasetTableAdapters.tbTasksTableAdapter

Dim TaskName As String

Dim TaskPriority As Integer

Dim TaskDueDate As Date

Dim returnValue As Integer

TaskName = “Test”

TaskPriority = 1

TaskDueDate = Now()

taTasks.InsertTask(TaskName, TaskPriority, TaskDueDate)

returnValue = taTasks.GetReturnValue(1)

[C#]

TasksDatasetTableAdapters.tbTasksTableAdapter taCustomers = new WindowsApplication1.TasksDatasetTableAdapters.tbTasksTableAdapter();

String taskName;

int taskPriority;

DateTime taskDueDate;

int returnValue;

taskName = “Test”;

taskPriority = 1;

taskDueDate = System.DateTime.Now;

taCustomers.InsertTask(taskName, taskPriority, taskDueDate);

returnValue = (int)taCustomers.GetReturnValue(1);

We pass in 1 as a parameter value to GetReturnValue() since our stored procedure is located at index 1 in _commandCollection. Above code will correctly retrieve return value from the stored procedure which is the new identity value of intID column. If you have more than one stored procedures that return something, you can retrieve those return values by calling GetReturnValue() with correct index.

Typed Dataset simplifies data access layer development significantly by generating necessary code for you based on the information you provide via Dataset Designer. Although generated code covers large number of scenarios, I suggest that you take a close look at generated code and find out how you can extend the functionality of default Typed Dataset. And definitely let us know how we can improve it to make Typed Dataset more powerful and flexible.

Also, don’t forget to let me know if you have better ways to retrieve return values from stored procedures. What I suggest here is just one solution and I am sure that you have other solutions that might be more elegant than this.

Young Joo

Sql Injection PHP MySql example

October 9, 2009 · Leave a Comment
Filed under: Databases, Featured, MySql, MySql 5.0, MySQL 5.1, PHP, SQL 

What is SQL Injection

SQL injection refers to the act of someone inserting a MySQL statement to be run on your database without your knowledge. Injection usually occurs when you ask a user for input, like their name, and instead of a name they give you a MySQL statement that you will unknowingly run on your database.

SQL Injection Example

Below is a sample string that has been gathered from a normal user and a bad user trying to use SQL Injection. We asked the users for their login, which will be used to run a SELECT statement to get their information.

MySQL & PHP Code:

// a good user's name
$name = "timmy";
$query = "SELECT * FROM customers WHERE username = '$name'";
echo "Normal: " . $query . "<br />";

// user input that uses SQL Injection
$name_bad = "' OR 1'"; 

// our MySQL query builder, however, not a very safe one
$query_bad = "SELECT * FROM customers WHERE username = '$name_bad'";

// display what the new query will look like, with injection
echo "Injection: " . $query_bad;

Display:

Normal: SELECT * FROM customers WHERE username = ‘timmy’
Injection: SELECT * FROM customers WHERE username = ” OR 1”

The normal query is no problem, as our MySQL statement will just select everything from customers that has a username equal to timmy.

However, the injection attack has actually made our query behave differently than we intended. By using a single quote (‘) they have ended the string part of our MySQL query

  • username = ‘ ‘

and then added on to our WHERE statement with an OR clause of 1 (always true).

  • username = ‘ ‘ OR 1

This OR clause of 1 will always be true and so every single entry in the “customers” table would be selected by this statement!

More Serious SQL Injection Attacks

Although the above example displayed a situation where an attacker could possibly get access to a lot of information they shouldn’t have, the attacks can be a lot worse. For example an attacker could empty out a table by executing a DELETE statement.

MySQL & PHP Code:

$name_evil = "'; DELETE FROM customers WHERE 1 or username = '"; 

// our MySQL query builder really should check for injection
$query_evil = "SELECT * FROM customers WHERE username = '$name_evil'";

// the new evil injection query would include a DELETE statement
echo "Injection: " . $query_evil;

Display:

SELECT * FROM customers WHERE username = ‘ ‘; DELETE FROM customers WHERE 1 or username = ‘ ‘

If you were run this query, then the injected DELETE statement would completely empty your “customers” table. Now that you know this is a problem, how can you prevent it?

Injection Prevention -mysql_real_escape_string()

Lucky for you, this problem has been known for a while and PHP has a specially-made function to prevent these attacks. All you need to do is use the mouthful of a function mysql_real_escape_string.

What mysql_real_escape_string does is take a string that is going to be used in a MySQL query and return the same string with all SQL Injection attempts safely escaped. Basically, it will replace those troublesome quotes(‘) a user might enter with a MySQL-safe substitute, an escaped quote \’.

Lets try out this function on our two previous injection attacks and see how it works.

MySQL & PHP Code:

//NOTE: you must be connected to the database to use this function!
// connect to MySQL

$name_bad = "' OR 1'"; 

$name_bad = mysql_real_escape_string($name_bad);

$query_bad = "SELECT * FROM customers WHERE username = '$name_bad'";
echo "Escaped Bad Injection: <br />" . $query_bad . "<br />";

$name_evil = "'; DELETE FROM customers WHERE 1 or username = '"; 

$name_evil = mysql_real_escape_string($name_evil);

$query_evil = "SELECT * FROM customers WHERE username = '$name_evil'";
echo "Escaped Evil Injection: <br />" . $query_evil;

Display:

Escaped Bad Injection:
SELECT * FROM customers WHERE username = ‘\’ OR 1\”
Escaped Evil Injection:
SELECT * FROM customers WHERE username = ‘\'; DELETE FROM customers WHERE 1 or username = \”

Notice that those evil quotes have been escaped with a backslash \, preventing the injection attack. Now all these queries will do is try to find a username that is just completely ridiculous:

  • Bad: \’ OR 1\’
  • Evil: \'; DELETE FROM customers WHERE 1 or username = \’

And I don’t think we have to worry about those silly usernames getting access to our MySQL database. So please do use the handy mysql_real_escape_string() function to help prevent SQL Injection attacks on your websites. You have no excuse not to use it after reading this lesson!

What Is SQL Injection? Understanding This Important Threat Vector

October 8, 2009 · 3 Comments
Filed under: Featured, MySql, MySql 5.0, MySQL 5.1, SQL 

In August 2009 three people were charged with the theft of 130 million credit card numbers from Heartland Payment Systems, 7-Eleven and a supermarket chain called Hannaford Brothers. It was reportedly the biggest case of identity theft in U.S. history. This theft was carried out using SQL injection techniques. What is SQL injection and why should you be bothered about it?

We all log into Web sites on a daily basis, be it for shopping, banking or accessing protected content. In most cases usernames are stored in a relational database along with other user details and related information.

The proliferation of databases created to support this growth in websites with protected content has lead to many insecure and vulnerable sites, hastily put on line with little thought to the underlying security requirements. This gaping hole in the security of Web sites is being aggressively exploited by SQL injection attacks.

Even more worrying this attack can be carried out against fully patched databases. It is not a problem with patches, more an issue with the way databases work. This attack can be made against search pages, feedback forms, customer comment forms and any other Web site pages that rely on a database engine. All databases can be vulnerable to this attack including MySQL, SQL Server and Oracle. Despite its name SQL injection is not only specific to Microsoft SQL Server.

Anatomy of a typical SQL injection attack

Probably the most vulnerable page for a SQL injection attack would be the one used to login to a site. The database table that stores a user’s details would normally have at least two columns—one for the username and one for the password. The table would probably be called users or something very similar.

The database will parse the username and password typed into the logon screen and convert it into a string of SQL to send to the database, so the database engine receives a line of SQL similar to this:

SELECT * FROM users WHERE username = ‘Chris Date’ AND password = ‘userspassword’

This is a pretty standard SQL statement and would look more or less the same whichever database you were using to store the user’s data. Hackers are interested in SQL injection attacks as they can manipulate the database by sending it some duff data. Instead of giving a valid username they could type something like this:

SELECT * FROM users WHERE username = ‘ ‘ ‘ AND password = ‘ ‘

The hacker has supplied a ‘ character as the username and a blank as the password. The database, being well behaved, would parse this query but return an error message informing the user that they have made a mistake in the line of code they have just typed in.

Using this chink in the database security armour the hacker will start to dig around a bit more. The error messages returned from the database will start to become increasingly more helpful, and begin to point out the structure of the underlying tables in the database. We can start to fool the database further by adding in other strings to the username.

For example if we type in ‘or email=’345 as our username it will be resolved by the query parser as:

SELECT * FROM users
WHERE username = ‘ ‘or email=’345′ AND password = ‘ ‘

At this point the database may give an error if there is no such column in the users database as email. On the other hand it may not give an error message suggesting that there is a column called e-mail after all. All the hacker needs do is guess the e-mail address for a legitimate user. As most organisations have a standard e-mail address structure it is very easy to get an employee name and then use that as your user name in the SQL injection attack.

As you can see it is scarily easy to build up a picture of a database structure based on some simple fooling of the text entry screen and some intelligent guesses as to the likely structure of a database.

How can SQL injection be avoided?

It is possible to filter out single quotes, double quotes, slash, backslash, semi-colon and extended characters from user input strings, cookie values and parameters from a URL. In addition numeric values can be converted into integers before passing to the database.

Database administrators can take further measures in case they are still attacked. For example the simple measure of deleting all of the extended stored procedures that are not needed such as xp_sendmail and xp_cmdshell can immediately start to secure the database.

SQL injection is not a new attack and is avoidable if you take some basic security measures—whatever database you use.

SQL Index Types

Index Types

In addition to an index being clustered or nonclustered, it can be configured in other ways:

  • Composite index: An index that contains more than one column. In both SQL Server 2005 and 2008, you can include up to 16 columns in an index, as long as the index doesn’t exceed the 900-byte limit. Both clustered and nonclustered indexes can be composite indexes.
  • Unique Index: An index that ensures the uniqueness of each value in the indexed column. If the index is a composite, the uniqueness is enforced across the columns as a whole, not on the individual columns. For example, if you were to create an index on the FirstName and LastName columns in a table, the names together must be unique, but the individual names can be duplicated.

A unique index is automatically created when you define a primary key or unique constraint:

·          

    • Primary key: When you define a primary key constraint on one or more columns, SQL Server automatically creates a unique, clustered index if a clustered index does not already exist on the table or view. However, you can override the default behavior and define a unique, nonclustered index on the primary key.
    • Unique: When you define a unique constraint, SQL Server automatically creates a unique, nonclustered index. You can specify that a unique clustered index be created if a clustered index does not already exist on the table.
  • Covering index: A type of index that includes all the columns that are needed to process a particular query. For example, your query might retrieve the FirstName and LastName columns from a table, based on a value in the ContactID column. You can create a covering index that includes all three columns.

Index Design

As beneficial as indexes can be, they must be designed carefully. Because they can take up significant disk space, you don’t want to implement more indexes than necessary. In addition, indexes are automatically updated when the data rows themselves are updated, which can lead to additional overhead and can affect performance. As a result, index design should take into account a number of considerations.

Database

As mentioned above, indexes can enhance performance because they can provide a quick way for the query engine to find data. However, you must also take into account whether and how much you’re going to be inserting, updating, and deleting data. When you modify data, the indexes must also be modified to reflect the changed data, which can significantly affect performance. You should consider the following guidelines when planning your indexing strategy:

  • For tables that are heavily updated, use as few columns as possible in the index, and don’t over-index the tables.
  • If a table contains a lot of data but data modifications are low, use as many indexes as necessary to improve query performance. However, use indexes judiciously on small tables because the query engine might take longer to navigate the index than to perform a table scan.
  • For clustered indexes, try to keep the length of the indexed columns as short as possible. Ideally, try to implement your clustered indexes on unique columns that do not permit null values. This is why the primary key is often used for the table’s clustered index, although query considerations should also be taken into account when determining which columns should participate in the clustered index.
  • The uniqueness of values in a column affects index performance. In general, the more duplicate values you have in a column, the more poorly the index performs. On the other hand, the more unique each value, the better the performance. When possible, implement unique indexes.
  • For composite indexes, take into consideration the order of the columns in the index definition. Columns that will be used in comparison expressions in the WHERE clause (such as WHERE FirstName = ‘Charlie’) should be listed first. Subsequent columns should be listed based on the uniqueness of their values, with the most unique listed first.
  • You can also index computed columns if they meet certain requirements. For example, the expression used to generate the values must be deterministic (which means it always returns the same result for a specified set of inputs). For more details about indexing computed columns, see the topic “Creating Indexes on Computed Columns” in SQL Server Books Online.

Queries

Another consideration when setting up indexes is how the database will be queried. As mentioned above, you must take into account the frequency of data modifications. In addition, you should consider the following guidelines:

  • Try to insert or modify as many rows as possible in a single statement, rather than using multiple queries.
  • Create nonclustered indexes on columns used frequently in your statement’s predicates and join conditions.
  • Consider indexing columns used in exact-match queries.

 

SQL JOINS

February 13, 2009 · Leave a Comment
Filed under: Featured, MySql, MySQL 5.1, Oracle, SQL, SQL JOINS 

 

The SQL JOIN clause is used whenever we have to select data from 2 or more tables.

To be able to use SQL JOIN clause to extract data from 2 (or more) tables, we need a relationship between certain columns in these tables.

We are going to illustrate our SQL JOIN example with the following 2 tables:

Customers:

CustomerID

FirstName

LastName

Email

DOB

Phone

1

John

Smith

John.Smith@yahoo.com

2/4/1968

626 222-2222

2

Steven

Goldfish

goldfish@fishhere.net

4/4/1974

323 455-4545

3

Paula

Brown

pb@herowndomain.org

5/24/1978

416 323-3232

4

James

Smith

jim@supergig.co.uk

20/10/1980

416 323-8888

Sales:

CustomerID

Date

SaleAmount

2

5/6/2004

$100.22

1

5/7/2004

$99.95

3

5/7/2004

$122.95

3

5/13/2004

$100.00

4

5/22/2004

$555.55

As you can see those 2 tables have common field called CustomerID and thanks to that we can extract information from both tables by matching their CustomerID columns.

Consider the following SQL statement:

SELECT Customers.FirstName, Customers.LastName, SUM(Sales.SaleAmount) AS SalesPerCustomer
FROM Customers, Sales
WHERE Customers.CustomerID = Sales.CustomerID
GROUP BY Customers.FirstName, Customers.LastName

The SQL expression above will select all distinct customers (their first and last names) and the total respective amount of dollars they have spent.
The SQL JOIN condition has been specified after the SQL WHERE clause and says that the 2 tables have to be matched by their respective CustomerID columns.

Here is the result of this SQL statement:

FirstName

LastName

SalesPerCustomers

John

Smith

$99.95

Steven

Goldfish

$100.22

Paula

Brown

$222.95

James

Smith

$555.55

The SQL statement above can be re-written using the SQL JOIN clause like this:

SELECT Customers.FirstName, Customers.LastName, SUM(Sales.SaleAmount) AS SalesPerCustomer
FROM Customers JOIN Sales
ON Customers.CustomerID = Sales.CustomerID
GROUP BY Customers.FirstName, Customers.LastName

There are 2 types of SQL JOINS – INNER JOINS and OUTER JOINS. If you don’t put INNER or OUTER keywords in front of the SQL JOIN keyword, then INNER JOIN is used. In short “INNER JOIN” = “JOIN” (note that different databases have different syntax for their JOIN clauses).

The INNER JOIN will select all rows from both tables as long as there is a match between the columns we are matching on. In case we have a customer in the Customers table, which still hasn’t made any orders (there are no entries for this customer in the Sales table), this customer will not be listed in the result of our SQL query above.

If the Sales table has the following rows:

CustomerID

Date

SaleAmount

2

5/6/2004

$100.22

1

5/6/2004

$99.95

And we use the same SQL JOIN statement from above:

SELECT Customers.FirstName, Customers.LastName, SUM(Sales.SaleAmount) AS SalesPerCustomer
FROM Customers JOIN Sales
ON Customers.CustomerID = Sales.CustomerID
GROUP BY Customers.FirstName, Customers.LastName

We’ll get the following result:

FirstName

LastName

SalesPerCustomers

John

Smith

$99.95

Steven

Goldfish

$100.22

Even though Paula and James are listed as customers in the Customers table they won’t be displayed because they haven’t purchased anything yet.

But what if you want to display all the customers and their sales, no matter if they have ordered something or not? We’ll do that with the help of SQL OUTER JOIN clause.

The second type of SQL JOIN is called SQL OUTER JOIN and it has 2 sub-types called LEFT OUTER JOIN and RIGHT OUTER JOIN.

The LEFT OUTER JOIN or simply LEFT JOIN (you can omit the OUTER keyword in most databases), selects all the rows from the first table listed after the FROM clause, no matter if they have matches in the second table.

If we slightly modify our last SQL statement to:

SELECT Customers.FirstName, Customers.LastName, SUM(Sales.SaleAmount) AS SalesPerCustomer
FROM Customers LEFT JOIN Sales
ON Customers.CustomerID = Sales.CustomerID
GROUP BY Customers.FirstName, Customers.LastName

and the Sales table still has the following rows:

CustomerID

Date

SaleAmount

2

5/6/2004

$100.22

1

5/6/2004

$99.95

The result will be the following:

FirstName

LastName

SalesPerCustomers

John

Smith

$99.95

Steven

Goldfish

$100.22

Paula

Brown

NULL

James

Smith

NULL

As you can see we have selected everything from the Customers (first table). For all rows from Customers, which don’t have a match in the Sales (second table), the SalesPerCustomer column has amount NULL (NULL means a column contains nothing).

The RIGHT OUTER JOIN or just RIGHT JOIN behaves exactly as SQL LEFT JOIN, except that it returns all rows from the second table (the right table in our SQL JOIN statement).

interesting?..http://www.devshed.com/c/a/MySQL/Understanding-SQL-Joins/

 

 

 

Three easy ways to optimize your MySQL queries

December 15, 2008 · Leave a Comment
Filed under: Featured, MySql, MySql 5.0, MySQL 5.1, Query, SQL 

Any database programmer will tell you that in high-traffic database-driven applications, a single badly-designed SQL query can significantly impact the overall performance of your application. Not only does such a query consume more database time than it ideally should, but it can have an exponential effect on the performance of other application components.

Optimizing query performance is as much a black art as a science, as heavily dependent on the developer’s intuition as on hard statistical performance data. Fortunately, databases likes MySQL come with some tools to aid the process, and this article discusses three of them briefly: using indexes, analyzing queries with EXPLAIN, and adjusting MySQL’s internal configuration.

#1: Using indexes

MySQL allows you to index database tables, making it possible to quickly seek to records without performing a full table scan first and thus significantly speeding up query execution. You can have up to 16 indexes per table, and MySQL also supports multi-column indexes and full-text search indexes.

Adding an index to a table is as simple as calling the CREATE INDEX command and specifying the field(s) to index. Listing A shows you an example:

Listing A


mysql> CREATE INDEX idx_username ON users(username);
Query OK, 1 row affected (0.15 sec)
Records: 1 Duplicates: 0 Warnings: 0

Here, indexing the username field of the users table ensures that SELECT queries which reference this field in their WHERE or HAVING clause will run a little faster than in the pre-indexed state. You can check that the index was created (Listing B) with the SHOW INDEX command:

Listing B


mysql> SHOW INDEX FROM users;
————–+————-+———–+————-+———-+——–+——+————+———+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
————–+————-+———–+————-+———-+——–+——+————+———+
| users | 1 | idx_username | 1 | username | A | NULL | NULL | NULL | YES | BTREE | |
————–+————-+———–+————-+———-+——–+——+————+———+
1 row in set (0.00 sec)

It’s important to note that indexes are a double-edged sword. Indexing every field of a table is usually unnecessary, and is quite likely to slow things down significantly when inserting or updating data because of the additional work MySQL has to do to rebuild the index each time. On the other hand, avoiding indexes altogether isn’t such a great idea either, because while this will speed up INSERTs, it will cause SELECT operations to slow down. There is thus always a trade-off to be made, and it’s wise to consider what the primary function of the table will be (data retrieval or data edit) when designing the indexing system.

#2: Optimizing query performance

When analyzing query performance, it’s also useful to consider the EXPLAIN keyword. This keyword, when placed in front of a SELECT query, describes how MySQL intends to execute the query and the number of rows it will need to process to successfully deliver a result set. To illustrate, consider the following simple example (Listing C):

Listing C


mysql> EXPLAIN SELECT city.name, city.district FROM city, country WHERE city.countrycode = country.code AND country.code = ‘IND';
+—-+————-+———+——-+—————+———+———+——-+——+————-+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+—-+————-+———+——-+—————+———+———+——-+——+————-+
| 1 | SIMPLE | country | const | PRIMARY | PRIMARY | 3 | const | 1 | Using index |
| 1 | SIMPLE | city | ALL | NULL | NULL | NULL | NULL | 4079 | Using where |
+—-+————-+———+——-+—————+———+———+——-+——+————-+
2 rows in set (0.00 sec)

Here, the query is structured as a join between two tables and the EXPLAIN keyword describes how MySQL will process the join. It should be clear the current design will require MySQL to process only one record in the country table (which is indexed) but all 4079 records in the city table (which isn’t). This then suggests scope for improvement using other optimization tricks – for example, adding an index to the city table as follows (Listing D):

Listing D


mysql> CREATE INDEX idx_ccode ON city(countrycode);
Query OK, 4079 rows affected (0.15 sec)
Records: 4079 Duplicates: 0 Warnings: 0

And now, when you re-run the query with EXPLAIN, you’ll see a noticeable improvement (Listing E):

Listing E


mysql> EXPLAIN SELECT city.name, city.district FROM city, country WHERE city.countrycode = country.code AND country.code = ‘IND';
+—-+————-+———+——-+—————+———–+———+——-+——+————-+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+—-+————-+———+——-+—————+———–+———+——-+——+————-+
| 1 | SIMPLE | country | const | PRIMARY | PRIMARY | 3 | const | 1 | Using index |
| 1 | SIMPLE | city | ref | idx_ccode | idx_ccode | 3 | const | 333 | Using where |
+—-+————-+———+——-+—————+———–+———+——-+——+————-+
2 rows in set (0.01 sec)

As this illustrates, MySQL now only needs to scan 333 records in the city table to produce a result set — a reduction of almost 90 percent! Naturally, this translates into faster query execution time and more efficient usage of database resources.

#3: Adjusting internal variables

MySQL is so open that it’s fairly easy to further fine-tune its default settings to obtain greater performance and stability. Some of the key variables that should be optimized are listed below.

  • Altering Index Buffer Size (key_buffer)
    This variable controls the size of the buffer used when handling table indices (both read and write operations). The MySQL manual recommends that this variable be increased “to as much as you can afford” to ensure you get the best performance on indexed tables, and recommends a value equivalent to about 25 percent of the total system memory. This is one of the more important MySQL configuration variables and if you’re interested in optimizing and improving performance, trying different values for the key_buffer_size variable is a good place to start.
  • Altering Table Buffer Size (read_buffer_size)
    When a query requires a table to be scanned sequentially, MySQL allocates a memory buffer to this query. The read_buffer_size variable controls the size of this buffer. If you find that sequential scans are proceeding slowly, you can improve performance by increasing this value, and hence the size of the memory buffer.
  • Setting The Number Of Maximum Open Tables (table_cache)
    This variable controls the maximum number of tables MySQL can have open at any one time, and thus controls the server’s ability to respond to incoming requests. This variable is closely related to the max_connections variables — increasing this value allows MySQL to keep a larger number of tables open, just as increasing max_connections increases the number of allowed connections. Consider altering this value if you have a high-volume server which receives queries on multiple different databases and tables.
  • Deciding A Time Limit For Long Queries (long_query_time)
    MySQL comes with a so-called “slow query log”, which automatically logs all queries that do not end within a particular time limit. This log is useful to track inefficient or misbehaving queries, and to find targets for optimization algorithms. The long_query_time variable controls this maximum time limit, in seconds.

The previous discussion should give you some insight into three tools you can use to analyze and optimize your SQL queries, and help you squeeze better performance out of your application. Go on and try them out — and happy optimizing!