There’s a harsh reality web application developers need to face up to; we don’t do security very well. A report from WhiteHat Security last year reported “83% of websites have had a high, critical or urgent issue”. That is, quite simply, a staggeringly high number and it’s only once you start to delve into to depths of web security that you begin to understand just how easy it is to inadvertently produce vulnerable code.
Inevitably a large part of the problem is education. Oftentimes developers are simply either not aware of common security risks at all or they’re familiar with some of the terms but don’t understand the execution and consequently how to secure against them.
Of course none of this should come as a surprise when you consider only 18 percent of IT security budgets are dedicated to web application security yet in 86% of all attacks, a weakness in a web interface was exploited. Clearly there is an imbalance leaving the software layer of web applications vulnerable.
OWASP and the Top 10
Enter OWASP, the Open Web Application Security Project, a non-profit charitable organisation established with the express purpose of promoting secure web application design. OWASP has produced some excellent material over the years, not least of which is The Ten Most Critical Web Application Security Risks – or “Top 10” for short - whose users and adopters include a who’s who of big business.
The Top 10 is a fantastic resource for the purpose of identification and awareness of common security risks. However it’s abstracted slightly from the technology stack in that it doesn’t contain a lot of detail about the execution and required countermeasures at an implementation level. Of course this approach is entirely necessary when you consider the extensive range of programming languages potentially covered by the Top 10.
What I’ve been finding when directing .NET developers to the Top 10 is some confusion about how to comply at the coalface of development so I wanted to approach the Top 10 from the angle these people are coming from. Actually, .NET web applications are faring pretty well in the scheme of things. According to the WhiteHat Security Statistics Report released last week, the Microsoft stack had fewer exploits than the likes of PHP, Java and Perl. But it still had numerous compromised sites so there is obviously still work to be done.
Moving on, this is going to be a 10 part process. In each post I’m going to look at the security risk in detail, demonstrate – where possible – how it might be exploited in a .NET web application and then detail the countermeasures at a code level. Throughout these posts I’m going to draw as much information as possible out of the OWASP publication so each example ties back into an open standard.
Here’s what I’m going to cover:
Some secure coding fundamentals
Before I start getting into the Top 10 it’s worth making a few fundamentals clear. Firstly, don’t stop securing applications at just these 10 risks. There are potentially limitless exploit techniques out there and whilst I’m going to be talking a lot about the most common ones, this is not an exhaustive list. Indeed the OWASP Top 10 itself continues to evolve; the risks I’m going to be looking at are from the 2010 revision which differs in a few areas from the 2007 release.
Secondly, applications are often compromised by applying a series of these techniques so don’t get too focussed on any single vulnerability. Consider the potential to leverage an exploit by linking vulnerabilities. Also think about the social engineering aspects of software vulnerabilities, namely that software security doesn’t start and end at purely technical boundaries.
Thirdly, the practices I’m going to write about by no means immunise code from malicious activity. There are always new and innovative means of increasing sophistication being devised to circumvent defences. The Top 10 should be viewed as a means of minimising risk rather than eliminating it entirely.
Finally, start thinking very, very laterally and approach this series of posts with an open mind. Experienced software developers are often blissfully unaware of how many of today’s vulnerabilities are exploited and I’m the first to put my hand up and say I’ve been one of these and continue to learn new facts about application security on a daily basis. This really is a serious discipline within the software industry and should not be approached casually.
Worked examples
I’m going to provide worked examples of both exploitable and secure code wherever possible. For the sake of retaining focus on the security concepts, the examples are going to be succinct, direct and as basic as possible.
So here’s the disclaimer: don’t expect elegant code, this is going to be elemental stuff written with the sole intention of illustrating security concepts. I’m not even going to apply basic practices such as sorting SQL statements unless it illustrates a security concept. Don’t write your production ready code this way!
Defining injection
Let’s get started. I’m going to draw directly from the OWASP definition of injection:
Injection flaws, such as SQL, OS, and LDAP injection, occur when untrusted data is sent to an interpreter as part of a command or query. The attacker’s hostile data can trick the interpreter into executing unintended commands or accessing unauthorized data.
The crux of the injection risk centres on the term “untrusted”. We’re going to see this word a lot over coming posts so let’s clearly define it now:
Untrusted data comes from any source – either direct or indirect – where integrity is not verifiable and intent may be malicious. This includes manual user input such as form data, implicit user input such as request headers and constructed user input such as query string variables. Consider the application to be a black box and any data entering it to be untrusted.
OWASP also includes a matrix describing the source, the exploit and the impact to business:
Threat Agents | Attack Vectors | Security Weakness | Technical Impacts | Business Impact | |
Exploitability EASY | Prevalence COMMON | Detectability AVERAGE | Impact SEVERE | ||
Consider anyone who can send untrusted data to the system, including external users, internal users, and administrators. | Attacker sends simple text-based attacks that exploit the syntax of the targeted interpreter. Almost any source of data can be an injection vector, including internal sources. | Injection flaws occur when an application sends untrusted data to an interpreter. Injection flaws are very prevalent, particularly in legacy code, often found in SQL queries, LDAP queries, XPath queries, OS commands, program arguments, etc. Injection flaws are easy to discover when examining code, but more difficult via testing. Scanners and fuzzers can help attackers find them. | Injection can result in data loss or corruption, lack of accountability, or denial of access. Injection can sometimes lead to complete host takeover. | Consider the business value of the affected data and the platform running the interpreter. All data could be stolen, modified, or deleted. Could your reputation be harmed? |
Most of you are probably familiar with the concept (or at least the term) of SQL injection but the injection risk is broader than just SQL and indeed broader than relational databases. As the weakness above explains, injection flaws can be present in technologies like LDAP or theoretically in any platform which that constructs queries from untrusted data.
Anatomy of a SQL injection attack
Let’s jump straight into how the injection flaw surfaces itself in code. We’ll look specifically at SQL injection because it means working in an environment familiar to most .NET developers and it’s also a very prevalent technology for the exploit. In the SQL context, the exploit needs to trick SQL Server into executing an unintended query constructed with untrusted data.
For the sake of simplicity and illustration, let’s assume we’re going to construct a SQL statement in C# using a parameter passed in a query string and bind the output to a grid view. In this case it’s the good old Northwind database driving a product page filtered by the beverages category which happens to be category ID 1. The web application has a link directly to the page where the CategoryID parameter is passed through in a query string. Here’s a snapshot of what the Products and Customers (we’ll get to this one) tables look like:
Here’s what the code is doing:
var catID = Request.QueryString["CategoryID"];
var sqlString = "SELECT * FROM Products WHERE CategoryID = " + catID; var connString = WebConfigurationManager.ConnectionStrings ["NorthwindConnectionString"].ConnectionString; using (var conn = new SqlConnection(connString)) { var command = new SqlCommand(sqlString, conn); command.Connection.Open(); grdProducts.DataSource = command.ExecuteReader(); grdProducts.DataBind(); }
And here’s what we’d normally expect to see in the browser:
In this scenario, the CategoryID query string is untrusted data. We assume it is properly formed and we assume it represents a valid category and we consequently assume the requested URL and the sqlString variable end up looking exactly like this (I’m going to highlight the untrusted data in red and show it both in the context of the requested URL and subsequent SQL statement):
Products.aspx?CategoryID=1
SELECT * FROM Products WHERE CategoryID = 1
Of course much has been said about assumption. The problem with the construction of this code is that by manipulating the query string value we can arbitrarily manipulate the command executed against the database. For example:
Products.aspx?CategoryID=1 or 1=1
SELECT * FROM Products WHERE CategoryID = 1 or 1=1
Obviously 1=1 always evaluates to true so the filter by category is entirely invalidated. Rather than displaying only beverages we’re now displaying products from all categories. This is interesting, but not particularly invasive so let’s push on a bit:
Products.aspx?CategoryID=1 or name=''
SELECT * FROM Products WHERE CategoryID = 1 or name=''
When this statement runs against the Northwind database it’s going to fail as the Products table has no column called name. In some form or another, the web application is going to return an error to the user. It will hopefully be a friendly error message contextualised within the layout of the website but at worst it may be a yellow screen of death. For the purpose of where we’re going with injection, it doesn’t really matter as just by virtue of receiving some form of error message we’ve quite likely disclosed information about the internal structure of the application, namely that there is no column called name in the table(s) the query is being executed against.
Let’s try something different:
Products.aspx?CategoryID=1 or productname=''
SELECT * FROM Products WHERE CategoryID = 1 or productname=''
This time the statement will execute successfully because the syntax is valid against Northwind so we have therefore confirmed the existence of the ProductName column. Obviously it’s easy to put this example together with prior knowledge of the underlying data schema but in most cases data models are not particularly difficult to guess if you understand a little bit about the application they’re driving. Let’s continue:
Products.aspx?CategoryID=1 or 1=(select count(*) from products)
SELECT * FROM Products WHERE CategoryID = 1 or 1=(select count(*) from products)
With the successful execution of this statement we have just verified the existence of the Products tables. This is a pretty critical step as it demonstrates the ability to validate the existence of individual tables in the database regardless of whether they are used by the query driving the page or not. This disclosure is starting to become serious information leakage we could potentially leverage to our advantage.
So far we’ve established that SQL statements are being arbitrarily executed based on the query string value and that there is a table called Product with a column called ProductName. Using the techniques above we could easily ascertain the existence of the Customers table and the CompanyName column by fairly assuming that an online system facilitating ordering may contain these objects. Let’s step it up a notch:
Products.aspx?CategoryID=1;update products set productname = productname
SELECT * FROM Products WHERE CategoryID = 1;update products set productname = productname
The first thing to note about the injection above is that we’re now executing multiple statements. The semicolon is terminating the first statement and allowing us to execute any statement we like afterwards. The second really important observation is that if this page successfully loads and returns a list of beverages, we have just confirmed the ability to write to the database. It’s about here that the penny usually drops in terms of understanding the potential ramifications of injection vulnerabilities and why OWASP categorises the technical impact as “severe”.
All the examples so far have been non-destructive. No data has been manipulated and the intrusion has quite likely not been detected. We’ve also not disclosed any actual data from the application, we’ve only established the schema. Let’s change that.
Products.aspx?CategoryID=1;insert into products(productname) select companyname from customers
SELECT * FROM Products WHERE CategoryID = 1;insert into products (productname) select companyname from customers
So as with the previous example, we’re terminating the CategoryID parameter then injecting a new statement but this time we’re populating data out of the Customers table. We’ve already established the existence of the tables and columns we’re dealing with and that we can write to the Products table so this statement executes beautifully. We can now load the results back into the browser:
Products.aspx?CategoryID=500 or categoryid is null
SELECT * FROM Products WHERE CategoryID = 500 or categoryid is null
The unfeasibly high CategoryID ensures existing records are excluded and we are making the assumption that the ID of new records defaults to null (obviously no default value on the column in this case). Here’s what the browser now discloses – note the company name of the customer now being disclosed in the ProductName column:
Bingo. Internal customer data now disclosed.
What made this possible?
The above example could only happen because of a series of failures in the application design. Firstly, the CategoryID query string parameter allowed any value to be assigned and executed by SQL Server without any parsing whatsoever. Although we would normally expect an integer, arbitrary strings were accepted.
Secondly, the SQL statement was constructed as a concatenated string and executed without any concept of using parameters. The CategoryID was consequently allowed to perform activities well outside the scope of its intended function.
Finally, the SQL Server account used to execute the statement had very broad rights. At the very least this one account appeared to have data reader and data writer rights. Further probing may have even allowed the dropping of tables or running of system commands if the account had the appropriate rights.
Validate all input against a whitelist
This is a critical concept not only this post but in the subsequent OWASP posts that will follow so I’m going to say it really, really loud:
All input must be validated against a whitelist of acceptable value ranges.
As per the definition I gave for untrusted data, the assumption must always be made that any data entering the system is malicious in nature until proven otherwise. The data might come from query strings like we just saw, from form variables, request headers or even file attributes such as the Exif metadata tags in JPG images.
In order to validate the integrity of the input we need to ensure it matches the pattern we expect. Blacklists looking for patterns such as we injected earlier on are hard work both because the list of potentially malicious input is huge and because it changes as new exploit techniques are discovered.
Validating all input against whitelists is both far more secure and much easier to implement. In the case above, we only expected a positive integer and anything outside that pattern should have been immediate cause for concern. Fortunately this is a simple pattern that can be easily validated against a regular expression. Let’s rewrite that first piece of code from earlier on with the help of whitelist validation:
var catID = Request.QueryString["CategoryID"]; var positiveIntRegex = new Regex(@"^0*[1-9][0-9]*$"); if(!positiveIntRegex.IsMatch(catID)) { lblResults.Text = "An invalid CategoryID has been specified."; return; }
Just this one piece of simple validation has a major impact on the security of the code. It immediately renders all the examples further up completely worthless in that none of the malicious CategoryID values match the regex and the program will exit before any SQL execution occurs.
An integer is a pretty simple example but the same principal applies to other data types. A registration form, for example, might expect a “first name” form field to be provided. The whitelist rule for this field might specify that it can only contain the letters a-z and common punctuation characters (be careful with this – there are numerous characters outside this range that commonly appear in names), plus it must be within 30 characters of length. The more constraints that can be placed around the whitelist without resulting in false positives, the better.
Regular expression validators in ASP.NET are a great way to implement field level whitelists as they can easily provide both client side (which is never sufficient on its own) and server side validation plus they tie neatly into the validation summary control. MSDN has a good overview of how to use regular expressions to constrain input in ASP.NET so all you need to do now is actually understand how to write a regex.
Finally, no input validation story is complete without the infamous Bobby Tables:
Parameterised stored procedures
One of the problems we had above was that the query was simply a concatenated string generated dynamically at runtime. The account used to connect to SQL Server then needed broad permissions to perform whatever action was instructed by the SQL statement.
Let’s take a look at the stored procedure approach in terms of how it protects against SQL injection. Firstly, we’ll put together the SQL to create the procedure and grant execute rights to the user.
CREATE PROCEDURE GetProducts @CategoryID INT AS SELECT * FROM dbo.Products WHERE CategoryID = @CategoryID GO GRANT EXECUTE ON GetProducts TO NorthwindUser GO
There are a couple of native defences in this approach. Firstly, the parameter must be of integer type or a conversion error will be raised when the value is passed. Secondly, the context of what this procedure – and by extension the invoking page – can do is strictly defined and secured directly to the named user. The broad reader and writer privileges which were earlier granted in order to execute the dynamic SQL are no longer needed in this context.
Moving on the .NET side of things:
var conn = new SqlConnection(connString); using (var command = new SqlCommand("GetProducts", conn)) { command.CommandType = CommandType.StoredProcedure; command.Parameters.Add("@CategoryID", SqlDbType.Int).Value = catID; command.Connection.Open(); grdProducts.DataSource = command.ExecuteReader(); grdProducts.DataBind(); }
This is a good time to point out that parameterised stored procedures are an additional defence to parsing untrusted data against a whitelist. As we previously saw with the INT data type declared on the stored procedure input parameter, the command parameter declares the data type and if the catID string wasn’t an integer the implicit conversion would throw a System.FormatException before even touching the data layer. But of course that won’t do you any good if the type is already a string!
Just one final point on stored procedures; passing a string parameter and then dynamically constructing and executing SQL within the procedure puts you right back at the original dynamic SQL vulnerability. Don’t do this!
Named SQL parameters
One of problems with the code in the original exploit is that the SQL string is constructed in its entirety in the .NET layer and the SQL end has no concept of what the parameters are. As far as it’s concerned it has just received a perfectly valid command even though it may in fact have already been injected with malicious code.
Using named SQL parameters gives us far greater predictability about the structure of the query and allowable values of parameters. What you’ll see in the following code block is something very similar to the first dynamic SQL example except this time the SQL statement is a constant with the category ID declared as a parameter and added programmatically to the command object.
const string sqlString = "SELECT * FROM Products WHERE CategoryID = @CategoryID"; var connString = WebConfigurationManager.ConnectionStrings ["NorthwindConnectionString"].ConnectionString; using (var conn = new SqlConnection(connString)) { var command = new SqlCommand(sqlString, conn); command.Parameters.Add("@CategoryID", SqlDbType.Int).Value = catID; command.Connection.Open(); grdProducts.DataSource = command.ExecuteReader(); grdProducts.DataBind(); }
What this will give us is a piece of SQL that looks like this:
exec sp_executesql N'SELECT * FROM Products WHERE CategoryID = @CategoryID',N'@CategoryID int',@CategoryID=1
There are two key things to observe in this statement:
- The sp_executesql command is invoked
- The CategoryID appears as a named parameter of INT data type
This statement is only going to execute if the account has data reader permissions to the Products table so one downside of this approach is that we’re effectively back in the same data layer security model as we were in the very first example. We’ll come to securing this further shortly.
The last thing worth noting with this approach is that the sp_executesql command also provides some query plan optimisations which although are not related to the security discussion, is a nice bonus.
LINQ to SQL
Stored procedures and parameterised queries are a great way of seriously curtailing the potential damage that can be done by SQL injection but they can also become pretty unwieldy. The case for using ORM as an alternative has been made many times before so I won’t rehash it here but I will look at this approach in the context of SQL injection. It’s also worthwhile noting that LINQ to SQL is only one of many ORMs out there and the principals discussed here are not limited purely to one of Microsoft’s interpretation of object mapping.
Firstly, let’s assume we’ve created a Northwind DBML and the data layer has been persisted into queryable classes. Things are now pretty simple syntax wise:
var dc = new NorthwindDataContext(); var catIDInt = Convert.ToInt16(catID); grdProducts.DataSource = dc.Products.Where(p => p.CategoryID == catIDInt); grdProducts.DataBind();
From a SQL injection perspective, once again the query string should have already been assessed against a whitelist and we shouldn’t be at this stage if it hasn’t passed. Before we can use the value in the “where” clause it needs to be converted to an integer because the DBML has persisted the INT type in the data layer and that’s what we’re going to be performing our equivalency test on. If the value wasn’t an integer we’d get that System.FormatException again and the data layer would never be touched.
LINQ to SQL now follows the same parameterised SQL route we saw earlier, it just abstracts the query so the developer is saved from being directly exposed to any SQL code. The database is still expected to execute what from its perspective, is an arbitrary statement:
exec sp_executesql N'SELECT [t0].[ProductID], [t0].[ProductName], [t0]. [SupplierID], [t0].[CategoryID], [t0].[QuantityPerUnit], [t0].[UnitPrice], [t0].[UnitsInStock], [t0].[UnitsOnOrder], [t0].[ReorderLevel], [t0]. [Discontinued]FROM [dbo].[Products] AS [t0]
WHERE [t0].[CategoryID] = @p0'
,N'@p0 int',@p0=1
There was some discussion about the security model in the early days of LINQ to SQL and concern expressed in terms of how it aligned to the prevailing school of thought regarding secure database design. Much of the reluctance related to the need to provide accounts connecting to SQL with reader and writer access at the table level. Concerns included the risk of SQL injection as well as from the DBA’s perspective, authority over the context a user was able to operate in moved from their control – namely within stored procedures – to the application developer’s control. However with parameterised SQL being generated and the application developers now being responsible for controlling user context and access rights it was more a case of moving cheese than any new security vulnerabilities.
Applying the principle of least privilege
The final flaw in the successful exploit above was that the SQL account being used to browse products also had the necessary rights to read from the Customers table and write to the Products table, neither of which was required for the purposes of displaying products on a page. In short, the principle of least privilege had been ignored:
In information security, computer science, and other fields, the principle of least privilege, also known as the principle of minimal privilege or just least privilege, requires that in a particular abstraction layer of a computing environment, every module (such as a process, a user or a program on the basis of the layer we are considering) must be able to access only such information and resources that are necessary to its legitimate purpose.
This was achievable because we took the easy way out and used a single account across the entire application to both read and write from the database. Often you’ll see this happen with the one SQL account being granted db_datareader and db_datawriter roles:
This is a good case for being a little more selective about the accounts we’re using and the rights they have. Quite frequently, a single SQL account is used by the application. The problem this introduces is that the one account must have access to perform all the functions of the application which most likely includes reading and writing data from and to tables you simply don’t want everyone accessing.
Let’s go back to the first example but this time we’ll create a new user with only select permissions to the Products table. We’ll call this user NorthwindPublicUser and it will be used by activities intended for the general public, i.e. not administrative activates such as managing customers or maintaining products.
Now let’s go back to the earlier request attempting to validate the existence of the Customers table:
Products.aspx?CategoryID=1 or 1=(select count(*) from customers)
In this case I’ve left custom errors off and allowed the internal error message to surface through the UI for the purposes of illustration. Of course doing this in a production environment is never a good thing not only because it’s information leakage but because the original objective of verifying the existence of the table has still been achieved. Once custom errors are on there’ll be no external error message hence there will be no verification the table exists. Finally – and most importantly - once we get to actually trying to read or write unauthorised data the exploit will not be successful.
This approach does come with a cost though. Firstly, you want to be pragmatic in the definition of how many logons are created. Ending up with 20 different accounts for performing different functions is going to drive the DBA nuts and be unwieldy to manage. Secondly, consider the impact on connection pooling. Different logons mean different connection strings which mean different connection pools.
On balance, a pragmatic selection of user accounts to align to different levels of access is a good approach to the principle of least privilege and shuts the door on the sort of exploit demonstrated above.
Getting more creative with HTTP request headers
On a couple of occasions above I’ve mentioned parsing input other than just the obvious stuff like query strings and form fields. You need to consider absolutely anything which could be submitted to the server from an untrusted source.
A good example of the sort of implicit untrusted data submission you need to consider is the accept-language attribute in the HTTP request headers. This is used to specify the spoken language preference of the user and is passed along with every request the browser makes. Here’s how the headers look after inspecting them with Fiddler:
Note the preference Firefox has delivered in this case is “en-gb”. The developer can now access this attribute in code:
var language = HttpContext.Current.Request.UserLanguages[0];
lblLanguage.Text = "The browser language is: " + language;
And the result:
The language is often used to localise content on the page for applications with multilingual capabilities. The variable we’ve assigned above may be passed to SQL Server – possibly in a concatenated SQL string - should language variations be stored in the data layer.
But what if a malicious request header was passed? What if, for example, we used the Fiddler Request Builder to reissue the request but manipulated the header ever so slightly first:
It’s a small but critical change with a potentially serious result:
We’ve looked enough at where an exploit can go from here already, the main purpose of this section was to illustrate how injection can take different attack vectors in its path to successful execution. In reality, .NET has far more efficient ways of doing language localisation but this just goes to prove that vulnerabilities can be exposed through more obscure channels.
Summary
The potential damage from injection exploits is indeed, severe. Data disclosure, data loss, database object destruction and potentially limitless damage to reputation.
The thing is though, injection is a really easy vulnerability to apply some pretty thorough defences against. Fortunately it’s uncommon to see dynamic, parameterless SQL strings constructed in .NET code these days. ORMs like LINQ to SQL are very attractive from a productivity perspective and the security advantages that come with it are eradicating some of those bad old practices.
Input parsing, however, remains a bit more elusive. Often developers are relying on type conversion failures to detect rogue values which, of course, won’t do much good if the expected type is already a string and contains an injection payload. We’re going to come back to input parsing again in the next part of the series on XSS. For now, let’s just say that not parsing input has potential ramifications well beyond just injection vulnerabilities.
I suspect securing individual database objects to different accounts is not happening very frequently at all. The thing is though, it’s the only defence you have at the actual data layer if you’ve moved away from stored procedures. Applying the least privilege principle here means that in conjunction with the other measures, you’ve now erected injection defences on the input, the SQL statement construction and finally at the point of its execution. Ticking all these boxes is a very good place to be indeed.
References
- SQL Injection Attacks by Example
- SQL Injection Cheat Sheet
- The Curse and Blessings of Dynamic SQL
- LDAP Injection Vulnerabilities