Why do some SQL strings have an 'N' prefix?
You may have seen Transact-SQL code that passes strings around using an N prefix. This denotes that the subsequent string is in Unicode (the N actually stands for National language character set). Which means that you are passing an NCHAR, NVARCHAR or NTEXT value, as opposed to CHAR, VARCHAR or TEXT. See Article #2354 for a comparison of these data types.
Unicode is typically used in database applications which are designed to facilitate code pages which extend beyond the English and Western Europe code pages (Erland Sommarskog, a native of Sweden, refers to this set as "Germanic and Romance languages"), for example Chinese. Unicode is designed so that extended character sets can still "fit" into database columns. What this means is that Unicode character data types are limited to half the space, because each byte actually takes two bytes (Unicode is sometimes referred to as "double-wide"). For more information on Unicode, see Unicode.org. Note that there are many encoding schemes in the Unicode standard, but SQL Server only supports one: UTF-16.
While using Unicode is a design choice you can make in building your own applications, some facilities in SQL server require it. One example is sp_executeSQL. If you try the following:
You will get this error:
You can get around this in two ways:
Note that implicit conversion makes the N prefix optional in case (b); however, for legibility and consistency, you should always use the prefix when defining Unicode strings. One reason is that leaving it off can actually change your data if it contains Unicode characters (losing the additional information), as in the following example:
The first assignment, which didn't use the N prefix, gets printed as a regular a. Only the second maintains the character that was actually supposed to be represented. As you can imagine, if you are intending to support data entry in foreign languages and code pages, you will likely need to test for Unicode support (making sure that such columns support Unicode, and that data won't be lost when passed into stored procedures, functions, etc.). Note that your application will need to handle Unicode as well; for example, when you try to print this character from ASP...
...it actually prints out the string aa. (This result might depend on your codepage and regional settings.) So you might consider translating your data into its ASCII equivalent, e.g. a = ā.
Another reason you want to avoid implicit conversion is that there are some potentially serious performance issues. Consider the following quite simple repro:
Paste the code into Query Analyzer, turn execution plan on, and let her rip. You'll observe the following breakdown of percentage of work (roughly, depending on your hardware):
Now, that's not the whole story; we all know that there are many other factors, such as I/O, that will impact the actual time each portion of the query takes. The key is that implicit conversion *can* cause a table scan instead of an index seek, and on larger tables this can really hurt. While it's important to understand why this happens and in which scenarios, my recommendation is to match your character-based datatypes as explicitly as possible.
One other thing to watch out for: your database may be using Unicode without your knowledge. If you upsize from Access to SQL Server, for example, character-based text columns might be translated to Unicode (I believe this is a catch-all technique; in case Access was storing Unicode strings, or if you might be storing Unicode strings later, you won't lose data or require changes). I think the Access upsizing tools should be updated to force a conscious choice, so that you aren't wasting space for nothing, and so that you know that you made a decision at all.
For a more thorough discussion of Unicode and the N prefix, please see KB #239530, this MSDN article, and this Google thread.
In other RDBMS platforms, or in the ANSI and/or ISO specifications, you might see prefixes other than N being used against values. (Current versions of SQL Server only support Unicode.) Here are the additional monikers I am aware of:
Related ArticlesCan I fix this mm/dd/yyyy <-> dd/mm/yyyy confusion once and for all?
Could I get some help with JOINs?
How can I tell which version of MDAC I'm running?
How do I access MIN, MAX, SUM, COUNT values from SQL statements?
How do I change column order in a table structure?
How do I change the order of columns in a table?
How do I concatenate strings from a column into a single row?
How do I convert columns of values into a single list?
How do I determine if a database exists?
How do I document / compare my SQL Server database(s)?
How do I get the IDENTITY / AUTONUMBER value for the row I inserted?
How do I solve 'ADO Could Not Find The Specified Provider'?
Should I use BETWEEN in my database queries?
Why can't I use the * wildcard in a database search?
Why do I get 'Syntax Error in INSERT INTO Statement' with Access?
Why do I get weird results when using both AND and OR in a query?
Why does AbsolutePosition return as -1?
Why doesn't SQL Server allow me to separate DATE and TIME?
Why is Query Analyzer only returning 255 characters?
Why should I avoid NULLs in my database?
How do I deal with an apostrophe (') in a SQL statement?