Hacker News
C# strings silently kill your SQL Server indexes in Dapper
wvenable
|next
[-]
I'm not sure why anyone would choose varchar for a column in 2026 unless if you have some sort of ancient backwards compatibility situation.
dspillett
|root
|parent
|next
[-]
The same string takes roughly half the storage space, meaning more rows per page and therefore a smaller working set needed in memory for the same queries and less IO. Also, any indexes on those columns will also be similarly smaller. So if you are storing things that you know won't break out of the standard ASCII set⁰, stick with [VAR]CHARs¹, otherwise use N[VAR]CHARs.
Of course if you can guarantee that your stuff will be used on recent enough SQL Server versions that are configured to support UTF8 collations, then default to that instead unless you expect data in a character set where that might increase the data size over UTF16. You'll get the same size benefit for pure ASCII without losing wider character set support.
Furthermore, if you are using row or page compression it doesn't really matter: your wide-character strings will effectively be UTF8 encoded anyway. But be aware that there is a CPU hit for processing compressed rows and pages every access because they remain compressed in memory as well as on-disk.
--------
[0] Codes with fixed ranges, etc.
[1] Some would say that the other way around, and “use NVARCHAR if you think there might be any non-ASCIII characters”, but defaulting to NVARCHAR and moving to VARCHAR only if you are confident is the safer approach IMO.
beart
|root
|parent
|next
|previous
[-]
As to your second point. VARCHAR uses N + 2 bytes where as NVARCHAR uses N*2 + 2 bytes for storage (at least on SQL Server). The vast majority of character fields in databases I've worked with do not need to store unicode values.
wvenable
|root
|parent
|next
[-]
This has not been my experience at all. Exactly the opposite, in fact. ASCII is dead.
SigmundA
|root
|parent
[-]
Text fields that users can type into directly especially multiline tend to need unicode but they are far fewer.
psidebot
|root
|parent
|next
[-]
simonask
|root
|parent
|previous
[-]
Unicode is a requirement everywhere human language is used, from Earth to the Boöotes Void.
NegativeLatency
|root
|parent
|next
[-]
SigmundA
|root
|parent
|previous
[-]
Taking double the space for this stuff is a waste of resources and nobody usually cares about extended characters here in English language systems at least they just want something more readable than integers when querying and debugging the data. End users will see longer descriptions joined from code tables or from app caches which can have unicode.
_3u10
|root
|parent
|next
|previous
[-]
SigmundA
|root
|parent
[-]
I have avoided it and have not followed if the issues are fully resolved, I would hope they are.
kstrauser
|root
|parent
[-]
Their insistence on making the rest of the world go along with their obsolete pet scheme would be annoying if I ever had to use their stuff for anything ever. UTF-8 was conceived in 1992, and here we are in 2026 with a reasonably popularly database still considering it the new thing.
SigmundA
|root
|parent
|previous
[-]
https://learn.microsoft.com/en-us/sql/relational-databases/d...
Also UTF-8 is actually just a varchar collation so you don't use nvarchar with that, lol?
SigmundA
|root
|parent
|next
|previous
[-]
wvenable
|root
|parent
[-]
SigmundA
|root
|parent
[-]
If we simply went to UTF-8 collation using varchar then this wouldn't be an issue either, which is why you would use varchar in 2026, best of both worlds so to speak.
applfanboysbgon
|root
|parent
|next
|previous
[-]
ziml77
|root
|parent
|next
[-]
And it's not like I don't care about performance. If I see a small query taking more than a fraction of a second when testing in SSMS or If I see a larger query taking more than a few seconds I will dig into the query plan and try to make changes to improve it. For code that I took from testing in SSMS and moved into a Dapper query, I wouldn't have noticed performance issues from that move if the slowdown was never particularly large.
cosmez
|root
|parent
|next
|previous
[-]
Most people are not aware of how Dapper maps types under the hood; once you know, you start being careful about it.
Nothing to do with LLMs, just plain old learning through mistakes.
paulsutter
|root
|parent
|previous
[-]
Utf16 is brain dead and an embarrassment
wvenable
|root
|parent
[-]
So many problems could be solved with a time machine.
kstrauser
|root
|parent
[-]
The time machine would've involved Microsoft saying "it's clear now that USC-2 was a bad idea, so let's start migrating to something genuinely better".
diath
|next
|previous
[-]
briHass
|next
|previous
[-]
Dapper has a static configuration for things like TypeMappers, and you can change the default mapping for string to use varchar with: Dapper.SqlMapper.AddTypeMap(typeof(string),System.Data.DbType.AnsiString). I typically set that in the app startup, because I avoid NVARCHAR almost entirely (to save the extra byte per character, since I rarely need anything outside of ANSI.)
Or, one could use stored procedures. Assuming you take in a parameter that is the correct type for your indexed predicate, the conversion happens once when the SPROC is called, not done by the optimizer in the query.
I still have mixed feelings about overuse of SQL stored procedures, but this is a classic example of where on of their benefits is revealed: they are a defined interface for the database, where DB-specific types can be handled instead of polluting your code with specifics about your DB.
(This is also a problem for other type mismatches like DateTime/Date, numeric types, etc.)
ziml77
|root
|parent
[-]
maciekkmrk
|next
|previous
[-]
jiggawatts
|next
|previous
[-]
It ought to be smart enough to convert a constant parameter to the target column type in a predicate constraint and then check for the availability of a covering index.
valiant55
|root
|parent
|next
[-]
0: https://learn.microsoft.com/en-us/sql/t-sql/data-types/data-...