SQL Server Advanced Cheat Sheet

📚 Table of Contents

  1. Built-in Functions
  2. Aggregate Functions
  3. Window Functions
  4. Table-Valued Functions & CTEs
  5. T-SQL Specific Features
  6. Indexing Strategies
  7. Query Optimization
  8. Stored Procedures & Functions
  9. Transactions & Isolation Levels
  10. Performance Tuning
  11. Security & Permissions
  12. System Views & DMVs

🔧 Built-in Functions

String Functions

-- String manipulation
CONCAT(str1, str2, ...)           -- Concatenate strings
CONCAT_WS(separator, str1, str2)  -- Concatenate with separator
STRING_AGG(expression, separator) -- Aggregate concatenation (2017+)
STUFF(string, start, length, replacement) -- Replace substring
REPLICATE(string, count)          -- Repeat string
REVERSE(string)                   -- Reverse string
QUOTENAME(string, [quote_char])   -- Add delimiters for identifiers
FORMAT(value, format, [culture])  -- Format values (2012+)
STRING_SPLIT(string, separator)   -- Split to table (2016+)
TRIM([characters FROM] string)    -- Remove chars (2017+)
TRANSLATE(string, chars, replacements) -- Replace multiple chars (2017+)

Date/Time Functions

-- Date manipulation
DATEADD(datepart, number, date)   -- Add to date
DATEDIFF(datepart, start, end)    -- Difference between dates
DATEPART(datepart, date)          -- Extract part
DATENAME(datepart, date)          -- Get name of part
EOMONTH(date, [months])           -- End of month (2012+)
DATEFROMPARTS(year, month, day)   -- Construct date (2012+)
SWITCHOFFSET(datetimeoffset, offset) -- Change timezone
AT TIME ZONE                      -- Convert timezone (2016+)

📌 Date Parts Reference

Common dateparts: year, quarter, month, day, week, hour, minute, second, millisecond

Mathematical Functions

CEILING(n)    -- Round up
FLOOR(n)      -- Round down
ROUND(n, d)   -- Round to d decimals
POWER(n, p)   -- n to power p
SQRT(n)       -- Square root
ABS(n)        -- Absolute value
SIGN(n)       -- Sign (-1, 0, 1)
RAND([seed])  -- Random 0-1
LOG(n, [base]) -- Logarithm
EXP(n)        -- e^n

Conversion Functions

CAST(expression AS datatype)      -- Convert type
CONVERT(datatype, expression, [style]) -- Convert with style
TRY_CAST(expression AS datatype)  -- Safe cast (2012+)
TRY_CONVERT(datatype, expression) -- Safe convert (2012+)
PARSE(string AS datatype [USING culture]) -- Parse string (2012+)
TRY_PARSE(string AS datatype)     -- Safe parse (2012+)

JSON Functions (2016+)

ISJSON(expression)                 -- Check if valid JSON
JSON_VALUE(json, path)             -- Extract scalar
JSON_QUERY(json, path)             -- Extract object/array
JSON_MODIFY(json, path, value)    -- Update JSON
OPENJSON(json)                     -- Parse to table
FOR JSON PATH/AUTO                 -- Format as JSON

📊 Aggregate Functions

Standard Aggregates

COUNT(*) / COUNT(column)           -- Count rows/non-nulls
COUNT(DISTINCT column)             -- Count unique values
SUM(column)                        -- Sum values
AVG(column)                        -- Average
MIN(column) / MAX(column)          -- Min/Max
STDEV(column) / STDEVP(column)    -- Sample/Population std dev
VAR(column) / VARP(column)         -- Sample/Population variance
STRING_AGG(column, separator)     -- Concatenate (2017+)
APPROX_COUNT_DISTINCT(column)     -- Approximate distinct (2019+)

Statistical Aggregates

CHECKSUM_AGG(column)              -- Checksum for change detection
GROUPING(column)                   -- Is NULL from ROLLUP/CUBE?
GROUPING_ID(col1, col2, ...)      -- Grouping level identifier

🪟 Window Functions

Ranking Functions

ROW_NUMBER() OVER (ORDER BY col)  -- Sequential number
RANK() OVER (ORDER BY col)        -- Rank with gaps
DENSE_RANK() OVER (ORDER BY col)  -- Rank without gaps
NTILE(n) OVER (ORDER BY col)      -- Divide into n groups
PERCENT_RANK() OVER (ORDER BY col) -- Percentile rank (0-1)
CUME_DIST() OVER (ORDER BY col)   -- Cumulative distribution

Analytic Functions

Analytic functions allow you to access data from multiple rows without using a self-join. They're powerful for comparing values across rows, calculating running totals, or finding previous/next values in a sequence. Unlike aggregate functions, they don't collapse rows - each row retains its identity while gaining access to related data.

LAG(col, offset, default) OVER (ORDER BY...)    -- Previous row
LEAD(col, offset, default) OVER (ORDER BY...)   -- Next row
FIRST_VALUE(col) OVER (ORDER BY...)             -- First in window
LAST_VALUE(col) OVER (ORDER BY...)              -- Last in window

Window Frame Specification

Window frames define exactly which rows are included in the window function's calculation for each row. This gives you precise control over calculations like moving averages or cumulative sums. ROWS considers physical rows, while RANGE considers logical values (useful for ties).

OVER (
    PARTITION BY column
    ORDER BY column
    ROWS/RANGE BETWEEN 
        UNBOUNDED PRECEDING / n PRECEDING / CURRENT ROW
        AND
        CURRENT ROW / n FOLLOWING / UNBOUNDED FOLLOWING
)

📋 Table-Valued Functions & CTEs

Common Table Expressions (CTEs)

CTEs are temporary named result sets that exist within the scope of a single statement. They make complex queries more readable by breaking them into logical building blocks. Recursive CTEs can traverse hierarchical data like organizational charts or bill-of-materials. CTEs are often more readable than subqueries and can be referenced multiple times in the same query.

-- Basic CTE
WITH CTE_Name (col1, col2) AS (
    SELECT col1, col2 FROM table
)
SELECT * FROM CTE_Name;

-- Recursive CTE
WITH RecursiveCTE AS (
    -- Anchor member
    SELECT initial_query
    UNION ALL
    -- Recursive member
    SELECT recursive_query FROM RecursiveCTE
    WHERE termination_condition
)
SELECT * FROM RecursiveCTE
OPTION (MAXRECURSION n);  -- Default 100, 0 = unlimited

Table-Valued Functions

Table-valued functions return a table data type that can be used like a regular table in queries. Inline TVFs are essentially parameterized views that get optimized as part of the calling query, making them very efficient. Multi-statement TVFs build a table variable step-by-step but can't be optimized as well, so they're slower for large datasets.

-- Inline TVF (better performance)
CREATE FUNCTION fnInline(@param INT)
RETURNS TABLE
AS RETURN (
    SELECT * FROM table WHERE id = @param
);

-- Multi-statement TVF
CREATE FUNCTION fnMulti(@param INT)
RETURNS @result TABLE (col1 INT, col2 VARCHAR(50))
AS BEGIN
    INSERT @result SELECT ...
    RETURN;
END;

✅ Best Practice

Prefer inline TVFs over multi-statement TVFs for better performance, as they can be optimized by the query processor.

🎯 T-SQL Specific Features

Variables & Control Flow

-- Variables
DECLARE @var INT = 10;
SET @var = 20;
SELECT @var = column FROM table;  -- Assign from query

-- Control structures
IF EXISTS (SELECT 1 FROM table)
    BEGIN ... END
ELSE
    BEGIN ... END

WHILE @counter < 10
BEGIN
    -- Loop body
    SET @counter += 1;
END

-- Error handling
BEGIN TRY
    -- Code that might fail
END TRY
BEGIN CATCH
    SELECT ERROR_NUMBER(), ERROR_MESSAGE();
    THROW;  -- Re-throw error
END CATCH

MERGE Statement

MERGE combines INSERT, UPDATE, and DELETE operations into a single atomic statement. It's ideal for synchronizing tables, data warehousing ETL operations, or upsert scenarios. The statement compares source and target data, then performs different actions based on match conditions. It's more efficient than separate statements and ensures data consistency.

MERGE target_table AS target
USING source_table AS source ON target.id = source.id
WHEN MATCHED THEN
    UPDATE SET target.col = source.col
WHEN NOT MATCHED BY TARGET THEN
    INSERT (col1, col2) VALUES (source.col1, source.col2)
WHEN NOT MATCHED BY SOURCE THEN
    DELETE
OUTPUT $action, inserted.*, deleted.*;

OUTPUT Clause

The OUTPUT clause captures data affected by INSERT, UPDATE, DELETE, or MERGE operations. It's invaluable for audit trails, logging changes, or returning generated values to applications. You can access both the old values (deleted) and new values (inserted) in the same statement, making it perfect for tracking what changed.

-- Capture modified data
INSERT INTO table (col1, col2)
OUTPUT inserted.* INTO @audit_table
VALUES (val1, val2);

UPDATE table SET col1 = val1
OUTPUT deleted.col1 AS old_val, inserted.col1 AS new_val
WHERE condition;

PIVOT/UNPIVOT

PIVOT transforms row data into columns, turning unique values from one column into multiple columns in the output - perfect for creating crosstab reports or matrices. UNPIVOT does the reverse, normalizing columnar data back into rows. These operations are common in reporting scenarios where you need to reshape data for presentation.

-- PIVOT
SELECT * FROM (
    SELECT category, year, sales FROM sales_data
) AS src
PIVOT (
    SUM(sales) FOR year IN ([2021], [2022], [2023])
) AS pvt;

-- UNPIVOT
SELECT category, year, sales
FROM pivoted_table
UNPIVOT (
    sales FOR year IN ([2021], [2022], [2023])
) AS unpvt;

🔍 Indexing Strategies

Indexes are the single most important tool for query performance. Think of them like a book's index - instead of reading every page to find a topic, you look it up in the index. Indexes create sorted data structures that allow SQL Server to find rows quickly. However, they come with trade-offs: while reads become faster, writes become slower because indexes must be maintained. The key is finding the right balance for your workload.

Index Types

Each index type serves different purposes. Clustered indexes define the physical order of data (like arranging books on a shelf), while non-clustered indexes are separate structures pointing back to the data (like a library card catalog). Columnstore indexes revolutionize analytics by storing data column-wise instead of row-wise, achieving massive compression and performance gains for aggregate queries.

-- Clustered Index (1 per table, defines physical order)
CREATE CLUSTERED INDEX IX_name ON table(column);

-- Non-clustered Index (multiple allowed)
CREATE NONCLUSTERED INDEX IX_name ON table(column)
INCLUDE (col1, col2)  -- Covering index
WHERE condition;      -- Filtered index

-- Unique Index
CREATE UNIQUE INDEX IX_name ON table(column);

-- Columnstore Index (for analytics)
CREATE COLUMNSTORE INDEX IX_name ON table;
CREATE CLUSTERED COLUMNSTORE INDEX ON table;

-- Full-text Index
CREATE FULLTEXT INDEX ON table(column)
KEY INDEX PK_name;

Index Management

Indexes fragment over time as data changes, degrading performance. Regular maintenance keeps them efficient. REBUILD recreates the index from scratch (offline by default, removing fragmentation completely), while REORGANIZE defragments online but less thoroughly. Monitor fragmentation levels: reorganize at 5-30%, rebuild above 30%. Index hints force the optimizer to use specific indexes but should be used sparingly as they prevent adaptive optimization.

-- Rebuild/Reorganize
ALTER INDEX IX_name ON table REBUILD;
ALTER INDEX IX_name ON table REORGANIZE;
ALTER INDEX ALL ON table REBUILD;

-- Disable/Enable
ALTER INDEX IX_name ON table DISABLE;
ALTER INDEX IX_name ON table REBUILD;  -- Re-enables

-- Index hints
SELECT * FROM table WITH (INDEX(IX_name));
SELECT * FROM table WITH (FORCESEEK);

✅ Indexing Best Practices

⚡ Query Optimization

Query Hints

Query hints override the optimizer's decisions, forcing specific execution behaviors. Use them as a last resort when you're certain the optimizer is making poor choices. Table hints control locking and access methods, while query hints affect the entire execution plan. Remember: today's optimal hint might become tomorrow's performance problem as data distributions change.

-- Join hints
INNER HASH JOIN / INNER LOOP JOIN / INNER MERGE JOIN

-- Table hints
WITH (NOLOCK)        -- Read uncommitted
WITH (READPAST)      -- Skip locked rows
WITH (UPDLOCK)       -- Update lock
WITH (TABLOCK)       -- Table lock
WITH (FORCESEEK)     -- Force index seek

-- Query hints
OPTION (RECOMPILE)   -- Recompile query
OPTION (OPTIMIZE FOR (@var = value))
OPTION (MAXDOP n)    -- Max parallelism
OPTION (FAST n)      -- Optimize for first n rows
OPTION (USE PLAN N'xml_plan')  -- Force plan

Execution Plan Analysis

Execution plans are your window into the optimizer's thinking. They show exactly how SQL Server executes your query - which indexes it uses, join algorithms chosen, and where performance bottlenecks occur. Look for thick arrows (high row counts), hash joins on large tables (memory intensive), key lookups (consider covering indexes), and missing index suggestions. The actual plan shows real runtime statistics, while estimated plans show what the optimizer expects.

SET STATISTICS IO ON;   -- Show I/O statistics
SET STATISTICS TIME ON; -- Show time statistics
SET SHOWPLAN_XML ON;    -- Show execution plan

-- Plan cache queries
SELECT * FROM sys.dm_exec_query_stats;
SELECT * FROM sys.dm_exec_cached_plans;
SELECT * FROM sys.dm_exec_query_plan(plan_handle);

Common Optimization Patterns

-- Use EXISTS instead of IN for subqueries
SELECT * FROM orders o
WHERE EXISTS (SELECT 1 FROM customers c WHERE c.id = o.customer_id);

-- Avoid functions on indexed columns
-- Bad: WHERE YEAR(date) = 2023
-- Good: WHERE date >= '2023-01-01' AND date < '2024-01-01'

-- Use UNION ALL instead of UNION when possible
SELECT col FROM table1 UNION ALL SELECT col FROM table2;

-- Parameterize queries to reuse plans
EXEC sp_executesql N'SELECT * FROM table WHERE id = @id', 
                   N'@id INT', @id = 1;

💾 Stored Procedures & Functions

Stored Procedures

CREATE PROCEDURE sp_name
    @param1 INT,
    @param2 VARCHAR(50) = 'default',
    @output INT OUTPUT
AS
BEGIN
    SET NOCOUNT ON;  -- Suppress row count messages
    
    -- Procedure body
    SET @output = @param1 * 2;
    
    RETURN 0;  -- Return code
END;

-- Execute
DECLARE @result INT;
EXEC sp_name @param1 = 10, @output = @result OUTPUT;

Scalar Functions

CREATE FUNCTION fn_name(@param INT)
RETURNS INT
WITH SCHEMABINDING  -- Improves performance
AS
BEGIN
    RETURN @param * 2;
END;

System Stored Procedures

Procedure Description
sp_who / sp_who2 Current connections
sp_lock Current locks
sp_helpindex 'table' Show indexes
sp_helptext 'object' Show definition
sp_depends 'object' Show dependencies
sp_recompile 'object' Mark for recompilation
sp_updatestats Update all statistics

🔐 Transactions & Isolation Levels

Transaction Control

BEGIN TRANSACTION;
    -- DML operations
    SAVE TRANSACTION savepoint_name;
    -- More operations
    IF @@ERROR <> 0
        ROLLBACK TRANSACTION savepoint_name;
COMMIT TRANSACTION;

-- Implicit transactions
SET IMPLICIT_TRANSACTIONS ON;

-- Transaction info
SELECT @@TRANCOUNT;  -- Current transaction level

Isolation Levels

Isolation levels control how transactions interact with each other, balancing data consistency against performance. Lower isolation allows more concurrency but risks reading inconsistent data. Higher isolation ensures consistency but can cause blocking and deadlocks. SNAPSHOT isolation (using row versioning) provides consistency without blocking readers, making it ideal for reporting queries against OLTP systems.

Level Description Issues Prevented
READ UNCOMMITTED Dirty reads allowed None
READ COMMITTED Default level Dirty reads
REPEATABLE READ Holds read locks Dirty reads, Non-repeatable reads
SERIALIZABLE Highest isolation All anomalies
SNAPSHOT Row versioning All anomalies (optimistic)
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;

-- Table-level hints
SELECT * FROM table WITH (NOLOCK);      -- READ UNCOMMITTED
SELECT * FROM table WITH (HOLDLOCK);    -- SERIALIZABLE
SELECT * FROM table WITH (READPAST);    -- Skip locked rows

⚠️ Deadlock Prevention

🚀 Performance Tuning

Dynamic Management Views (DMVs)

DMVs are SQL Server's performance monitoring goldmine, providing real-time insights into server health, query performance, and resource usage. They expose internal metadata about the system's operation, from currently executing queries to historical performance patterns. DMVs reset when SQL Server restarts, so consider persisting important metrics for trend analysis. They're essential for troubleshooting performance issues and understanding workload patterns.

-- Performance related
sys.dm_exec_requests           -- Current requests
sys.dm_exec_sessions           -- Current sessions
sys.dm_exec_connections        -- Current connections
sys.dm_os_wait_stats          -- Wait statistics
sys.dm_io_virtual_file_stats  -- I/O statistics
sys.dm_db_index_usage_stats   -- Index usage
sys.dm_db_missing_index_*     -- Missing indexes

-- Query performance
SELECT TOP 10
    qs.total_elapsed_time / qs.execution_count AS avg_time,
    SUBSTRING(st.text, (qs.statement_start_offset/2)+1,
        ((qs.statement_end_offset - qs.statement_start_offset)/2)+1) AS query
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) st
ORDER BY avg_time DESC;

✅ Performance Best Practices

  1. Parameter Sniffing: Use OPTION (RECOMPILE) or optimize for unknown
  2. Statistics: Keep updated, consider auto-update settings
  3. Tempdb: Multiple data files (1 per 4 cores, max 8)
  4. Memory: Set max server memory appropriately
  5. Parallelism: Configure MAXDOP and cost threshold
  6. Query Store: Enable for performance history (2016+)

Query Store (2016+)

Query Store is SQL Server's built-in performance monitoring solution, often called "flight recorder for queries." It automatically captures query plans, runtime statistics, and wait stats over time, persisting them across restarts. This allows you to identify performance regressions, force good plans when the optimizer makes poor choices, and analyze workload trends. It's particularly valuable for troubleshooting parameter sniffing issues and plan regression after upgrades.

-- Enable Query Store
ALTER DATABASE db SET QUERY_STORE = ON;

-- Query Store views
sys.query_store_query           -- Query text and metadata
sys.query_store_query_text      -- Actual query text
sys.query_store_plan            -- Execution plans
sys.query_store_runtime_stats   -- Runtime performance metrics

-- Force a plan (useful when optimizer chooses poorly)
EXEC sp_query_store_force_plan @query_id = 1, @plan_id = 1;

🔒 Security & Permissions

Permission Management

SQL Server uses a hierarchical permission system where permissions can be granted at multiple levels (server, database, schema, object). GRANT gives permissions, DENY explicitly blocks them (overriding GRANTs), and REVOKE removes previous GRANT or DENY statements. Always follow the principle of least privilege - grant only the minimum permissions necessary for users to perform their tasks.

-- Grant permissions
GRANT SELECT, INSERT ON schema.table TO user;
GRANT EXECUTE ON procedure TO role;

-- Deny/Revoke
DENY DELETE ON table TO user;
REVOKE INSERT ON table FROM user;

-- Check permissions
SELECT * FROM fn_my_permissions('schema.table', 'OBJECT');
SELECT HAS_PERMS_BY_NAME('schema.table', 'OBJECT', 'SELECT');

Row-Level Security (2016+)

Row-Level Security (RLS) automatically filters rows based on user characteristics, implementing data access policies at the database level rather than in application code. Security predicates are inline table-valued functions that return which rows a user can access. This is perfect for multi-tenant applications where different users should see different subsets of data in the same table.

-- Create security policy
CREATE FUNCTION fn_security(@user_id INT)
RETURNS TABLE
WITH SCHEMABINDING
AS RETURN SELECT 1 AS result
WHERE @user_id = USER_ID();

CREATE SECURITY POLICY policy_name
ADD FILTER PREDICATE fn_security(user_id) ON table
WITH (STATE = ON);

Dynamic Data Masking (2016+)

Dynamic Data Masking (DDM) obfuscates sensitive data in query results for non-privileged users without changing the actual stored data. It's a presentation-layer security feature that helps prevent unauthorized exposure of sensitive data. Users with UNMASK permission see the real data. Common use cases include masking credit card numbers, SSNs, and email addresses in development/test environments.

-- Add masking
ALTER TABLE table
ALTER COLUMN ssn ADD MASKED WITH (FUNCTION = 'partial(0,"XXX-XX-",4)');

ALTER TABLE table
ALTER COLUMN email ADD MASKED WITH (FUNCTION = 'email()');

-- Masking functions: default(), email(), random(start, end), partial()

📊 System Views & DMVs

Catalog Views

Catalog views provide metadata about database objects and are the recommended way to access system metadata. They're more comprehensive and consistent than older INFORMATION_SCHEMA views. These views are essential for building administrative scripts, documenting databases, and understanding object relationships. They return information specific to the current database context.

View Description
sys.tables All tables
sys.columns All columns
sys.indexes All indexes
sys.foreign_keys Foreign keys
sys.procedures Stored procedures
sys.views Views
sys.triggers Triggers
sys.stats Statistics objects

Useful Metadata Queries

These queries combine system views to provide actionable insights about your database. The table size query helps identify large tables for optimization, while the dependency query is crucial for understanding the impact of schema changes. Regular monitoring of these metrics helps maintain database health and performance.

-- Table sizes
SELECT 
    t.name AS TableName,
    SUM(p.rows) AS RowCounts,
    SUM(a.total_pages) * 8 AS TotalSpaceKB
FROM sys.tables t
INNER JOIN sys.indexes i ON t.object_id = i.object_id
INNER JOIN sys.partitions p ON i.object_id = p.object_id
INNER JOIN sys.allocation_units a ON p.partition_id = a.container_id
GROUP BY t.name;

-- Find dependencies
SELECT 
    referencing_entity_name = o.name,
    referenced_entity_name = r.referenced_entity_name
FROM sys.sql_expression_dependencies r
JOIN sys.objects o ON r.referencing_id = o.object_id;

💡 Additional Tips & Tricks

Temporal Tables (2016+)

Temporal tables automatically track the full history of data changes, providing "time travel" capabilities to query data as it existed at any point in time. SQL Server manages a history table automatically, moving old versions there whenever data changes. This is invaluable for audit trails, recovering from accidental updates, analyzing trends, and meeting compliance requirements. No trigger code needed!

-- Create temporal table
CREATE TABLE Employee (
    ID INT PRIMARY KEY,
    Name VARCHAR(100),
    ValidFrom DATETIME2 GENERATED ALWAYS AS ROW START,
    ValidTo DATETIME2 GENERATED ALWAYS AS ROW END,
    PERIOD FOR SYSTEM_TIME (ValidFrom, ValidTo)
) WITH (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.EmployeeHistory));

-- Query historical data
SELECT * FROM Employee
FOR SYSTEM_TIME AS OF '2023-01-01';

Useful System Functions

These system functions provide crucial runtime information about your queries and environment. Understanding the difference between @@IDENTITY, SCOPE_IDENTITY(), and IDENT_CURRENT() is critical for correctly retrieving inserted identity values. NEWSEQUENTIALID() generates GUIDs optimized for indexing, reducing page splits compared to random NEWID() values.

Function Description
@@IDENTITY Last identity in session (any scope - dangerous with triggers)
SCOPE_IDENTITY() Last identity in current scope (safer than @@IDENTITY)
IDENT_CURRENT('table') Current identity for specific table (any session)
@@ROWCOUNT Rows affected by last statement
@@ERROR Error number of last statement (0 = success)
DB_NAME() Current database name
NEWID() Random GUID (causes fragmentation in indexes)
NEWSEQUENTIALID() Sequential GUID (better for clustered indexes)
-comment">-- Query historical data SELECT * FROM Employee FOR SYSTEM_TIME AS OF '2023-01-01';

Useful System Functions

Function Description
@@IDENTITY Last identity in session
SCOPE_IDENTITY() Last identity in scope
IDENT_CURRENT('table') Current identity for table
@@ROWCOUNT Rows affected by last statement
@@ERROR Error number of last statement
DB_NAME() Current database
NEWID() New GUID
NEWSEQUENTIALID() Sequential GUID

📝 Quick Reference - Common Tasks

Finding Duplicates

This pattern uses ROW_NUMBER() to identify and remove duplicates while keeping one instance of each duplicate group. The CTE makes it easy to preview what will be deleted before executing. Always backup data before bulk deletes! The PARTITION BY defines what constitutes a duplicate, while ORDER BY determines which duplicate to keep.

WITH CTE AS (
    SELECT *, ROW_NUMBER() OVER (PARTITION BY col ORDER BY id) AS rn
    FROM table
)
DELETE FROM CTE WHERE rn > 1;  -- Delete duplicates, keeping first occurrence

Running Totals

Window functions with frame clauses efficiently calculate running aggregates without self-joins. The ROWS UNBOUNDED PRECEDING frame includes all rows from the partition start through the current row. This approach is much faster than correlated subqueries and works with any aggregate function (SUM, AVG, COUNT, etc.).

SELECT 
    date,
    amount,
    SUM(amount) OVER (ORDER BY date ROWS UNBOUNDED PRECEDING) AS running_total
FROM transactions;

Gaps and Islands

The gaps and islands problem involves finding missing values (gaps) or consecutive ranges (islands) in sequences. This pattern uses LAG() to compare each row with its predecessor, identifying where the sequence breaks. Common uses include finding missing invoice numbers, available time slots, or consecutive date ranges.

-- Find gaps in sequence
WITH CTE AS (
    SELECT id, LAG(id) OVER (ORDER BY id) AS prev_id
    FROM table
)
SELECT prev_id + 1 AS gap_start, id - 1 AS gap_end
FROM CTE
WHERE id - prev_id > 1;

Dynamic SQL

Dynamic SQL builds and executes SQL statements at runtime, enabling flexible queries where table names, column names, or complex conditions are determined programmatically. Always use sp_executesql with parameters to prevent SQL injection. QUOTENAME() safely escapes object names. Use sparingly as it prevents plan reuse and complicates debugging.

DECLARE @sql NVARCHAR(MAX);
SET @sql = N'SELECT * FROM ' + QUOTENAME(@tableName);  -- QUOTENAME prevents injection
EXEC sp_executesql @sql;

-- With parameters (safer)
SET @sql = N'SELECT * FROM Users WHERE Age > @age';
EXEC sp_executesql @sql, N'@age INT', @age = 18;

📌 Note

Features marked with version numbers (e.g., 2016+) require that SQL Server version or later.