SQL Basics
SQL Basics
PostgreSQL, MySQL, SQLite, SQL Server, Oracle — they all share similar core syntax. Learn it once and we keep using it for life.
1. About SQL
SQL stands for Structured Query Language. In 1974, Donald Chamberlin and Raymond Boyce of IBM published it under the name SEQUEL, which was later renamed to SQL due to a trademark conflict. ANSI standardized it as SQL-86 in 1986, ISO followed in 1987, and updates have continued since.
| Standard | Year | Key additions |
|---|---|---|
| SQL-86 / 89 | 1986–89 | Basic syntax |
| SQL-92 | 1992 | JOIN types, tightened standardization |
| SQL:1999 | 1999 | OLAP, triggers, CTE |
| SQL:2003 | 2003 | XML, window functions |
| SQL:2008 | 2008 | TRUNCATE, MERGE |
| SQL:2016 | 2016 | JSON |
DBMSes differ in standard compliance and dialect. Some places diverge in syntax itself, like PostgreSQL's ON CONFLICT versus MySQL's ON DUPLICATE KEY UPDATE.
2. DDL, DML, DCL, TCL
| Category | Meaning | Commands |
|---|---|---|
| DDL (Data Definition) | Schema definition | CREATE, ALTER, DROP, TRUNCATE |
| DML (Data Manipulation) | Data manipulation | SELECT, INSERT, UPDATE, DELETE |
| DCL (Data Control) | Permissions | GRANT, REVOKE |
| TCL (Transaction Control) | Transactions | BEGIN, COMMIT, ROLLBACK |
Some DDL behavior inside transactions varies by DBMS. PostgreSQL allows almost all DDL to be rolled back inside a transaction, while MySQL and Oracle trigger implicit commits for some DDL.
3. CREATE TABLE
CREATE TABLE users (
id BIGSERIAL PRIMARY KEY,
email TEXT NOT NULL UNIQUE,
name TEXT NOT NULL,
age INT CHECK (age >= 0),
active BOOLEAN DEFAULT TRUE,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_users_email ON users (email);
Data types differ slightly across DBMSes. The frequent ones are INT, BIGINT, TEXT, VARCHAR(n), BOOLEAN, DATE, TIMESTAMP, TIMESTAMPTZ, JSONB.
4. The clauses of SELECT
SELECT name, COUNT(*) AS cnt
FROM orders
WHERE status = 'paid'
AND created_at >= '2025-01-01'
GROUP BY name
HAVING COUNT(*) >= 5
ORDER BY cnt DESC
LIMIT 10
OFFSET 0;
Logical processing order differs from writing order.
FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY → LIMIT/OFFSET
Knowing this order resolves traps like why aliases cannot be used in WHERE but work in ORDER BY.
Common shapes for the WHERE clause.
WHERE age BETWEEN 18 AND 30
WHERE name LIKE 'Lee%' -- starts with 'Lee'
WHERE name ILIKE '%lee%' -- case-insensitive (Postgres)
WHERE id IN (1, 2, 3)
WHERE id NOT IN (SELECT user_id FROM banned)
WHERE email IS NULL
WHERE EXISTS (SELECT 1 FROM orders o WHERE o.user_id = users.id)
5. The five JOINs
Table A Table B
+---+------+ +-----+------+
| 1 | foo | | 1 | x |
| 2 | bar | | 3 | y |
+---+------+ +-----+------+
| JOIN | Result |
|---|---|
INNER JOIN |
Matches on both sides only. (1, foo, x) |
LEFT JOIN |
All of left, NULL where right is missing. (1, foo, x), (2, bar, NULL) |
RIGHT JOIN |
All of right, NULL where left is missing. (1, foo, x), (NULL, NULL, y) |
FULL OUTER JOIN |
All of both |
CROSS JOIN |
Cartesian product. m × n rows |
SELECT u.name, o.amount
FROM users u
INNER JOIN orders o ON o.user_id = u.id
WHERE o.status = 'paid';
USING (col) is a short form when column names match.
6. CTE (WITH)
WITH paid_users AS (
SELECT DISTINCT user_id FROM orders WHERE status = 'paid'
)
SELECT u.name
FROM users u
JOIN paid_users p ON p.user_id = u.id;
Recursive CTEs are useful for trees and hierarchies.
WITH RECURSIVE org AS (
SELECT id, name, manager_id, 1 AS level
FROM employees WHERE manager_id IS NULL
UNION ALL
SELECT e.id, e.name, e.manager_id, o.level + 1
FROM employees e JOIN org o ON e.manager_id = o.id
)
SELECT * FROM org ORDER BY level;
CTE is part of the SQL:1999 standard, and MySQL supports it from 8.0+.
7. INSERT, UPDATE, DELETE, UPSERT
INSERT INTO users (email, name) VALUES ('a@b.com', 'lee');
UPDATE users SET active = FALSE WHERE last_login < '2024-01-01';
DELETE FROM users WHERE active = FALSE;
-- UPSERT (Postgres)
INSERT INTO users (email, name) VALUES ('a@b.com', 'lee')
ON CONFLICT (email) DO UPDATE SET name = EXCLUDED.name;
Forgetting WHERE on UPDATE or DELETE affects every row. This is the most frequent accident on production DBs. The habit of running the same condition as a SELECT first inside a transaction (BEGIN; UPDATE ...; COMMIT;) helps.
8. Transactions and isolation levels
BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT; -- or ROLLBACK
ACID — Atomicity, Consistency, Isolation, Durability. Isolation has four levels.
READ UNCOMMITTED < READ COMMITTED < REPEATABLE READ < SERIALIZABLE
Defaults differ by DBMS. Postgres uses READ COMMITTED, MySQL/InnoDB uses REPEATABLE READ.
9. NULL's three-valued logic
NULL means "no value." A comparison with NULL yields UNKNOWN rather than TRUE or FALSE.
SELECT NULL = NULL; -- NULL (UNKNOWN)
SELECT NULL = 1; -- NULL
SELECT NULL <> NULL; -- NULL
SELECT 1 IS NULL; -- FALSE
SELECT NULL IS NULL; -- TRUE
The WHERE clause only passes rows where the condition is TRUE. WHERE x = NULL will never return a row. Always use IS NULL or IS NOT NULL.
COUNT(*) counts NULL too, but COUNT(col) excludes rows where that column is NULL. AVG(col) also excludes NULL rows from both numerator and denominator.
10. Common pitfalls
UPDATE/DELETE without WHERE — a frequent operational accident. Always validate with SELECT first.
Missing column in GROUP BY — non-aggregated columns in SELECT must also appear in GROUP BY.
LIMIT without ORDER BY — different rows may come back each run.
N+1 queries — when application code repeats the same query pattern inside a loop. Fetch in one go via JOIN or IN clause.
Queries that bypass the index — operations like WHERE func(col) = ... neutralize indexes. Build an expression index or change the condition.
SELECT * — spell out columns in production code so schema changes do not break things.
Dialect differences — NOW() vs CURRENT_TIMESTAMP, LIMIT vs TOP. Search the first time we meet them.
11. Try it once
To try it from the command line, Docker spins up PostgreSQL in one line.
docker run --name pg -e POSTGRES_PASSWORD=secret -p 5432:5432 -d postgres:16
docker exec -it pg psql -U postgres
SQLite has almost no installation overhead.
sqlite3 test.db
Closing thoughts
SQL is over 30 years old and still the first choice for new systems. The standard is well defined and the tools and resources are deep. Once we get used to it, 80% of the knowledge carries over even when the DBMS changes.
Next
- postgres-first
- postgres-deep
References: PostgreSQL official tutorial, Use The Index, Luke!, SQLBolt, Codd's 1970 paper.