summaryrefslogtreecommitdiff
path: root/sql-indent.org
blob: b93d80433a35ac3053e0e9153e94453f93007854 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634

    sql-indent.el -- syntax based indentation for SQL files for GNU Emacs

*NOTE* This file is formatted as an Emacs Org file.  If you open it in GNU
Emacs, you will be able to open-close sections of the file, which will make
navigation easier.

* Overview

sql-indent.el is a GNU Emacs minor mode which adds support for syntax-based
indentation when editing SQL code: TAB indents the current line based on the
syntax of the SQL code on previous lines.  This works like the indentation for
C and C++ code.

The package also defines align rules so that the ~align~ function works for
SQL statements, see ~sqlind-align-rules~ for which rules are defined.  This
can be used to align multiple lines around equal signs or "as" statements.
Here is an example of alignment rules:

#+BEGIN_SRC sql
  update my_table
     set col1_has_a_long_name = value1,
         col2_is_short        = value2,
         col3_med             = v2,
         c4                   = v5
   where cond1 is not null;

  select long_colum as lcol,
         scol       as short_column,
         mcolu      as mcol,
    from my_table;
#+END_SRC

To use this feature, select the region you want to align and type:

#+BEGIN_SRC text
  M-x align RET
#+END_SRC

~sqlind-minor-mode~ together with the align rules can assist in writing tidy
SQL code or formatting existing SQL code.  The indentation rules are
customizable, so you can adapt it to match your own indentation preferences.

* Installation

To install this package, open the file ~sql-indent.el~ in Emacs and type

#+BEGIN_SRC text
  M-x package-install-from-buffer RET
#+END_SRC

The syntax-based indentation of SQL code can be turned ON/OFF at any time by
enabling or disabling ~sqlind-minor-mode~:

#+BEGIN_SRC text
  M-x sqlind-minor-mode RET
#+END_SRC

To enable syntax-based indentation for every SQL buffer, you can add
~sqlind-minor-mode~ to ~sql-mode-hook~.  First, bring up the customization
buffer using the command:

#+BEGIN_SRC text
  M-x customize-variable RET sql-mode-hook RET
#+END_SRC
    
Than, click on the "INS" button to add a new entry and put "sqlind-minor-mode"
in the text field.

* Customization

The sql-indent.el package allows customizing the indentation rules to suit
your personal preferences or site specific coding styles.  To create a set of
customized rules, you will need to have basic familiarity with how indentation
works.  See also the "A Simple Example" section below and the
~sql-indent-left.el~ file, which demonstrate how to set up custom indentation
rules.  The sections below contain the rest of the details.

The indentation process happens in two separate phases: first syntactic
information is determined about the line, than the line is indented based on
that syntactic information.  The syntactic parse is not expected to change
often, since it deals with the structure of the SQL code, however, indentation
is a personal preference, and can be easily customized.

Two variables control the way code is indented: ~sqlind-basic-offset~ and
~sqlind-indentation-offsets-alist~.  To customize these variables, you need to
create a function that sets custom values and add it to ~sql-mode-hook~.

** A Simple Example

The default indentation rules will align to the right all the keywords in a
SELECT statement, like this:

#+BEGIN_SRC sql
  select c1, c2
    from t1
   where c3 = 2
#+END_SRC

If you prefer to have them aligned to the left, like this:

#+BEGIN_SRC sql
  select c1, c2
  from t1
  where c3 = 2
#+END_SRC

You can add the following code to your init file:

#+BEGIN_SRC emacs-lisp
  (require 'sql-indent)

  ;; Update indentation rules, select, insert, delete and update keywords
  ;; are aligned with the clause start

  (defvar my-sql-indentation-offsets-alist
    `((select-clause 0)
      (insert-clause 0)
      (delete-clause 0)
      (update-clause 0)
      ,@sqlind-default-indentation-offsets-alist))

  (add-hook 'sqlind-minor-mode-hook
      (lambda ()
         (setq sqlind-indentation-offsets-alist
               my-sql-indentation-offsets-alist)))
#+END_SRC

** Customization Basics

To customize indentation, you will need to provide a new value for the
~sqlind-indentation-offsets-alist~ variable.  The variable is made buffer
local each time it is set, so you need to set it inside the ~sql-mode-hook~.
The variable specifies how each syntactic symbol should be indented.  Since
only a few symbols need to be updated, the usual way to update it is to
"extend" the value of ~sqlind-default-indentation-offsets-alist~, like so:

#+BEGIN_SRC emacs-lisp
  (defvar my-sql-indentation-offsets-alist
    `( ;; put new syntactic symbols here, and add the default ones at the end.
       ;; If there is no value specified for a syntactic symbol, the default
       ;; will be picked up.
      ,@sqlind-default-indentation-offsets-alist))

  ;; Arrange for the new indentation offset to be set up for each SQL buffer.
  (add-hook 'sqlind-minor-mode-hook
            (lambda ()
              (setq sqlind-indentation-offsets-alist
                    my-sql-indentation-offsets-alist)))
#+END_SRC

The simplest way to adjust the indentation is to explore the syntactic
information using ~sqlind-show-syntax-of-line~.  To use it, move the cursor to
the line you would like to indent and type:

#+BEGIN_SRC text
M-x sqlind-show-syntax-of-line RET
#+END_SRC

A message like the one below will be shown in the messages buffer:

#+BEGIN_SRC text
((select-clause . 743) (statement-continuation . 743))
#+END_SRC

The first symbol displayed is the syntactic symbol used for indentation, in
this case ~select-clause~.  The syntactic symbols are described in a section
below, however, for now, this is the symbol that will need to be updated in
~sqlind-indentation-offsets-alist~.  The number next to it represents the
anchor, or reference position in the buffer where the current statement
starts.  The anchor and is useful if you need to write your own indentation
functions.

To customize indentation for this type of statement, add an entry in the
~sqlind-indentation-offsets-alist~, for the syntactic symbol shown, with
information about how it should be indented.  This information is a list
containing *indentation control items* (these are described below).

For example, to indent keyword in SELECT clauses at the same level as the
keyword itself, we use a number which is added to the indentation level of the
anchor, in this case, 0:

#+BEGIN_SRC text
(select-clause 0)
#+END_SRC

To indent it at ~sqlind-basic-offset~ plus one more space, use:

#+BEGIN_SRC text
(select-clause + 1)
#+END_SRC

To right-justify the keyword w.r.t the SELECT keyword, use:

#+BEGIN_SRC text
(select-clause sqlind-right-justify-clause)
#+END_SRC

The default value for ~sqlind-indentation-offsets-alist~ contains many
examples for indentation setup rules.

** Indentation control items

~sqlind-calculate-indentation~ is the function that calculates the indentation
offset to use, based on the contents of ~sqlind-indentation-offsets-alist~.
The indentation offset starts with the indentation column of the ANCHOR point
and it is adjusted based on the following items:

 * a ~NUMBER~ -- the NUMBER will be added to the indentation offset.

 * ~+~ -- the current indentation offset is incremented by
   ~sqlind-basic-offset~

 * ~++~ -- the current indentation offset is indentation by ~2 *
   sqlind-basic-offset~

 * ~-~ -- the current indentation offset is decremented by
   ~sqlind-basic-offset~

 * ~--~ -- the current indentation offset is decremented by 2 *
   ~sqlind-basic-offset~

 * a ~FUNCTION~ -- the syntax and current indentation offset is passed to the
   function and its result is used as the new indentation offset.  This can be
   used to further customize indentation.

*** Indentation Helper Functions

The following helper functions are available as part of the package and can be
used as the FUNCTION part in the ~sqlind-indentation-offsets-alist~

**** sqlind-use-anchor-indentation

discard the current offset and returns the indentation column of the ANCHOR
  
**** sqlind-lineup-to-anchor

discard the current offset and returns the column of the anchor point, which
may be different than the indentation column of the anchor point.

**** sqlind-use-previous-line-indentation

discard the current offset and returns the indentation column of the previous
line

**** sqlind-lineup-open-paren-to-anchor

if the line starts with an open parenthesis, discard the current
offset and return the column of the anchor point.

**** sqlind-lone-semicolon

if the line contains a single semicolon ';', use the value of
~sqlind-use-anchor-indentation~

**** sqlind-adjust-operator

if the line starts with an arithmetic operator (like ~+~ , ~-~, or ~||~), line
it up so that the right hand operand lines up with the left hand operand of
the previous line.  For example, it will indent the ~||~ operator like this:

#+BEGIN_SRC sql
select col1, col2
          || col3 as composed_column, -- align col3 with col2
       col4
    || col5 as composed_column2
from   my_table
where  cond1 = 1
and    cond2 = 2;
#+END_SRC

**** sqlind-left-justify-logical-operator

If the line starts with a logic operator (AND, OR NOT), line the operator with
the start of the WHERE clause.  This rule should be added to the
~in-select-clause~ syntax after the ~sqlind-lineup-to-clause-end~ rule.

**** sqlind-right-justify-logical-operator

If the line starts with a logic operator (AND, OR NOT), line the operator with
the end of the WHERE clause. This rule should be added to the
~in-select-clause~ syntax.
  
#+BEGIN_SRC sql
select *
  from table
 where a = b
   and c = d; -- AND clause sits under the where clause
#+END_SRC

**** sqlind-adjust-comma

if the line starts with a comma, adjust the current offset so that the line is
indented to the first word character.  For example, if added to a
~select-column~ syntax indentation rule, it will indent as follows:

#+BEGIN_SRC sql
select col1
   ,   col2 -- align "col2" with "col1"
from my_table;
#+END_SRC

**** sqlind-lineup-into-nested-statement

discard the current offset and return the column of the first word inside a
nested statement.  This rule should be added to
~nested-statement-continuation~ syntax indentation rule, and will indent as
follows:

#+BEGIN_SRC sql
(    a,
     b  -- b is aligned with a
)
#+END_SRC

*** More Indentation Helper Functions
The following function contain indentation code specific to various SQL
statements.  Have a look at their doc strings for what they do:

 * ~sqlind-indent-comment-start~, ~sqlind-indent-comment-continuation~

 * ~sqlind-indent-select-column~

 * ~sqlind-indent-select-table~

 * ~sqlind-lineup-to-clause-end~

 * ~sqlind-right-justify-clause~

 * ~sqlind-lineup-joins-to-anchor~

** Syntactic Symbols

The SQL parsing code returns a syntax definition (either a symbol or a
list) and an anchor point, which is a buffer position.  The syntax symbols can
be used to define how to indent each line.  The following syntax symbols are
defined for SQL code:

 * ~(syntax-error MESSAGE START END)~ -- this is returned when the parse
   failed.  MESSAGE is an informative message, START and END are buffer
   locations denoting the problematic region.  ANCHOR is undefined for this
   syntax info

 * ~in-comment~ -- line is inside a multi line comment, ANCHOR is the start of
   the comment.

 * ~comment-start~ -- line starts with a comment.  ANCHOR is the start of the
   enclosing block.

 * ~in-string~ -- line is inside a string, ANCHOR denotes the start of the
   string.

 * ~toplevel~ -- line is at toplevel (not inside any programming construct).
   ANCHOR is usually (point-min).

 * ~(in-block BLOCK-KIND LABEL)~ -- line is inside a block construct.
   BLOCK-KIND (a symbol) is the actual block type and can be one of "if",
   "case", "exception", "loop" etc.  If the block is labeled, LABEL contains
   the label.  ANCHOR is the start of the block.

 * ~(in-begin-block KIND LABEL)~ -- line is inside a block started by a begin
   statement.  KIND (a symbol) is "toplevel-block" for a begin at toplevel,
   "defun" for a begin that starts the body of a procedure or function,
   \"package\" for a begin that starts the body of a package, nil for a begin
   that is none of the previous.  For a "defun" or "package", LABEL is the
   name of the procedure, function or package, for the other block types LABEL
   contains the block label, or the empty string if the block has no label.
   ANCHOR is the start of the block.

 * ~(block-start KIND)~ -- line begins with a statement that starts a block.
   KIND (a symbol) can be one of "then", "else" or "loop".  ANCHOR is the
   reference point for the block start (the corresponding if, case, etc).

 * ~(block-end KIND LABEL)~ -- the line contains an end statement.  KIND (a
   symbol) is the type of block we are closing, LABEL (a string) is the block
   label (or procedure name for an end defun).

 * ~declare-statement~ -- line is after a declare keyword, but before the
   begin.  ANCHOR is the start of the declare statement.

 * ~(package NAME)~ -- line is inside a package definition.  NAME is the name
   of the package, ANCHOR is the start of the package.

 * ~(package-body NAME)~ -- line is inside a package body.  NAME is the name
   of the package, ANCHOR is the start of the package body.

 * ~(create-statement WHAT NAME)~ -- line is inside a CREATE statement (other
   than create procedure or function).  WHAT is the thing being created, NAME
   is its name.  ANCHOR is the start of the create statement.

 * ~(defun-start NAME)~ -- line is inside a procedure of function definition
   but before the begin block that starts the body.  NAME is the name of the
   procedure/function, ANCHOR is the start of the procedure/function
   definition.

The following SYNTAX-es are for SQL statements.  For all of them ANCHOR points
to the start of a statement itself.

 * ~labeled-statement-start~ -- line is just after a label.

 * ~statement-continuation~ -- line is inside a statement which starts on a
   previous line.

 * ~nested-statement-open~ -- line is just inside an opening bracket, but the
  actual bracket is on a previous line.

 * ~nested-statement-continuation~ -- line is inside an opening bracket, but
   not the first element after the bracket.

 * ~nested-statement-close~ line is inside an opening bracket and the line
   contains the closing bracket as the first character.

The following SYNTAX-es are for statements which are SQL code (DML
statements).  They are specializations on the previous statement syntaxes and
for all of them a previous generic statement syntax is present earlier in the
SYNTAX list.  Unless otherwise specified, ANCHOR points to the start of the
clause (select, from, where, etc) in which the current point is.

 * ~with-clause~ -- line is inside a WITH clause, but before the main SELECT
   clause.

 * ~with-clause-cte~ -- line is inside a with clause before a CTE (common
   table expression) declaration

 * ~with-clause-cte-cont~ -- line is inside a with clause before a CTE
   definition

 * ~case-clause~ -- line is on a CASE expression (WHEN or END clauses).
   ANCHOR is the start of the CASE expression.

 * ~case-clause-item~ -- line is on a CASE expression (THEN and ELSE clauses).
   ANCHOR is the position of the case clause.

 * ~case-clause-item-cont~ -- line is on a CASE expression but not on one of
   the CASE sub-keywords.  ANCHOR points to the case keyword that this line is
   a continuation of.

 * ~select-clause~ -- line is inside a select statement, right before one of
   its clauses (from, where, order by, etc).

 * ~select-column~ -- line is inside the select column section, after a full
   column was defined (and a new column definition is about to start).

 * ~select-column-continuation~ -- line is inside the select column section,
   but in the middle of a column definition.  The defined column starts on a
   previous like.  Note that ANCHOR still points to the start of the select
   statement itself.

 * ~select-join-condition~ -- line is right before or just after the ON clause
   for an INNER, LEFT or RIGHT join.  ANCHOR points to the join statement for
   which the ON is defined.

 * ~select-table~ -- line is inside the from clause, just after a table was
   defined and a new one is about to start.

 * ~select-table-continuation~ -- line is inside the from clause, inside a
   table definition which starts on a previous line. ANCHOR still points to
   the start of the table definition.

 * ~(in-select-clause CLAUSE)~ -- line is inside the select CLAUSE, which can
   be "where", "order by", "group by" or "having".  Note that CLAUSE can never
   be "select" and "from", because we have special syntaxes inside those
   clauses.

 * ~insert-clause~ -- line is inside an insert statement, right before one of
   its clauses (values, select).

 * ~(in-insert-clause CLAUSE)~ -- line is inside the insert CLAUSE, which can
   be "insert into" or "values".

 * ~delete-clause~ -- line is inside a delete statement right before one of
   its clauses.

 * ~(in-delete-clause CLAUSE)~ -- line is inside a delete CLAUSE, which can be
   "delete from" or "where".

 * ~update-clause~ -- line is inside an update statement right before one of
   its clauses.

 * ~(in-update-clause CLAUSE)~ -- line is inside an update CLAUSE, which can
   be "update", "set" or "where"
* Limitations

The sql-indent package does not contain a full SQL parser, instead it relies
on various heuristics to determine the context of each line in a SQL program.
Relying on heuristics means that sometimes valid SQL code is not detected
correctly, and therefore the indentation will be wrong.

This section contains some of the known cases that are incorrectly detected,
and provides some workarounds for them.

** Parsing expressions

There is no support for parsing SQL expressions, so if an expression is broken
over several lines, sql-indent.el will consider all lines to be
~statement-continuation~ lines.  The exception is that bracketed expressions
are identified correctly so they can be used for indentation.

The examples below summarize what is supported and what is not, as well as the
workarounds:

#+BEGIN_SRC sql
  -- SUPPORTED: case expression immediately after assignment
  var := case ind
         when 1 then 'Guy'
         when 2 then 'Abc'
         when 3 then 'Def'
         else 'World'
         end case;

  -- NOT SUPPORTED: any complex expression involving a case expression.  entire
  -- expression is a 'statement-continuation
  var := 'abc'
    || case ind
    when 1 then 'Guy'
    when 2 then 'Abc'
    when 3 then 'Def'
    else 'World'
    end case;

  -- WORKAROUND: use brackets instead
  var := 'abc'
    || (case ind
        when 1 then 'Guy'
        when 2 then 'Abc'
        when 3 then 'Def'
        else 'World'
        end case);

  -- SUPPORTED: case expression as select column
  select col1,
         case ind
         when 1 then 'Guy'
         when 2 then 'Abc'
         when 3 then 'Def'
         else 'World'
         end case,
         col2,
    from some_table;

  -- NOT SUPPORTED: any complex expression involving a case expression in a
  -- select column.  Entire column is a 'select-column-continuation
  select col1,
         'abc' || case ind
           when 1 then 'Guy'
           when 2 then 'Abc'
           when 3 then 'Def'
           else 'World'
           end case,
         col2,
    from some_table;

  -- WORKAROUND: use brackets instead
  select col1,
         'abc' || (case ind
                   when 1 then 'Guy'
                   when 2 then 'Abc'
                   when 3 then 'Def'
                   else 'World'
                   end case),
         col2,
    from some_table;
#+END_SRC

** DECLARE statements in Postgres

The DECLARE statement in Postgres can start a block of declarations or declare
a single item.  Unfortunately, the sql-indent package is not always able to
differentiate between the two.  Curently, DECLARE blocks are only recognized
at the start of another enclosing block, such as $$, BEGIN, THEN or ELSE, but
they can occur on other situations.  The workaround is to enclose the DECLARE
block inside a BEGIN/END block or to use individual DECLARE statements.

For more info see https://github.com/alex-hhh/emacs-sql-indent/issues/92

DECLARE blocks are not recognized when the follow normal statements,
sql-indent considers them single statements instead.  In the example below,
DECLARE is considered a statement, and the ~local_b~ declaration is anchored
off the previous BEGIN statement:

#+BEGIN_SRC sql
  -- NOT SUPPORTED: `local_b` is not recognized as a declaration
  create function less_than(a text, b text) returns boolean as $$
    begin
      raise debug 'less_than(%, %)', a, b;
      declare
        local_a text := a;
      local_b text := b;
      begin
        return local_a < local_b;
      end;
    end;
  end;
  $$ language plpgsql;
#+END_SRC sql

The workaround is to either surround the DECLARE block with a BEGIN/END
statement, or to prefix each variable declaration with DECLARE, as in the two
examples below:

#+BEGIN_SRC sql
  -- WORKAROUND 1: surround the DECLARE/BEGIN/END with another BEGIN/END block
  create function less_than(a text, b text) returns boolean as $$
    begin
      raise debug 'less_than(%, %)', a, b;
      begin
        declare
          local_a text := a;
          local_b text := b;
        begin
          return local_a < local_b;
        end;
      end;
    end;
  end;
  $$ language plpgsql;

  -- WORKAROUND 2: declare each variable individually
  create function less_than(a text, b text) returns boolean as $$
    begin
      raise debug 'less_than(%, %)', a, b;
      declare
        local_a text := a;
      declare
        local_b text := b;
      begin
        return local_a < local_b;
      end;
    end;
  end;
  $$ language plpgsql;
#+END_SRC sql

.