summaryrefslogtreecommitdiff
path: root/scanner.texi
blob: bb99c901e51e02a50dfc8737dd7d919a7c587f59 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
\input texinfo   @c -*-texinfo-*-
@c %**start of header
@setfilename scanner.info
@c try the time-stamp package for version stuff
@set VERSION 0.3
@settitle Scanner Manual @value{VERSION}
@syncodeindex vr cp
@syncodeindex fn cp
@documentencoding UTF-8
@c %**end of header
@copying
This is the @emph{Scanner Manual}, corresponding to version @value{VERSION}.

Copyright @copyright{} 2021 Free Software Foundation, Inc.

@quotation
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.3
or any later version published by the Free Software Foundation;
with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
A copy of the license is included in the section entitled ``GNU
Free Documentation License''.

A copy of the license is also available from the Free Software
Foundation Web site at @url{https://www.gnu.org/licenses/fdl.html}.

@end quotation

The document was typeset with
@uref{http://www.texinfo.org/, GNU Texinfo}.

@end copying

@dircategory Emacs
@direntry
* Scanner: (scanner)Document and image scanning in GNU Emacs
@end direntry


@titlepage
@title Scanner
@subtitle Document and image scanning in GNU Emacs, version @value{VERSION}
@author Raffael Stocker
@page
@vskip 0pt plus 1filll
@insertcopying
@end titlepage

@c Output the table of the contents at the beginning.
@contents

@ifnottex
@node Top
@top Scanner


@insertcopying
@end ifnottex

@c TODO Organize the manual not as a list of features or options, but
@c instead logically by subtopics.  Address the user's questions, say
@c why a feature is there, what it is good for and how to use it to do a
@c certain job.

@c Generate the nodes for this menu with `C-c C-u C-a'.
@menu
* Overview::
* User Options::
* Improving Scan Quality::
* News::
* Reporting Bugs::
* GNU Free Documentation License::
* Index::
@end menu

@c Update all node entries with `C-c C-u C-n'.
@c Insert new nodes with `C-c C-c n'.
@node Overview
@chapter Overview
@cindex overview

This chapter gives provides you with the most important information to
get started using Scanner.

@menu
* Introduction::
* Basic Setup::
* Scanning Documents and Images::
@end menu

@node Introduction
@section Introduction
@cindex introduction

If you want to scan a document at high quality with @acronym{OCR,
optical character recognition} and not use one of the available free
GUI programs, there are several things you might have to do:
@itemize

@item
use a program like @command{scanimage} to obtain an image file from
your scanner,

@item
enhance the image quality using a post-processing tool like
@command{unpaper}, and

@item
generate a PDF or text file with OCR software like @command{tesseract}.
@end itemize

Although this is not difficult to do in principle, each of these
programs requires an elaborate incantation to produce adequate output:
@itemize

@item
the scan resolution must be set to something appropriate for later OCR
(usually 300 to 600 dpi,)

@item
the page size must be defined,

@item
perhaps some offsets must be added to page borders,

@item
the document may have to be rotated,

@item
some scan artifacts, like shadows, may have to be removed,

@item
the page may need de-skewing,

@item
the language for OCR must be selected,

@item
@dots{}
@end itemize

Luckily, many of these items change rarely or not at all.  Scanner
uses the customization system of GNU Emacs
(@pxref{Customization,,,emacs,The GNU Emacs Manual}) to remember the
necessary settings and takes care of processing using the
abovementioned programs.

@node Basic Setup
@section Basic Setup
@cindex basic setup
@cindex setup, basic
@cindex configuration, basic
@cindex installation

To get started with Scanner, make sure the following programs are
installed:
@table @command
@item scanimage
Scanimage comes with the sane-backends distribution, see
@url{http://sane-project.org/}. 

@item tesseract
Tesseract is used for OCR and PDF generation in document scans.  The
source is available at @url{https://github.com/tesseract-ocr/tesseract}.

@item unpaper
Unpaper is used for post-processing the scans obtained from
@command{scanimage} before feeding them into @command{tesseract}.  This
is optional, but highly recommended.  The source is available at
@url{https://github.com/unpaper/unpaper}. 
@end table

Tesseract is usually provided without the language data files as they
are very large.  The full set of language files is over 4@dmn{GB}.  Some
GNU/Linux distributions offer individual language packages; if yours
does not, you can download the language data files from
@url{https://github.com/tesseract-ocr/tessdata}.

Make sure the options @code{scanner-scanimage-program},
@code{scanner-tesseract-program}, and @code{scanner-unpaper-program} are
set correctly.  Also, the options @code{scanner-tessdata-dir} and
@code{scanner-tesseract-configdir} must be set correctly so
@command{tesseract} can find the language data files and output
configurations. 

Customize the basic options like @code{scanner-doc-papersize},
@code{scanner-resolution}, @code{scanner-tesseract-languages}, and
@code{scanner-tesseract-outputs}.  See @ref{User Options} for a detailed
discussion of all the available options.


@node Scanning Documents and Images
@section Scanning Documents and Images
@cindex scanning documents and images

The Scanner package provides two commands for scanning documents and
images.  These are described below.

@table @kbd
@item M-x scanner-scan-document
@itemx C-u M-x scanner-scan-document
@itemx C-u N M-x scanner-scan-document
@findex scanner-scan-document
Scan a document.  When called without a prefix argument, this command
will scan only one page.  When called with the default prefix argument
(as @kbd{C-u M-x scanner-scan-document}), it will ask after each scanned
page whether another pages should be scanned.  With a numeric prefix
argument, it will scan that many pages, waiting a number of seconds
between each page, as configured in @code{scanner-scan-delay}.

The scan will use the resolution configured in
@code{scanner-resolution} with the @code{:doc} key.

This command interactively reads a file name that will
be used as the base name of the output file(s).  The extension of the
file name is ignored as it is instead specified by the
@command{tesseract} output formats as configured with the option
@code{scanner-tesseract-outputs} or the command
@code{scanner-select-outputs}.
If the specified file already exists, @code{scanner-scan-document} will
ask for confirmation to overwrite it.

This command will trigger auto-detection if no device has been
configured.  If more than one device are available, it will ask
you to select one.

If you configured Scanner to use @command{unpaper}, this command will
post-process the scans obtained from @command{scanimage} using
@command{unpaper} before feeding the results to @command{tesseract}.
See @ref{User Options} to find out how to configure scan and
post-processing.

The scanning and conversion processes are run asynchronously.  If you
want to monitor progress, bring up the @code{*Scanner*} buffer which
collects the outputs of the backend programs.

This command is also available from the Scanner menu as@*
@clicksequence{Tools @click{} Scanner @click{} Scan a document}@*
for a single-page scan and@*
@clicksequence{Tools @click{} Scanner @click{} Scan a multi-page document}@*
for a multi-page scan.

@item M-x scanner-scan-image
@itemx C-u M-x scanner-scan-image
@itemx C-u n M-x scanner-scan-image
@findex scanner-scan-image
Scan an image.  When called without a prefix argument, this command
will scan only one image.  When called with the default prefix argument
(as @kbd{C-u M-x scanner-scan-image}), it will ask after each scanned
image whether another image should be scanned.  With a numeric prefix
argument, it will scan that many images, waiting a number of seconds
between each image, as configured in @code{scanner-scan-delay}.

The scan will use the resolution configured in
@code{scanner-resolution} with the @code{:image} key.

This command interactively reads a file name.  The extension of the file
name specifies the output file format.  If no extension is provided, the
default image format, as configured in @code{scanner-image-format} will
be used.  In a multi-image scan, this command will extend the given file
name base by @var{-number}, where @var{number} is the number of the
scanned image.  For example, if the file name is @file{image.jpeg}, a
multi-image scan of @var{n} images will produce the files
@file{image-1.jpeg}, @file{image-2.jpeg} @dots{} @file{image-n.jpeg}.
If one of these files already exists, @code{scanner-scan-image} will ask
for confirmation to overwrite it.

No post-processing with @command{unpaper} or @command{tesseract} is
done.  See @ref{User Options} to find out how to configure scanning.

This command is also available from the Scanner menu as@*
@clicksequence{Tools @click{} Scanner @click{} Scan an image}@*
for a single-page scan and@*
@clicksequence{Tools @click{} Scanner @click{} Scan multiple images}@*
for a multi-image scan.

@item M-x scanner-scan-preview
Make a preview scan.  This command makes a preview scan assuming
document scan settings.  The resolution of the scan is changed to
@code{scanner-preview-resolution} and the scan mode is set to ``Gray'',
otherwise all options stay in effect.  If @code{scanner-use-unpaper} is
non-@code{nil}, post-processing with unpaper is done as well.  The
resulting scan is shown in an image window, unless Emacs can't display
images, in which case a Dired buffer is created showing the files
generated by the scan.

This command is also available from the Scanner menu as@*
@clicksequence{Tools @click{} Scanner @click{} Make a preview scan}@*
@end table


@node User Options
@chapter User Options
@cindex user options

This chapter lists all the available user options.  All of these
options can be edited using the customization system of GNU Emacs,
which is advisable as then basic sanity checks are carried out.  For a
number of options, interactive commands are available that simplify
the customization at run time, but don't save the changed values
between Emacs sessions.  These functions are also available from the
Scanner menu (@clicksequence{Tools @click{} Scanner}).

@menu
* Configuration Commands::
* General Options::
* Configuring scanimage::
* Configuring unpaper::
* Configuring tesseract::
@end menu

@node Configuration Commands
@section Configuration Commands
@cindex configuration commands

The following commands help you configure some of the more-often used
options.  They only change the options for the running session; if you
want to permanently set an option, so Emacs will remember them between
sessions, use the customization interface.

@table @kbd
@item M-x scanner-set-image-resolution
@item M-x scanner-set-document-resolution
@findex scanner-set-document-resolution
@findex scanner-set-image-resolution
These commands interactively asks for a resolution (in @acronym{DPI,
dots per inch}) to be used in subsequent image and document scans,
respectively.  The corresponding user options is
@code{scanner-resolution}.

These commands are available in the Scanner menu as@*
@clicksequence{Tools @click{} Scanner @click{} Select image
resolution}@*
and@*
@clicksequence{Tools @click{} Scanner @click{} Select
document resolution}.

@item M-x scanner-select-papersize
@findex scanner-select-papersize
Select a paper size from @code{scanner-paper-sizes} or
@code{:whatever}.  See also @code{scanner-doc-papersize}.

This command is available in the Scanner menu as@*
@clicksequence{Tools @click{} Scanner @click{} Select paper size}.

@item M-x scanner-select-image-size
@findex scanner-select-image-size
Select an image size.  This command interactively reads x and y
dimensions in millimeter from the minibuffer and sets
@code{scanner-image-size} accordingly.

This command is also available in the Scanner menu as@*
@clicksequence{Tools @click{} Scanner @click{} Select image size}.

@item M-x scanner-select-outputs
@findex scanner-select-outputs
Select the document outputs.  This command reads a list of document
output formats.  See also @code{scanner-tesseract-outputs}.

This command is also available in the Scanner menu as@*
@clicksequence{Tools @click{} Scanner @click{} Select document outputs}.

@item M-x scanner-select-languages
@findex scanner-select-languages
Select the languages assumed for OCR.  This command reads a list of
languages used for OCR.  The necessary @command{tesseract} data files
must be available.  See @code{scanner-tesseract-languages}.

This command is also available in the Scanner menu as@*
@clicksequence{Tools @click{} Scanner @click{} Select OCR languages}.

@item M-x scanner-select-device
@itemx C-u M-x scanner-select-device
@findex scanner-select-device
Select a device, possibly triggering auto-detection.  Normally, manual
device selection is not necessary as @command{scanimage} will
auto-detect.  However, if you have multiple devices and want to change
between them, you can use this command to do so.

When called with a prefix argument, auto-detection is forced even when
devices have already been detected before.

This command is also available in the Scanner menu as@*
@clicksequence{Tools @click{} Scanner @click{} Select scanning device}

@item M-x scanner-set-scan-delay
Set the delay in seconds between scans in multi-page mode.  This
commands sets the variable @code{scanner-scan-delay}.

This command is also available in the Scanner menu as@*
@clicksequence{Tools @click{} Scanner @click{} Set delay between scans}

@item M-x scanner-set-brightness
Set the brightness of the scans.  The available range is
device-specific.

This command is also available in the Scanner menu as@*
@clicksequence{Tools @click{} Scanner @click{} Set brightness}

@item M-x scanner-set-contrast
Set the contrast of the scans.  The available range is
device-specific.

This command is also available in the Scanner menu as@*
@clicksequence{Tools @click{} Scanner @click{} Set contrast}


@end table

The following commands can be found in the ``Scan Enhancement'' submenu
of the Scanner menu (@clicksequence{Tools @click{} Scanner @click{} Scan
Enhancement}).  They require @command{unpaper} to be installed.  Scan
enhancement allows such post-processing operations as rotation,
de-noising, and deskewing, among others.  It is highly recommended as a
preparatory step before OCR.  The descriptions of the commands below
give a few hints on the usage of @command{unpaper}.  For more details,
see its man-page or web-site.

@table @kbd
@item M-x scanner-toggle-use-unpaper
@findex scanner-toggle-use-unpaper
Toggle the use of @command{unpaper} for scan enhancement.  This command
changes the option @code{scanner-use-unpaper} during the session.  Only
when this option is non-@code{nil} will @command{unpaper} be used and
the other items in the ``Scan Enhancement'' menu be available.

This command is also available in the Scanner menu as@*
@clicksequence{Tools @click{} Scanner @click{} Scan Enhancement @click{}
Use unpaper for scan enhancement}

The following commands configure some important processing steps; see
@ref{Configuring unpaper} for all the options.

@item M-x scanner-select-page-layout
@findex scanner-select-page-layout
This command interactively asks for the page layout of the pages to be
scanned.  Available options are ``single'', ``double'', and ``none''
(the default).  If you scan a sheet with two pages, for example as with
a book, you can choose ``double'' here so @command{unpaper} will divide
the sheet into two output pages.  If you use ``single'', it will try to
identify the actual (single-)page contents on the sheet and stretch
these to fit the output page size.  If you don't want any rearrangement,
choose ``none''.  Note that ``double'' page layout implies a landscape
orientation.  This command sets the option
@code{scanner-unpaper-page-layout} accordingly.  If you want to split up
an input page into two output pages, you must also use the
@command{scanner-select-output-pages} command.

This command is also available in the Scanner menu as@*
@clicksequence{Tools @click{} Scanner @click{} Scan Enhancement @click{}
Select page layout}

@item M-x scanner-select-input-pages
@findex scanner-select-input-pages
This command allows you to select the number of input pages.  Available
options are @code{1} and @code{2}.  It sets the option
@code{scanner-unpaper-input-pages}.  If you wanted to combine two
scanned input pages into one page, for example, to have left and right
sides on one sheet, you would select two input pages and one output
page, together with a ``single'' (or ``none'') page layout.

This command is also available in the Scanner menu as@*
@clicksequence{Tools @click{} Scanner @click{} Scan Enhancement @click{}
Select number of input pages}

@item M-x scanner-select-output-pages
@findex scanner-select-output-pages
This command allows you to select the number of output pages.  Available
options are @code{1} and @code{2}.  It sets the option
@code{scanner-unpaper-output-pages}.  If you wanted to split one scanned
input page into two output pages, for example, to have left and right
sides from a book on separate pages, you would select one input page and
two output pages, together with a ``double'' page layout.

This command is also available in the Scanner menu as@*
@clicksequence{Tools @click{} Scanner @click{} Scan Enhancement @click{}
Select number of output pages}

@item M-x scanner-select-pre-rotation
@findex scanner-select-pre-rotation
This command asks for the rotation to be applied before any further
processing.  Available values are ``clockwise'', ``counter-clockwise'',
and ``none''.  It sets the @code{scanner-unpaper-pre-rotation} option.
You should use this option if you have a landscape-oriented document
scanned as portrait.  Rotating before further processing is especially
relevant for scanning double-page documents, as it ensures that the
document is in the correct orientation before @command{unpaper} tries to
split pages.

This command is also available in the Scanner menu as@*
@clicksequence{Tools @click{} Scanner @click{} Scan Enhancement @click{}
Select page rotation before processing}

@item M-x scanner-select-post-rotation
@findex scanner-select-post-rotation
This command asks for the rotation to be applied after all the 
processing.  Available values are ``clockwise'', ``counter-clockwise'',
and ``none''.  It sets the @code{scanner-unpaper-post-rotation} option.

This command is also available in the Scanner menu as@*
@clicksequence{Tools @click{} Scanner @click{} Scan Enhancement @click{}
Select page rotation after processing}

@item M-x scanner-select-pre-size
@findex scanner-select-pre-size
This command interactively asks for the page size to set before further
processing.  The scanned sheets will be scaled to this size.  Available
options are ``a5'', ``a4'', ``a3'', ``a5-landscape'', ``a4-landscape'',
``a3-landscape'', ``letter'', ``legal'', ``letter-landscape'',
``legal-landscape'', ``none'', and direct width and height
specifications as in ``21cm,29.7cm''.  See the documentation for
@command{unpaper} for the understood units.  If you choose ``none'', no
size will be specified in the invocation of @command{unpaper} and it
will select the size based on the input data.

This command is also available in the Scanner menu as@*
@clicksequence{Tools @click{} Scanner @click{} Scan Enhancement @click{}
Select page size before processing}

@item M-x scanner-select-post-size
@findex scanner-select-post-size
This command interactively asks for the page size to set after all the
processing.  The processed sheets will be scaled to this size.  Available
options are ``a5'', ``a4'', ``a3'', ``a5-landscape'', ``a4-landscape'',
``a3-landscape'', ``letter'', ``legal'', ``letter-landscape'',
``legal-landscape'', ``none'', and direct width and height
specifications as in ``21cm,29.7cm''.  See the documentation for
@command{unpaper} for the understood units.  If you choose ``none'', no
size will be specified in the invocation of @command{unpaper} and it
will select the size based on the processed data.

This command is also available in the Scanner menu as@*
@clicksequence{Tools @click{} Scanner @click{} Scan Enhancement @click{}
Select page size after processing}
@end table


@node General Options
@section General Options
@cindex general options

@defopt scanner-resolution
This option specifies the resolution in DPI used for image and
document scans as a property list with the keys @code{:image} and
@code{:doc}, respectively, and integers as values.  The default is:
@lisp
(:image 600 :doc 300)
@end lisp
The available resolutions depend on your device.

This option can be set per-session with the commands
@code{scanner-select-image-resolution} and
@code{scanner-select-document-resolution}.
@end defopt

@defopt scanner-preview-resolution
This option specifies the resolution in DPI used in preview scans.  The
default is 75.
@end defopt

@defopt scanner-brightness
This option specifies the brightness setting for scans.  The default is
20.  This option assumes the @command{scanimage} switch
@option{--brightness} is available, which is device-specific.

This option can be set with the command @code{scanner-set-brightness}.
@end defopt

@defopt scanner-contrast
This option specifies the contrast setting for scans.  The default is
50.  This option assumes the @command{scanimage} switch
@option{--contrast} is available, which is device-specific.

This option can be set with the command @code{scanner-set-contrast}.
@end defopt

@defopt scanner-paper-sizes
This option holds paper sizes for document scans as a property list with
the name of the page format as the key (e.g. @code{:a4}) and a list of
width/height pairs in millimeters as value.  The default is:
@lisp
(:a3
 (297 420)
 :a4
 (210 297)
 :a5
 (148 210)
 :a6
 (105 148)
 :tabloid
 (279.4 431.8)
 :legal
 (215.9 355.6)
 :letter
 (215.9 279.4))
@end lisp
@end defopt

@anchor{scanner-doc-papersize}
@defopt scanner-doc-papersize
Use this option to select the paper size for the document scans.  The
value must be one of the keys from @code{scanner-paper-sizes}, or the
special value @code{:whatever} that lets @command{scanimage} select the
paper size (usually the available scan area).  The default is
@code{:a4}.

This option can be set per-session with the command
@code{scanner-select-papersize}.
@end defopt

@defopt scanner-image-size
This option specifies the size used in image scans as a list of width
and height values in millimeters.  The default is
@lisp
(200 250)
@end lisp
for an image of 200@dmn{mm} width and 250@dmn{mm} height.  If set to
nil, the size is determined by @command{scanimage} (usually the available scan
area.)

This option can be set per-session with the command
@code{scanner-select-image-size}.
@end defopt

@defopt scanner-scan-delay
This option specifies the delay in seconds to wait between pages in a
multi-page scan.  Set this to something large enough so you can feed
the next sheet to your scanner before it starts scanning the next
page.  The default is 3.
@end defopt

@defopt scanner-reverse-pages
This option, when set to t, causes Scanner to reverse the order of the
scanned pages in a document scan.  The default is nil.
@end defopt

@node Configuring scanimage
@section Configuring scanimage
@cindex configuring scanimage

Some of the options @command{scanimage} accepts (and Scanner uses) are
device-dependent.  To find out which options your scanner hardware
offers, run @command{scanimage --help} with your scanner plugged in.
This incantation should print a list of general and device-dependent
options.

@defopt scanner-scanimage-program
This option specifies the path of @command{scanimage}.  The default is given by
@lisp
(executable-find "scanimage")
@end lisp
@end defopt

@defopt scanner-scan-mode
This option specifies the scan modes for document and image scans.  It
is a property list with the keys @code{:image} and @code{:doc}, for
images and documents, respectively, and strings naming the scan modes
as values.  For example,
@lisp
(:image "Color" :doc "Gray")
@end lisp
sets ``Color'' mode for image scans and ``Gray'' mode for document
scans.  The default is to use ``Color'' for both image and document
scans.

The available scan modes depend on your device.  Usually, ``Lineart'',
``Gray'', and ``Color'' are available.  For images you probably want
``Color'', and for good OCR results in document scans, you should
choose either ``Gray'' or ``Color''.
@end defopt

@defopt scanner-image-format
This option sets the default format used by @command{scanimage} for image and
document scans.  It is a property list similar to
@code{scanner-scan-mode}.  For example, the default
@lisp
(:image "jpeg" :doc "pnm")
@end lisp
configures Scanner to use the JPEG format for image scans and the PNM
format for document scans.  While document scans will always use the
format specified with this option, you can override the format used in
image scans with the appropriate file extension, see
@ref{Scanning Documents and Images}.

The supported formats are documented in the @command{scanimage} manual page.
For example, version 1.0.31 of @command{scanimage} supports PNM, TIFF, PNG and
JPEG. 

Note that the document scan format specified with this option is an
intermediate format, not the document format generated at the end of
the whole process.  With the PNM format used in the example above, you
can still have a PDF output, see @ref{scanner-tesseract-outputs}.

If you use @command{unpaper} for post-processing before OCR in document
scans (@pxref{Configuring unpaper}), the format will silently be forced
to PNM, as this is required by @command{unpaper}.
@end defopt

@defopt scanner-device-name
The device name of the scanner as reported by @command{scanimage}.  The default
is nil, which prompts Scanner to use @command{scanimage} for automatic
detection.  The detected device will be stored in this variable and
used for all subsequent scans, until a new detection is forced either
by calling @code{scanner-select-device} with a prefix argument, or by
this device becoming unavailable.

Usually you need not customize this option as auto-detection should
work just fine.
@end defopt

@defopt scanner-scanimage-switches
You may find that additional switches to @command{scanimage} not covered by any
of the above user options are necessary.  You can use
@code{scanner-scanimage-switches} for these.  Specify the switches as a
list of switch/value pairs, such as:
@lisp
("--switch1" "value1" "-s" "2")
@end lisp
The default is nil.
@end defopt


@node Configuring unpaper
@section Configuring unpaper
@cindex configuring unpaper

@defopt scanner-unpaper-program
This variable contains the path of the @command{unpaper} program.
@end defopt

@defopt scanner-use-unpaper
If this option is non-@code{nil}, scan enhancement using
@command{unpaper} is activated.  Although using @command{unpaper} is
highly recommended, its configuration is a bit elaborate and might be
confusing at first.  The default is therefore @code{nil}.
@end defopt

@defopt scanner-unpaper-page-layout
This option specifies the page layout of the scanned sheets.  Allowed
values are ``single'', ``double'', and ``none'', setting
@command{unpaper} up for detection of the page extent.  Note that
``double'' implies a landscape orientation.  This option corresponds to
the @option{--layout} option of @command{unpaper}.  See its
documentation for details on the implications of the values.  The
default is ``none''.
@end defopt

@defopt scanner-unpaper-input-pages
This option selects the number of pages per scanned sheet of input.
Allowed values are @code{1} and @code{2}.  This variable corresponds to
the @option{--input-pages} option of @command{unpaper}.  If set to two
input pages, @command{unpaper} will pairwise combine input sheets.  The
default is @code{1}.
@end defopt

@defopt scanner-unpaper-output-pages
This option selects the number of pages per sheet of processed output.
Allowed values are @code{1} and @code{2}.  This variable corresponds to
the @option{--output-pages} option of @command{unpaper}.  If set to two
output pages, @command{unpaper} will split up every page of processed
output into two pages.  The default is @code{1}.
@end defopt

@defopt scanner-unpaper-pre-rotation
This option specifies the rotation to be applied before further
processing.  Allowed values are ``clockwise'', ``counter-clockwise'',
and ``none''.  This variable corresponds to the @option{--pre-rotation}
option of @command{unpaper}.  If you choose ``none'', no rotation is
specified in the invocation of @command{unpaper}.  The default is
``none.
@end defopt

@defopt scanner-unpaper-post-rotation
This option specifies the rotation to be applied after all the
processing.  Allowed values are ``clockwise'', ``counter-clockwise'',
and ``none''.  This variable corresponds to the @option{--post-rotation}
option of @command{unpaper}.  If you choose ``none'', no rotation is
specified in the invocation of @command{unpaper}.  The default is
``none.
@end defopt

@defopt scanner-unpaper-pre-size
This option specifies the page size to assume before further processing.
The scanned input will be scaled to this size.  Allowed values are
``a5'', ``a4'', ``a3'', ``a5-landscape'', ``a4-landscape'',
``a3-landscape'', ``letter'', ``legal'', ``letter-landscape'',
``legal-landscape'', ``none'', and direct width and height
specifications as in ``21cm,29.7cm''.  This variable corresponds to the
@option{--size} option of @command{unpaper}.  The default is ``a4''.
@end defopt

@defopt scanner-unpaper-post-size
This option specifies the page size to assume after all the processing.
The processed output will be scaled to this size.  Allowed values are
``a5'', ``a4'', ``a3'', ``a5-landscape'', ``a4-landscape'',
``a3-landscape'', ``letter'', ``legal'', ``letter-landscape'',
``legal-landscape'', ``none'', and direct width and height
specifications as in ``21cm,29.7cm''.  This variable corresponds to the
@option{--post-size} option of @command{unpaper}.  The default is ``a4''.
@end defopt

@defopt scanner-unpaper-border
This option allows you to force a border of white pixels at the four
edges of a scanned sheet.  Allowed is any list of four integers, for
example, @code{(10 10 10 10)} (the default).  This is very useful to
remove black or gray scan artefacts at the edges of a sheet.  Even if
this is not specified, @command{unpaper} will try to detect any such
artefacts and remove them.  However, forcing a border usually leads to
better results.  This variable corresponds to the @option{--border}
option of @command{unpaper}.
@end defopt

@defopt scanner-unpaper-switches
Any additional parameters to @command{unpaper} can be specified using
this option.  Allowed is any list comprising valid @command{unpaper}
options as strings.
@end defopt

@node Configuring tesseract
@section Configuring tesseract
@cindex configuring tesseract

@defopt scanner-tesseract-program
This option specifies the path of the @command{tesseract} program.
@end defopt

@defopt scanner-tessdata-dir
This option specifies the @file{tessdata} directory.  This directory is
supposed to contain the language data files for @command{tesseract}.
The default is @file{/usr/share/tessdata/}.
@end defopt

@defopt scanner-tesseract-configdir
This option specifies the @command{tesseract} @file{configs} directory.
This directory is supposed to contain the language data files for
@command{tesseract}.  The default is
@file{/usr/share/tessdata/configs/}.
@end defopt

@defopt scanner-tesseract-languages
This option lists the languages passed to @command{tesseract} as a list
of strings.  The default is:
@lisp
("eng")
@end lisp
It is possible to pass more than one language to @command{tesseract},
which can be useful if you have a multi-language document.  For
instance,
@lisp
("eng" "deu")
@end lisp
sets @command{tesseract} up for recognizing english and german language.
However, for single-language documents, the best results are usually
obtained when setting only one language.

This option can be set per-session with the command
@code{scanner-select-languages}.
@end defopt

@anchor{scanner-tesseract-outputs}
@defopt scanner-tesseract-outputs
This option lists the output formats to produce.  The available output
formats are provided as configuration files in the
@file{/usr/share/tessdata/configs/} directory.  The default
@lisp
("pdf" "txt")
@end lisp
causes @command{tesseract} to output both a PDF and a text file.

This option can be set per-session with the command
@code{scanner-select-outputs}.
@end defopt

@defopt scanner-tesseract-switches
You can use this option to specify any additional switches for
@command{tesseract} not covered by the above options.  Use the same
format as for @code{scanner-scanimage-switches}.  The default is nil.
@end defopt

@node Improving Scan Quality
@chapter Improving Scan Quality
@cindex improving scan quality
@cindex scan quality, improving
@cindex quality, improving

This chapter comprises recommendations for improving the scan or OCR
quality.  If you know about any additional tips and tricks to improve
quality, please let the author know about them.

Besides checking the following sections, you might also want to consult
the documentation for @command{tesseract},
@url{https://tesseract-ocr.github.io/tessdoc/}, and @command{unpaper},
@url{https://github.com/unpaper/unpaper/blob/main/doc/basic-concepts.md}
(basics) and
@url{https://github.com/unpaper/unpaper/blob/main/doc/image-processing.md}
(details).


@menu
* Improving General Scan Quality::
* Improving OCR::
@end menu

@node Improving General Scan Quality
@section Improving General Scan Quality
@cindex improving general scan quality

@table @asis
@item Image format
As a lossy format, JPEG is not a good basis for later OCR.  Therefore,
use PNG, PNM, or TIFF for document scanning.  If you also use
@command{unpaper}, the image format is forced to PNM, as required by
this tool.

@item Scan area
Besides setting the size of the scan area, @command{scanimage} allows
you to specify offsets to the top and left edges.  The device-specific
switches @option{-l} and @option{-t}, if available, allow you to specify
the top-left x and y positions, respectively.  This can be used to get
rid of some blacked out parts in corners due to the mis-alignment of
scan area and scanned sheet.

@item Resolution
For document scans, use at least 300 DPI to achieve acceptable OCR
results.  A resolution above 600 DPI will not enhance OCR quality any
further and only leads to larger files.  For most documents, 300 DPI
should be ok.

@item Brightness and contrast
Good document quality (and especially good OCR results) require
sufficient contrast and a good reproduction of the document's background
color.  If the defaults of your device are inadequate, use the
brightness and contrast settings of @command{scanimage} to provide
sensible values.  See @ref{Configuring scanimage} and the options
@code{scanner-brightness} and @code{scanner-contrast} there.  You may
want to try a low brightness setting (for example, 20) and a medium
contrast setting (for example, 50) as a start.

Note that the underlying parameters to @command{scanimage} are
device-specific.  If the two mentioned options are not supported by your
device, you may be able to use @code{scanner-scanimage-switches} to
supply the specific switches to @command{scanimage}.

@item Dark areas and shadows
If your scan shows dark (black/gray) areas or shadows, for example in
the fold when scanning a book, use @command{unpaper} to remove them.  If
it cannot remove these areas automatically, you can manually specify an
area to be wiped out using the @option{--wipe} switch of
@command{unpaper}.  If you scan a page with the ``double'' layout and
want to remove the shadow of a book fold, use the @option{--middle-wipe}
switch.  You can put these switches into the
@code{scanner-unpaper-switches} option.  See also the @command{unpaper}
documentation.

@item Page borders
If you use @command{unpaper}, it will try to remove dark areas around
the edges of the page.  If this does not work automatically, use the
@code{scanner-unpaper-border} option to specify a border (in pixels)
around the edges of the page that is to be wiped, see @ref{Configuring
unpaper}.

@end table

@node Improving OCR
@section Improving OCR
@cindex improving ocr

@table @asis
@item Tesseract version
Use version 4 or higher of @command{tesseract}.  This version includes a
new OCR engine that delivers better results than the previous one.
Also, @command{tesseract} is multithreaded starting from version 4 and
is therefore faster on multi-core machines.

@item Language setting
Tesseract allows you to use multiple languages.  For single-language
documents, however, this doesn't seem to be optimal.  It's best to
choose a single language when possible.

@item Page deskewing
OCR is quite sensitive to any skew of a page.  Use @command{unpaper} to
deskew the pages.  See @ref{Configuring unpaper}.

In some cases, @command{unpaper} may not be able to deskew a page
automatically.  If so, have a look at the deskewing switches of
@command{unpaper}.  Especially @option{--deskew-scan-step},
@option{--deskew-scan-deviation}, and @option{--deskew-scan-range} can
be helpful.  You can put those switches into
@code{scanner-unpaper-switches}.  See the @command{unpaper}
documentation for details.

@end table






@node News
@chapter News
@cindex news

This chapter lists the changes made in new releases of Scanner.

@menu
* Changes in Version 0.3::
@end menu

@node Changes in Version 0.3
@section Changes in Version 0.3
@cindex changes in version 0.3

@itemize
@item
@command{unpaper} has been added as a new backend; it allows scan
enhancement in document scans; see @ref{Configuring unpaper} for
options.  There is a new ``Scan Enhancement'' sub-menu for this backend.
@item
A command for making a preview scan has been added.
@item
Menu items and user options for brightness and contrast have been added.
@item
A menu item for setting the scan delay has been added.
@item
Older @command{scanimage} versions (before 1.0.28) are now supported as well.
@item
A new page size @code{:whatever} allows @command{scanimage} to select
the page size based on the available scan area.
@item
Scanner now comes with a manual.
@end itemize



@node Reporting Bugs
@chapter Reporting Bugs
@cindex reporting bugs

Refer to @uref{https://www.gitlab.com/rstocker/scanner/}
mention *Scanner* log buffer


@node GNU Free Documentation License
@chapter GNU Free Documentation License
@c Get fdl.texi from https://www.gnu.org/licenses/fdl.html
@include fdl.texi


@node Index
@unnumbered Index

@printindex cp

@c combine indices

@bye

@c scanner.texi ends here