.Dd Wed Sep 25, 2024
.Dt charles 1
.Sh NAME
.Nm charles
.Nd Manuscript Formatter for FreeBSD
.Sh SYNOPSIS
.Nm charles Oo -a Oc Oo -n Oc Oo -i Oc Oo -c | -e | -x Oc Oo <infile> Oo <outfile> Oc Oc
.Sh DESCRIPTION
Charles converts a marked-up text document into a fiction manuscript
formatted as an RTF, a PDF, or an EPUB.
.Pp
Charles reads from stdin or from a single file specified on the
command line and writes to stdout or to another file specified on the
command line.  To specify an output file without an input file, specify an
input file of '-'.  If the input document is malformed, charles exits
without explanation.
.Pp
Charles expects its input to be encoded as UTF-8.
.Pp
Charles embeds an open source courier typeface into PDFs.
Charles's PDFs support the Unicode character sets that the fonts
support.  These are sufficient to support European languages that use
alphabets derived from the Latin alphabet.
.Pp
When producing EPUB files, charles accepts all Unicode code points.
When producing EPUB files, charles converts typewriter quotes and
double hyphens to their typographical equivalents and represents break
paragraphs as 2 typographical em-dashes.  Sections and Chapters start new
pages.  Both sections and chapters appear in the TOC.
.Pp
When invoked without options, charles reads TSML and writes
a PDF with 8.5in x 11in pages.
.Ss OPTIONS
.Bl -tag -width '-x'
.It -a
causes charles to use A4 papersize instead of the default US Letter.
.It -c
causes charles to convert text input to TSML. The text format
recognized is described in the section titled TEXT INPUT.
.It -e
causes charles to convert TSML input into an ebook hierarchy in a
directory named "ebook" that is a sub-directory of the current directory or
a sub-directory of the directory specified by the second command line
argument.
.It -i
causes charles to add markup for cover images in EPUBS. See the section
titled EPUB for details.
.It -n
causes 'c' headings to start on a new page.
.It -x
causes charles to produce RTF output instead of PDF.
.El
.Ss TSML
TSML is a generic markup language described in tsml(5).  TSML is designed
to be machine-friendly rather than human-friendly.  Charles uses a simple
TMSL scheme to classify paragraphs by type and to define document metadata.
.Pp
An example TSML manuscript is included in the source distribution in the
file document.tsml. A concise example TSML manuscript follows.
.Bd -literal -offset left
[y[
[m[
  [t[ TITLE 1;;TITLE 2 ]]
  [a[ Author 1; Author 2 ]]
  [c[ Contact 1; Contact 2 ]]
  [r[ Revision 1; Revision 2 ]]
  [h[ TITLE; Author ]]
  [w[ WATERMARK1; WATERMARK2 ]]
  [l[ en ]]
  [y[ 2018, Author 1 and Author 2]]l
]m]

[p[ THIS IS PART $P ]]
[s[ THIS IS SECTION $S ]]
[c[ THIS IS CHAPTER $C ]]

[b[
   _Lorem_ _ipsum_ _dolor_ _sit_ _amet_, consectetur adipiscing elit. Quis est
   enim, in quo sit cupiditas, quin recte cupidus dici possit? Estne, quaeso,
   inquam, sitienti in bibendo voluptas? Quasi ego id curem, quid ille aiat
   aut neget.  Illi enim inter se dissentiunt. Ne amores quidem sanctos a
   sapiente alienos esse arbitrantur. Duo Reges: constructio interrete.
]]
]y]
.Ed
.Ss ELEMENTS
All elements have names of a single letter.  The toplevel element is named
\'y'.  It contains one or more instances of the following child elements.
.Pp
.Bl -tag -width 't'
.It 'm'
Charles extracts from the children of the 'm' element the data to use in the
manuscript's title page, running header, and in the PDF Info dictionary.
To specify that the text in these elements be formatted into multiple
lines, separate the data for each line from the others with a semicolon.
To insert a blank line between lines, insert 2 contiguous semicolons.  To insert a
semicolon into your data, insert a tilde '~' instead.
.Bd -literal -offset
[t[Title;;Subtitle]]
.Ed
.Bl -tag -width 't'
.It 't'
specifies the title of the manuscript.  Titles are formatted in boldface.
.It 'a'
specifies the authors. If you want a 'By' line add it to the beginning of
your authors.
.It 'c'
specifies contact information.  All occurrences of the 3 characters "(C)"
are replaced with the copyright symbol.
.It 'r'
specifies revision information.  To insert the current date, make one of
the revision lines $D by itself.
.Pp
To insert the manuscript word count, start one of the revision lines with
$W. If you use $W by itself, the word count will be followed with "WORDS"
in English. If you prefer different text, follow $W with the desired text.
.Pp
To insert the manuscript page count, start one of the revision lines with
$P. If you use $P by itself, the page count will be followed with "PAGES"
in English. If you prefer different text, follow $P with the desired text.
Charles ignores the $P variable when generating RTF files.  RTF
readers perform pagination.  The number of pages of an RTF document may
vary from the PDF version and from reader to reader.
.It 'h'
specifies the content to be added to the running header of the manuscript.
This element should contain 2 lines.  The first line should be your last
name.  The second line should be one or two words from the title of the
manuscript.  Subsequent lines are ignored.
.It 'w'
specifies watermark text to be added underneath the text on each page.  All
occurrences of the 3 characters "(C)" are replaced with the copyright
symbol.  The specified lines are centered, rotated left, and rendered at 60
points with 90% opacity. The watermark is not a PDF annotation and cannot
be removed or edited by tools that operate on annotations.
.It 'l'
specifies the language of the input document as a BCP 47 country code.
Charles only accesses this element when generating ebooks. The value
defaults to 'en' (English).
.It 'y'
specifies a copyright message to be added to the title pages of epubs. It
should contain the year, a comma, and the name of the copyright owner. You
can include an email address or web URL after another comma, but beware
that all text is formatted onto a single line. Charles will precede
the copyright information with the text ' Copyright' and the c-in-circle
copyright symbol. If the 'y' element is missing from the metadata, no
copyright message is added to epubs.
.El
.Pp
The 'm' element should only appear once in a document.  If you provide
multiple instances, each instance's data supersedes the data in preceding
instances.  The 5 child elements should only appear once each inside 'm'.
If you provide multiple instances, the data in each instance supersedes
data the data in preceding instances.  If you do not provide the 't', 'a',
\'c', 'h', or 'r' elements, placeholder data is used.
.It 'p'
provides the content of a centered, boldface part title. The lines of the
paragraph are centered on a separate page.  Any occurrence of $P in a part
is replaced with the part number.  Parts are numbered from 1.  Force
line breaks with the bar character '|' (UTF-8 124).
.It 's'
provides the content of a centered, boldface section title. The lines of
the paragraph are centered on a separate page.  Any occurrence of $S in a
section is replaced with the section number. Sections are numbered from 1.
Force line breaks with the bar character '|' (UTF-8 124).
.It 'c'
provides the content of a centered, boldface chapter heading separated from
the rest of the manuscript with 2 blank lines before and 1 blank line after
the centered line.  Any occurrence of $C in a chapter is replaced with the
number of the chapter.  Chapters are numbered from 1.  To force the content
of 'c' elements to appear at the top of new pages, include the -n option on
the command line.  Forced line breaks with the bar character '|' (UTF-8
124).
.It 'b'
.It 'e'
Both provide the content of a body text paragraph. 'b' paragraphs are set
entirely in roman.  'e' paragraphs are set entirely in italics.  Forced
line breaks with the bar character '|' (UTF-8 124).
.It 'i'
indicates a break paragraph that indicates a break in the narrative less
significant than that which a chapter indicates.  Only the first character
of the content is used.  The paragraph is rendered as a single line
containing the centered character.
.El
.Ss CONTENT
Charles discards data outside of terminal elements.  You can place
comments in between terminal elements.
.Pp
Inside of terminal elements, charles normalizes whitespace by converting all
whitespace characters into space characters, collapsing multiple contiguous
spaces into single spaces, and removing leading and trailing spaces.
.Pp
Charles underlines words bookended with underscores.  If you want to
underline a section of text, you must _underline_ _each_ _word_
_individually_.  _Underline_-_each_-_word_-_in_-_a_-_hyphenate._ You cannot
underline metadata for the title page.
.Pp
You you cannot embolden or italicize individual words.  Boldface is
intrusive outside of headings, and Courier Oblique is not oblique enough to
attract the eye.  To emphasize words in a line, underline them. In EPUBS,
underlined text is rendered in italics without the underline.
.Pp
To insert [, ], or \\ into element content, escape each with a backslash:
.Bd -literal -offset left
[b[He looks up. \\[At what?\\]]]
.Pp
To indicate elision at the front of words, use a backtick: in the `hood.
In PDFs, the backtick is replaced with a typewriter apostrophe.  In EPUBs,
the backtick is replaced with a single typographical apostrophe.
.Pp
Non-ASCII characters can be entered as ASCII backslash escape sequences.
To insert the following escape sequences into TSML elements, escape the
backslash with another backslash: \\\\'e.  You do not have to do this in
the human-friendly markup language.
.Pp
\\' followed by an upper or lowercase vowel inserts the utf-8 character
sequence for that vowel with an acute accent.
.Pp
\\` inserts grave-accented vowels.
.Pp
\\^ inserts circumflex-accented vowels.
.Pp
\\: inserts diaresis/trema/umlaut-accented vowels and upper and lowercase Ys.
.Pp
\\~ inserts only upper and lowercase Ns with tildes.
.Pp
\\, inserts only upper and lowercase Cs with cedillas.
.Pp
\\u followed by a decimal number inserts a Unicode character by code point.
If the code point is immediately followed by digits that are part of the
document text, those digits must be represented by \\u escapes, or the
digits will be interpreted as part of the code point.
.Ss TEXT INPUT
When the -c option is invoked, charles converts text input to TSML.
Metadata are specified in a paragraph at the start of a manuscript:
.Bd -literal -offset left
Title: FIMBLETHWICKE TEST DOCUMENT
Author:
  By

  Carnaby Charles
Contact:
  C. Charles
  123 Ocean View Terrace
  Coastal City
  Nowhere, Noplace
  555 ZZZ

  charles@nowhere.net
Revision:
  $W
  $D
Header:
  Charles
  Test
Language: en
Copyright: 2018, Carnaby Charles
Watermark:
   Copyright (C) 2018,
   Carnaby Charles
.Ed
.Pp
To specify multiple lines for a keyword, leave the line after the
keyword blank and place the content on lines following the keyword line.
Start the content lines with whitespace.  A content line of only whitespace
will be replaced with a blank line in the output.
.Pp
To insert editorial content that is not included in the formatted output,
wrap paragraphs in '[[' and ']]'.  Multiple paragraphs of text may be
embedded in a single note "paragraph":
.Bd -literal -offset left
[[Multi-paragraph note.

Second paragraph of note.]]
.Ed
.Pp
To specify a part paragraph, start the paragraph with an asterisk:
.Bd -literal -offset left
*PART $P
.Ed
.Pp
To specify a section paragraph, start the paragraph with an equals sign:
.Bd -literal -offset left
=SECTION $S
.Ed
.Pp
To specify a chapter paragraph, start the paragraph with a period:
.Bd -literal -offset left
\&.CHAPTER $C
.Ed
.Pp
To indicate a break paragraph, put '#' by itself in the paragraph:
.Bd -literal -offset left
This is the paragraph before the break.

#

This is the paragraph after the break.
.Ed
.Pp
To indicate an italic paragraph, put '%' at the start of the paragraph.
Whitespace after '%' is removed by the formatter:
.Bd -literal -offset left
% This paragraph is set entirely in italics. _Underlined_ _text_ is
permissable in italic paragraphs. In EPUBs, the underlined text is
emboldened to distinguish it from the surrounding italic text.
.Ed
.Pp
All other paragraphs are body paragraphs.
.Pp
Separate paragraphs with blank lines.
.Pp
Line breaks are ignored in paragraphs.  Paragraphs are reflowed by
charles to fill the output page.  Force linebreaks with '|'
characters.
.Pp
The process of producing a PDF or RTF file from a text file can be
automated with a Makefile:
.Bd -literal -offset left
all: screenplay.pdf

manuscript.pdf: manuscript.tsml
   charles manuscript.tsml manuscript.pdf

manuscript.tsml: manuscript.txt
   charles -c manuscript.txt manuscript.tsml
.Ed
.Ss EPUB
If you use -i, you must manually copy two PNG images into the ebook/OEBPS
directory.
.Pp
The first image file must be named cover1.png.  This is the external cover
image.  I suggest that you create an image of 2560px height by 1600px width.
.Pp
The second image must be named cover2.png and contain no more than 4
megapixels.  Note that this limit is specified in pixels not bytes.  The
internal image should be a scaled-down version of the external image.  I
suggest that you create an image of 1600px height by 1000px width.
.Pp
Zip up the ebook directory to form an .epub file:
.Bd -literal -offset left
cd ebook
zip -X ../book.epub mimetype
zip -r ../book.epub META-INF OEBPS
cd ..
rm -r ebook
.Ed
.Pp
The process of converting a text format document into TSML and then into an
ebook can be automated with a Makefile:
.Bd -literal -offset left
all: book.epub

book.epub: manuscript.tsml
   charles -e manuscript.tsml
   cd ebook && zip -X ../book.epub mimetype && zip -r ../book.epub META-INF OEBPS
   rm -r ebook

manuscript.tsml: manuscript.txt
   charles -c manuscript.txt manuscript.tsml
.Ed
.Pp
.Sh AUTHORS
.An James Bailie Aq bailie9@icloud.com
.br
mammothcheese.ca
