Quoted Values and Tokenization


The lexer transforms raw user input into semantic tokens before parsing starts. Understanding this lexical layer is essential when values include spaces, punctuation, or apostrophes.

If a value must remain one atomic unit across spaces or significant punctuation, quote it.

Token categories

A compliant lexer distinguishes four functional categories:

  • WORD for ordinary non-whitespace sequences that do not hit a recognized delimiter.
  • STRING for values enclosed in single quotes.
  • DELIMITER for recognized punctuation and operator tokens.
  • EOF for the synthetic end-of-input marker appended by the lexer.

Whitespace rules

Whitespace separates tokens, but it has no standalone semantic meaning outside quoted strings. Spaces, tabs, and newlines matter only insofar as they influence token boundaries.

Inside a quoted string, whitespace is preserved as part of the semantic value.

Quoted values

A quoted value is written with single quotes. The outer quotes are not part of the bound semantic value.

'hello world'
'db-backup'
'March 2026 maintenance window'

Quoted values are especially important when a domain value contains spaces, punctuation, or delimiter characters that would otherwise be split into multiple tokens.

Escaped apostrophes

Inside a quoted string, a doubled apostrophe represents one literal apostrophe in the semantic value.

'O''Reilly'

The bound semantic value is O'Reilly.

Delimiters and atomicity

Delimiters such as ., ,, ;, :, (, ), and ... are recognized by the lexer outside quoted strings. Inside a quoted string, they remain ordinary content.

Unquoted input: report . pdf
Quoted input:   'report.pdf'

The first input yields multiple tokens. The second yields one semantic string value.

When quoting is mandatory

  • When the value contains spaces and must stay one parameter.
  • When the value contains significant delimiter characters that must not be split.
  • When the value begins with text that could be confused with a later grammar keyword.
  • When you want the bound value to preserve embedded punctuation exactly as one token.

Unclosed strings

An opening quote without a matching closing quote is a syntax error. The rich diagnostic should expose an expected possibility equivalent to a closing quote.

Examples

SET TITLE 'Quarterly Ops Review' ;
SET OWNER 'O''Reilly' ;
EXECUTE 'COMPLEX.TASK' ON db01 ;
SET DATE 07 . 03 . 2026 ;

Practical guidance

Grammar authors should test both quoted and unquoted input for any command that accepts free-form values. This is where most subtle lexical bugs appear.

Do not assume that punctuation-heavy business values will remain atomic when unquoted. Make the quoting contract explicit in your examples and tests.

See also