The lexer transforms raw user input into semantic tokens before parsing starts. Understanding this lexical layer is essential when values include spaces, punctuation, or apostrophes.
If a value must remain one atomic unit across spaces or significant punctuation, quote it.
A compliant lexer distinguishes four functional categories:
WORD for ordinary non-whitespace sequences that do not hit a recognized delimiter.STRING for values enclosed in single quotes.DELIMITER for recognized punctuation and operator tokens.EOF for the synthetic end-of-input marker appended by the lexer.Whitespace separates tokens, but it has no standalone semantic meaning outside quoted strings. Spaces, tabs, and newlines matter only insofar as they influence token boundaries.
Inside a quoted string, whitespace is preserved as part of the semantic value.
A quoted value is written with single quotes. The outer quotes are not part of the bound semantic value.
'hello world'
'db-backup'
'March 2026 maintenance window'
Quoted values are especially important when a domain value contains spaces, punctuation, or delimiter characters that would otherwise be split into multiple tokens.
Inside a quoted string, a doubled apostrophe represents one literal apostrophe in the semantic value.
'O''Reilly'
The bound semantic value is O'Reilly.
Delimiters such as ., ,, ;, :, (, ), and ... are recognized by the lexer outside quoted strings. Inside a quoted string, they remain ordinary content.
Unquoted input: report . pdf
Quoted input: 'report.pdf'
The first input yields multiple tokens. The second yields one semantic string value.
An opening quote without a matching closing quote is a syntax error. The rich diagnostic should expose an expected possibility equivalent to a closing quote.
SET TITLE 'Quarterly Ops Review' ;
SET OWNER 'O''Reilly' ;
EXECUTE 'COMPLEX.TASK' ON db01 ;
SET DATE 07 . 03 . 2026 ;
Grammar authors should test both quoted and unquoted input for any command that accepts free-form values. This is where most subtle lexical bugs appear.
Do not assume that punctuation-heavy business values will remain atomic when unquoted. Make the quoting contract explicit in your examples and tests.