@file:DependsOn("/antlr-4.11.1-complete.jar")
@file:DependsOn(".")Advanced Lexical Analysis
1 Introduction to ANTLR
1.1 What is ANTLR?
ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files. It’s widely used to build languages, tools, and frameworks. From a grammar, ANTLR generates a parser that can build and walk parse trees.
Terence Parr, the author of ANTLR
1.2 Lexical analysis using ANTLR
ANTLR can help us with the construction of a lexer class. It requires a lexer grammar file .g4, which is converted to a Java class.

2 Example
2.1 The grammar file
// SampleLexer.g4 lexer grammar SampleLexer; WHITESPACE : [ \t]+_; NEWLINE : [\r\n]+; NUMBER : [0-9]+; WORD : [a-zA-Z]+;
2.2 ANTLR Toolchain
$ java -jar /antlr-4.11.1-complete.jar
ANTLR Parser Generator Version 4.11.1
-o ___ specify output directory where all output is generated
-lib ___ specify location of grammars, tokens files
-atn generate rule augmented transition network diagrams
-encoding ___ specify grammar file encoding; e.g., euc-jp
-message-format ___ specify output style for messages in antlr, gnu, vs2005
-long-messages show exception details when available for errors and warnings
-listener generate parse tree listener (default)
-no-listener don't generate parse tree listener
-visitor generate parse tree visitor
-no-visitor don't generate parse tree visitor (default)
-package ___ specify a package/namespace for the generated code
-depend generate file dependencies
-D<option>=value set/override a grammar-level option
-Werror treat warnings as errors
-XdbgST launch StringTemplate visualizer on generated code
-XdbgSTWait wait for STViz to close before continuing
-Xforce-atn use the ATN simulator for all predictions
-Xlog dump lots of logging info to antlr-timestamp.log
-Xexact-output-dir all output goes into -o dir regardless of paths/package
Let’s generate the lexer Java class.
$ java -jar /antlr-4.11.1-complete.jar ./SampleLexer.g4
$ tree .
.
├── SampleLexer.g4
├── SampleLexer.interp <-- new
├── SampleLexer.java <-- new
└── SampleLexer.tokens <-- new
Compiling the code to Java class
$ javac -cp /antlr-4.11.1-complete.jar:. ./SampleLexer.java
$ tree .
.
├── SampleLexer.class <-- new
├── SampleLexer.g4
├── SampleLexer.interp
├── SampleLexer.java
└── SampleLexer.tokens
2.3 Using the lexer in Kotlin
import org.antlr.v4.runtime.*val input:CharStream = CharStreams.fromString("hello 123")val lexer = SampleLexer(input)val stream: CommonTokenStream = CommonTokenStream(lexer)val tokens: List<Token> = stream.apply {
this.fill()
}.getTokens()tokens.joinToString("\n")[@0,0:4='hello',<4>,1:0]
[@1,5:5=' ',<1>,1:5]
[@2,6:8='123',<3>,1:6]
[@3,9:8='<EOF>',<-1>,1:9]
To be completed