title: Parser error messages don't identify source file state: OPEN author: dboreham labels: comments: 0 assignees: projects: milestone: number: 41 -- I noticed while testing which of the kroc course module example programs transpile and run ok that the parser's error output is not very useful. For example: ``` $ ./occam2go -I kroc/modules/course/libsrc -D TARGET.BITS.PER.WORD=32 -o test.go kroc/modules/course/examples/test_utils.occ Parse errors: line 4335: expected type after VAL, got [ line 4335: unexpected token: INT line 4335: unexpected token: ] line 4391: expected ), got STRING line 4391: unexpected token: STRING line 4391: unexpected token: * ``` Since the test_utils.occ program is quite short, these messages about line 4335 are presumably to do with the course library itself? Actually I'm a bit confused by this because the course library parses ok in other cases, when I try to transpile double_echo.occ for example it works ok. The same #include is used in both programs. Anyway it would be useful to get a better error message that identifies the source file. This would aid in diagnosis.
Entered plan mode. You should now focus on exploring the codebase and designing an implementation approach. In plan mode, you should: 1. Thoroughly explore the codebase to understand existing patterns 2. Identify similar features and architectural approaches 3. Consider multiple approaches and their trade-offs 4. Use AskUserQuestion if you need to clarify the approach 5. Design a concrete implementation strategy 6. When ready, use ExitPlanMode to present your plan for approval Remember: DO NOT write or edit any files yet. This is a read-only exploration and planning phase.
The issue is about parser error messages not identifying the source file. When using #INCLUDE, the line numbers refer to the expanded source (after preprocessing), not the original file. This makes it hard to diagnose errors.
Let me explore how the preprocessor, lexer, and parser handle source locations.
{
"subagent_type": "Explore",
"prompt": "I need to understand how source file locations are tracked through the occam2go transpiler pipeline (preproc → lexer → parser → codegen). The issue is that parser error messages show line numbers from the preprocessed/expanded source, not from the original source files. When `#INCLUDE` pulls in a large library, errors say \"line 4335\" even though the original file is small.\n\nPlease investigate thoroughly:\n\n1. **preproc/preproc.go**: How does the preprocessor handle `#INCLUDE`? Does it track which file each line comes from? Does it produce any source mapping?\n\n2. **lexer/lexer.go** and **lexer/token.go**: How does the lexer track line numbers? Does Token have a file field? How is position tracked?\n\n3. **parser/parser.go**: How does the parser report errors? Search for all error formatting/reporting (look for \"error\", \"Errorf\", \"errors\", \"parseError\", etc.). How are line numbers included in error messages?\n\n4. **ast/ast.go**: Do AST nodes carry source location info?\n\n5. **main.go**: How are errors displayed to the user?\n\nFor each file, show the relevant code snippets with line numbers."
}[
{
"type": "text",
"text": "Perfect! Now I have all the information I need. Let me provide a comprehensive investigation report.\n\n## Investigation Report: Source File Location Tracking in occam2go\n\nI've thoroughly analyzed how source file locations are tracked through the transpiler pipeline. Here are my findings:\n\n---\n\n### **1. PREPROCESSOR (preproc/preproc.go) — NO SOURCE MAPPING**\n\n**Key Functions:**\n- **`ProcessFile()` (lines 64-82)**: Reads a file and calls `processSource()`\n- **`processSource()` (lines 92-180)**: Line-by-line expansion that processes `#INCLUDE`, `#IF`/`#ELSE`/`#ENDIF`, `#DEFINE`, etc.\n\n**Critical Issue:**\n```\nLines 97-173: The preprocessor iterates through input lines with:\n for i, line := range lines {\n if i > 0 {\n out.WriteByte('\\n') // Line 99\n }\n ...\n }\n```\n\n**The preprocessor produces a single flattened string with NO source mapping information:**\n- It expands `#INCLUDE` directives inline (lines 144-154)\n- When including a file: `included, err := pp.resolveAndInclude(rest, baseDir)` (line 146) and `out.WriteString(included)` (line 150)\n- **No tracking of which file each line came from**\n- **No comment markers or metadata** to indicate include boundaries\n- Output is just expanded text with blank lines preserving line numbers\n\n**Result:** When course.module includes float_io.occ (a large library), the expanded output combines all lines. Line numbers reflect the expanded output only, not the original sources.\n\n---\n\n### **2. LEXER (lexer/token.go, lexer/lexer.go) — ONLY LINE & COLUMN, NO FILE**\n\n**Token Structure (token.go, lines 249-254):**\n```go\ntype Token struct {\n\tType TokenType\n\tLiteral string\n\tLine int // ← LINE ONLY\n\tColumn int // ← COLUMN ONLY\n\t// NO FILE FIELD\n}\n```\n\n**Lexer Tracking (lexer.go):**\n- **Line tracking** (lines 12, 106, 225-226, 244):\n ```go\n type Lexer struct {\n ...\n line int // ← Incremented on '\\n' (line 225)\n column int // ← Reset on '\\n' (line 226)\n ...\n }\n ```\n\n- **Token creation** (line 88, 106-107, 244):\n ```go\n tok := Token{Type: INDENT, Literal: \"\", Line: l.line, Column: 1}\n tok.Line = l.line\n tok.Column = l.column\n tok = Token{Type: NEWLINE, Literal: \"\\\\n\", Line: l.line, Column: l.column}\n ```\n\n**Critical Issue:**\n- Tokens only capture `Line` and `Column`, no `Filename` field\n- Lexer has no concept of which source file it's processing\n- **Line numbers are absolute in the expanded source**, not relative to original files\n\n---\n\n### **3. PARSER (parser/parser.go) — ERROR REPORTING USES TOKEN LINE ONLY**\n\n**Error Collection (lines 52-96):**\n```go\ntype Parser struct {\n\tl *lexer.Lexer\n\terrors []string // Stores error messages\n\t...\n\tcurToken lexer.Token // Contains only Line/Column\n\tpeekToken lexer.Token // Contains only Line/Column\n}\n\nfunc (p *Parser) Errors() []string {\n\treturn p.errors\n}\n\nfunc (p *Parser) addError(msg string) {\n\tp.errors = append(p.errors, fmt.Sprintf(\"line %d: %s\", p.curToken.Line, msg))\n\t// ^^^^^^^^^^^^^^^^\n\t// Uses token's Line field directly\n}\n```\n\n**Error Reporting Pattern:**\n- **Line 95** — `fmt.Sprintf(\"line %d: %s\", p.curToken.Line, msg)`\n- **Line 123** — `p.addError(fmt.Sprintf(\"expected %s, got %s\", t, p.peekToken.Type))`\n- Used throughout parser (40+ locations) for all syntax errors\n\n**All Parser Errors Include Only:**\n1. **Line number** (from `p.curToken.Line`)\n2. **Error message** (describing what went wrong)\n\n**No File Information** — Even if available, parser has no way to include it.\n\n---\n\n### **4. AST (ast/ast.go) — NODES CARRY TOKENS WITH LIMITED INFO**\n\n**AST Node Structure (examples):**\n```go\n// Lines 36-44: VarDecl\ntype VarDecl struct {\n\tToken lexer.Token // ← Stores the type token (INT, BYTE, etc.)\n\tType string // ← Type name as string\n\tNames []string\n}\n\n// Lines 86-95: SeqBlock\ntype SeqBlock struct {\n\tToken lexer.Token // ← The SEQ token\n\tStatements []Statement\n\tReplicator *Replicator\n}\n```\n\n**AST carries the original Token from the lexer, but:**\n- That Token only has `Type`, `Literal`, `Line`, and `Column`\n- **No file information**\n- When codegen accesses AST nodes, it can only use token position info\n\n---\n\n### **5. MAIN.GO — ERROR DISPLAY TO USER**\n\n**Pipeline (lines 74-104):**\n```go\n// Line 79: Preprocess\nexpanded, err := pp.ProcessFile(inputFile)\nif err != nil {\n\tfmt.Fprintf(os.Stderr, \"Preprocessor error: %s\\n\", err)\n\tos.Exit(1)\n}\n\n// Line 92: Lex (no filename passed)\nl := lexer.New(expanded)\n\n// Line 95: Parse\np := parser.New(l)\nprogram := p.ParseProgram()\n\n// Lines 98-104: Print parser errors\nif len(p.Errors()) > 0 {\n\tfmt.Fprintf(os.Stderr, \"Parse errors:\\n\")\n\tfor _, err := range p.Errors() {\n\t\tfmt.Fprintf(os.Stderr, \" %s\\n\", err) // Just prints the error string\n\t}\n\tos.Exit(1)\n}\n```\n\n**Error output format:**\n```\nParse errors:\n line 4335: expected :=, !, or ? after x[...], got NEWLINE\n```\n\n**Problem:** User sees \"line 4335\" in the expanded source, not \"float_io.occ, line 42\" in the original file.\n\n---\n\n## Summary: The Root Cause\n\n| Component | Information Available | Information Missing |\n|-----------|----------------------|----------------------|\n| **Preprocessor** | Input filenames, line-by-line expansion | **Source mapping** (which file each output line came from) |\n| **Lexer** | Line and column numbers (in expanded source) | **Filename field in Token struct** |\n| **Parser** | Line numbers from tokens | **File information** to pair with line numbers |\n| **AST** | Token with line/column | **File information in AST nodes** |\n| **Main.go** | Preprocessed/lexed/parsed data | **Context to map expanded lines back to original files** |\n\n## The Problem in Action\n\nWhen transpiling `course.module` which `#INCLUDE`s `float_io.occ`:\n\n1. **Preprocessor** expands both files into one string: `course_expanded.txt`\n2. **Lexer** numbers lines 1, 2, 3, ... in this expanded string\n3. **Parser** reports error: `\"line 4335: ...\"`\n4. **User sees** line 4335, which is actually in `float_io.occ` line 42\n5. **User confused** — the main file is small, but error reports huge line numbers\n\n---\n\n## What Would Be Needed to Fix This\n\n1. **Add filename field to Token**: `Filename string` in `lexer/token.go` line 249\n2. **Preprocess with source mapping**: Track `#INCLUDE` boundaries in preprocessor output (e.g., with markers or a separate mapping)\n3. **Pass filename through lexer**: Lexer needs to know which file it's currently processing\n4. **Update AST**: Optionally add filename to AST nodes\n5. **Format errors with filename**: `fmt.Sprintf(\"%s:%d: %s\", filename, line, msg)` in parser error reporting\n6. **Update main.go**: Resolve the filename mapping when displaying errors to users"
},
{
"type": "text",
"text": "agentId: aab9a7debfb9b09eb (for resuming to continue this agent's work if needed)\n<usage>total_tokens: 50088\ntool_uses: 13\nduration_ms: 77616</usage>"
}
]Now I have a comprehensive understanding of the problem. Let me plan the approach.
The key challenge is that the preprocessor flattens all included files into one big string, and the lexer/parser only track line numbers in that flattened string.
There are a few approaches:
Approach A: Source map in preprocessor
- The preprocessor emits a source map (array of {filename, originalLine} indexed by expanded line number)
- Pass this to the lexer, which stamps each token with the original filename and line
- Parser errors then show file:line
Approach B: Line markers in expanded source
- Insert special marker comments/lines like #line 1 "filename" (C-style) in the preprocessor output
Approach C: Post-hoc mapping - Keep a source map from the preprocessor - When displaying errors in main.go, map the expanded line numbers back to original file:line - Doesn't require lexer/parser/AST changes
Approach C seems cleanest since the preprocessor already processes lines sequentially, so building a mapping is straightforward, and main.go can just translate line numbers when displaying errors without touching the lexer or parser. The preprocessor could return both the expanded string and a source map that tracks which original file each line came from.
For the implementation, I'd define a struct to hold the original filename and line number, then build a slice indexed by output line number during preprocessing. In main.go, I'd retrieve this map and use it to translate error locations back to their source before displaying them to the user.
I'm realizing that modifying ProcessFile()'s return signature would break existing callers, so storing the source map as a field on the Preprocessor struct with a getter method is cleaner. I could also add a Filename field to the Token struct for a more comprehensive solution, but that's more invasive than needed right now.
Adding the source map to the Preprocessor struct...
Now I need to think through how the recursive include processing works. Since processSource() calls itself recursively through resolveAndInclude(), the source map needs to accumulate across all those nested calls. Each line of output needs to track which file it originated from and what the original line number was. I see the key insight now—each newline or string written to the output represents a line that needs a corresponding source location entry. The simplest approach is to have processSource accept a filename parameter and append a SourceLoc entry to the map for each line it outputs, whether that's actual source code or a blank placeholder for skipped directives, with included files building their own map entries through recursive calls.
The challenge is that processSource currently only receives the content and base directory, not the filename itself. I need to trace through the call chain: ProcessFile reads the file and calls processSource with just the content and directory, then processSource handles includes by calling resolveAndInclude, which calls back to ProcessFile for the resolved path. So I need to thread the filename through processSource as an additional parameter, then add a SourceLoc struct to track file and line number, and maintain a sourceMap slice on the Preprocessor to build up the mapping as we process each line.
For included content that spans multiple lines, the recursive ProcessFile call will already populate the map entries for those lines, so I just need to track which output line corresponds to which source location as we write content to the output buffer.
The tricky part is handling the #IF/#ELSE/#ENDIF blocks—when lines are skipped, blank lines get emitted to preserve output line numbers, but I need to figure out whether those blank lines should map back to the skipped source lines or somewhere else.
Actually, let me step back. The core mechanism is straightforward: for each line of output, I record which source file and line number it came from. The map is indexed by output line number, and when processing includes, the recursive calls fill in their portions. Then given any expanded line number, I can trace it back to the original source.
The newline handling works by not prefixing the first line with a newline, then adding one before each subsequent line. the included content brings its own newlines, so when I insert it at a particular line, those internal newlines get preserved in the output. The leading newline before the include acts as a separator between the previous line and the included content, which means the source map entries line up correctly with the actual output lines.
Now I'm thinking through how to track source locations during preprocessing. I could maintain a counter that increments as we write content, but that feels fragile. Or I could return source location metadata alongside the expanded string and let the caller assemble the map afterward. The marker comment approach would work but it's inelegant. I think the cleanest solution is to have processSource return both the expanded source and a list of source location entries, then build the final map from those entries once preprocessing completes. I'm going with the struct-based approach where the Preprocessor holds a sourceMap slice that gets populated as we process each line. I need to modify processSource to accept a filename parameter and append SourceLoc entries whenever we produce output, being careful to track which source file each output line came from, especially when handling recursive #INCLUDE directives.
Still writing the preprocessor logic... Writing the preprocessor logic...
Actually, I'm noticing a potential issue with how newlines are being handled in the skipped blocks—the code might be writing duplicate newlines since it adds one as a separator and then another for the blank line itself.
Let me read through the actual preprocessor file to understand the exact logic and verify whether this is really a problem.
Good, I have a thorough understanding of the problem. Let me read the key files to understand the exact implementation details.
1→// Package preproc implements a textual preprocessor for occam source files.
2→// It handles #IF/#ELSE/#ENDIF conditional compilation, #DEFINE symbols,
3→// #INCLUDE file inclusion, and ignores #COMMENT/#PRAGMA/#USE directives.
4→// The output is a single expanded string suitable for feeding into the lexer.
5→package preproc
6→
7→import (
8→ "fmt"
9→ "os"
10→ "path/filepath"
11→ "strings"
12→)
13→
14→// Option configures a Preprocessor.
15→type Option func(*Preprocessor)
16→
17→// WithIncludePaths sets the search paths for #INCLUDE resolution.
18→func WithIncludePaths(paths []string) Option {
19→ return func(pp *Preprocessor) {
20→ pp.includePaths = paths
21→ }
22→}
23→
24→// WithDefines sets predefined symbols.
25→func WithDefines(defs map[string]string) Option {
26→ return func(pp *Preprocessor) {
27→ for k, v := range defs {
28→ pp.defines[k] = v
29→ }
30→ }
31→}
32→
33→// Preprocessor performs textual preprocessing of occam source.
34→type Preprocessor struct {
35→ defines map[string]string
36→ includePaths []string
37→ errors []string
38→ processing map[string]bool // absolute paths currently being processed (circular include detection)
39→ included map[string]bool // absolute paths already included (prevent duplicate inclusion)
40→}
41→
42→// New creates a new Preprocessor with the given options.
43→func New(opts ...Option) *Preprocessor {
44→ pp := &Preprocessor{
45→ defines: map[string]string{},
46→ processing: map[string]bool{},
47→ included: map[string]bool{},
48→ }
49→ // Predefined symbols
50→ pp.defines["TARGET.BITS.PER.WORD"] = "64"
51→
52→ for _, opt := range opts {
53→ opt(pp)
54→ }
55→ return pp
56→}
57→
58→// Errors returns any errors accumulated during processing.
59→func (pp *Preprocessor) Errors() []string {
60→ return pp.errors
61→}
62→
63→// ProcessFile reads and processes a file, resolving #INCLUDE directives.
64→func (pp *Preprocessor) ProcessFile(filename string) (string, error) {
65→ absPath, err := filepath.Abs(filename)
66→ if err != nil {
67→ return "", fmt.Errorf("cannot resolve path %q: %w", filename, err)
68→ }
69→
70→ if pp.processing[absPath] {
71→ return "", fmt.Errorf("circular include detected: %s", filename)
72→ }
73→ pp.processing[absPath] = true
74→ defer delete(pp.processing, absPath)
75→
76→ data, err := os.ReadFile(absPath)
77→ if err != nil {
78→ return "", fmt.Errorf("cannot read %q: %w", filename, err)
79→ }
80→
81→ return pp.processSource(string(data), filepath.Dir(absPath))
82→}
83→
84→// ProcessSource processes occam source text with no file context.
85→// #INCLUDE directives will only resolve against includePaths.
86→func (pp *Preprocessor) ProcessSource(source string) (string, error) {
87→ return pp.processSource(source, "")
88→}
89→
90→// processSource performs line-by-line preprocessing.
91→// baseDir is the directory of the current file (for relative #INCLUDE resolution).
92→func (pp *Preprocessor) processSource(source string, baseDir string) (string, error) {
93→ lines := strings.Split(source, "\n")
94→ var out strings.Builder
95→ var condStack []condState
96→
97→ for i, line := range lines {
98→ if i > 0 {
99→ out.WriteByte('\n')
100→ }
101→
102→ trimmed := strings.TrimSpace(line)
103→
104→ if strings.HasPrefix(trimmed, "#") {
105→ directive, rest := parseDirectiveLine(trimmed)
106→
107→ switch directive {
108→ case "DEFINE":
109→ if isActive(condStack) {
110→ sym := strings.TrimSpace(rest)
111→ if sym != "" {
112→ pp.defines[sym] = ""
113→ }
114→ }
115→ out.WriteString("") // blank line preserves line numbers
116→
117→ case "IF":
118→ val := pp.evalExpr(rest)
119→ condStack = append(condStack, condState{active: val, seenTrue: val})
120→ out.WriteString("")
121→
122→ case "ELSE":
123→ if len(condStack) == 0 {
124→ pp.errors = append(pp.errors, fmt.Sprintf("line %d: #ELSE without matching #IF", i+1))
125→ } else {
126→ top := &condStack[len(condStack)-1]
127→ if top.seenTrue {
128→ top.active = false
129→ } else {
130→ top.active = true
131→ top.seenTrue = true
132→ }
133→ }
134→ out.WriteString("")
135→
136→ case "ENDIF":
137→ if len(condStack) == 0 {
138→ pp.errors = append(pp.errors, fmt.Sprintf("line %d: #ENDIF without matching #IF", i+1))
139→ } else {
140→ condStack = condStack[:len(condStack)-1]
141→ }
142→ out.WriteString("")
143→
144→ case "INCLUDE":
145→ if isActive(condStack) {
146→ included, err := pp.resolveAndInclude(rest, baseDir)
147→ if err != nil {
148→ return "", fmt.Errorf("line %d: %w", i+1, err)
149→ }
150→ out.WriteString(included)
151→ } else {
152→ out.WriteString("")
153→ }
154→
155→ case "COMMENT", "PRAGMA", "USE":
156→ out.WriteString("") // no-op, blank line
157→
158→ default:
159→ // Unknown directive — pass through if active
160→ if isActive(condStack) {
161→ out.WriteString(line)
162→ } else {
163→ out.WriteString("")
164→ }
165→ }
166→ } else {
167→ if isActive(condStack) {
168→ out.WriteString(line)
169→ } else {
170→ out.WriteString("") // blank line preserves line numbers
171→ }
172→ }
173→ }
174→
175→ if len(condStack) > 0 {
176→ pp.errors = append(pp.errors, fmt.Sprintf("unterminated #IF (missing %d #ENDIF)", len(condStack)))
177→ }
178→
179→ return out.String(), nil
180→}
181→
182→// condState tracks one level of #IF/#ELSE nesting.
183→type condState struct {
184→ active bool // currently emitting lines?
185→ seenTrue bool // has any branch been true?
186→}
187→
188→// isActive returns true if all condition stack levels are active.
189→func isActive(stack []condState) bool {
190→ for _, s := range stack {
191→ if !s.active {
192→ return false
193→ }
194→ }
195→ return true
196→}
197→
198→// parseDirectiveLine splits "#DIRECTIVE rest" into (directive, rest).
199→func parseDirectiveLine(trimmed string) (string, string) {
200→ // trimmed starts with "#"
201→ s := trimmed[1:] // skip '#'
202→ s = strings.TrimSpace(s)
203→
204→ idx := strings.IndexAny(s, " \t")
205→ if idx == -1 {
206→ return strings.ToUpper(s), ""
207→ }
208→ return strings.ToUpper(s[:idx]), strings.TrimSpace(s[idx+1:])
209→}
210→
211→// resolveAndInclude resolves an #INCLUDE filename and processes the included file.
212→func (pp *Preprocessor) resolveAndInclude(rest string, baseDir string) (string, error) {
213→ filename := stripQuotes(rest)
214→ if filename == "" {
215→ return "", fmt.Errorf("#INCLUDE with empty filename")
216→ }
217→
218→ // Try to find the file
219→ resolved := pp.resolveIncludePath(filename, baseDir)
220→ if resolved == "" {
221→ return "", fmt.Errorf("cannot find included file %q", filename)
222→ }
223→
224→ // Skip files that have already been included (prevent duplicate definitions)
225→ absPath, err := filepath.Abs(resolved)
226→ if err == nil && pp.included[absPath] {
227→ return "", nil
228→ }
229→ if err == nil {
230→ pp.included[absPath] = true
231→ }
232→
233→ return pp.ProcessFile(resolved)
234→}
235→
236→// resolveIncludePath searches for a file: first relative to baseDir, then in includePaths.
237→func (pp *Preprocessor) resolveIncludePath(filename string, baseDir string) string {
238→ // First: relative to current file's directory
239→ if baseDir != "" {
240→ candidate := filepath.Join(baseDir, filename)
241→ if _, err := os.Stat(candidate); err == nil {
242→ return candidate
243→ }
244→ }
245→
246→ // Then: each include path
247→ for _, dir := range pp.includePaths {
248→ candidate := filepath.Join(dir, filename)
249→ if _, err := os.Stat(candidate); err == nil {
250→ return candidate
251→ }
252→ }
253→
254→ return ""
255→}
256→
257→// stripQuotes removes surrounding double quotes from a string.
258→func stripQuotes(s string) string {
259→ s = strings.TrimSpace(s)
260→ if len(s) >= 2 && s[0] == '"' && s[len(s)-1] == '"' {
261→ return s[1 : len(s)-1]
262→ }
263→ return s
264→}
265→
266→// evalExpr evaluates a preprocessor conditional expression.
267→// Supports: TRUE, FALSE, DEFINED (SYMBOL), NOT (expr), (SYMBOL = value)
268→func (pp *Preprocessor) evalExpr(expr string) bool {
269→ expr = strings.TrimSpace(expr)
270→ if expr == "" {
271→ return false
272→ }
273→
274→ // TRUE / FALSE
275→ if expr == "TRUE" {
276→ return true
277→ }
278→ if expr == "FALSE" {
279→ return false
280→ }
281→
282→ // NOT (expr) or NOT DEFINED (...)
283→ if strings.HasPrefix(expr, "NOT ") || strings.HasPrefix(expr, "NOT(") {
284→ inner := strings.TrimPrefix(expr, "NOT")
285→ inner = strings.TrimSpace(inner)
286→ return !pp.evalExpr(inner)
287→ }
288→
289→ // DEFINED (SYMBOL)
290→ if strings.HasPrefix(expr, "DEFINED") {
291→ inner := strings.TrimPrefix(expr, "DEFINED")
292→ inner = strings.TrimSpace(inner)
293→ sym := stripParens(inner)
294→ _, ok := pp.defines[sym]
295→ return ok
296→ }
297→
298→ // Parenthesized expression
299→ if strings.HasPrefix(expr, "(") && strings.HasSuffix(expr, ")") {
300→ inner := expr[1 : len(expr)-1]
301→ inner = strings.TrimSpace(inner)
302→
303→ // Check for equality: SYMBOL = value
304→ if eqIdx := strings.Index(inner, "="); eqIdx >= 0 {
305→ lhs := strings.TrimSpace(inner[:eqIdx])
306→ rhs := strings.TrimSpace(inner[eqIdx+1:])
307→ lhsVal, ok := pp.defines[lhs]
308→ if !ok {
309→ return false
310→ }
311→ return lhsVal == rhs
312→ }
313→
314→ // Otherwise recurse
315→ return pp.evalExpr(inner)
316→ }
317→
318→ // Bare symbol — treat as DEFINED
319→ _, ok := pp.defines[expr]
320→ return ok
321→}
322→
323→// stripParens removes surrounding parentheses and whitespace.
324→func stripParens(s string) string {
325→ s = strings.TrimSpace(s)
326→ if len(s) >= 2 && s[0] == '(' && s[len(s)-1] == ')' {
327→ return strings.TrimSpace(s[1 : len(s)-1])
328→ }
329→ return s
330→}
331→ 1→package main
2→
3→import (
4→ "flag"
5→ "fmt"
6→ "os"
7→ "strings"
8→
9→ "github.com/codeassociates/occam2go/codegen"
10→ "github.com/codeassociates/occam2go/lexer"
11→ "github.com/codeassociates/occam2go/modgen"
12→ "github.com/codeassociates/occam2go/parser"
13→ "github.com/codeassociates/occam2go/preproc"
14→)
15→
16→const version = "0.1.0"
17→
18→// multiFlag allows a flag to be specified multiple times (e.g. -I path1 -I path2).
19→type multiFlag []string
20→
21→func (f *multiFlag) String() string { return strings.Join(*f, ", ") }
22→func (f *multiFlag) Set(value string) error {
23→ *f = append(*f, value)
24→ return nil
25→}
26→
27→func main() {
28→ // Check for subcommand before parsing flags
29→ if len(os.Args) >= 2 && os.Args[1] == "gen-module" {
30→ genModuleCmd(os.Args[2:])
31→ return
32→ }
33→
34→ showVersion := flag.Bool("version", false, "Print version and exit")
35→ outputFile := flag.String("o", "", "Output file (default: stdout)")
36→ var includePaths multiFlag
37→ flag.Var(&includePaths, "I", "Include search path (repeatable)")
38→ var defines multiFlag
39→ flag.Var(&defines, "D", "Predefined symbol (repeatable)")
40→
41→ flag.Usage = func() {
42→ fmt.Fprintf(os.Stderr, "occam2go - An Occam to Go transpiler\n\n")
43→ fmt.Fprintf(os.Stderr, "Usage: %s [options] <input.occ>\n", os.Args[0])
44→ fmt.Fprintf(os.Stderr, " %s gen-module [-o output] <SConscript>\n\n", os.Args[0])
45→ fmt.Fprintf(os.Stderr, "Options:\n")
46→ flag.PrintDefaults()
47→ }
48→
49→ flag.Parse()
50→
51→ if *showVersion {
52→ fmt.Printf("occam2go version %s\n", version)
53→ os.Exit(0)
54→ }
55→
56→ args := flag.Args()
57→ if len(args) < 1 {
58→ flag.Usage()
59→ os.Exit(1)
60→ }
61→
62→ inputFile := args[0]
63→
64→ // Build defines map
65→ defs := map[string]string{}
66→ for _, d := range defines {
67→ if idx := strings.Index(d, "="); idx >= 0 {
68→ defs[d[:idx]] = d[idx+1:]
69→ } else {
70→ defs[d] = ""
71→ }
72→ }
73→
74→ // Preprocess
75→ pp := preproc.New(
76→ preproc.WithIncludePaths(includePaths),
77→ preproc.WithDefines(defs),
78→ )
79→ expanded, err := pp.ProcessFile(inputFile)
80→ if err != nil {
81→ fmt.Fprintf(os.Stderr, "Preprocessor error: %s\n", err)
82→ os.Exit(1)
83→ }
84→ if len(pp.Errors()) > 0 {
85→ fmt.Fprintf(os.Stderr, "Preprocessor warnings:\n")
86→ for _, e := range pp.Errors() {
87→ fmt.Fprintf(os.Stderr, " %s\n", e)
88→ }
89→ }
90→
91→ // Lex
92→ l := lexer.New(expanded)
93→
94→ // Parse
95→ p := parser.New(l)
96→ program := p.ParseProgram()
97→
98→ if len(p.Errors()) > 0 {
99→ fmt.Fprintf(os.Stderr, "Parse errors:\n")
100→ for _, err := range p.Errors() {
101→ fmt.Fprintf(os.Stderr, " %s\n", err)
102→ }
103→ os.Exit(1)
104→ }
105→
106→ // Generate Go code
107→ gen := codegen.New()
108→ output := gen.Generate(program)
109→
110→ // Write output
111→ if *outputFile != "" {
112→ err := os.WriteFile(*outputFile, []byte(output), 0644)
113→ if err != nil {
114→ fmt.Fprintf(os.Stderr, "Error writing file: %s\n", err)
115→ os.Exit(1)
116→ }
117→ } else {
118→ fmt.Print(output)
119→ }
120→}
121→
122→func genModuleCmd(args []string) {
123→ fs := flag.NewFlagSet("gen-module", flag.ExitOnError)
124→ outputFile := fs.String("o", "", "Output file (default: stdout)")
125→ moduleName := fs.String("name", "", "Module guard name (default: derived from library name)")
126→ fs.Parse(args)
127→
128→ if fs.NArg() < 1 {
129→ fmt.Fprintf(os.Stderr, "Usage: occam2go gen-module [-o output] [-name GUARD] <SConscript>\n")
130→ os.Exit(1)
131→ }
132→
133→ sconscriptFile := fs.Arg(0)
134→ data, err := os.ReadFile(sconscriptFile)
135→ if err != nil {
136→ fmt.Fprintf(os.Stderr, "Error reading SConscript: %s\n", err)
137→ os.Exit(1)
138→ }
139→
140→ libs := modgen.ParseSConscript(string(data))
141→ if len(libs) == 0 {
142→ fmt.Fprintf(os.Stderr, "No OccamLibrary found in %s\n", sconscriptFile)
143→ os.Exit(1)
144→ }
145→
146→ // Use first library by default
147→ lib := libs[0]
148→
149→ // Derive module name from library name if not specified
150→ guard := *moduleName
151→ if guard == "" {
152→ // course.lib → COURSE.MODULE
153→ name := lib.Name
154→ if idx := strings.LastIndex(name, "."); idx >= 0 {
155→ name = name[:idx]
156→ }
157→ guard = strings.ToUpper(name) + ".MODULE"
158→ }
159→
160→ output := modgen.GenerateModule(lib, guard)
161→
162→ if *outputFile != "" {
163→ err := os.WriteFile(*outputFile, []byte(output), 0644)
164→ if err != nil {
165→ fmt.Fprintf(os.Stderr, "Error writing file: %s\n", err)
166→ os.Exit(1)
167→ }
168→ } else {
169→ fmt.Print(output)
170→ }
171→}
172→{
"file_path": "/home/david/projects/code-associates/occam2go/lexer/token.go",
"offset": 245,
"limit": 15
} 245→ }
246→ return IDENT
247→}
248→
249→type Token struct {
250→ Type TokenType
251→ Literal string
252→ Line int
253→ Column int
254→}
255→Now let me also check the parser's error handling to understand the format of error messages.
{
"pattern": "addError|p\\.errors = append",
"path": "/home/david/projects/code-associates/occam2go/parser/parser.go",
"output_mode": "content",
"-n": true
}94:func (p *Parser) addError(msg string) {
95: p.errors = append(p.errors, fmt.Sprintf("line %d: %s", p.curToken.Line, msg))
123: p.addError(fmt.Sprintf("expected %s, got %s", t, p.peekToken.Type))
229: p.addError(fmt.Sprintf("unexpected token: %s", p.curToken.Type))
370: p.addError(fmt.Sprintf("expected type after VAL, got %s", p.curToken.Type))
436: p.addError(fmt.Sprintf("expected type after INITIAL, got %s", p.curToken.Type))
570: p.addError(fmt.Sprintf("expected type after CHAN, got %s", p.curToken.Type))
603: p.addError(fmt.Sprintf("expected type after array size, got %s", p.curToken.Type))
781: p.addError(fmt.Sprintf("expected :=, !, or ? after %s[...], got %s", name, p.peekToken.Type))
818: p.addError(fmt.Sprintf("expected type after CHAN, got %s", p.curToken.Type))
874: p.addError("expected IS or CASE in protocol declaration")
930: p.addError(fmt.Sprintf("expected type name in protocol, got %s", p.curToken.Type))
945: p.addError("expected indented block after CASE in protocol")
981: p.addError(fmt.Sprintf("expected variant tag name, got %s", p.curToken.Type))
1027: p.addError("expected indented block after RECORD declaration")
1068: p.addError(fmt.Sprintf("expected type in record field, got %s", p.curToken.Type))
1281: p.addError("expected indented block after ? CASE")
1324: p.addError(fmt.Sprintf("expected variant tag name, got %s", p.curToken.Type))
1383: p.addError("expected indented block after ? CASE")
1421: p.addError(fmt.Sprintf("expected variant tag name, got %s", p.curToken.Type))
1476: p.addError("unexpected identifier after SEQ")
1488: p.addError("expected indented block after SEQ")
1511: p.addError("unexpected identifier after PAR")
1523: p.addError("expected indented block after PAR")
1578: p.addError("expected indented block after ALT")
1649: p.addError(fmt.Sprintf("expected channel name or guard in ALT case, got %s", p.curToken.Type))
1703: p.addError("expected & after guard in ALT case")
1711: p.addError(fmt.Sprintf("expected channel name after guard, got %s", p.curToken.Type))
1742: p.addError("expected indented body after ALT case")
1888: p.addError("expected indented body after PROC declaration")
2027: p.addError(fmt.Sprintf("expected type after []CHAN, got %s", p.curToken.Type))
2040: p.addError(fmt.Sprintf("expected type after [], got %s", p.curToken.Type))
2047: p.addError(fmt.Sprintf("expected array size, got %s", p.curToken.Type))
2060: p.addError(fmt.Sprintf("expected type after [%s], got %s", param.ArraySize, p.curToken.Type))
2075: p.addError(fmt.Sprintf("expected type after CHAN, got %s", p.curToken.Type))
2086: p.addError(fmt.Sprintf("expected type in parameter, got %s", p.curToken.Type))
2095: p.addError(fmt.Sprintf("expected parameter name, got %s", p.curToken.Type))
2203: p.addError("expected indented body after FUNCTION declaration")
2248: p.addError(fmt.Sprintf("expected VALOF or IS in function body, got %s", p.curToken.Type))
2258: p.addError("expected indented block after VALOF")
2429: p.addError("expected indented block after WHILE condition")
2449: p.addError("unexpected identifier after IF")
2461: p.addError("expected indented block after IF")
2546: p.addError("expected indented block after CASE")
2637: p.addError(fmt.Sprintf("could not parse %q as integer", p.curToken.Literal))
2650: p.addError(err.Error())
2754: p.addError(fmt.Sprintf("expected type after %s, got %s", token.Literal, p.peekToken.Type))
2772: p.addError(fmt.Sprintf("unexpected token in expression: %s", p.curToken.Type))Now I have a clear picture. Let me also look at how preproc_test.go looks to understand the testing patterns.
{
"file_path": "/home/david/projects/code-associates/occam2go/preproc/preproc_test.go",
"limit": 50
} 1→package preproc
2→
3→import (
4→ "os"
5→ "path/filepath"
6→ "strings"
7→ "testing"
8→)
9→
10→func TestDefineAndIfDefined(t *testing.T) {
11→ pp := New()
12→ src := `#DEFINE FOO
13→#IF DEFINED (FOO)
14→hello
15→#ENDIF
16→`
17→ out, err := pp.ProcessSource(src)
18→ if err != nil {
19→ t.Fatal(err)
20→ }
21→ lines := strings.Split(out, "\n")
22→ if strings.TrimSpace(lines[2]) != "hello" {
23→ t.Errorf("expected 'hello' on line 3, got %q", lines[2])
24→ }
25→}
26→
27→func TestIfFalseExcludes(t *testing.T) {
28→ pp := New()
29→ src := `#IF FALSE
30→visible
31→#ENDIF
32→`
33→ out, err := pp.ProcessSource(src)
34→ if err != nil {
35→ t.Fatal(err)
36→ }
37→ if strings.Contains(out, "visible") {
38→ t.Error("expected #IF FALSE to exclude content")
39→ }
40→}
41→
42→func TestIfTrue(t *testing.T) {
43→ pp := New()
44→ src := `#IF TRUE
45→visible
46→#ENDIF
47→`
48→ out, err := pp.ProcessSource(src)
49→ if err != nil {
50→ t.Fatal(err)# Plan: Add source file tracking to parser error messages (Issue #41)
## Context
Parser error messages currently show line numbers relative to the preprocessed (expanded) source. When `#INCLUDE` pulls in a large library like course.module, errors say "line 4335" even though the user's source file is small. This makes errors very hard to diagnose. We need to map expanded line numbers back to original file:line pairs.
## Approach
Build a **source map** in the preprocessor during expansion, then use it in `main.go` to translate error line numbers before display. This is the least invasive approach — no changes to lexer, parser, or AST.
## Changes
### 1. `preproc/preproc.go` — Build source map during expansion
- Add `SourceLoc` struct: `{File string, Line int}`
- Add `sourceMap []SourceLoc` field to `Preprocessor`
- Add `SourceMap() []SourceLoc` getter method
- Pass filename through `processSource` (new parameter) and `ProcessFile`
- `ProcessSource` passes `"<input>"` as the filename (no file context)
- For each output line produced, append a `SourceLoc{filename, originalLine}` entry
- For `#INCLUDE` lines: the recursive `ProcessFile` → `processSource` call populates entries for included content; the current line does NOT get an entry
- For directives that emit blank lines (`#DEFINE`, `#IF`, `#ELSE`, `#ENDIF`, `#COMMENT`, etc.): append `{filename, lineNum}`
- For skipped lines (inactive condStack): append `{filename, lineNum}`
- For regular source lines: append `{filename, lineNum}`
- Source map is 0-indexed (entry 0 = expanded line 1)
### 2. `main.go` — Translate error line numbers using source map
- After parsing, if there are errors, get `pp.SourceMap()`
- For each error string matching `"line NNN: ..."`, look up NNN in the source map
- Replace with `"filename:line: ..."` format (standard compiler error format)
- If the file is the top-level input file, can show just `"filename:line:"` or the full path
- If lookup fails (out of bounds), fall through to original message
### 3. `preproc/preproc_test.go` — Test source map
- Test that `ProcessFile` with `#INCLUDE` produces correct source map entries
- Test that non-included file maps lines 1:1
- Test that `ProcessSource` uses `"<input>"` as filename
## Files to modify
- `preproc/preproc.go` — add SourceLoc, sourceMap field, pass filename, build map
- `main.go` — translate error messages using source map
- `preproc/preproc_test.go` — add source map tests
## Verification
```bash
go test ./... # all tests pass
# Manual test with course module:
./occam2go -I kroc/modules/course/libsrc -D TARGET.BITS.PER.WORD=32 -o test.go kroc/modules/course/examples/test_utils.occ
# Error messages should now show filename:line instead of just line NNN
```
{
"plan": "# Plan: Add source file tracking to parser error messages (Issue #41)\n\n## Context\n\nParser error messages currently show line numbers relative to the preprocessed (expanded) source. When `#INCLUDE` pulls in a large library like course.module, errors say \"line 4335\" even though the user's source file is small. This makes errors very hard to diagnose. We need to map expanded line numbers back to original file:line pairs.\n\n## Approach\n\nBuild a **source map** in the preprocessor during expansion, then use it in `main.go` to translate error line numbers before display. This is the least invasive approach — no changes to lexer, parser, or AST.\n\n## Changes\n\n### 1. `preproc/preproc.go` — Build source map during expansion\n\n- Add `SourceLoc` struct: `{File string, Line int}`\n- Add `sourceMap []SourceLoc` field to `Preprocessor`\n- Add `SourceMap() []SourceLoc` getter method\n- Pass filename through `processSource` (new parameter) and `ProcessFile`\n- `ProcessSource` passes `\"<input>\"` as the filename (no file context)\n- For each output line produced, append a `SourceLoc{filename, originalLine}` entry\n- For `#INCLUDE` lines: the recursive `ProcessFile` → `processSource` call populates entries for included content; the current line does NOT get an entry\n- For directives that emit blank lines (`#DEFINE`, `#IF`, `#ELSE`, `#ENDIF`, `#COMMENT`, etc.): append `{filename, lineNum}`\n- For skipped lines (inactive condStack): append `{filename, lineNum}`\n- For regular source lines: append `{filename, lineNum}`\n- Source map is 0-indexed (entry 0 = expanded line 1)\n\n### 2. `main.go` — Translate error line numbers using source map\n\n- After parsing, if there are errors, get `pp.SourceMap()`\n- For each error string matching `\"line NNN: ...\"`, look up NNN in the source map\n- Replace with `\"filename:line: ...\"` format (standard compiler error format)\n- If the file is the top-level input file, can show just `\"filename:line:\"` or the full path\n- If lookup fails (out of bounds), fall through to original message\n\n### 3. `preproc/preproc_test.go` — Test source map\n\n- Test that `ProcessFile` with `#INCLUDE` produces correct source map entries\n- Test that non-included file maps lines 1:1\n- Test that `ProcessSource` uses `\"<input>\"` as filename\n\n## Files to modify\n\n- `preproc/preproc.go` — add SourceLoc, sourceMap field, pass filename, build map\n- `main.go` — translate error messages using source map\n- `preproc/preproc_test.go` — add source map tests\n\n## Verification\n\n```bash\ngo test ./... # all tests pass\n# Manual test with course module:\n./occam2go -I kroc/modules/course/libsrc -D TARGET.BITS.PER.WORD=32 -o test.go kroc/modules/course/examples/test_utils.occ\n# Error messages should now show filename:line instead of just line NNN\n```\n"
}The user doesn't want to proceed with this tool use. The tool use was rejected (eg. if it was a file edit, the new_string was NOT written to the file). STOP what you are doing and wait for the user to tell you how to proceed.