title: Support dotted protocol tags (e.g., bar.data)
state: OPEN
author: associate-1
labels:
comments: 0
assignees:
projects:
milestone:
number: 68
--
**Difficulty: Easy** — Allow dotted identifiers in protocol tag positions. `goIdent()` already converts dots to underscores, so codegen should need minimal changes.
## Summary
Occam variant protocol tags can contain dots (e.g., `bar.data`, `bar.terminate`, `bar.blank`). Currently the parser treats the dot as a field access operator rather than part of the tag name.
## Occam syntax
```occam
PROTOCOL BAR.PROTO
CASE
bar.data ; INT ; BYTE
bar.terminate
bar.blank ; INT
:
```
## Proposed Go mapping
Dots in tag names would be replaced with underscores (consistent with `goIdent()`):
```go
type _proto_BAR_PROTO interface{ _tag_BAR_PROTO() }
type _proto_BAR_PROTO_bar_data struct { _0 int; _1 byte }
type _proto_BAR_PROTO_bar_terminate struct{}
type _proto_BAR_PROTO_bar_blank struct { _0 int }
```
## Affected files
- `bar.occ` (protocol declaration and usage throughout)
## Work required
1. **Parser**: When parsing protocol tag names, allow dotted identifiers (consume `IDENT.IDENT` as a single tag name)
2. **Codegen**: Use `goIdent()` to convert dots to underscores in generated Go identifiers
Part of #43.Entered plan mode. You should now focus on exploring the codebase and designing an implementation approach. In plan mode, you should: 1. Thoroughly explore the codebase to understand existing patterns 2. Identify similar features and architectural approaches 3. Consider multiple approaches and their trade-offs 4. Use AskUserQuestion if you need to clarify the approach 5. Design a concrete implementation strategy 6. When ready, use ExitPlanMode to present your plan for approval Remember: DO NOT write or edit any files yet. This is a read-only exploration and planning phase.
{
"subagent_type": "Explore",
"prompt": "I need to understand how dotted identifiers and variant protocol tags are handled in this occam-to-Go transpiler. Please explore thoroughly:\n\n1. **Parser - variant protocol parsing**: In `parser/parser.go`, find how variant protocol declarations are parsed (look for `parseProtocolDecl` or similar). I need to see where tag names are consumed and how they currently handle identifiers (do they support dots?). Show the relevant code.\n\n2. **Parser - protocol tag usage**: Find where protocol tags are used in send/receive expressions (variant send `c ! tag ; val`, variant receive `c ? CASE ...`). Show how tag names are parsed in those contexts.\n\n3. **Codegen - variant protocol generation**: In `codegen/codegen.go`, find how variant protocol types and their tags are generated into Go code. Show the relevant generation functions.\n\n4. **AST - protocol nodes**: In `ast/ast.go`, show the AST node structures for variant protocols (ProtocolDecl, VariantProtocolCase, etc.).\n\n5. **goIdent() function**: Show the `goIdent()` function in codegen that converts occam identifiers to Go identifiers (replacing dots with underscores).\n\n6. **Lexer - DOT token**: In `lexer/token.go` and `lexer/lexer.go`, show how the DOT token is defined and lexed.\n\nFor each area, show the actual code snippets with line numbers."
}[
{
"type": "text",
"text": "Perfect! Now I have all the information needed. Let me provide a comprehensive report with all the findings organized by the 6 areas requested:\n\n## Comprehensive Analysis: Dotted Identifiers and Variant Protocols in occam2go\n\nI've thoroughly explored the codebase. Here are my findings organized by the six areas you requested:\n\n---\n\n### 1. Lexer - DOT Token Handling\n\n**File:** `/home/david/projects/code-associates/occam2go/lexer/lexer.go`\n\nThe lexer **treats dots as part of identifier names**, not as separate tokens:\n\n```go\n// Line 278-284: readIdentifier() function\nfunc (l *Lexer) readIdentifier() string {\n\tposition := l.position\n\tfor isLetter(l.ch) || isDigit(l.ch) || l.ch == '.' { // <-- dots included\n\t\tl.readChar()\n\t}\n\treturn l.input[position:l.position]\n}\n```\n\nThis means `out.repeat`, `print.int`, and `data.field` are all **lexed as single IDENT tokens** with the dot included in the literal string. There is **no separate DOT token type** defined in `token.go` (lines 1-103). The lexer's `isLetter()` function (line 404) allows underscore but dots are handled specially during identifier reading.\n\n---\n\n### 2. Parser - Variant Protocol Declaration\n\n**File:** `/home/david/projects/code-associates/occam2go/parser/parser.go`\n\n#### **parseProtocolDecl()** (lines 844-912)\n\n```go\nfunc (p *Parser) parseProtocolDecl() *ast.ProtocolDecl {\n\tdecl := &ast.ProtocolDecl{Token: p.curToken}\n\n\t// Expect protocol name\n\tif !p.expectPeek(lexer.IDENT) {\n\t\treturn nil\n\t}\n\tdecl.Name = p.curToken.Literal // <-- Tag name from IDENT token\n\t\n\t// Check if this is IS form or CASE form (variant)\n\tif p.peekTokenIs(lexer.NEWLINE) || p.peekTokenIs(lexer.INDENT) {\n\t\t// Could be variant: PROTOCOL NAME \\n INDENT CASE ...\n\t\tif p.peekTokenIs(lexer.INDENT) {\n\t\t\tp.nextToken() // consume INDENT\n\t\t\tp.nextToken() // move into block\n\n\t\t\tif p.curTokenIs(lexer.CASE) {\n\t\t\t\t// Variant protocol\n\t\t\t\tdecl.Kind = \"variant\"\n\t\t\t\tdecl.Variants = p.parseProtocolVariants()\n\t\t\t\tp.protocolNames[decl.Name] = true\n\t\t\t\tp.protocolDefs[decl.Name] = decl\n\t\t\t\treturn decl\n\t\t\t}\n\t\t}\n\t}\n\t// IS form: PROTOCOL NAME IS TYPE [; TYPE]*\n\t...\n}\n```\n\n#### **parseProtocolVariants()** (lines 936-1010)\n\n```go\nfunc (p *Parser) parseProtocolVariants() []ast.ProtocolVariant {\n\tvar variants []ast.ProtocolVariant\n\t// ... skip to block ...\n\t\n\tfor !p.curTokenIs(lexer.EOF) {\n\t\t// ... handle DEDENT ...\n\t\t\n\t\t// Parse variant: tag [; TYPE]*\n\t\tif !p.curTokenIs(lexer.IDENT) {\n\t\t\tp.addError(fmt.Sprintf(\"expected variant tag name, got %s\", p.curToken.Type))\n\t\t\treturn variants\n\t\t}\n\n\t\tv := ast.ProtocolVariant{\n\t\t\tTag: p.curToken.Literal, // <-- Raw identifier with dots (if present)\n\t\t}\n\n\t\t// Parse optional types after semicolons\n\t\tfor p.peekTokenIs(lexer.SEMICOLON) {\n\t\t\tp.nextToken() // move to ;\n\t\t\tp.nextToken() // move past ;\n\t\t\ttypeName := p.parseProtocolTypeName()\n\t\t\tif typeName == \"\" {\n\t\t\t\treturn variants\n\t\t\t}\n\t\t\tv.Types = append(v.Types, typeName)\n\t\t}\n\n\t\tvariants = append(variants, v)\n\t\t// ...\n\t}\n\treturn variants\n}\n```\n\n**Key finding:** Variant tag names are **captured as raw IDENT literals** (line 987). If a tag is `data.field`, it will be stored as the string `\"data.field\"` with the dot intact.\n\n---\n\n### 3. Parser - Protocol Tag Usage in Send/Receive\n\n**File:** `/home/david/projects/code-associates/occam2go/parser/parser.go`\n\n#### **Variant Send Detection** (lines 1185-1220)\n\n```go\n// Line 1185-1207: In parseStatement after c ! is recognized\nif p.curTokenIs(lexer.IDENT) && p.peekTokenIs(lexer.SEMICOLON) {\n\t// Could be variant send (tag ; values) or expression ; values\n\tpossibleTag := p.curToken.Literal // <-- Gets \"data.field\" if that's the token\n\tif p.isVariantTag(possibleTag) { // <-- Checks if tag is known\n\t\tstmt.VariantTag = possibleTag\n\t\tp.nextToken() // move to ;\n\t\tfor p.curTokenIs(lexer.SEMICOLON) {\n\t\t\tp.nextToken() // move past ;\n\t\t\tval := p.parseExpression(LOWEST)\n\t\t\tstmt.Values = append(stmt.Values, val)\n\t\t}\n\t\treturn stmt\n\t}\n}\n\nstmt.Value = p.parseExpression(LOWEST)\n// Check for sequential send: c ! expr ; expr ; ...\n```\n\n#### **isVariantTag() Helper** (lines 1222-1233)\n\n```go\nfunc (p *Parser) isVariantTag(name string) bool {\n\tfor _, proto := range p.protocolDefs {\n\t\tif proto.Kind == \"variant\" {\n\t\t\tfor _, v := range proto.Variants {\n\t\t\t\tif v.Tag == name { // <-- Direct string comparison\n\t\t\t\t\treturn true // Compares \"data.field\" == \"data.field\"\n\t\t\t\t}\n\t\t\t}\n\t\t}\n\t}\n\treturn false\n}\n```\n\n**Key finding:** Tag comparison is **exact string matching** (line 1226). If a tag is defined as `data.field`, the parser will correctly match a send with `c ! data.field ; ...` because both are the exact same string literal.\n\n#### **Variant Receive Parsing** (lines 1269-1340)\n\n```go\nfunc (p *Parser) parseVariantReceive(channel string, token lexer.Token) *ast.VariantReceive {\n\tstmt := &ast.VariantReceive{\n\t\tToken: token,\n\t\tChannel: channel,\n\t}\n\t// ... skip to block ...\n\t\n\tfor !p.curTokenIs(lexer.EOF) {\n\t\t// ... handle DEDENT ...\n\t\t\n\t\t// Parse a variant case: tag [; var]* \\n INDENT body\n\t\tvc := ast.VariantCase{}\n\n\t\tif !p.curTokenIs(lexer.IDENT) {\n\t\t\tp.addError(fmt.Sprintf(\"expected variant tag name, got %s\", p.curToken.Type))\n\t\t\tp.nextToken()\n\t\t\tcontinue\n\t\t}\n\t\tvc.Tag = p.curToken.Literal // <-- Tag stored as-is, including dots\n\t\t\n\t\t// Parse optional variables after semicolons: tag ; x ; y\n\t\tfor p.peekTokenIs(lexer.SEMICOLON) {\n\t\t\tp.nextToken() // move to ;\n\t\t\tif !p.expectPeek(lexer.IDENT) {\n\t\t\t\treturn stmt\n\t\t\t}\n\t\t\tvc.Variables = append(vc.Variables, p.curToken.Literal)\n\t\t}\n\t\t// ...\n\t}\n\treturn stmt\n}\n```\n\n**Key finding:** In variant receive (`c ? CASE`), tag names are **stored as raw strings** from the IDENT token (line 1329), preserving any dots.\n\n---\n\n### 4. AST - Protocol Node Structures\n\n**File:** `/home/david/projects/code-associates/occam2go/ast/ast.go`\n\n#### **ProtocolDecl** (lines 428-443)\n\n```go\ntype ProtocolDecl struct {\n\tToken lexer.Token // the PROTOCOL token\n\tName string // protocol name\n\tKind string // \"simple\", \"sequential\", or \"variant\"\n\tTypes []string // element types (simple: len=1, sequential: len>1)\n\tVariants []ProtocolVariant // only for Kind=\"variant\"\n}\n\ntype ProtocolVariant struct {\n\tTag string // tag name (e.g., \"text\", \"quit\") or \"data.field\" with dots\n\tTypes []string // associated types (empty for no-payload tags)\n}\n```\n\n#### **Send Statement** (lines 360-371)\n\n```go\ntype Send struct {\n\tToken lexer.Token // the ! token\n\tChannel string // channel name\n\tChannelIndex Expression // non-nil for cs[i] ! value\n\tValue Expression // value to send (simple send, backward compat)\n\tValues []Expression // additional values for sequential sends (c ! x ; y)\n\tVariantTag string // variant tag name for variant sends (c ! tag ; x)\n}\n```\n\n**Key finding:** `Send.VariantTag` is a **plain string field** that stores the tag name as-is from the source, which can include dots like `\"data.field\"`.\n\n#### **VariantReceive** (lines 445-460)\n\n```go\ntype VariantReceive struct {\n\tToken lexer.Token // the ? token\n\tChannel string\n\tChannelIndex Expression // non-nil for cs[i] ? CASE ...\n\tCases []VariantCase\n}\n\ntype VariantCase struct {\n\tTag string // variant tag name (can contain dots)\n\tVariables []string // variables to bind payload fields\n\tBody Statement\n}\n```\n\n**Key finding:** `VariantCase.Tag` is a **string field** that preserves the tag name as lexed, including any dots.\n\n---\n\n### 5. Codegen - goIdent() Function\n\n**File:** `/home/david/projects/code-associates/occam2go/codegen/codegen.go` (lines 74-91)\n\n```go\n// goIdent converts an occam identifier to a valid Go identifier.\n// Occam allows dots in identifiers (e.g., out.repeat); Go does not.\n// goReserved is a set of Go keywords and predeclared identifiers that cannot be\n// used as variable names when they also appear as type conversions in the generated code.\nvar goReserved = map[string]bool{\n\t\"byte\": true, \"int\": true, \"string\": true, \"len\": true, \"cap\": true,\n\t\"make\": true, \"new\": true, \"copy\": true, \"close\": true, \"delete\": true,\n\t\"panic\": true, \"recover\": true, \"print\": true, \"println\": true,\n\t\"error\": true, \"rune\": true, \"bool\": true, \"true\": true, \"false\": true,\n}\n\nfunc goIdent(name string) string {\n\tname = strings.ReplaceAll(name, \".\", \"_\") // <-- Replace ALL dots with underscores\n\tif goReserved[name] {\n\t\treturn \"_\" + name // Escape Go reserved words\n\t}\n\treturn name\n}\n```\n\n**Key finding:** `goIdent()` is a **universal conversion function** that:\n1. **Replaces all dots with underscores** (`\"data.field\"` → `\"data_field\"`)\n2. **Escapes Go reserved words** by prefixing with `_` (e.g., `\"int\"` → `\"_int\"`)\n\nThis ensures all occam identifiers (which can contain dots) become valid Go identifiers.\n\n---\n\n### 6. Codegen - Variant Protocol Generation\n\n**File:** `/home/david/projects/code-associates/occam2go/codegen/codegen.go`\n\n#### **Protocol Type Declaration** (lines 1246-1291)\n\n```go\nfunc (g *Generator) generateProtocolDecl(proto *ast.ProtocolDecl) {\n\tgName := goIdent(proto.Name)\n\tswitch proto.Kind {\n\tcase \"simple\":\n\t\tgoType := g.occamTypeToGoBase(proto.Types[0])\n\t\tg.writeLine(fmt.Sprintf(\"type _proto_%s = %s\", gName, goType))\n\t\tg.writeLine(\"\")\n\tcase \"sequential\":\n\t\tg.writeLine(fmt.Sprintf(\"type _proto_%s struct {\", gName))\n\t\tg.indent++\n\t\tfor i, t := range proto.Types {\n\t\t\tgoType := g.occamTypeToGoBase(t)\n\t\t\tg.writeLine(fmt.Sprintf(\"_%d %s\", i, goType))\n\t\t}\n\t\tg.indent--\n\t\tg.writeLine(\"}\")\n\t\tg.writeLine(\"\")\n\tcase \"variant\":\n\t\t// Interface type\n\t\tg.writeLine(fmt.Sprintf(\"type _proto_%s interface {\", gName))\n\t\tg.indent++\n\t\tg.writeLine(fmt.Sprintf(\"_is_%s()\", gName))\n\t\tg.indent--\n\t\tg.writeLine(\"}\")\n\t\tg.writeLine(\"\")\n\t\t// Concrete types for each variant\n\t\tfor _, v := range proto.Variants {\n\t\t\tgTag := goIdent(v.Tag) // <-- CRITICAL: goIdent() converts tag names!\n\t\t\tif len(v.Types) == 0 {\n\t\t\t\t// No-payload variant: empty struct\n\t\t\t\tg.writeLine(fmt.Sprintf(\"type _proto_%s_%s struct{}\", gName, gTag))\n\t\t\t} else {\n\t\t\t\tg.writeLine(fmt.Sprintf(\"type _proto_%s_%s struct {\", gName, gTag))\n\t\t\t\tg.indent++\n\t\t\t\tfor i, t := range v.Types {\n\t\t\t\t\tgoType := g.occamTypeToGoBase(t)\n\t\t\t\t\tg.writeLine(fmt.Sprintf(\"_%d %s\", i, goType))\n\t\t\t\t}\n\t\t\t\tg.indent--\n\t\t\t\tg.writeLine(\"}\")\n\t\t\t}\n\t\t\tg.writeLine(fmt.Sprintf(\"func (_proto_%s_%s) _is_%s() {}\", gName, gTag, gName))\n\t\t\tg.writeLine(\"\")\n\t\t}\n\t}\n}\n```\n\n**Key finding:** At line 1273, `goIdent(v.Tag)` is called on variant tag names. If a tag is `\"data.field\"`, it becomes `\"data_field\"` in the generated code.\n\n#### **Variant Send Generation** (lines 1171-1200)\n\n```go\nif send.VariantTag != \"\" && proto != nil && proto.Kind == \"variant\" {\n\t// Variant send with explicit tag: c <- _proto_NAME_tag{values...}\n\tg.write(fmt.Sprintf(\"_proto_%s_%s{\", gProtoName, goIdent(send.VariantTag)))\n\t// ^^^^^^^^^^^^^^\n\t// Converts \"data.field\" → \"data_field\"\n\tfor i, val := range send.Values {\n\t\tif i > 0 {\n\t\t\tg.write(\", \")\n\t\t}\n\t\tg.generateExpression(val)\n\t}\n\tg.write(\"}\")\n} else if proto != nil && proto.Kind == \"variant\" && send.Value != nil && len(send.Values) == 0 {\n\t// Check if the send value is a bare identifier matching a variant tag\n\tif ident, ok := send.Value.(*ast.Identifier); ok && g.isVariantTag(protoName, ident.Value) {\n\t\tg.write(fmt.Sprintf(\"_proto_%s_%s{}\", gProtoName, goIdent(ident.Value)))\n\t\t// ^^^^^^^^^^^^^^\n\t\t// Also converts here for bare tags\n\t}\n}\n```\n\n**Key finding:** When generating variant send code (line 1173), `goIdent()` is called on `send.VariantTag`, converting dots to underscores. Example:\n- Occam: `c ! data.field ; 42`\n- Generated Go: `c <- _proto_MSG_data_field{42}`\n\n#### **Variant Receive Generation** (lines 1293-1322)\n\n```go\nfunc (g *Generator) generateVariantReceive(vr *ast.VariantReceive) {\n\tprotoName := g.chanProtocols[vr.Channel]\n\tgProtoName := goIdent(protoName)\n\tchanRef := goIdent(vr.Channel)\n\t// ... build chanRef ...\n\t\n\tg.writeLine(fmt.Sprintf(\"switch _v := (<-%s).(type) {\", chanRef))\n\tfor _, vc := range vr.Cases {\n\t\tg.writeLine(fmt.Sprintf(\"case _proto_%s_%s:\", gProtoName, goIdent(vc.Tag)))\n\t\t// ^^^^^^^^^^^^^^\n\t\t// Converts \"data.field\" → \"data_field\"\n\t\tg.indent++\n\t\tfor i, v := range vc.Variables {\n\t\t\tg.writeLine(fmt.Sprintf(\"%s = _v._%d\", goIdent(v), i))\n\t\t}\n\t\tif vc.Body != nil {\n\t\t\tg.generateStatement(vc.Body)\n\t\t}\n\t\tg.indent--\n\t}\n\tg.writeLine(\"}\")\n}\n```\n\n**Key finding:** In variant receive (line 1311), `goIdent(vc.Tag)` converts variant case tags to valid Go identifiers. Example:\n- Occam: `c ? CASE data.field ; x ...`\n- Generated Go: `case _proto_MSG_data_field:`\n\n#### **isVariantTag() in Codegen** (lines 1324-1335)\n\n```go\nfunc (g *Generator) isVariantTag(protoName, tagName string) bool {\n\tproto := g.protocolDefs[protoName]\n\tif proto == nil {\n\t\treturn false\n\t}\n\tfor _, v := range proto.Variants {\n\t\tif v.Tag == tagName { // <-- String comparison on raw tag names\n\t\t\treturn true\n\t\t}\n\t}\n\treturn false\n}\n```\n\n**Key finding:** This checks the **raw tag names** stored in the AST (which can contain dots). It does **not** call `goIdent()` on the comparison. The comparison is `v.Tag == tagName`, so both sides must match exactly.\n\n---\n\n## Summary\n\n| Component | How Dots Are Handled |\n|-----------|----------------------|\n| **Lexer** | Dots are **part of IDENT tokens**; `\"out.repeat\"` is one token, not three |\n| **Parser - Tags in declaration** | Tag names are **captured as-is** from IDENT literals, preserving dots |\n| **Parser - Tag matching** | Uses **exact string comparison**: `\"data.field\" == \"data.field\"` |\n| **AST** | Stores tag names as **plain strings** in `ProtocolVariant.Tag` and `Send.VariantTag` |\n| **Codegen - goIdent()** | **Replaces all dots with underscores** and escapes reserved words |\n| **Codegen - Tags** | **Calls goIdent() on all tag names** when generating Go identifiers |\n\n**Critical Example:**\n```occam\nPROTOCOL MSG\n CASE\n data.field; INT\n\nSEQ\n c ! data.field ; 42\n```\n\nBecomes:\n```go\ntype _proto_MSG_data_field struct { _0 int }\n\nc <- _proto_MSG_data_field{42}\n```\n\nThe parser accepts `data.field` as a tag name, stores it in the AST as `\"data.field\"`, and the codegen converts it to the valid Go identifier `data_field` when generating type and switch case names."
},
{
"type": "text",
"text": "agentId: aa663ff4b06da486e (for resuming to continue this agent's work if needed)\n<usage>total_tokens: 55443\ntool_uses: 26\nduration_ms: 70834</usage>"
}
]The exploration reveals something interesting: the lexer already includes dots as part of identifiers (readIdentifier reads while isLetter(l.ch) || isDigit(l.ch) || l.ch == '.'). And the codegen already calls goIdent() on tag names, which converts dots to underscores.
So if dots are already part of IDENT tokens, and codegen already handles them... what's actually broken? Let me re-read the issue:
Currently the parser treats the dot as a field access operator rather than part of the tag name.
Wait, but the lexer includes dots in identifiers.
I need to trace through this more carefully—the lexer's readIdentifier function does consume dots, but maybe something in the parser is intercepting them before they get grouped into a single token, or perhaps the issue is about how the parser interprets the token stream rather than how the lexer produces it. dots are consumed as part of identifiers. That means bar.data would be a single IDENT token "bar.data". So what's the actual issue?
Wait, maybe the issue is that bar.data when used in a context like sending or receiving, gets parsed differently? Like maybe the expression parser parses bar as an identifier, then sees . and tries to parse it as a field access?
Actually wait - the expression parser uses parseExpression which has infix operators. If . is part of the identifier lexed by the lexer, then the expression parser would never see it separately. The dot is consumed as part of the IDENT token.
Hmm, but maybe the issue is something else. Let me look at the actual test or try to reproduce. Let me look at what happens when we try to parse a protocol with dotted tags. Maybe the issue is that the protocol name itself has a dot, and then when defining tags with dots, something goes wrong.
Actually, let me re-read the issue more carefully. It says the parser treats the dot as a field access operator rather than part of the tag name. But if the lexer includes dots in identifiers, this shouldn't happen unless there's a specific context where something breaks. Let me check if there's an existing test case for this, or look at the record field access parsing to see if it might interfere. I'm wondering if the issue is more subtle - maybe in certain contexts like after c !, the parser calls parseExpression which parses bar as an IDENT, and then if . is NOT actually part of the IDENT token in that context, it would be treated as a separate operator. I'm realizing the lexer actually includes dots in identifiers, so bar.data becomes a single IDENT token without any separate DOT token. This means the parser should already handle it correctly, unless the real issue is just that we need a test case to verify it works, or there's a specific edge case I'm missing. Let me re-examine the original issue description to understand what's actually broken. So occam actually uses bracket syntax for field access, not dots—that means there shouldn't be a dot operator interfering with parsing at all. The issue might be that everything already works fine because the lexer treats dots as part of identifiers, but there's no test coverage for it. Or there could be a real bug hiding in some edge case of the variant send parsing logic that I need to examine more closely. I'm wondering if the issue author might not have fully verified the current state—it's possible this already works, or there could be a subtle edge case I need to check. Let me examine the actual parser and lexer code to confirm whether dotted identifiers are properly handled in protocol tag positions.
The exploration shows the lexer already treats dots as part of identifiers, and codegen already calls goIdent() on tag names. Let me verify the actual code and check if there's really a bug or if tests are just missing.
{
"file_path": "/home/david/projects/code-associates/occam2go/lexer/lexer.go",
"offset": 275,
"limit": 15
} 275→ return Token{Type: tokenType, Literal: string(ch), Line: l.line, Column: l.column}
276→}
277→
278→func (l *Lexer) readIdentifier() string {
279→ position := l.position
280→ for isLetter(l.ch) || isDigit(l.ch) || l.ch == '.' {
281→ l.readChar()
282→ }
283→ return l.input[position:l.position]
284→}
285→
286→func (l *Lexer) readNumber() string {
287→ position := l.position
288→ for isDigit(l.ch) {
289→ l.readChar(){
"file_path": "/home/david/projects/code-associates/occam2go/kroc/modules/course/examples/bar.occ"
} 1→---------------------------------------------------------------------------
2→--
3→-- bar.occ -- animated fair ALTing
4→-- Copyright (C) 2000 Fred Barnes
5→-- Some fixes/mods Oct 2001, Peter Welch
6→--
7→-- This program is free software; you can redistribute it and/or modify
8→-- it under the terms of the GNU General Public License as published by
9→-- the Free Software Foundation; either version 2 of the License, or
10→-- (at your option) any later version.
11→--
12→-- This program is distributed in the hope that it will be useful,
13→-- but WITHOUT ANY WARRANTY; without even the implied warranty of
14→-- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
15→-- GNU General Public License for more details.
16→--
17→-- You should have received a copy of the GNU General Public License
18→-- along with this program; if not, write to the Free Software
19→-- Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
20→--
21→
22→#INCLUDE "course.module"
23→--#INCLUDE "time.module"
24→
25→--{{{ FUNCTION seconds
26→--* Convert seconds to timer ticks.
27→-- @param s Time in seconds
28→-- @return Time in timer ticks
29→INT INLINE FUNCTION seconds (VAL INT s)
30→ INT ticks:
31→ VALOF
32→ #IF TARGET.BITS.PER.WORD = 16
33→ ticks := s * 1000
34→ #ELSE
35→ ticks := s * 1000000
36→ #ENDIF
37→ RESULT ticks
38→:
39→--}}}
40→
41→--{{{ FUNCTION milliseconds
42→--* Convert milliseconds to timer ticks.
43→-- @param ms Time in milliseconds
44→-- @return Time in timer ticks
45→INT INLINE FUNCTION milliseconds (VAL INT ms)
46→ INT ticks:
47→ VALOF
48→ #IF TARGET.BITS.PER.WORD = 16
49→ ticks := ms
50→ #ELSE
51→ ticks := ms * 1000
52→ #ENDIF
53→ RESULT ticks
54→:
55→--}}}
56→
57→--{{{ FUNCTION microseconds
58→--* Convert microseconds to timer ticks.
59→-- @param us Time in microseconds
60→-- @return Time in timer ticks
61→INT INLINE FUNCTION microseconds (VAL INT us)
62→ INT ticks:
63→ VALOF
64→ #IF TARGET.BITS.PER.WORD = 16
65→ ticks := us / 1000
66→ #ELSE
67→ ticks := us
68→ #ENDIF
69→ RESULT ticks
70→:
71→--}}}
72→
73→--{{{ O.REQ protocol
74→PROTOCOL O.REQ
75→ CASE
76→ movecursor; BYTE; BYTE
77→ string; INT::[]BYTE
78→ number; INT
79→ thirsty; INT
80→ happy; INT
81→ sad; INT
82→ blank
83→ bar.blank
84→ bar.happy
85→ terminate
86→:
87→--}}}
88→
89→--{{{ constants
90→
91→VAL INT PRI.ALT IS 0:
92→VAL INT FAIRISH.ALT IS 1:
93→VAL INT FAIR.ALT IS 2:
94→
95→VAL INT num.punters IS 20:
96→
97→VAL INT drunk.pints IS 8:
98→VAL INT limit.pints IS 16:
99→
100→VAL BYTE bar.left IS 1:
101→VAL BYTE punter.left IS 14:
102→VAL BYTE punter.top IS 2:
103→VAL BYTE punter.right IS 30:
104→VAL BYTE punter.bcount IS 35:
105→VAL BYTE barkeep.x IS 6:
106→VAL BYTE help.y IS ((BYTE num.punters) + punter.top) + 2:
107→
108→VAL []BYTE punter.blank IS " ":
109→VAL [][]BYTE punter.happy IS [":-)", "%-]", "X-|"]:
110→VAL [][]BYTE punter.thirsty IS [":-O", "%-P", "X-|"]:
111→VAL [][]BYTE punter.sad IS [":-(", "%-[", "X-|"]:
112→VAL []BYTE barkeep.blank IS " ":
113→VAL []BYTE barkeep.happy IS "=:-]":
114→
115→--}}}
116→
117→--{{{ PROC draw.bar (SHARED CHAN O.REQ out!)
118→PROC draw.bar (SHARED CHAN O.REQ out!)
119→ VAL []BYTE bar.end IS "----------||":
120→ VAL []BYTE bar.bit IS " ||":
121→ VAL []BYTE help IS "keys: 0 = normal (arbitrary) alt, 1 = fair(ish) alt, 2 = fair alt":
122→ CLAIM out!
123→ SEQ
124→ out ! movecursor; 1; 1
125→ out ! string; (SIZE bar.end)::bar.end
126→ SEQ i = INT punter.top FOR INT num.punters
127→ SEQ
128→ out ! movecursor; 1; BYTE i
129→ out ! string; (SIZE bar.bit)::bar.bit
130→ out ! movecursor; 1; (BYTE num.punters) + punter.top
131→ out ! string; (SIZE bar.end)::bar.end
132→ out ! movecursor; 1; help.y
133→ out ! string; (SIZE help)::help
134→:
135→--}}}
136→
137→--{{{ PROC punter (VAL INT me, seed, CHAN BYTE request!, response?, SHARED CHAN O.REQ out!)
138→PROC punter (VAL INT me, seed, CHAN INT request!, response?, SHARED CHAN O.REQ out!)
139→ VAL INT speed.const IS milliseconds (20):
140→
141→ BYTE my.x, my.y:
142→ INT num.pints:
143→ VAL INT FINE IS 0:
144→ VAL INT DRUNK IS 1:
145→ VAL INT DEAD IS 2:
146→ INITIAL INT state IS FINE:
147→ INITIAL INT my.seed IS seed + me:
148→ TIMER tim:
149→ INT t:
150→ SEQ
151→ --{{{ Initialise
152→ num.pints := 0
153→ my.x := punter.right
154→ my.y := (BYTE me) + punter.top
155→ CLAIM out!
156→ SEQ
157→ out ! movecursor; punter.bcount; my.y
158→ out ! number; num.pints
159→ --}}}
160→ --{{{ Shake up random number seed
161→ SEQ i = 0 FOR 1000
162→ t, my.seed := random (1<<6, my.seed)
163→ --}}}
164→ --{{{ Main loop
165→ WHILE (state <> DEAD)
166→ INT tdelay:
167→ SEQ
168→ --{{{ Think
169→ SEQ
170→ CLAIM out!
171→ SEQ
172→ out ! movecursor; my.x; my.y
173→ out ! happy; state
174→ tdelay, my.seed := random (seconds (2), my.seed)
175→ tim ? t
176→ tim ? AFTER (t PLUS tdelay)
177→ --}}}
178→ --{{{ Get thirsty and go to the bar
179→ SEQ i = 0 FOR INT (punter.right - punter.left)
180→ SEQ
181→ CLAIM out!
182→ SEQ
183→ out ! movecursor; my.x; my.y
184→ out ! blank
185→ my.x := (my.x - 1)
186→ out ! movecursor; my.x; my.y
187→ out ! thirsty; state
188→ tim ? t
189→ tim ? AFTER (t PLUS speed.const)
190→ --}}}
191→ --{{{ Get unhappy and order drink
192→ SEQ
193→ CLAIM out!
194→ SEQ
195→ out ! movecursor; my.x; my.y
196→ out ! sad; state
197→ request ! me
198→ INT any:
199→ response ? any
200→ --}}}
201→ --{{{ Update beer count and move back to the right
202→ SEQ
203→ num.pints := num.pints + 1
204→ IF
205→ num.pints = limit.pints
206→ state := DEAD
207→ num.pints >= drunk.pints
208→ state := DRUNK
209→ TRUE
210→ SKIP
211→ CLAIM out!
212→ SEQ
213→ out ! movecursor; punter.bcount; my.y
214→ out ! number; num.pints
215→ SEQ i = 0 FOR INT (punter.right - punter.left)
216→ SEQ
217→ CLAIM out!
218→ SEQ
219→ out ! movecursor; my.x; my.y
220→ out ! blank
221→ my.x := (my.x + 1)
222→ out ! movecursor; my.x; my.y
223→ out ! happy; state
224→ tim ? t
225→ tim ? AFTER (t PLUS speed.const)
226→ --}}}
227→ --}}}
228→:
229→--}}}
230→
231→--{{{ PROC punters ([]CHAN BYTE req!, rsp?, SHARED CHAN O.REQ d.chan!, CHAN BOOL kill.keyboard!, kill.bar!)
232→PROC punters ([]CHAN INT req!, rsp?, SHARED CHAN O.REQ d.chan!,
233→ CHAN BOOL kill.keyboard!, kill.bar!)
234→
235→ INT seed: -- random number seed
236→
237→ SEQ
238→
239→ --{{{ Set random seed
240→ TIMER tim:
241→ SEQ
242→ tim ? seed
243→ seed := (seed >> 2) + 42
244→ --}}}
245→
246→ PAR i = 0 FOR num.punters
247→ punter (i, seed, req[i]!, rsp[i]?, d.chan!)
248→
249→ kill.keyboard ! TRUE -- kill off the others
250→ kill.bar ! TRUE -- carefully ...
251→ CLAIM d.chan!
252→ d.chan ! terminate -- order!
253→
254→:
255→--}}}
256→
257→--{{{ PROC display (CHAN O.REQ in?, CHAN BYTE out!)
258→PROC display (CHAN O.REQ in?, CHAN BYTE out!)
259→ [256]BYTE str:
260→ INITIAL BOOL running IS TRUE:
261→ SEQ
262→ cursor.invisible (out!)
263→ erase.screen (out!)
264→ WHILE running
265→ SEQ
266→ in ? CASE
267→ BYTE x, y:
268→ movecursor; x; y
269→ cursor.x.y (x, y, out!)
270→ INT len:
271→ string; len::str
272→ out.string ([str FOR len], 0, out!)
273→ INT n:
274→ number; n
275→ out.int (n, 0, out!)
276→ blank
277→ out.string (punter.blank, 0, out!)
278→ INT s:
279→ happy; s
280→ SEQ
281→ out.string (punter.happy[s], 0, out!)
282→ out ! '*c'
283→ INT s:
284→ thirsty; s
285→ SEQ
286→ out.string (punter.thirsty[s], 0, out!)
287→ out ! '*c'
288→ INT s:
289→ sad; s
290→ SEQ
291→ out.string (punter.sad[s], 0, out!)
292→ out ! '*c'
293→ bar.blank
294→ SEQ
295→ out.string (barkeep.blank, 0, out!)
296→ out ! '*c'
297→ bar.happy
298→ SEQ
299→ out.string (barkeep.happy, 0, out!)
300→ out ! '*c'
301→ terminate
302→ SEQ
303→ running := FALSE
304→ cursor.x.y (1, help.y - 1, out!)
305→ erase.eos (out!)
306→ out ! FLUSH
307→ cursor.visible (out!)
308→:
309→--}}}
310→
311→--{{{ PROC bar.brain ([]CHAN BYTE in?, CHAN BYTE out!, CHAN INT mode.change?, CHAN BOOL terminate?)
312→PROC bar.brain ([]CHAN INT in?, CHAN INT out!,
313→ CHAN INT mode.change?, CHAN BOOL terminate?)
314→ VAL INT s IS SIZE in:
315→ INITIAL INT local.alt.mode IS PRI.ALT:
316→ INITIAL INT favourite IS 0:
317→ INITIAL BOOL running IS TRUE:
318→ WHILE running
319→ CASE local.alt.mode
320→ PRI.ALT
321→ PRI ALT
322→ BOOL any:
323→ terminate ? any
324→ running := FALSE
325→ mode.change ? local.alt.mode
326→ SKIP
327→ PRI ALT i = 0 FOR s
328→ INT any:
329→ in[i] ? any
330→ out ! any
331→ FAIRISH.ALT
332→ PRI ALT
333→ BOOL any:
334→ terminate ? any
335→ running := FALSE
336→ mode.change ? local.alt.mode
337→ SKIP
338→ PRI ALT j = favourite FOR s
339→ VAL INT X IS (j \ s):
340→ INT any:
341→ in[X] ? any
342→ SEQ
343→ out ! any
344→ favourite := ((favourite + 1) \ s)
345→ FAIR.ALT
346→ PRI ALT
347→ BOOL any:
348→ terminate ? any
349→ running := FALSE
350→ mode.change ? local.alt.mode
351→ SKIP
352→ PRI ALT i = favourite FOR s
353→ VAL INT X IS (i \ s):
354→ INT any:
355→ in[X] ? any
356→ SEQ
357→ out ! any
358→ favourite := (X + 1)
359→:
360→--}}}
361→
362→--{{{ PROC bar.body (CHAN BYTE in?, []CHAN BYTE responses!, SHARED CHAN O.REQ out!, CHAN BOOL terminate?)
363→PROC bar.body (CHAN INT in?, []CHAN INT responses!, SHARED CHAN O.REQ out!,
364→ CHAN BOOL terminate?)
365→ VAL INT bar.delay IS milliseconds (20):
366→ VAL INT bar.anim.delay IS milliseconds (20):
367→
368→ PROC barkeep.from.to (VAL BYTE start.y, end.y)
369→ TIMER tim:
370→ INT t:
371→ SEQ
372→ tim ? t
373→ tim ? AFTER (t PLUS bar.anim.delay)
374→ CLAIM out!
375→ SEQ
376→ out ! movecursor; barkeep.x; start.y
377→ out ! bar.blank
378→ out ! movecursor; barkeep.x; end.y
379→ out ! bar.happy
380→ :
381→
382→ INITIAL BYTE barkeep.y IS punter.top:
383→ INITIAL BOOL running IS TRUE:
384→ INITIAL BOOL serving IS FALSE:
385→ TIMER tim:
386→ INT wake.up:
387→ SEQ
388→ CLAIM out!
389→ SEQ
390→ out ! movecursor; barkeep.x; barkeep.y
391→ out ! bar.happy
392→ tim ? wake.up
393→ wake.up := wake.up PLUS bar.delay
394→ WHILE running
395→ PRI ALT
396→ BOOL any:
397→ terminate ? any
398→ running := FALSE
399→ (NOT serving) & tim ? AFTER wake.up
400→ serving := TRUE
401→ INT n:
402→ serving & in ? n
403→ BYTE new.y:
404→ SEQ
405→ new.y := ((BYTE n) + punter.top)
406→ IF
407→ new.y < barkeep.y
408→ SEQ i = 0 FOR INT (barkeep.y - new.y)
409→ VAL BYTE bi IS BYTE i:
410→ barkeep.from.to (barkeep.y - bi,
411→ barkeep.y - (bi + 1))
412→ new.y > barkeep.y
413→ SEQ i = INT barkeep.y FOR INT (new.y - barkeep.y)
414→ VAL BYTE bi IS BYTE i:
415→ barkeep.from.to (bi, bi + 1)
416→ TRUE
417→ SKIP
418→ barkeep.y := new.y
419→ responses[n] ! 0
420→ serving := FALSE
421→ tim ? wake.up
422→ wake.up := wake.up PLUS bar.delay
423→:
424→--}}}
425→
426→--{{{ PROC bar.tender ([]CHAN BYTE req?, rsp!, CHAN INT mode.change?, SHARED CHAN O.REQ d.chan!, CHAN BOOL kill?)
427→PROC bar.tender ([]CHAN INT req?, rsp!, CHAN INT mode.change?,
428→ SHARED CHAN O.REQ d.chan!, CHAN BOOL kill?)
429→ CHAN INT spinal.chord:
430→ CHAN BOOL kill.brain, kill.body:
431→ PAR
432→ --{{{ terminator
433→ BOOL any:
434→ SEQ
435→ kill ? any
436→ kill.brain ! any -- order is
437→ kill.body ! any -- important
438→ --}}}
439→ bar.brain (req?, spinal.chord!, mode.change?, kill.brain?)
440→ bar.body (spinal.chord?, rsp!, d.chan!, kill.body?)
441→:
442→--}}}
443→
444→--{{{ PROC keyboard.reader (CHAN BYTE in?, CHAN INT out!, CHAN BOOL terminate?)
445→PROC keyboard.reader (CHAN BYTE in?, CHAN INT out!, CHAN BOOL terminate?)
446→ INITIAL BOOL running IS TRUE:
447→ WHILE running
448→ PRI ALT
449→ BOOL any:
450→ terminate ? any
451→ running := FALSE
452→ BYTE ch:
453→ in ? ch
454→ IF
455→ (ch >= '0') AND (ch <= '2')
456→ CASE ch - '0'
457→ 0
458→ out ! PRI.ALT
459→ 1
460→ out ! FAIRISH.ALT
461→ 2
462→ out ! FAIR.ALT
463→ TRUE
464→ SKIP
465→:
466→--}}}
467→
468→--{{{ PROC bar (CHAN BYTE keyboard?, screen!, error!)
469→PROC bar (CHAN BYTE keyboard?, screen!, error!)
470→
471→ SHARED ! CHAN O.REQ d.chan:
472→ [num.punters]CHAN INT req:
473→ [num.punters]CHAN INT rsp:
474→ CHAN INT mode.change:
475→ CHAN BOOL kill.keyboard:
476→ CHAN BOOL kill.bar:
477→ PAR
478→ display (d.chan?, screen!)
479→ SEQ
480→ draw.bar (d.chan!)
481→ PAR
482→ punters (req!, rsp?, d.chan!, kill.keyboard!, kill.bar!)
483→ keyboard.reader (keyboard?, mode.change!, kill.keyboard?)
484→ bar.tender (req?, rsp!, mode.change?, d.chan!, kill.bar?)
485→:
486→--}}}
487→
488→Now I can see the actual usage. In bar.occ, the protocol O.REQ has dotted tags like bar.blank and bar.happy. And these are used in both sends (like out ! bar.blank) and receives (in ? CASE ... bar.blank).
Since the lexer already treats dots as part of identifiers, bar.blank would be a single IDENT token "bar.blank". So the parsing of protocol declarations, variant sends, and variant receives should already work in terms of tokenization.
But wait - looking at the variant sends more carefully:
out ! bar.blank
out ! bar.happy
These are sends without a semicolon after the tag. So the variant send detection code checks if the current token is an identifier followed by a semicolon to recognize a variant tag. For out ! bar.blank, there's no semicolon - it's bar.blank followed by a newline instead.
So it would fall through to the expression parsing path, where bar.blank gets parsed as a regular expression and becomes an ast.Identifier. Then in the codegen phase, there's a check that sees if that identifier value matches a variant tag for the protocol, which should handle this case correctly.
Looking at the actual sends and receives in the file, I see dotted tags like bar.blank and bar.happy appearing without payloads in both the send statements and the receive cases. The question is whether the parser correctly handles these dotted identifiers when they appear after the ! operator in a send statement.
When the parser encounters out ! bar.blank, it reads bar.blank as a single identifier token and then checks if a semicolon follows—but since there's a newline instead, it falls through to parsing it as a regular expression. The codegen then needs to recognize that this expression value is actually a variant tag by checking if it matches a known variant in the protocol definition.
For the receive side with dotted tags, the parser validates that the current token is an identifier before proceeding.
But wait—if this is already working, what's the actual problem? Let me trace through what happens when I run the parser on this file. Actually, I'm realizing there are several other unimplemented features in bar.occ beyond just dotted protocol tags: shared channels, claim statements, priority alt, counted arrays, and multidimensional arrays. So I can't test directly with that file. I need to figure out what subset of features I can actually test with.
Let me think about this differently—maybe the issue is simpler than I'm making it. The lexer treats bar.blank as a single IDENT token, the parser stores it as-is, and the codegen replaces dots with underscores. So theoretically it should work. Let me create a minimal test case with just a dotted protocol tag and see what happens.
Actually, looking at the send parsing logic, when we have out ! bar.blank ; 42, the parser sees an IDENT followed by a SEMICOLON, checks if it's a valid variant tag, and if so, assigns it to stmt.VariantTag. For sends without a semicolon, it falls through to expression parsing and creates an identifier node. The codegen should handle both cases correctly, so I think the existing implementation already supports dotted protocol tags.
But the issue reporter says it doesn't work, so let me reconsider. Looking at the lexer again, readIdentifier includes dots in its character set, so bar.blank is definitely tokenized as a single IDENT. For protocol declarations with dotted names like PROTOCOL O.REQ, the parser stores the full name including the dot, then converts it to O_REQ when generating Go code. The protocol name mapping also stores the original dotted name as the key.
The real question is whether the channel protocol lookup mechanism correctly matches these dotted names when variant sends try to find the protocol for a given channel. Let me trace through how chanProtocols gets populated and how variant sends resolve their protocol references. it works (it likely already does for the basic case)
2. Add tests for dotted protocol tags
3. Maybe fix any edge cases that arise
Actually wait, let me look at this more carefully. In bar.occ the protocol name is O.REQ. When used in a channel declaration:
SHARED ! CHAN O.REQ d.chan:
The parser needs to recognize O.REQ as a protocol name. Let me check how channel declarations with protocol types work.
Actually, SHARED CHAN isn't implemented. But even for regular usage:
CHAN O.REQ in?
The parser parses this as CHAN <type> <name>. The type would be O.REQ (single IDENT token). This should work.
Let me think about what changes are actually needed. Looking at the issue again, the author suggests the parser needs to explicitly handle dotted identifiers when parsing protocol tag names, but since the lexer already includes dots in identifiers, this should already work. Similarly, codegen already uses goIdent() to convert dots to underscores in generated Go identifiers.
So the real work here is just adding comprehensive tests to verify this functionality works correctly, then fixing any edge cases that surface. I should write unit tests for parsing variant protocols with dotted tags, unit tests for codegen with dotted protocol tags, and an end-to-end test with dotted protocol tags to ensure everything integrates properly.
Before I start writing tests though, I need to look at the existing protocol tests to understand the testing patterns and conventions used in this codebase.
Now I see the real-world usage in bar.occ. The lexer already handles dots in identifiers, so bar.blank is a single IDENT token. Let me verify if there's actually a bug by checking the existing protocol tests and the send parsing code more carefully.
{
"pattern": "protocol.*variant|dotted|bar\\.blank|PROTOCOL.*CASE",
"path": "/home/david/projects/code-associates/occam2go/codegen",
"output_mode": "files_with_matches"
}{
"pattern": "variant.*tag|VariantTag|isVariantTag|parseProtocolVariants",
"path": "/home/david/projects/code-associates/occam2go/parser/parser.go",
"output_mode": "content"
}719: // Check if this is a variant send: first token is an identifier that is a variant tag
722: if p.isVariantTag(possibleTag) {
723: stmt.VariantTag = possibleTag
868: decl.Variants = p.parseProtocolVariants()
936:func (p *Parser) parseProtocolVariants() []ast.ProtocolVariant {
980: // Parse variant: tag [; TYPE]*
982: p.addError(fmt.Sprintf("expected variant tag name, got %s", p.curToken.Type))
1185: // Check if this is a variant send: first token is an identifier that is a variant tag
1189: // Could be variant send (tag ; values) or expression ; values
1190: // Check if the identifier is a known protocol variant tag
1192: // not followed by an operator, treat it as a variant tag
1195: // Check if this identifier is a protocol variant tag
1196: if p.isVariantTag(possibleTag) {
1197: stmt.VariantTag = possibleTag
1222:func (p *Parser) isVariantTag(name string) bool {
1321: // Parse a variant case: tag [; var]* \n INDENT body
1325: p.addError(fmt.Sprintf("expected variant tag name, got %s", p.curToken.Type))
1422: p.addError(fmt.Sprintf("expected variant tag name, got %s", p.curToken.Type)){
"file_path": "/home/david/projects/code-associates/occam2go/parser/parser.go",
"offset": 1180,
"limit": 60
} 1180→ p.nextToken() // move to !
1181→ stmt.Token = p.curToken
1182→
1183→ p.nextToken() // move past !
1184→
1185→ // Check if this is a variant send: first token is an identifier that is a variant tag
1186→ // We detect this by checking if the identifier is followed by SEMICOLON
1187→ // and the identifier is NOT followed by an operator (i.e., it's a bare tag name)
1188→ if p.curTokenIs(lexer.IDENT) && p.peekTokenIs(lexer.SEMICOLON) {
1189→ // Could be variant send (tag ; values) or expression ; values
1190→ // Check if the identifier is a known protocol variant tag
1191→ // For simplicity, if IDENT is followed by SEMICOLON and the ident is
1192→ // not followed by an operator, treat it as a variant tag
1193→ // We save the ident and check further
1194→ possibleTag := p.curToken.Literal
1195→ // Check if this identifier is a protocol variant tag
1196→ if p.isVariantTag(possibleTag) {
1197→ stmt.VariantTag = possibleTag
1198→ p.nextToken() // move to ;
1199→ // Parse remaining values after the tag
1200→ for p.curTokenIs(lexer.SEMICOLON) {
1201→ p.nextToken() // move past ;
1202→ val := p.parseExpression(LOWEST)
1203→ stmt.Values = append(stmt.Values, val)
1204→ }
1205→ return stmt
1206→ }
1207→ }
1208→
1209→ stmt.Value = p.parseExpression(LOWEST)
1210→
1211→ // Check for sequential send: c ! expr ; expr ; ...
1212→ for p.peekTokenIs(lexer.SEMICOLON) {
1213→ p.nextToken() // move to ;
1214→ p.nextToken() // move past ;
1215→ val := p.parseExpression(LOWEST)
1216→ stmt.Values = append(stmt.Values, val)
1217→ }
1218→
1219→ return stmt
1220→}
1221→
1222→func (p *Parser) isVariantTag(name string) bool {
1223→ for _, proto := range p.protocolDefs {
1224→ if proto.Kind == "variant" {
1225→ for _, v := range proto.Variants {
1226→ if v.Tag == name {
1227→ return true
1228→ }
1229→ }
1230→ }
1231→ }
1232→ return false
1233→}
1234→
1235→func (p *Parser) parseReceive() ast.Statement {
1236→ channel := p.curToken.Literal
1237→
1238→ p.nextToken() // move to ?
1239→ recvToken := p.curToken{
"file_path": "/home/david/projects/code-associates/occam2go/parser/parser.go",
"offset": 710,
"limit": 40
} 710→ sendToken := p.curToken
711→ p.nextToken() // move past !
712→
713→ stmt := &ast.Send{
714→ Token: sendToken,
715→ Channel: name,
716→ ChannelIndex: index,
717→ }
718→
719→ // Check if this is a variant send: first token is an identifier that is a variant tag
720→ if p.curTokenIs(lexer.IDENT) && p.peekTokenIs(lexer.SEMICOLON) {
721→ possibleTag := p.curToken.Literal
722→ if p.isVariantTag(possibleTag) {
723→ stmt.VariantTag = possibleTag
724→ p.nextToken() // move to ;
725→ for p.curTokenIs(lexer.SEMICOLON) {
726→ p.nextToken() // move past ;
727→ val := p.parseExpression(LOWEST)
728→ stmt.Values = append(stmt.Values, val)
729→ }
730→ return stmt
731→ }
732→ }
733→
734→ stmt.Value = p.parseExpression(LOWEST)
735→
736→ // Check for sequential send
737→ for p.peekTokenIs(lexer.SEMICOLON) {
738→ p.nextToken() // move to ;
739→ p.nextToken() // move past ;
740→ val := p.parseExpression(LOWEST)
741→ stmt.Values = append(stmt.Values, val)
742→ }
743→
744→ return stmt
745→ }
746→
747→ if p.peekTokenIs(lexer.RECEIVE) {
748→ // Indexed channel receive: cs[i] ? x or cs[i] ? CASE ...
749→ p.nextToken() // move to ?Now I have a clear picture. Since the lexer already handles dots as part of identifiers, the parser and codegen should already work with dotted protocol tags. The question is whether there are any edge cases or if this simply needs tests.
Let me look at existing protocol tests to understand the testing patterns.
{
"pattern": "PROTOCOL.*CASE|variant.*protocol|TestVariant|TestProtocol",
"path": "/home/david/projects/code-associates/occam2go/codegen",
"output_mode": "content",
"-n": true
}{
"pattern": "TestProtocol|TestVariant",
"path": "/home/david/projects/code-associates/occam2go/parser",
"output_mode": "content",
"-n": true
}{
"file_path": "/home/david/projects/code-associates/occam2go/parser/parser_test.go",
"offset": 1629,
"limit": 80
} 1629→func TestVariantProtocolDecl(t *testing.T) {
1630→ input := `PROTOCOL MSG
1631→ CASE
1632→ text; INT
1633→ number; INT; INT
1634→ quit
1635→`
1636→ l := lexer.New(input)
1637→ p := New(l)
1638→ program := p.ParseProgram()
1639→ checkParserErrors(t, p)
1640→
1641→ if len(program.Statements) != 1 {
1642→ t.Fatalf("expected 1 statement, got %d", len(program.Statements))
1643→ }
1644→
1645→ proto, ok := program.Statements[0].(*ast.ProtocolDecl)
1646→ if !ok {
1647→ t.Fatalf("expected ProtocolDecl, got %T", program.Statements[0])
1648→ }
1649→
1650→ if proto.Name != "MSG" {
1651→ t.Errorf("expected name 'MSG', got %s", proto.Name)
1652→ }
1653→
1654→ if proto.Kind != "variant" {
1655→ t.Errorf("expected kind 'variant', got %s", proto.Kind)
1656→ }
1657→
1658→ if len(proto.Variants) != 3 {
1659→ t.Fatalf("expected 3 variants, got %d", len(proto.Variants))
1660→ }
1661→
1662→ // text; INT
1663→ if proto.Variants[0].Tag != "text" {
1664→ t.Errorf("expected tag 'text', got %s", proto.Variants[0].Tag)
1665→ }
1666→ if len(proto.Variants[0].Types) != 1 || proto.Variants[0].Types[0] != "INT" {
1667→ t.Errorf("expected types [INT] for text, got %v", proto.Variants[0].Types)
1668→ }
1669→
1670→ // number; INT; INT
1671→ if proto.Variants[1].Tag != "number" {
1672→ t.Errorf("expected tag 'number', got %s", proto.Variants[1].Tag)
1673→ }
1674→ if len(proto.Variants[1].Types) != 2 {
1675→ t.Errorf("expected 2 types for number, got %d", len(proto.Variants[1].Types))
1676→ }
1677→
1678→ // quit (no payload)
1679→ if proto.Variants[2].Tag != "quit" {
1680→ t.Errorf("expected tag 'quit', got %s", proto.Variants[2].Tag)
1681→ }
1682→ if len(proto.Variants[2].Types) != 0 {
1683→ t.Errorf("expected 0 types for quit, got %d", len(proto.Variants[2].Types))
1684→ }
1685→}
1686→
1687→func TestChanDeclWithProtocol(t *testing.T) {
1688→ input := `PROTOCOL SIGNAL IS INT
1689→CHAN OF SIGNAL c:
1690→`
1691→ l := lexer.New(input)
1692→ p := New(l)
1693→ program := p.ParseProgram()
1694→ checkParserErrors(t, p)
1695→
1696→ if len(program.Statements) != 2 {
1697→ t.Fatalf("expected 2 statements, got %d", len(program.Statements))
1698→ }
1699→
1700→ chanDecl, ok := program.Statements[1].(*ast.ChanDecl)
1701→ if !ok {
1702→ t.Fatalf("expected ChanDecl, got %T", program.Statements[1])
1703→ }
1704→
1705→ if chanDecl.ElemType != "SIGNAL" {
1706→ t.Errorf("expected element type 'SIGNAL', got %s", chanDecl.ElemType)
1707→ }
1708→{
"file_path": "/home/david/projects/code-associates/occam2go/codegen/codegen_test.go",
"offset": 504,
"limit": 80
} 504→func TestVariantProtocolType(t *testing.T) {
505→ input := `PROTOCOL MSG
506→ CASE
507→ text; INT
508→ quit
509→`
510→ output := transpile(t, input)
511→
512→ if !strings.Contains(output, "type _proto_MSG interface {") {
513→ t.Errorf("expected interface declaration in output, got:\n%s", output)
514→ }
515→ if !strings.Contains(output, "_is_MSG()") {
516→ t.Errorf("expected marker method in output, got:\n%s", output)
517→ }
518→ if !strings.Contains(output, "type _proto_MSG_text struct {") {
519→ t.Errorf("expected text struct in output, got:\n%s", output)
520→ }
521→ if !strings.Contains(output, "type _proto_MSG_quit struct{}") {
522→ t.Errorf("expected quit struct in output, got:\n%s", output)
523→ }
524→}
525→
526→func TestRecordType(t *testing.T) {
527→ input := `RECORD POINT
528→ INT x:
529→ INT y:
530→`
531→ output := transpile(t, input)
532→
533→ if !strings.Contains(output, "type POINT struct {") {
534→ t.Errorf("expected 'type POINT struct {' in output, got:\n%s", output)
535→ }
536→ if !strings.Contains(output, "x int") {
537→ t.Errorf("expected 'x int' field in output, got:\n%s", output)
538→ }
539→ if !strings.Contains(output, "y int") {
540→ t.Errorf("expected 'y int' field in output, got:\n%s", output)
541→ }
542→}
543→
544→func TestRecordFieldAssignmentCodegen(t *testing.T) {
545→ input := `RECORD POINT
546→ INT x:
547→ INT y:
548→SEQ
549→ POINT p:
550→ p[x] := 5
551→`
552→ output := transpile(t, input)
553→
554→ if !strings.Contains(output, "p.x = 5") {
555→ t.Errorf("expected 'p.x = 5' in output, got:\n%s", output)
556→ }
557→}
558→
559→func TestChanArrayDeclGen(t *testing.T) {
560→ input := `[5]CHAN OF INT cs:
561→`
562→ output := transpile(t, input)
563→
564→ if !strings.Contains(output, "cs := make([]chan int, 5)") {
565→ t.Errorf("expected 'cs := make([]chan int, 5)' in output, got:\n%s", output)
566→ }
567→ if !strings.Contains(output, "for _i := range cs { cs[_i] = make(chan int) }") {
568→ t.Errorf("expected init loop in output, got:\n%s", output)
569→ }
570→}
571→
572→func TestIndexedSendGen(t *testing.T) {
573→ input := `cs[0] ! 42
574→`
575→ output := transpile(t, input)
576→
577→ if !strings.Contains(output, "cs[0] <- 42") {
578→ t.Errorf("expected 'cs[0] <- 42' in output, got:\n%s", output)
579→ }
580→}
581→
582→func TestIndexedReceiveGen(t *testing.T) {
583→ input := `cs[0] ? xLet me also look at the existing e2e tests for protocols to understand the test patterns.
{
"pattern": "variant|PROTOCOL.*CASE",
"path": "/home/david/projects/code-associates/occam2go/codegen/e2e_test.go",
"output_mode": "content",
"-n": true,
"context": 3
}<tool_use_error>Path does not exist: /home/david/projects/code-associates/occam2go/codegen/e2e_test.go. Note: your current working directory is /home/david/projects/code-associates/occam2go.</tool_use_error>
{
"pattern": "TestE2E",
"path": "/home/david/projects/code-associates/occam2go/codegen",
"output_mode": "files_with_matches"
}Found 17 files codegen/e2e_array_test.go codegen/e2e_types_test.go codegen/e2e_concurrency_test.go codegen/e2e_intrinsics_test.go codegen/e2e_misc_test.go codegen/e2e_params_test.go codegen/e2e_retypes_test.go codegen/e2e_strings_test.go codegen/e2e_course_test.go codegen/e2e_phase2_test.go codegen/e2e_control_test.go codegen/e2e_basic_test.go codegen/e2e_replicator_test.go codegen/e2e_proc_func_test.go codegen/e2e_preproc_test.go codegen/e2e_protocol_test.go codegen/e2e_record_test.go
{
"file_path": "/home/david/projects/code-associates/occam2go/codegen/e2e_protocol_test.go"
} 1→package codegen
2→
3→import "testing"
4→
5→func TestE2E_SimpleProtocol(t *testing.T) {
6→ // Simple protocol: just a named type alias
7→ occam := `PROTOCOL SIGNAL IS INT
8→
9→SEQ
10→ CHAN OF SIGNAL c:
11→ INT result:
12→ PAR
13→ c ! 42
14→ c ? result
15→ print.int(result)
16→`
17→ output := transpileCompileRun(t, occam)
18→ expected := "42\n"
19→ if output != expected {
20→ t.Errorf("expected %q, got %q", expected, output)
21→ }
22→}
23→
24→func TestE2E_SequentialProtocol(t *testing.T) {
25→ // Sequential protocol: send/receive multiple values
26→ occam := `PROTOCOL PAIR IS INT ; INT
27→
28→SEQ
29→ CHAN OF PAIR c:
30→ INT x, y:
31→ PAR
32→ c ! 10 ; 20
33→ c ? x ; y
34→ print.int(x)
35→ print.int(y)
36→`
37→ output := transpileCompileRun(t, occam)
38→ expected := "10\n20\n"
39→ if output != expected {
40→ t.Errorf("expected %q, got %q", expected, output)
41→ }
42→}
43→
44→func TestE2E_VariantProtocol(t *testing.T) {
45→ // Variant protocol: tagged union with CASE receive
46→ occam := `PROTOCOL MSG
47→ CASE
48→ data; INT
49→ quit
50→
51→SEQ
52→ CHAN OF MSG c:
53→ INT result:
54→ result := 0
55→ PAR
56→ c ! data ; 42
57→ c ? CASE
58→ data ; result
59→ print.int(result)
60→ quit
61→ print.int(0)
62→`
63→ output := transpileCompileRun(t, occam)
64→ expected := "42\n"
65→ if output != expected {
66→ t.Errorf("expected %q, got %q", expected, output)
67→ }
68→}
69→
70→func TestE2E_VariantProtocolNoPayload(t *testing.T) {
71→ // Variant protocol with no-payload tag
72→ occam := `PROTOCOL MSG
73→ CASE
74→ data; INT
75→ quit
76→
77→SEQ
78→ CHAN OF MSG c:
79→ INT result:
80→ result := 0
81→ PAR
82→ c ! quit
83→ c ? CASE
84→ data ; result
85→ print.int(result)
86→ quit
87→ print.int(99)
88→`
89→ output := transpileCompileRun(t, occam)
90→ expected := "99\n"
91→ if output != expected {
92→ t.Errorf("expected %q, got %q", expected, output)
93→ }
94→}
95→
96→func TestE2E_ProtocolWithProc(t *testing.T) {
97→ // Protocol channel passed as PROC parameter
98→ occam := `PROTOCOL PAIR IS INT ; INT
99→
100→PROC sender(CHAN OF PAIR out)
101→ out ! 3 ; 7
102→
103→SEQ
104→ CHAN OF PAIR c:
105→ INT a, b:
106→ PAR
107→ sender(c)
108→ c ? a ; b
109→ print.int(a + b)
110→`
111→ output := transpileCompileRun(t, occam)
112→ expected := "10\n"
113→ if output != expected {
114→ t.Errorf("expected %q, got %q", expected, output)
115→ }
116→}
117→Now I have a complete picture. The code already supports dotted identifiers at the lexer level (dots are part of IDENT tokens). Let me verify my understanding:
bar.data as a single IDENT tokenProtocolVariant.Tag and Send.VariantTaggoIdent() on tag names, converting dots to underscoresWhat I should do: 1. Add tests with dotted protocol tags (parser unit test, codegen unit test, e2e test) 2. Verify it actually works by running the tests
I'm fairly confident this already works, so I'll write the tests to verify and document the behavior. I should also check if there's a TODO.md file that needs updating.
# Plan: Support dotted protocol tags (Issue #68)
## Context
Occam variant protocol tags can contain dots (e.g., `bar.data`, `bar.terminate`, `bar.blank`), as seen in real-world code like `kroc/modules/course/examples/bar.occ`. The issue requests support for this.
After investigation, the lexer already handles dots as part of identifiers (`lexer.go:280` — `readIdentifier()` loops while `isLetter || isDigit || ch == '.'`), so `bar.data` is already a single IDENT token. The codegen already calls `goIdent()` on tag names, converting dots to underscores. The core functionality likely already works, but has no test coverage.
## Changes
### 1. Add parser unit test for dotted variant protocol tags
**File:** `parser/parser_test.go`
Add `TestVariantProtocolDeclDottedTags` — a protocol with dotted tag names (`bar.data ; INT`, `bar.terminate`, `bar.blank ; INT`). Verify each `ProtocolVariant.Tag` preserves the dotted name (e.g., `"bar.data"`).
### 2. Add codegen unit test for dotted variant protocol types
**File:** `codegen/codegen_test.go`
Add `TestVariantProtocolDottedTags` — verify generated Go output contains underscore-converted names:
- `type _proto_BAR_PROTO_bar_data struct { ... }`
- `type _proto_BAR_PROTO_bar_terminate struct{}`
- etc.
### 3. Add e2e test for dotted protocol tags (send + receive)
**File:** `codegen/e2e_protocol_test.go`
Add `TestE2E_VariantProtocolDottedTags` — a full transpile→compile→run test:
- Protocol with dotted tags (with and without payloads)
- Variant send with dotted tag + payload (`c ! bar.data ; 42`)
- Variant send with dotted tag + no payload (`c ! bar.blank`)
- Variant receive with dotted tags in CASE arms
- Verify expected output
### 4. Fix any issues found during testing
If the tests reveal bugs, fix them in the relevant parser/codegen code.
### 5. Update TODO.md
Document that dotted protocol tags are now supported.
## Verification
```bash
go test ./parser -run TestVariantProtocolDeclDottedTags -v
go test ./codegen -run TestVariantProtocolDottedTags -v
go test ./codegen -run TestE2E_VariantProtocolDottedTags -v
go test ./... # full suite to check for regressions
```
{
"allowedPrompts": [
{
"tool": "Bash",
"prompt": "run tests"
}
],
"plan": "# Plan: Support dotted protocol tags (Issue #68)\n\n## Context\n\nOccam variant protocol tags can contain dots (e.g., `bar.data`, `bar.terminate`, `bar.blank`), as seen in real-world code like `kroc/modules/course/examples/bar.occ`. The issue requests support for this.\n\nAfter investigation, the lexer already handles dots as part of identifiers (`lexer.go:280` — `readIdentifier()` loops while `isLetter || isDigit || ch == '.'`), so `bar.data` is already a single IDENT token. The codegen already calls `goIdent()` on tag names, converting dots to underscores. The core functionality likely already works, but has no test coverage.\n\n## Changes\n\n### 1. Add parser unit test for dotted variant protocol tags\n**File:** `parser/parser_test.go`\n\nAdd `TestVariantProtocolDeclDottedTags` — a protocol with dotted tag names (`bar.data ; INT`, `bar.terminate`, `bar.blank ; INT`). Verify each `ProtocolVariant.Tag` preserves the dotted name (e.g., `\"bar.data\"`).\n\n### 2. Add codegen unit test for dotted variant protocol types\n**File:** `codegen/codegen_test.go`\n\nAdd `TestVariantProtocolDottedTags` — verify generated Go output contains underscore-converted names:\n- `type _proto_BAR_PROTO_bar_data struct { ... }`\n- `type _proto_BAR_PROTO_bar_terminate struct{}`\n- etc.\n\n### 3. Add e2e test for dotted protocol tags (send + receive)\n**File:** `codegen/e2e_protocol_test.go`\n\nAdd `TestE2E_VariantProtocolDottedTags` — a full transpile→compile→run test:\n- Protocol with dotted tags (with and without payloads)\n- Variant send with dotted tag + payload (`c ! bar.data ; 42`)\n- Variant send with dotted tag + no payload (`c ! bar.blank`)\n- Variant receive with dotted tags in CASE arms\n- Verify expected output\n\n### 4. Fix any issues found during testing\n\nIf the tests reveal bugs, fix them in the relevant parser/codegen code.\n\n### 5. Update TODO.md\n\nDocument that dotted protocol tags are now supported.\n\n## Verification\n\n```bash\ngo test ./parser -run TestVariantProtocolDeclDottedTags -v\ngo test ./codegen -run TestVariantProtocolDottedTags -v\ngo test ./codegen -run TestE2E_VariantProtocolDottedTags -v\ngo test ./... # full suite to check for regressions\n```\n"
}The user doesn't want to proceed with this tool use. The tool use was rejected (eg. if it was a file edit, the new_string was NOT written to the file). STOP what you are doing and wait for the user to tell you how to proceed.