Claude Code transcript

6 prompts · 99 messages · 32 tool calls · 0 commits · 2 pages

#1

The current project is an occam compiler. I'd like you to explain one thing about its functionality: there is an #include keyword allowed in source files input to the compiler. For example you can see this used in line 1 of the file ./modules/course/examples/hello_world.occ. I'd like to learn how that include feature is implemented. I've looked at one of the "modules" (which are directories with occam source files and subdirectories) to see how it might work. For example the "course" module is at ./modules/course. I don't think the include feature simply processes all source files under the module root directory. I suspect it is more complex than that. I see files in the directory structure that might be involved. They look like Python source files. For example I see one at ./modules/course/SConscript . My hunch is that the compiler looks for these files and perhaps executes them to get the actual set of source files to include for a given module, but I'm not sure. Can you look into this and let me know how it works please? By the way the compiler itsef is written in C.

15 read · 9 grep · 2 glob · 1 task

Now I have the complete picture. Here's the explanation:


How #INCLUDE "course.module" Works

The include mechanism has three layers: the C compiler's file resolution, the occbuild Python tool that generates .module files and manages search paths, and the build system (SCons/autoconf) that orchestrates everything.

Layer 1: The C Compiler (occ21)

When the compiler parses #INCLUDE "course.module" in syn2.c:6086 (rfile()), it treats it as a plain source include (LEX_SOURCE mode). It calls open_file() at lex1.c:1234, which calls:

fptr = popen_relative(name, current_fe_data->fe_pathname, NULL, "r", ...);

The second argument, fe_pathname, defaults to the string "ISEARCH" (misc/harness.c:331). This tells popen_relative() (in fnilib/popen.c:130) to read the ISEARCH environment variable, split it into a colon-separated list of directories, and search each one for the file. The compiler itself knows nothing about modules, build systems, or Python scripts -- it just searches directories listed in ISEARCH.

Layer 2: occbuild -- The Module Builder

The real intelligence lives in tools/kroc/occbuild.in, a Python script. It does two critical things:

A) Generates .module files. The make_module() function (occbuild.in:290) creates course.module as a real occam source file on disk. For the course library, the generated file looks roughly like:

-- Include file for "course" module
#IF NOT (DEFINED (COURSE.MODULE))
#DEFINE COURSE.MODULE

#INCLUDE "forall.module"
#INCLUDE "consts.inc"
... toolchain-specific PRAGMA directives ...
#ENDIF

The content is controlled by --need (adds #INCLUDE "X.module" for dependencies), --include (adds custom #INCLUDE lines), and a toolchain-specific callback that adds #PRAGMA directives to link the compiled library.

B) Sets the ISEARCH environment variable. Before invoking occ21, occbuild builds a search path from multiple sources and calls set_isearch() (occbuild.in:260) to export it:

  1. In-tree module directories -- resolved from the in-tree-modules file (see Layer 3)
  2. --search flags -- passed by the build system (occbuild.in:1071)
  3. Existing ISEARCH -- any user-set value is appended (occbuild.in:256)
  4. Toolchain defaults -- the installed library path (occbuild.in:254)

Layer 3: The Build System Wiring

SCons (site_scons/site_tools/occbuild.py): Each INCPATH directory in the SConscript environment is converted to a --search flag for occbuild (line 113-115). So when libsrc/SConscript sets INCPATH='.', the libsrc/ directory ends up in the search path.

Autoconf (m4/occbuild.m4): The OCCAM_PROVIDE macro (occbuild.m4:218) registers modules in a file called $KROC_BUILD_ROOT/in-tree-modules. Each line maps a module name to its directory:

course.module /path/to/modules/course/libsrc
shared_screen.module /path/to/modules/course/libsrc course.module

When occbuild runs with --in-tree, it reads this file to discover where each dependency's .module file lives and adds those directories to the search path.

The Complete Chain

hello_world.occ
  │  #INCLUDE "course.module"
  ▼
Build system (SCons) invokes:
  occbuild --search /path/to/libsrc --need course --object hello_world.occ
  │
  ▼
occbuild:
  1. Builds search_path from --search flags, in-tree-modules, ISEARCH env
  2. Sets ISEARCH="/path/to/libsrc:/path/to/forall:..."
  3. Invokes occ21 (the C compiler)
  │
  ▼
occ21:
  1. Parses #INCLUDE "course.module"
  2. Calls popen_relative("course.module", "ISEARCH", ...)
  3. Reads ISEARCH env var, searches each directory
  4. Finds and opens /path/to/libsrc/course.module
  5. Parses it as occam source (it's a generated .occ-like file)

Key Insight

Your hunch was partly right -- Python scripts are involved, but they don't dynamically discover source files at include-resolution time. Instead, occbuild is invoked at build time to (a) generate the .module file from the library's declared dependencies, and (b) set up the ISEARCH environment so the C compiler can find it. The SConscript files orchestrate which sources go into each library and what flags get passed to occbuild. The compiler itself is just doing a simple path-based file search.

#2

Can you show me what a generated course.module file looks like?

3 read · 2 grep

Now I have all the pieces. For the course library build, the SCons command expands to something like:

occbuild --search . --include consts.inc --library course.lib utils.tce string.tce ...

So: - needs = [] (no --need flags) - includes_before = ["consts.inc"] (from --include consts.inc) - includes_after = [] - std_libs = True - base = "course" - occam_objs is non-empty (the .tce files) - native_objs is empty (no native C objects) - linker_opts = []

Walking through make_module("course", custom) at occbuild.in:290:

guard = "COURSE.MODULE"     # (line 300: ofn="course.module", replace _ with ., upper())
needs = []                  # so do_list on line 311 outputs nothing
includes_before = ["consts.inc"]  # line 312 outputs: #INCLUDE "consts.inc"
custom(f):                  # line 313 — the KRoCToolchain closure at line 473:
  occam_objs != [] → True   → writes: #USE "course.lib"
  occam_objs != [] → True   → writes: #PRAGMA COMMENT ".USELIB course"
                             → writes: -- occbuild:has-native-library
  linker_opts == []          → skips LDFLAGS line
includes_after = []         # line 314 outputs nothing

So the generated course.module file would be:

-- Include file for "course" module
-- Automatically generated by occbuild ...; do not edit

#IF NOT (DEFINED (COURSE.MODULE))
#DEFINE COURSE.MODULE

#INCLUDE "consts.inc"
#USE "course.lib"
#PRAGMA COMMENT ".USELIB course"
-- occbuild:has-native-library

#ENDIF

That's it. It's a small generated occam source fragment that: 1. Has an include guard (#IF NOT DEFINED / #DEFINE) to prevent double-inclusion 2. Includes consts.inc (the constants like NULL, BELL, etc. we saw earlier) 3. #USE "course.lib" — tells the compiler to read the compiled library's type descriptors (this is the TCOFF binary path via open_descfile) 4. #PRAGMA COMMENT ".USELIB course" — embeds a linker directive so the final link step knows to pull in the course native library 5. -- occbuild:has-native-library — a comment marker that occbuild itself scans for when building downstream libraries that depend on course (to know it needs -lcourse linker flags)

#3

Thank you and apologies I hit enter accidentally above. It turns out there are no generated .module files in the git repository so they must be built at install time, or sometime between installation and build time.

Right, they're generated at build time — specifically as part of the library build step. You can see this in site_scons/site_tools/occbuild.py:58-67: the occbuild_library_emitter adds the .module file as a build target alongside the .lib:

def occbuild_library_emitter(target, source, env):
    target_name = str(target[0])
    precomp     = target_name.replace(target[0].suffix, '.precomp')
    module      = target_name.replace(target[0].suffix, '.module')
    ...
    target.append(module)

So when SCons builds course.lib, it also produces course.module as an output artifact in the same step. The dependency graph ensures this happens before any downstream .occ files that #INCLUDE "course.module" are compiled.

#4

Thank you. If I were working on making another occam compiler and didn't want yet to implement all of this, do you think it would be feasible for me to run occbuild once to generate the .module and .lib files, then copy them into my project and have my compiler just to simple path include? Woud that work?

Mostly yes, but with one significant complication. The textual #INCLUDE chain would work fine — .module files are plain occam source, and things like consts.inc are just VAL declarations. Your compiler would need to handle the preprocessor directives (#IF NOT (DEFINED ...), #DEFINE, #ENDIF) but those are straightforward.

The problem is the #USE "course.lib" line inside the generated .module file. That's not a textual include — in occ21 it triggers open_descfile() which reads course.lib as a TCOFF binary containing type descriptors, procedure signatures, channel protocol definitions, etc. That's how the compiler learns the types of everything exported by the library without re-parsing all the source. Implementing a TCOFF reader is a non-trivial piece of work.

A couple of workarounds you could consider:

  1. Replace #USE with source-level declarations. Instead of using the generated .module files as-is, create your own hand-written .module files that #INCLUDE the actual occam source headers (like consts.inc) and contain the PROC / PROTOCOL / CHAN TYPE declarations directly as occam source. Essentially flatten everything into text your compiler can parse.

  2. Start with self-contained programs. For something like hello_world.occ, you could inline the few declarations it actually needs (the VAL BYTE constants, the out.string signature) and skip the module system entirely while you get the core language working.

The .lib files would still be needed later at link time regardless, but that's a separate concern from compilation.

#5

Ok thanks, now I see. I will try the textural include idea, as compilers are now so fast that the performance benefit of the binary precompile approach is probably not necessary. Your idea of inlining the declarations required for simple programs is also something I considered but at present my goal is to compile actual Occam programs from long ago. Those all have #use "course.module" and I'd like to be able to compile them as-is.

That makes sense. So you'd create your own course.module that replaces the #USE "course.lib" line with #INCLUDEs of the actual occam source files — keeping the existing programs completely unmodified.

One thing to be aware of: #USE imports only the interface (procedure signatures, type declarations) without the implementation bodies, whereas #INCLUDE-ing the full .occ source files would pull in the complete procedure bodies. Depending on how your compiler handles separate compilation and linking, you might either:

  • Include the full source (simplest — the library code gets compiled into each program), or
  • Extract just the declarations into a header-like .inc file and include that, then link the library separately

Either way, it should let you compile those old programs with #INCLUDE "course.module" unchanged.

#6

I think that will work as at present my compiler only handles single compilation units anyway. It doesn't understand linking.