- Dec 2023
-
eddieantonio.ca eddieantonio.ca
-
Python is both a compiled and interpreted language
The CPython interpreter really is an interpreter. But it also is a compiler. Python must go through a few stages before ever running the first line of code:
- scanning
- parsing
Older versions of Python added an additional stage:
- scanning
- parsing
- checking for valid assignment targets
Let’s compare this to the stages of compiling a C program:
- ~~preprocessing~~
- lexical analysis (another term for “scanning”)
- syntactic analysis (another term for “parsing”)
- ~~semantic analysis~~
- ~~linking~~
-
-
mathspp.com mathspp.com
-
The tokenizer takes your source code and chunks it into “tokens”. Tokens are just small pieces of source code that you can identify in isolation. As examples, there will be tokens for numbers, mathematical operators, variable names, and keywords (like if or for). The parser will take that linear sequence of tokens and essentially reshape them into a tree structure (that's what the T in AST stands for: Tree). This tree is what gives meaning to your tokens, providing a nice structure that is easier to reason about and work on. As soon as we have that tree structure, our compiler can go over the tree and figure out what bytecode instructions represent the code in the tree. For example, if part of the tree represents a function, we may need a bytecode for the return statement of that function. Finally, the interpreter takes those bytecode instructions and executes them, producing the results of our original program.
-
Recap
In this article you started implementing your own version of Python. To do so, you needed to create four main components:
A tokenizer: * accepts strings as input (supposedly, source code); * chunks the input into atomic pieces called tokens; * produces tokens regardless of their sequence making sense or not.
A parser: * accepts tokens as input; * consumes the tokens one at a time, while making sense they come in an order that makes sense; * produces a tree that represents the syntax of the original code.
A compiler: * accepts a tree as input; * traverses the tree to produce bytecode operations.
An interpreter: * accepts bytecode as input; * traverses the bytecode and performs the operation that each one represents; * uses a stack to help with the computations.
-
The tokenizer
The tokenizer is the part of your program that accepts the source code and produces a linear sequence of tokens – bits of source code that you identify as being relevant.
-
The four parts of our program
- Tokenizer takes source code as input and produces tokens;
- Parser takes tokens as input and produces an AST;
- Compiler takes an AST as input and produces bytecode;
- Interpreter takes bytecode as input and produces program results.
-