基于JAVA实现的Tiger编译器

Sunshine

发布日期: 2019-05-05 15:14:41 浏览量: 1378
评分:
star star star star star star star star star star_border
*转载请注明来自write-bug.com

一、项目要求及完成情况

  • 正确的词法/语法分析,给出对应的文法文件

  • 输出正确的抽象语法树(testcases中50个测试文件,testcase-result中对应的抽象语法树)

  • 错误处理功能

    • 提示错误类型(词法错误、语法错误、语义错误等)、出错位置等(词法错误,语法错误ANTLR默认行为,部分语义错误:变量作用域及基本类型检查)
    • 错误修复(ANTLR默认行为 single-token insertion and single-token deletion)
  • 发挥想象力,使编译器尽善尽美(实现了变量作用域检查和基本类型检查)

  • Project文档

    • 使用语法(文法类型,有无改动语法、如何改动等),错误处理说明等(对实现的变量作用变量作用域检查和基本类型检查的错误处理代码进行了介绍)
    • 对本项目语法的使用工具的体会(ANTLR简述部分)

二、项目总览

使用ANTLR工具,为Tiger构造一个编译器前端,将输入的Tigger语言转化为抽象语法树,实现了变量作用域检查和简单的Type Checking。

三、ANTLR简述

ANTLR(ANother Tool for language Recognition),作者Terence Parr(University of San Francisco)。

ANTLR是一个可以接受含有语法描述的语言描述符并且生成程序能够识别这些语言所产生的句子。作为一个翻译程序的 一部分,你可以给你的语法附上简单的操作符和行为并且告诉ANTLR如何构造AST并且如何输出它们。ANTLR知道如何使用Java,C++,C#或者Python来生成它们。
注意到词法错误和语法错误ANTLR有默认的处理行为。

Here is how ANTLR uses those ideas together in a nutshell: parsers perform single-token insertion and single-token deletion upon mismatched token errors if possible. If not, parsers gobble up tokens until they find a token that could reasonably follow the current rule and then return, continuing as if nothing had happened.

如上图所示,把源文件转化成AST,需要一个Lexer和Parser。Lexer把源文件读入,分成一个个token。然后Parser读入Lexer产生的token生成AST。在ANTLR提供了Lisenter和Visitor两种方式来遍历抽象语法树。本项目利用这些接口来实现变量声明检查。和基本的类型检查。

四、AST的生成

ANTLR是一个比较成熟的工具,The Definitive ANTLR 4 Reference 详细介绍了工具的使用说明。为了生成AST,需要完成grammer定义文件Tiger.g4 下面定义的语法,基本与官方语法文档指定的语法相同。 为了简化之后的Type Checking,把Ojbect相关的new class等语法去掉了。

  1. java org.antlr.v4.Tool Tiger.g4

然后会产生TigerLexer.java, TigerParser.java, Tiger.tokens , TiegerLexder.tokens, TigerListener.java, TigerBaseLisetener.java。

其中TigerLexer.java是ANTLR根据g4为我生成的Lexer,TigerParser.java是对应的Parser,TigerListener.java是前面提到的Listener方式的抽象语法树的接口,TigerBaseListener.java是这个接口的一个基本实现。我们的实现通过继承TigerBaseListener来重载中间的放来实现期望的行为。

  1. java org.antlr.v4.runtime.misc.TestRig Tiger program -tree test.in # 输出文本信息
  2. java org.antlr.v4.runtime.misc.TestRig Tiger program -gui test.in # 生成图形

4.1 TigerLexer.java节选

  1. // Generated from Tiger.g4 by ANTLR 4.2.1
  2. import org.antlr.v4.runtime.Lexer;
  3. import org.antlr.v4.runtime.CharStream;
  4. import org.antlr.v4.runtime.Token;
  5. import org.antlr.v4.runtime.TokenStream;
  6. import org.antlr.v4.runtime.*;
  7. import org.antlr.v4.runtime.atn.*;
  8. import org.antlr.v4.runtime.dfa.DFA;
  9. import org.antlr.v4.runtime.misc.*;
  10. @SuppressWarnings({"all", "warnings", "unchecked", "unused", "cast"})
  11. public class TigerLexer extends Lexer {
  12. protected static final DFA[] _decisionToDFA;
  13. protected static final PredictionContextCache _sharedContextCache = new PredictionContextCache();
  14. public static final int T__39=1, T__38=2, T__37=3, T__36=4, T__35=5, T__34=6, T__33=7, T__32=8, T__31=9, T__30=10, T__29=11, T__28=12, T__27=13, T__26=14, T__25=15, T__24=16, T__23=17, T__22=18, T__21=19, T__20=20, T__19=21, T__18=22, T__17=23, T__16=24, T__15=25, T__14=26, T__13=27, T__12=28, T__11=29, T__10=30, T__9=31, T__8=32, T__7=33, T__6=34, T__5=35, T__4=36, T__3=37, T__2=38, T__1=39, T__0=40, INTEGER=41, STRING=42, ID=43, COMMENT=44, LINE_COMMENT=45, WS=46;
  15. public static String[] modeNames = {
  16. "DEFAULT_MODE"
  17. };
  18. ...

4.2 TigerParser.java节选

  1. // Generated from Tiger.g4 by ANTLR 4.2.1
  2. import org.antlr.v4.runtime.atn.*;
  3. import org.antlr.v4.runtime.dfa.DFA;
  4. import org.antlr.v4.runtime.*;
  5. import org.antlr.v4.runtime.misc.*;
  6. import org.antlr.v4.runtime.tree.*;
  7. import java.util.List;
  8. import java.util.Iterator;
  9. import java.util.ArrayList;
  10. @SuppressWarnings({"all", "warnings", "unchecked", "unused", "cast"})
  11. public class TigerParser extends Parser {
  12. protected static final DFA[] _decisionToDFA;
  13. protected static final PredictionContextCache _sharedContextCache = new PredictionContextCache();
  14. public static final int
  15. T__39=1, T__38=2, T__37=3, T__36=4, T__35=5, T__34=6, T__33=7, T__32=8, T__31=9, T__30=10, T__29=11, T__28=12, T__27=13, T__26=14, T__25=15, T__24=16, T__23=17, T__22=18, T__21=19, T__20=20, T__19=21, T__18=22, T__17=23, T__16=24, T__15=25, T__14=26, T__13=27, T__12=28, T__11=29, T__10=30, T__9=31, T__8=32, T__7=33, T__6=34, T__5=35, T__4=36, T__3=37, T__2=38, T__1=39, T__0=40, INTEGER=41, STRING=42, ID=43, COMMENT=44, LINE_COMMENT=45, WS=46;
  16. public static final String[] tokenNames = {
  17. "<INVALID>", "']'", "'&'", "'in'", "'of'", "','", "'['", "'-'", "'*'", "'while'", "'('", "':'", "'<'", "'<='", "'var'", "'array'", "'nil'", "'to'", "'{'", "'break'", "'let'", "'else'", "'}'", "'do'", "')'", "'function'", "'.'", "'+'", "'for'", "'<>'", "'='", "';'", "'if '", "'>'", "'type'", "':='", "'then'", "'/'", "'>='", "'|'", "'end'", "INTEGER", "STRING", "ID", "COMMENT", "LINE_COMMENT", "WS" };
  18. ...

4.3 Tiger.g4语法文件

  1. /**
  2. May,28 2014
  3. author: whimsycwd
  4. compiler project
  5. */
  6. grammar Tiger;
  7. program : exp
  8. | decs
  9. ;
  10. exp : 'nil' #Nil
  11. | INTEGER #Integer
  12. | STRING #String
  13. | type_id '[' exp ']' 'of' exp #Array
  14. | type_id '{' ( ID '=' exp (',' ID '=' exp )* )? '}' #Record
  15. //| 'new' type_id #New
  16. | lvalue #LeftValue
  17. | ID '(' ( exp (',' exp)*)? ')' #Call
  18. | '-' exp #UnaryMinus
  19. | exp ('*' | '/') exp #Mul
  20. | exp ('+' | '-') exp #Add
  21. | exp ('<>' | '=' | '>=' | '<=' | '>' | '<') exp #Cmp
  22. | exp ('&' | '|') exp #Logical
  23. | '(' exps ')' #ParenExprs
  24. | lvalue ':=' exp #Assign
  25. | 'if ' exp 'then' exp ('else' exp)? #IfStmt
  26. | 'while' exp 'do' exp #WhileStmt
  27. | 'for' ID ':=' exp 'to' exp 'do' exp #ForStmt
  28. | 'break' #Break
  29. | 'let' decs 'in' exps 'end' #LET
  30. ;
  31. exps : ( exp (';' exp)* )?;
  32. decs : dec*;
  33. dec : 'type' ID '=' ty #TypeDec
  34. | vardec #VarDecNothing
  35. | 'function' ID '(' tyfields? ')' (':' type_id)? '=' exp #FuncDec
  36. //| 'primitive' ID '(' tyfields? ')' (':' type_id)?
  37. //| 'import' STRING
  38. //| 'class' ID ('extends' type_id)? '{' classfields '}'
  39. ;
  40. vardec : 'var' ID (':' type_id)? ':=' exp #VarDecInner
  41. ;
  42. //classfields : classfield*
  43. // ;
  44. //classfield : vardec
  45. // | 'method' ID '(' tyfields ')' (':' type_id) '=' exp
  46. // ;
  47. ty : type_id
  48. | '{' tyfields? '}'
  49. | 'array' 'of' type_id
  50. // | 'class' ('extends' type_id)? '{' classfields '}'
  51. ;
  52. tyfields : ( ID ':' type_id (',' ID ':' type_id)*);
  53. type_id : ID;
  54. lvalue : ID #SimpleVar
  55. | lvalue '.' ID #DotVar
  56. | lvalue '[' exp ']' #BracketVar
  57. ;
  58. INTEGER : DIGIT+;
  59. STRING : '"' (ESC | .)*? '"';
  60. ID : LETTER (LETTER | DIGIT) *;
  61. fragment
  62. ESC : '\\' [btnr"\\];
  63. fragment
  64. LETTER : [a-zA-Z] | '_';
  65. fragment
  66. DIGIT : [0-9];
  67. COMMENT : '/*' .*? '*/' ->skip;
  68. LINE_COMMENT : '//' .*? '\n' ->skip;
  69. WS : [ \n\r\t]+ ->skip;

五、AST测试

5.1 配置测试环境 config.profile

  1. BASE="/Users/whimsycwd/Repo/ANTLR/Tiger/"
  2. export CLASSPATH=".:./lib/antlr-4.2.1-complete.jar:${BASE}lib/antlr-4.2.1-complete.jar:${BASE}classes"
  3. alias antlr4='java org.antlr.v4.Tool'
  4. alias grun='java org.antlr.v4.runtime.misc.TestRig'
  5. alias runTiger='grun Tiger program'

上述代码为Terminal配置了环境变量。只需改变BASE到实际的地址。在Terminal中source config.profile,就可以使用antlr4,grun,runTiger的指令的简称。

5.2 下载测试代码 fetch.sh

  1. URL=http://www.computing.dcu.ie/~hamilton/teaching/CA448/testcases/
  2. rm *.tig
  3. for i in {1..49}
  4. do
  5. curl -O "${URL}test${i}.tig"
  6. done
  7. for name in 'queens.tig' 'merge.tig'
  8. do
  9. curl -O "${URL}$name"
  10. done

新建文件夹testcases, 在该目录下执行上述的代码可以将测试文件批量下载下来。

5.3 编译运行 run.sh

  1. # clean the generated file and recompile
  2. # May 31,2014 whimsycwd
  3. #source config.profile
  4. rm Tiger*.java 2>/dev/null
  5. #rm Tiger*.class 2>/dev/null
  6. rm Tiger*.tokens 2>/dev/null
  7. rm ./classes/* 2>/dev/null
  8. echo "File deleted"
  9. echo "ANTLR Generating"
  10. java org.antlr.v4.Tool Tiger.g4
  11. echo "Compile Start"
  12. #javac Tiger*.java -d ./classes/
  13. javac *.java -d ./classes/

5.4 输出测试AST结果 runAllTest.sh

  1. # RUN ALL TEST
  2. # May 31, 2014 whimsycwd
  3. echo "deleting testcases-result"
  4. rm testcases-result/*
  5. for i in {1..49}
  6. do
  7. echo "test${i}.tig is processing"
  8. java org.antlr.v4.runtime.misc.TestRig Tiger program -tree "testcases/test${i}.tig" > "testcases-result/test${i}.tig.res"
  9. done
  10. for name in 'queens.tig' 'merge.tig'
  11. do
  12. echo "${name} is processing"
  13. java org.antlr.v4.runtime.misc.TestRig Tiger program -tree "testcases/${name}" > "testcases-result/${name}.res"
  14. done

新建testcases-result,利用上述代码能够批量输出测试代码到testcases-result/。

5.5 结果示例 1 testcases/test1.tig

  1. /* an array type and an array variable */
  2. let
  3. type arrtype = array of int
  4. var arr1:arrtype := arrtype [10] of 0
  5. in
  6. arr1
  7. end

代码输出了AST结果文件:testcases-result/test.tig.res。

  1. (program (exp let (decs (dec type arrtype = (ty array of (type_id int))) (dec (vardec var arr1 : (type_id arrtype) := (exp (type_id arrtype) [ (exp 10) ] of (exp 0))))) in (exps (exp (lvalue arr1))) end))

对应的图形化的结果为:

5.6 结果示例 2 mytest/error.in 错误提示和行号

  1. let
  2. var a:== 1
  3. var b:= 2
  4. in
  5. let
  6. var c := 3
  7. in
  8. d
  9. end
  10. end

代码输出结果

  1. whimsyMini:Tiger whimsycwd$ runTiger -tree error.in
  2. line 2:11 no viable alternative at input '='
  3. (program (exp let (decs (dec (vardec var a := (exp = 1))) (dec (vardec var b := (exp 2)))) in (exps (exp let (decs (dec (vardec var c := (exp 3)))) in (exps (exp (lvalue d))) end)) end))

对应的图形化的结果,追到错误的地方,被高亮出来了,并且跳过继续解析。

六、变量作用域检查

为了实现变量声明的检查,两次遍历抽象语法树,第一次遍历抽象语法树,简历变量表,注意在遍历的过程中,还要考虑作用域,进入一个作用域的时候需要创建新的作用域。作用域构成一个链表,resolve一个变量的时候需要跟往上层去查找。

6.1 CheckSymbols.java

  1. ...
  2. public void process(String[] args) throws Exception {
  3. String inputFile = null;
  4. if ( args.length>0 ) inputFile = args[0];
  5. InputStream is = System.in;
  6. if ( inputFile!=null ) {
  7. is = new FileInputStream(inputFile);
  8. }
  9. ANTLRInputStream input = new ANTLRInputStream(is);
  10. TigerLexer lexer = new TigerLexer(input);
  11. CommonTokenStream tokens = new CommonTokenStream(lexer);
  12. TigerParser parser = new TigerParser(tokens);
  13. parser.setBuildParseTree(true);
  14. ParseTree tree = parser.program();
  15. ParseTreeWalker walker = new ParseTreeWalker();
  16. DefPhase def = new DefPhase();
  17. walker.walk(def, tree);
  18. // create next phase and feed symbol table info from def to ref phase
  19. RefPhase ref = new RefPhase(def.globals, def.scopes);
  20. walker.walk(ref, tree);
  21. }
  22. ...

具体的代码 CheckSymbols.java、DefPhase.java、Scope.java、BaseScope.java、GlobalScope.java、LocalScope.java、RefPhase.java。

第一阶段DefPhase,如下所示,比如在进入Function的时候currentScope = new LocalScope(currentScope); 新建一个作用域。在退出Function的时候 currentScope = currentScope.getEnclosingScope(); 往链表上层去找。

6.2 DefPhase.java

  1. ...
  2. public void enterFuncDec(TigerParser.FuncDecContext ctx){
  3. currentScope = new LocalScope(currentScope);
  4. TigerParser.TyfieldsContext tyfields = ctx.tyfields();
  5. if (tyfields != null){
  6. List<TerminalNode> list = tyfields.ID();
  7. for (TerminalNode e : list){
  8. currentScope.define(new VariableSymbol( e.getSymbol().getText() ) ); // 把变量添加到当前作用域
  9. }
  10. }
  11. saveScope(ctx, currentScope);
  12. }
  13. public void exitFuncDec(TigerParser.FuncDecContext ctx){
  14. currentScope = currentScope.getEnclosingScope(); // 退出作用域
  15. }
  16. ...

检查变量varName是否在存在。解析变量代码如下。

6.3 BaseScope.java

  1. public Symbol resolve(String name) {
  2. Symbol s = symbols.get(name);
  3. if ( s!=null ) return s;
  4. if ( enclosingScope != null ) return enclosingScope.resolve(name);
  5. return null; // not found
  6. }

在第二阶段RefPhase当变量解析不到该变量,输出该类型的错误。

6.4 RefPhase.java

  1. ...
  2. public void exitSimpleVar(TigerParser.SimpleVarContext ctx){
  3. String name = ctx.ID().getSymbol().getText();
  4. Symbol var = currentScope.resolve(name);
  5. if (var == null){
  6. CheckSymbols.error(ctx.ID().getSymbol(),"no such variable: " + name);
  7. }
  8. ...

七、变量声明测试

7.1 结果示例 mytest/test.in

  1. let
  2. var a:= 1
  3. var b:= 2
  4. in
  5. let
  6. var c := 3
  7. in
  8. d
  9. end
  10. end

注意到这个代码中 d并找不到此变量。可得到输出结果

  1. whimsyMini:Tiger whimsycwd$ java CheckSymbols mytest/test.in
  2. line 8:8 no such variable: d

八、基本的类型检查

为了实现基本的类型检查,对于每一个抽象语法树上的节点都需要给他一个类型。定义了一个AST节点到Type的映射typeSys.

  1. 'for' ID ':=' exp 'to' exp 'do' exp

下面的代检查了三个exp的类型。 exp1 和 exp2必须是Type.tInt。而对于整个语句的类型定义为exp3。

RefPhase.java

  1. ...
  2. Type t1 = typeSys.get(ctx.exp(0));
  3. Type t2 = typeSys.get(ctx.exp(1));
  4. if (t1 != Type.tInt){
  5. CheckSymbols.error( ctx.getStart(), "loop pd expression must be tInt" );
  6. typeSys.put(ctx, Type.tVoid);
  7. }
  8. else {
  9. typeSys.put(ctx, t2);
  10. }
  11. print("In WhileStmt : " + typeSys.get(ctx) );
  12. }
  13. public void exitForStmt(TigerParser.ForStmtContext ctx){
  14. Type t1 = typeSys.get(ctx.exp(0));
  15. Type t2 = typeSys.get(ctx.exp(1));
  16. Type t3 = typeSys.get(ctx.exp(2));
  17. if (t1 != Type.tInt || t2 != Type.tInt){
  18. CheckSymbols.error( ctx.getStart(), "loop start and end must both be tInt" );
  19. typeSys.put(ctx, Type.tVoid);
  20. }
  21. else {
  22. typeSys.put(ctx, t3);
  23. }
  24. print("In ForStmt " + typeSys.get(ctx));
  25. currentScope = currentScope.getEnclosingScope();
  26. }
  27. public void exitLET(TigerParser.LETContext ctx){
  28. typeSys.put(ctx, typeSys.get( ctx.exps() ) );
  29. currentScope = currentScope.getEnclosingScope(); // exit a scope
  30. print("In LET: "+typeSys.get(ctx) );
  31. }
  32. public void exitExps(TigerParser.ExpsContext ctx){
  33. List<TigerParser.ExpContext> list = ctx.exp();
  34. typeSys.put(ctx, typeSys.get( list.get(list.size() -1 ) ) );
  35. print("In exps: "+ typeSys.get(ctx) );
  36. }
  37. ...

九、基本的类型检查测试

结果示例 mytest/add.in

  1. 1 let
  2. 2 var a := nil
  3. 3 var b := 12
  4. 4 var c := "adfaf"
  5. 5 var d := 133
  6. 6 in
  7. 7 -c;
  8. 8 -b;
  9. 9 b+d;
  10. 10 b-d;
  11. 11 a+c;
  12. 12 b/d;
  13. 13 b*d;
  14. 14 b*(d+b);
  15. 15 d := b+d
  16. 16
  17. 17 end

这个代码的输出结果,发现了其类型的错误。注意到line 7 -c,其中c是一个String错误。line 11 a + c, 只有Int有定义 加法。

  1. whimsyMini:mytest whimsycwd$ java CheckSymbols add.in
  2. line 7:4 Expected - Int, got - tString
  3. line 11:4 Expected Int +/- Int got tVoid +/- tString
上传的附件 cloud_download 基于JAVA实现的Tiger编译器.7z ( 1.60mb, 2次下载 )
error_outline 下载需要8点积分

发送私信

这个世界上我只相信两个人,一个是我,另一个不是你

14
文章数
15
评论数
最近文章
eject