Tree-sitter

<colbgcolor=#7d8f30><colcolor=#fff> Tree-sitter

종류	증분 파서 제너레이터
최초 개발자	맥스 브런스펠드(Max Brunsfeld)
안정 버전	v0.25.2
언어	C, Rust
라이선스	MIT 허가서
링크

1. 개요2. 역사3. 특징4. 활용

4.1. 지원 플랫폼4.2. 파생 소프트웨어

5. 외부 링크6. 관련 문서

1. 개요

증분식 파서 제너레이터 구현체.

2. 역사

최초 개발자인 맥스 브런스필드는 2013년 GitHub에 입사해 당시 깃헙의 최대 야심작 중 하나였던 Atom 에디터 팀으로 일하기 시작했다.[z][2] 동시에 2013년 11월 6일 [3] C와 C++로 증분 파싱 시스템인 Tree-sitter를 사이드 프로젝트로 개발하다가[4], Atom이 1.0버전 출시를 준비하는 동안 이를 GitHub의 정식 프로젝트로 포함시킨다.

이후 브런스필드는 2018년 1월 8일 당시 CoffeeScript 기반이었던 Atom의 구문 강조 시스템 및 코드 폴딩 기능에 선택적 Tree-sitter 지원을 추가하는 PR을 작성했고[5] 이는 2018년 2월 14일(안정 릴리즈는 2018년 3월 16일) Atom v1.25.0-beta0 버전부터 core.useTreeSitterParsers 플래그를 통한 실험적 기능으로 제공되기 시작해[6][7] 2018년 10월 31일 Atom v1.31 버전부터는 기본 설정으로 완전히 통합되었다.[8]

2019년 1월 5일, 파서 제너레이터 구현체를 C++에서 Rust로 재작성하고, 파서 개발용 CLI 인터페이스까지 Rust로 재작성하며 node-gyp를 통해 별도의 저장소에서 개발되고 있던 tree-sitter-cli를 deprecate시켰다.[9]

GitHub의 새 code search 및 navigation 기능에 GitHub가 개발한 stackgraph와 함께 중점적으로 쓰이고 있다.## linguist의 모든 언어가 지원되는 것은 아니고, 현재 C##, Python #, Ruby #, Elixir #, Rust #, TypeScript # 등의 언어 navigation 기능이 Tree-sitter 파서 기반으로 구현되어 있다.

3. 특징

속도가 매우 빠르다. 특히 백트래킹이 없어 사실상 [math(\mathcal O(n))]에 가까운 속도가 나오는 렉서 기반 신택스 하이라이터 구현체들과 비교해도 별 차이가 없을 정도로 수ms 이내에 파싱이 끝난다. 쿼리 속도 역시 빠른 편. 파서를 JavaScript로 생성하는 것으로 오해하지만 이는 문법 정의 파일이고 실제 생성되는 파서의 코어 로직은 전부 C로 짜여 있다.
증분 파싱을 지원한다. 다시 말해 코드 일부분만이 수정된 경우 해당 부분의 서브트리만 갱신시키고 기존의 파스 트리는 재사용하므로써 속도를 올리는 방식인데, 이는 컴파일러보다 실시간으로 diagnose를 표시해 줘야 하는 린터나 LSP 구현체에 특히 매력적인 장점이다. 일반적으로 대부분의 파서 제너레이터는 complete 파서만 생성하기에 증분 파싱 기능을 지원하려면 파서를 전부 손으로 짜야 하는데, LR 파서를 손으로 짜고 증분 파싱까지 구현하기란 결코 쉬운 난이도가 아니다.

수준급의 error recovery 기능이 내장되어 있다. 에디터 내에서 코드를 실시간으로 수정하다 보면 괄호를 연 채 닫지 않는 등 자연히 파싱 불가능한 상태를 지날 수밖에 없는데, 일반적인 컴파일러의 경우 파싱이 실패할 때마다 그 위치에 error diagnose를 보여주게 될 것이다. 자연히 다른 정적 분석으로 얻어진 정보도 소실될 수밖에 없다. 결국 상용 에디터 채택급의 증분 구현을 위해선 지능적인 에러 회피 전략이 필수적인데, Tree-sitter로 제작된 파서의 경우 파싱에 실패해도 해당 서브트리만 에러 토큰이 들어갈 뿐 코드베이스의 나머지 전체 트리는 영향받지 않는다.

문법 정의가 쉽다. 특히 별도의 메타신택스 또는 자신만의 문법 정의 언어가 있는 YACC, Bison 등의 다른 제너레이터와 다르게 문법을 JavaScript로 작성한다. 정확히는 grammar.js 파일을 몇몇 사전 정의된 함수를 사용해 JS로 작성한 후, Node.js로 이를 실행해 최종 객체를 JSON 형태로 얻어낸 뒤 다시 이를 기반으로 파서를 생성한다. 따라서 JS에 익숙하다면 문법 정의를 작성하기 꽤 쉽다.
Scheme 비스무리한 문법의 자체 쿼리 언어인 TSQuery로 파싱한 트리를 조회할 수 있다. 주로 용도는 요소 그룹핑 및 하이라이팅 등이지만 용도에 따라 정적 분석용이나 기타 인덱싱 용으로도 활용될 수 있다. 쿼리 자체가 파서와 별개이기에 문법 강조를 바꾼다고 파서를 재빌드할 필요 없이 하이라이터 쿼리만 수정하면 되는 것도 장점.
바인딩 활용성이 높다. 코어 라이브러리 자체가 C로 짜여 있기 때문에 C FFI 호환만 되는 언어라면 어느 언어든지 적당한 성능으로 바인딩 이용이 가능하다. 가장 대표적인 예시가 Node.js와 Electron 위에서 돌아가는 Atom.

4. 활용

4.1. 지원 플랫폼

Atom
Neovim - 2018년부터 Tree-sitter를 적극적으로 도입하기 시작했다.# 현재는 다소 unstable하지만 내장된 Lua API가 있고#, 대부분의 엔드 유저들은 nvim-treesitter를 사용해 :TSInstall만으로 구문 강조, 코드 폴딩, 심볼 검색 등등 기능을 사용하거나 추가적인 확장을 만들 수 있다.
Helix
Zed - 브런스펠드가 GitHub 퇴사 후 합류해 개발 중인 Rust 기반 에디터.
Emacs #

4.2. 파생 소프트웨어

ast-grep
rust-sitter - JavaScript 대신 Rust를 사용해 문법을 정의할 수 있게 해주는 레이어.
syncat - cat 명령어와 동일하지만 Tree-sitter 문법 기준으로 터미널 내에서 신택스 하이라이팅을 지원한다.

5. 외부 링크

tree-sitter-cli (npm)

6. 관련 문서

[z] Max joined the Atom team in 2013 after working at Pivotal Labs. While driving Atom towards its 1.0 launch during the day, Max spent nights and weekends building Tree-sitter, a blazing-fast and expressive incremental parsing framework that currently powers all code analysis at GitHub. Before leaving to start Zed, Max helped GitHub's semantic analysis team integrate Tree-sitter to support syntax highlighting and code navigation on github.com. #@[2] Max Brunsfeld is an engineer on GitHub's Atom team. Tree-sitter - a new parsing system for programming tools[3] Commit 84c5bceb818127fd7728655fb209a1be92e53fde - Nov 6, 2013[4] Today, I'm gonna be talking about a piece of software I've been working on for almost 4 years now, called Tree-sitter. I worked on it as a side project for a long time and now I'm working on it as part of some project of GitHub. Tree-sitter: a new parsing system for programming tools - GitHub Universe 2017 - Dec 21, 2017[5] This pull request adds a new config setting to Atom: core.useTreeSitterParsers, which defaults to false. If you set that setting to true, when editing files written in certain languages, Atom will use a new parsing system called Tree-sitter to provide improved syntax highlighting and code folding. Allow Tree-sitter parsers to be used for syntax highlighting and code folding #16299 - Jan 8, 2018[6] Support greatly improved syntax highlighting and code folding with a next-generation parsing system called tree-sitter. See the pull request for details about opting in to try it out. 1.25.0-beta0 release - Feb 14, 2018[7] For syntax highlighting and code-folding, an incremental parsing system, called tree-sitter, is available in beta form. Tree-sitter is a C library used via bindings to higher-level languages. Tree-sitter currently is disabled by default but can be turned on via the User Tree Sitter Parsers setting. What's new in GitHub's Atom text editor - Mar 16, 2018[8] At GitHub, we want to explore new ways of making programming intuitive and delightful, so we've developed a parsing system called Tree-sitter that will serve as a new foundation for code analysis in Atom. Tree-sitter makes it possible for Atom to parse your code while you type—maintaining a syntax tree at all times that precisely describes the structure of your code. We've enabled the new system by default in Atom, bringing a number of improvements. Atom understands your code better than ever before - Oct 31, 2018[9] In this PR, I'm moving the functionality of tree-sitter-cli into this repository. Instead of implementing the CLI in JavaScript like before, it will now be in Rust. In addition, I'm porting the C++ compiler library (used by the CLI) to Rust, and consolidating it into the CLI crate. Include CLI functionality in the main repo, using Rust instead of C++ #260 - Jan 5, 2019