Kotlin Parser in Kotlin?


#1

TLDR: Is there a *.kt parser written in Kotlin that I can use?

Long version:

  1. I have some numerical algorithms written in a restricted subset of Kotlin.

  2. I want to automatically convert these *.kt files to *.c (to compile to wasm) and to *.cuda (the restricted subset of kotlin I am using makes the parallelism explicit).

  3. My main problem right now is – how do I parse *.kt files (in Kotlin) in the first place ?

====

Question: what kotlin lirary will help me do *.kt -> abstract syntax tree

====

Please ignore the following off-topic discussions:
Q: Kotlin is not parallel. You can not output parallel cuda.
A: I’m using a restricted subset, which expresses individual threads.

Q: Kotlin is not low level, You can not output C.
A: Not a problem.

Q: Why not write in C/cuda directly ?
A: 1. I want to have three version: JVM, js (via Cwasm), Cuda
2. I prefer to just write in a restricted subset of kotlin.i


#2

I don’t think there is a kotlin parser outside of the compiler. So I’d start there. As far as I know it is based on intellij’s PSI system, but the parts specific to kotlin are part of the compiler.


#3

@Wasabi375 : Thank you! This drastically narrows down the search.

I was searching around for sample code.

Does https://github.com/vektory79/kotlin-script-parser-test/blob/master/src/main/java/hello/CompileTest.kt look like it’s importing all the right classes?


#4

I’ve been thinking about tackling a similar project. I am thinking a lightweight set of data classes that represent the AST (can’t find a good one, the compiler lib is too is heavy for me and the psi package is not very ergonomic to use). I could see a separate piece compiling and then transforming into this model. My use case is for a more powerful code writer (kotlinpoet has many problems).

Also note, Kotlin Native can compile to WASM (wasn’t one of your off-topic parts to avoid specifically, also not sure about cuda files, but if there is something that will translate llvm IR, KN can write llvm bitcode too).


#5

@cretz: lol, good catch; problem is: from what I have read so far, Kotlin native is currently not focused on performance, and thus is not generating fast code


#6

I can’t say because I’m not on the project, but I can say that anything targetting LLVM gets many optimizations included (if they enable them) and from my uses of KN, I have seen quality performance.


#7
  1. I have never benchmarked Kotlin/Native -> WASM vs Kotlin/JS myself, so all my information is from rumors.

  2. Either way, this is irrelevant, as I need to also output Cuda, and for that, I have to parse the code.


#8

Based on that link you gave, I was able to whittle the code down to this for parsing code for the file level:

package somepkg

import com.intellij.openapi.util.Disposer
import com.intellij.psi.PsiManager
import com.intellij.testFramework.LightVirtualFile
import org.jetbrains.kotlin.cli.jvm.compiler.EnvironmentConfigFiles
import org.jetbrains.kotlin.cli.jvm.compiler.KotlinCoreEnvironment
import org.jetbrains.kotlin.config.CompilerConfiguration
import org.jetbrains.kotlin.idea.KotlinFileType
import org.jetbrains.kotlin.psi.KtFile

open class Parser {
    fun parse(code: String): KtFile {
        val disposable = Disposer.newDisposable()
        try {
            val env = KotlinCoreEnvironment.createForProduction(
                disposable, CompilerConfiguration(), EnvironmentConfigFiles.JVM_CONFIG_FILES)
            val file = LightVirtualFile("temp.kt", KotlinFileType.INSTANCE, code)
            return PsiManager.getInstance(env.project).findFile(file) as KtFile
        } finally {
            disposable.dispose()
        }
    }

    companion object : Parser() {
        init {
            // To hide annoying warning on Windows
            System.setProperty("idea.use.native.fs.for.win", "false")
        }
    }
}

Of course it doesn’t catch any errors or anything, but it at least shows how to use a virtual file and use the find file to parse it.

EDIT: I’m starting a project at https://github.com/cretz/kastree which is hopefully just a simplified syntax tree that converts from the psi model. It has contains no type info like https://github.com/arturbosch/detekt might, so you might want to use that project instead.


#9

And now I realize I could have just used PsiFileFactory.createFileFromText. Also note, if you’re parsing a bunch of files, I am unsure if the env should be held across parsings.


#10

I’m missing something very obvious.

Your code returns a https://github.com/JetBrains/kotlin/blob/master/compiler/psi/src/org/jetbrains/kotlin/psi/KtFile.kt

how do we go from that to an abstract syntax tree?


#11

It’s a PSI object. The PSI(Program structure interface) is basically intellij’s version of an AST with some additional features. As far as I know the KtFile is the top element representing a file of kotlin source code.


#12

@Wasabi375 @cretz

I have skimmed https://www.jetbrains.org/intellij/sdk/docs/basics/architectural_overview/psi_elements.html a bit

My current understanding is:

  1. PSIElement is a IDE construct
  2. PSIElement may or may not be a Kotlin compiler internals
  3. We can attach colors / attributes to PSIElement – it’s useful for syntax coloring, tool tips over keywords, etc …
  4. PSIElement applies to languages besides Kotlin

Assuming most of the above is true – why is it, for the task of parsing Kotlin files, we want PSIElement (IDE specific class) instead of some Kotlin-compiler specific class?

[ One possible solution is that Kotlin Compiler also happens to use PSIElement as internal AST representation – but I have no way how to verify that, and it would be very surprising to me]


#13

Not surprising to me. I think that is the case and I think it’s the one and only true parser. (well, the compiler has an IR and there are some other details, but yes, that PSI stuff is the only place that has the proper AST that I’m aware of.) I’m working with the stuff in the org.jetbrains.kotlin.psi package now and it’s not that bad at all, just a lot of helpers and utils to clutter it. I am transforming it into my own AST that will be easier to use if you want to wait a few days.


Calling PsiViewer from external tool (NOT plugin)
#14

@cretz looks like you are right

the following code dumps out the AST of the file located at data/tmp/sample.kt

 package foobar


import com.intellij.openapi.util.Disposer
import com.intellij.psi.PsiElement
import com.intellij.psi.PsiManager
import com.intellij.testFramework.LightVirtualFile
import org.jetbrains.kotlin.cli.jvm.compiler.EnvironmentConfigFiles
import org.jetbrains.kotlin.cli.jvm.compiler.KotlinCoreEnvironment
import org.jetbrains.kotlin.config.CompilerConfiguration
import org.jetbrains.kotlin.idea.KotlinFileType
import org.jetbrains.kotlin.psi.KtFile
import java.io.File

open class Parser {
    fun parse(code: String): KtFile {
        val disposable = Disposer.newDisposable()
        try {
            val env = KotlinCoreEnvironment.createForProduction(
                    disposable, CompilerConfiguration(), EnvironmentConfigFiles.JVM_CONFIG_FILES)
            val file = LightVirtualFile("temp.kt", KotlinFileType.INSTANCE, code)
            return PsiManager.getInstance(env.project).findFile(file) as KtFile
        }  catch (e: Exception ) {
            println("parse error: $e")
            throw e
        } finally {
            disposable.dispose()
        }
    }

    companion object: Parser() {
        init {
        }
    }
}


class Text_margin() {
    var margins: MutableList<String> = mutableListOf()
    var cache = ""

    private fun update_cache() {
        cache = margins.joinToString(separator = ".")
    }
    fun push(s: String) {
        margins.add(s)
        update_cache()
    }
    fun pop() {
        margins.removeAt(margins.size-1)
        update_cache()
    }
}

class Text_margin_console() {
    var text_margin = Text_margin()

    fun push(s: String) {
        text_margin.push(s)
    }
    fun pop() {
        text_margin.pop()
    }
    fun write(text: String) {
        for (line in text.lines()) {
            println("${text_margin.cache}: $line")
        }
    }
    fun write(psi_element: PsiElement) {
        val childs = psi_element.getChildren()

        if (childs.size > 0) {
            var i = 0
            for (c in childs) {
                push("$i")
                write("Node: $c")
                write(c)
                pop()
                i++
            }
        } else {
            write(psi_element.getText())
        }
    }
}




fun main(args: Array<String>) {
    val tmc = Text_margin_console()

    val parser = Parser()
    val file_content = File("data/tmp/sample.kt").readText()
    val ans = parser.parse(file_content )
    println("$ans = ${ans.getText()}")

    tmc.write(ans)

}

#15

Ok, the code I put at https://github.com/cretz/kastree will let you parse a string of code into a clean AST. While there are some integration tests against the PSI corpus, I don’t have any tests on the validity just yet. It’s not quite a library yet (my needs require that I make a mutating Visitor and Writer), the AST and the PSI-to-AST converter are sound.


#16

Looks great. I don’t really know why I would ever need this myself but you never know :wink:
Maybe just add a short readme and a license to your repository. That way it should be easier (well or leagal) to use for other people :stuck_out_tongue_winking_eye:


#17

Yup, I usually do (see my other repos), but I just hacked this out and posted here for that user. I have now added the license and will add a README when the lib is complete.