Kotlin regex force start at index

ikt · July 26, 2018, 4:35am

I am aware of find - Kotlin Programming Language
which says: regex must start at OR AFTER the start index

I want a way to say: regex MUST start at PRECISELY start_index.

I have tried the following:

fun main (args: Array<String>) {
    val s = "1a2"
    val r1 = Regex("""\d+""")
    println( r1.find(s, 0) )
    println( r1.find(s, 1) )
    println( r1.find(s, 2) )

    println("=====")
    val r2 = Regex("""^\d+""")
    println( r2.find(s, 0) )
    println( r2.find(s, 1) )
    println( r2.find(s, 2) )

}

which returns: match, match, match (for r1)
and match, null, null (for r2)

I want something that returns (match, null, match)

I want to say: match \d+, but you must start PRECISELY at start_index.

this would mean 1a2 → matches when index = 0, 2; fails when index = 1 (wince start is ‘a’)

Is this possible? To say: search for regex starting PRECISELY at start_index

I tried “^” hoping it matches "start_index, but it looks like it is hardcoded to be start-of-line

Thanks!

noo.blaster · July 27, 2018, 12:34pm

    println( r.find(s.substring(0)) )
    println( r.find(s.substring(1)) )
    println( r.find(s.substring(2)) )

is ok ?

noo.blaster · July 27, 2018, 12:35pm

or compile your regex with

    println( Regex("""^.{0}\d+""").find(s) )
    println( Regex("""^.{1}\d+""").find(s) )
    println( Regex("""^.{2}\d+""").find(s) )

ikt · July 27, 2018, 12:55pm

I failed to state: I am using regex to tokenize a very long string. Neither of the above works for the following reasons:

substring:
may be constantly copying over string, per token tokenized

recompiling regex:
creates a new regex per token tokenized

Varia · July 27, 2018, 3:31pm

If your use case is tokenization, maybe split - Kotlin Programming Language is what you are looking for?

ikt · July 27, 2018, 6:16pm

Split would be great if my tokenization was context-free. Unfortunately, my tokenization is context sensistive. The tokeinization is something like:

there is a current “state”
this state defines a list of valid regexs to try
depending on which regex we match on, we go to a new “state”
… and so forth …

split would require that there be no state, and that all regexs be valid to use at all times

noo.blaster · July 27, 2018, 6:34pm

not sure if you really need it… still

^ means start of line…
maybe use a negative lookbehind on \d : (?<!\d)\d+

Varia · July 27, 2018, 11:37pm

Javas java.util.regex.Matcher (created from a java.util.regex.Pattern) seems to have a bit more utility than Kotlins Regex. Link: Matcher (Java Platform SE 8 )

ikt · July 28, 2018, 3:01am

@Varia : I will look into Java/Matcher. Thanks!

pbuchsbaum · July 10, 2019, 9:46pm

I agree with many here. It’s not nice to be forced to use Java/Matcher just for that reason. Using substring combined with “^” in the regex it just a cheap trick (with an useless string handling), and you still have to fix the positions of MatchResult.range

Topic		Replies	Views
[Feature request] Regex findAll with overlap Language Design	2	1351	December 12, 2023
Can't wrap my mind around this PatternSyntaxException Native	7	1998	November 1, 2018
What is the equivalent of String.prototype.match() of JS in Kotlin (JVM) Support	1	1118	November 6, 2017
Using Regex in a when Support	7	21667	December 8, 2022
String manipulation and regular expression	3	1612	February 19, 2013

Kotlin regex force start at index

Related topics