ikt
July 26, 2018, 4:35am
1
I am aware of find - Kotlin Programming Language
which says: regex must start at OR AFTER the start index
I want a way to say: regex MUST start at PRECISELY start_index.
I have tried the following:
fun main (args: Array<String>) {
val s = "1a2"
val r1 = Regex("""\d+""")
println( r1.find(s, 0) )
println( r1.find(s, 1) )
println( r1.find(s, 2) )
println("=====")
val r2 = Regex("""^\d+""")
println( r2.find(s, 0) )
println( r2.find(s, 1) )
println( r2.find(s, 2) )
}
which returns: match, match, match (for r1)
and match, null, null (for r2)
I want something that returns (match, null, match)
I want to say: match \d+, but you must start PRECISELY at start_index.
this would mean 1a2 → matches when index = 0, 2; fails when index = 1 (wince start is ‘a’)
Is this possible? To say: search for regex starting PRECISELY at start_index
I tried “^” hoping it matches "start_index, but it looks like it is hardcoded to be start-of-line
Thanks!
println( r.find(s.substring(0)) )
println( r.find(s.substring(1)) )
println( r.find(s.substring(2)) )
is ok ?
or compile your regex with
println( Regex("""^.{0}\d+""").find(s) )
println( Regex("""^.{1}\d+""").find(s) )
println( Regex("""^.{2}\d+""").find(s) )
ikt
July 27, 2018, 12:55pm
4
I failed to state: I am using regex to tokenize a very long string. Neither of the above works for the following reasons:
substring:
may be constantly copying over string, per token tokenized
recompiling regex:
creates a new regex per token tokenized
Varia
July 27, 2018, 3:31pm
5
If your use case is tokenization, maybe split - Kotlin Programming Language is what you are looking for?
ikt
July 27, 2018, 6:16pm
6
Split would be great if my tokenization was context-free. Unfortunately, my tokenization is context sensistive. The tokeinization is something like:
there is a current “state”
this state defines a list of valid regexs to try
depending on which regex we match on, we go to a new “state”
… and so forth …
split
would require that there be no state, and that all regexs be valid to use at all times
not sure if you really need it… still
^ means start of line…
maybe use a negative lookbehind on \d : (?<!\d)\d+
Varia
July 27, 2018, 11:37pm
8
Javas java.util.regex.Matcher
(created from a java.util.regex.Pattern
) seems to have a bit more utility than Kotlins Regex. Link: Matcher (Java Platform SE 8 )
1 Like
ikt
July 28, 2018, 3:01am
9
@Varia : I will look into Java/Matcher. Thanks!
I agree with many here. It’s not nice to be forced to use Java/Matcher just for that reason. Using substring
combined with “^” in the regex it just a cheap trick (with an useless string handling), and you still have to fix the positions of MatchResult.range