Sequence chunk or window with predicate?


#1

Say if I have a file containing a long list of family data , starting with F (father) , M (mother) , and optional unknown length S (son) , D (daughter) data , like this :

F ...
M ...
S ...
F ...
M ...
S ...
D ...
F ...
M ...
S ...
S ...
S ...
F ...
M ...
D ...
D ...

I want to parse this file into a data structure containing father , mother , and list of children…

val iStream: InputStream = javaClass.getResourceAsStream(filename)
iStream.bufferedReader(Charsets.UTF_8).useLines { sequence -> 
}

I found it is very hard to use sequence or iterator to accomplish this job . Because the children size is unknown , I cannot use seq.chunk(n:Int) or seq.windowed() to do it.

As to iterator , I should depend on (peeking) next line , checking if it’s starting with ‘F’ , meaning new family . It will break while(iterator.hasNext())

I try to implement a custom Iterator , but don’t know how to feed it into a sequence or an iterator.

Any ideas ? (except reading all into List , which consumes memory )


#2

With the help of coroutines you can chunk a sequence by families the same way as you do it with lists, but producing a lazy sequence instead, for example:

fun chunkByFamily(names: Sequence<String>): Sequence<List<String>> = buildSequence {
    val currentFamily = mutableListOf<String>()
    for (name in names) {
        if (startsNewFamily(name)) {
            yield(currentFamily.toList())
            currentFamily.clear()
        }
        currentFamily += name
    }
    yield(currentFamily)
}

#3

Wow , what a fantastic solution :+1::+1::+1:
Thanks, it works !
( I was stuck in nested iterator like this : https://stackoverflow.com/a/5475086/298430 )