I am using regular expression (regex) in the column split function of Paxata. My conjecture is that it deploys a strategy of returning the shortest instead of the longest match. Let me explain with an example.
Intention: to match either pattern "AAABBB" of pattern "BBB" anywhere in a string (i.e. in the values of a column).
Outcome: value with "AAABBB" will be matched as "BBB" and returned in a new split column.
I was hoping that a longer match will have priority over a shorter match. I even tried placing the patterns in descending order of length within the regex but it did not help.
Is it the expected behavior or there is some control somewhere that I missed?