Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
2.0k views
in Technique[技术] by (71.8m points)

regex - Ignore part of a string when splitting using regular expression in R

I'm trying to split a string in R (using strsplit) at some specific points (dash, -) however not if the dash are within a string in brackets ([).

Example:

xx <- c("Radio Stations-Listened to Past Week-Toronto [FM-CFXJ-93.5 (93.5 The Move)]","Total Internet-Time Spent Online-Past 7 Days")
xx
  [1] "Radio Stations-Listened to Past Week-Toronto [FM-CFXJ-93.5 (93.5 The Move)]"
  [2] "Total Internet-Time Spent Online-Past 7 Days" 

should give me something like:

list(c("Radio Stations","Listened to Past Week","Toronto [FM-CFXJ-93.5 (93.5 The Move)]"), c("Total Internet","Time Spent Online","Past 7 Days"))
  [[1]]
  [1] "Radio Stations"                         "Listened to Past Week"                 
  [3] "Toronto [FM-CFXJ-93.5 (93.5 The Move)]"

  [[2]]
  [1] "Total Internet"    "Time Spent Online" "Past 7 Days"  

Is there a way with regular expression to do this? The position and the number of dashs change within each elements of the vector, and there is not always brackets. However, when there are brackets, they are always at the end.

I've tried different things, but none are working:

## Trying to match "-" before "[" in Perl
strsplit(xx, split = "-(?=\[)", perl=T)
# does nothing

## trying to first extract what follow "[" then splitting what is preceding that
temp <- strsplit(xx, "[", fixed = T)
temp <- lapply(temp, function(yy) substr(head(yy, -1),"-"))
# doesn't work as there are some elements with no brackets...

Any help would be appreciated.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Based on: Regex for matching a character, but not when it's enclosed in square bracket

You can use:

strsplit(xx, "-(?![^\[]*\])", perl = TRUE)
[[1]]
[1] "Radio Stations"                         "Listened to Past Week"                 
[3] "Toronto [FM-CFXJ-93.5 (93.5 The Move)]"

[[2]]
[1] "Total Internet"    "Time Spent Online" "Past 7 Days" 

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...