r - Minus values from columns relative to year

Question

Welcome To Ask or Share your Answers For Others

r - Minus values from columns relative to year

asked Jan 27, 2021 in Technique[技术] by 深蓝 (71.8m points)

r - Minus values from columns relative to year

I'm trying to minus values for each habitat covariate relative to year 2019 and 2010. So, something that can assign by ID those values belonging to each habitat for 2010 and 2019, minus them, otherwise, those that aren't grouped by ID are left as is in the dataframe.

Here's an example of the dataset and what I expect for the output:

#dataset example

# A tibble: 30 x 18
      id year  pland_00_water pland_01_evergr~ pland_02_evergr~ pland_03_decidu~ pland_04_decidu~ pland_05_mixed_~ pland_06_closed~
   <int> <chr>          <dbl>            <dbl>            <dbl>            <dbl>            <dbl>            <dbl>            <dbl>
 1   267 2019          0.0833                0                0                0                0                0                0
 2   268 2019          0.2                   0                0                0                0                0                0
 3   362 2019          0.1                   0                0                0                0                0                0
 4   420 2019          0.0556                0                0                0                0                0                0
 5   421 2019          0.0667                0                0                0                0                0                0
 6   484 2019          0.125                 0                0                0                0                0                0
 7   492 2010          0.1                   0                0                0                0                0                0
 8   492 2019          0.1                   0                0                0                0                0                0
 9   719 2010          0.0769                0                0                0                0                0                0
10   719 2019          0.0769                0                0                0                0                0                0


#output example

# A tibble: 30 x 18
      id year  pland_00_water pland_01_evergr~ pland_02_evergr~ pland_03_decidu~ pland_04_decidu~ pland_05_mixed_~ pland_06_closed~
   <int> <chr>          <dbl>            <dbl>            <dbl>            <dbl>            <dbl>            <dbl>            <dbl>
 1   267 2019          0.0833                0                0                0                0                0                0
 2   268 2019          0.2                   0                0                0                0                0                0
 3   362 2019          0.1                   0                0                0                0                0                0
 4   420 2019          0.0556                0                0                0                0                0                0
 5   421 2019          0.0667                0                0                0                0                0                0
 6   484 2019          0.125                 0                0                0                0                0                0
 7   492 changed        0                     0                0                0                0                0                0

 9   719 changed        0                     0                0                0                0                0                0

I can imagine this working with a function and boolean operators such that, if year 2010 & 2019 match by id then minus the next row by the previous (assuming that they're ordered by id then this should work), otherwise, if they do not match by id then leave them as is.

I'm trying to wrap my head around which code to use for this, I can see this working within a function and using lapply to apply across the entire dataset.

Here's a reproducible code:

structure(list(id = c(267L, 268L, 362L, 420L, 421L, 484L, 492L, 
492L, 719L, 719L, 986L, 986L, 1071L, 1071L, 1303L, 1303L, 1306L, 
1399L, 1399L, 1400L, 1400L, 2007L, 2083L, 2083L, 2134L, 2135L, 
2136L, 2213L, 2213L, 2214L), year = c(2019, 2019, 2019, 2019, 
2019, 2019, 2010, 2019, 2010, 2019, 2010, 2019, 2010, 2019, 2010, 
2019, 2010, 2010, 2019, 2010, 2019, 2019, 2010, 2019, 2019, 2019, 
2019, 2010, 2019, 2010), pland_00_water = c(0.0833333333333333, 
0.2, 0.1, 0.0555555555555556, 0.0666666666666667, 0.125, 0.1, 
0.1, 0.0769230769230769, 0.0769230769230769, 0.0588235294117647, 
0.0588235294117647, 0.0714285714285714, 0.0714285714285714, 0.0769230769230769, 
0.0769230769230769, 0.0588235294117647, 0.05, 0.05, 0.111111111111111, 
0.111111111111111, 0.0526315789473684, 0.142857142857143, 0.142857142857143, 
0.0666666666666667, 0.0588235294117647, 0.1, 0.142857142857143, 
0.142857142857143, 0.25), pland_01_evergreen_needleleaf = c(0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.0588235294117647, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), pland_02_evergreen_broadleaf = c(0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0), pland_03_deciduous_needleleaf = c(0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.0714285714285714, 0, 0, 
0, 0, 0.05, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), pland_04_deciduous_broadleaf = c(0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.0714285714285714, 0.0714285714285714, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), pland_05_mixed_forest = c(0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0), pland_06_closed_shrubland = c(0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0), pland_07_open_shrubland = c(0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0), pland_08_woody_savanna = c(0, 0, 0, 0, 0, 0, 
0, 0, 0.0769230769230769, 0.0769230769230769, 0.0588235294117647, 
0.0588235294117647, 0.0714285714285714, 0.0714285714285714, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), pland_09_savanna = c(0, 
0, 0, 0, 0, 0, 0, 0, 0.0769230769230769, 0.0769230769230769, 
0.0588235294117647, 0.0588235294117647, 0, 0, 0, 0.0769230769230769, 
0.0588235294117647, 0.05, 0.05, 0.111111111111111, 0.111111111111111, 
0, 0, 0, 0, 0, 0, 0, 0, 0), pland_10_grassland = c(0.0833333333333333, 
0.2, 0.1, 0.0555555555555556, 0.0666666666666667, 0.125, 0.1, 
0.1, 0.0769230769230769, 0.0769230769230769, 0.0588235294117647, 
0.0588235294117647, 0.0714285714285714, 0.0714285714285714, 0.0769230769230769, 
0.0769230769230769, 0.0588235294117647, 0.05, 0.05, 0.111111111111111, 
0.111111111111111, 0.0526315789473684, 0.142857142857143, 0.142857142857143, 
0.0666666666666667, 0.0588235294117647, 0.1, 0.142857142857143, 
0.142857142857143, 0.25), pland_11_wetland = c(0.0833333333333333, 
0.2, 0.1, 0.0555555555555556, 0, 0, 0.1, 0.1, 0.0769230769230769, 
0.0769230769230769, 0.0588235294117647, 0.0588235294117647, 0.0714285714285714, 
0.0714285714285714, 0.0769230769230769, 0.0769230769230769, 0.0588235294117647, 
0.05, 0.05, 0.111111111111111, 0, 0.0526315789473684, 0.142857142857143, 
0.142857142857143, 0.0666666666666667, 0.0588235294117647, 0.1, 
0.142857142857143, 0.142857142857143, 0), pland_12_cropland = c(0.0833333333333333, 
0.2, 0.1, 0.0555555555555556, 0.0666666666666667, 0.125, 0.1, 
0.1, 0.0769230769230769, 0.0769230769230769, 0.0588235294117647, 
0, 0, 0, 0.0769230769230769, 0.0769230769230769, 0.0588235294117647, 
0.05, 0.05, 0.111111111111111, 0.111111111111111, 0.0526315789473684, 
0.142857142857143, 0.142857142857143, 0.0666666666666667, 0, 
0, 0.142857142857143, 0.142857142857143, 0.25), pland_13_urban = c(0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0), pland_14_mosiac = c(0, 0, 0, 0, 0, 0, 
0, 0, 0.0769230769230769, 0.0769230769230769, 0, 0.0588235294117647, 
0, 0, 0, 0, 0, 0.05, 0.05, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), 
    pland_15_barren = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), row.names = c(NA, 
-30L), class = c("tbl_df", "tbl", "data.frame"))

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-01-27T04:00:45+0000

Here's a tidyverse version:

library(dplyr)

x %>%
  arrange(year) %>%
      # can add 'id' if desired, minimum 'year' required for below
  group_by(id) %>%
  filter(
    all(c("2010", "2019") %in% year),
    year %in% c("2010", "2019")
  ) %>%
  summarize_at(vars(-year), diff) %>%
  mutate(year = "changed") %>%
  ungroup() %>%
  bind_rows(x, .) %>%
  arrange(id, year)           # just to show id=492
# # A tibble: 39 x 18
#       id year  pland_00_water pland_01_evergr~ pland_02_evergr~ pland_03_decidu~ pland_04_decidu~ pland_05_mixed_~
#    <int> <chr>          <dbl>            <dbl>            <dbl>            <dbl>            <dbl>            <dbl>
#  1   267 2019          0.0833                0                0                0                0                0
#  2   268 2019          0.2                   0                0                0                0                0
#  3   362 2019          0.1                   0                0                0                0                0
#  4   420 2019          0.0556                0                0                0                0                0
#  5   421 2019          0.0667                0                0                0                0                0
#  6   484 2019          0.125                 0                0                0                0                0
#  7   492 2010          0.1                   0                0                0                0                0
#  8   492 2019          0.1                   0                0                0                0                0
#  9   492 chan~         0                     0                0                0                0                0
# 10   719 2010          0.0769                0                0                0                0                0
# # ... with 29 more rows, and 10 more variables: pland_06_closed_shrubland <dbl>, pland_07_open_shrubland <dbl>,
# #   pland_08_woody_savanna <dbl>, pland_09_savanna <dbl>, pland_10_grassland <dbl>, pland_11_wetland <dbl>,
# #   pland_12_cropland <dbl>, pland_13_urban <dbl>, pland_14_mosiac <dbl>, pland_15_barren <dbl>

Explanation:

the first arrange(year) is so that the diff later will have values in an expected order (assuming all years are year-like that sort lexicographically the same as a numerical sort);
the filter first removes any ids that do not have both years, and then ensures we have only those two years; while your data only contains "2010" and "2019", I didn't want to assume that ... it's a harmless filter if that's all you have, remove year %in% c("2010","2019") if desired and safe;
I assume that columns other than id and year are numeric/integer, so summarize_at(vars(-year), diff) is safe (id is out of the picture since it is a grouping variable); if there are non-numerical values, you might be able to use summarize_if(is.numeric, diff) which also works here ... but will silently NA-ize non-numeric fields if present;
bind_rows(x, .) is needed because the filter removed many rows we want/need to retain; and
the last arrange(id,year) is solely demonstrative for this answer.

Categories

r - Minus values from columns relative to year

r - Minus values from columns relative to year

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags