Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.7k views
in Technique[技术] by (71.8m points)

datetime - How to transform a dataframe of characters to the respective dates?

I noticed already a couple of times that working with dates doesn't allow for using the usual tricks in R. Say I have a dataframe Data with Dates (see below), and I want to convert the complete dataframe to a date class. The only solution I could come up with until now is :

for (i in 1:ncol(Data)){
    Data[,i] <- as.Date(Data[,i],format="%d %B %Y")
}

This gives a dataframe with the correct structure :

> str(Data)
'data.frame':   6 obs. of  4 variables:
 $ Rep1:Class 'Date'  num [1:6] 12898 12898 13907 13907 13907 ...
 $ Rep2:Class 'Date'  num [1:6] 13278 13278 14217 14217 14217 ...
 $ Rep3:Class 'Date'  num [1:6] 13600 13600 14340 14340 14340 ...
 $ Rep4:Class 'Date'  num [1:6] 13831 13831 14669 14669 14669 ...

Using a classic apply approach gives something completely different. Although all variables are of the same class and go to the same class, I can't get a data-frame or matrix of the correct class as output :

> str(sapply(Data,as.Date,format="%d %B %Y"))
 num [1:6, 1:4] 12898 12898 13907 13907 13907 ...
 - attr(*, "dimnames")=List of 2
  ..$ : NULL
  ..$ : chr [1:4] "Rep1" "Rep2" "Rep3" "Rep4"
> str(apply(Data,2,as.Date,format="%d %B %Y"))
 num [1:6, 1:4] 12898 12898 13907 13907 13907 ...
 - attr(*, "dimnames")=List of 2
  ..$ : NULL
  ..$ : chr [1:4] "Rep1" "Rep2" "Rep3" "Rep4"

If you want to transform these matrices again in Date objects, you need an origin. That origin can differ from system to system, so using as.Date or another function after the apply() doesn't help much either. If you apply the origin, you get a vector again.

Anybody a clean solution for this kind of data? Below is the dataframe I used in the examples.

Data <- structure(list(Rep1 = c(" 25 April 2005 ", " 25 April 2005 ", 
" 29 January 2008 ", " 29 January 2008 ", " 29 January 2008 ", 
" 29 January 2008 "), Rep2 = c(" 10 May 2006 ", " 10 May 2006 ", 
" 4 December 2008 ", " 4 December 2008 ", " 4 December 2008 ", 
" 4 December 2008 "), Rep3 = c(" 28 March 2007 ", " 28 March 2007 ", 
" 6 April 2009 ", " 6 April 2009 ", " 6 April 2009 ", " 6 April 2009 "
), Rep4 = c(" 14 November 2007 ", " 14 November 2007 ", " 1 March 2010 ", 
" 1 March 2010 ", " 1 March 2010 ", " 1 March 2010 ")), .Names = c("Rep1", 
"Rep2", "Rep3", "Rep4"), row.names = c("1", "2", "3", "4", "5", 
"6"), class = "data.frame")
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

I think the most succinct way to do this is:

Data[] <- lapply(Data, as.Date,format="%d %B %Y")

This also nicely generalises to the case where not all columns are dates:

Data[date_col] <- lapply(Data[date_col], as.Date,format="%d %B %Y")

You can also simplify the date parsing with a couple of other packages

library(stringr)
library(lubridate)
Data[] <- lapply(Data, function(x) dmy(str_trim(x)))

which is a little more verbose, but has the advantage that you don't need to figure out the data format yourself.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...