Skip to content

Columns used in a function in by are not available in j #1427

@renkun-ken

Description

@renkun-ken

It seems that all columns appearing the the function that produces the by column will not be included in j nor in .SD.

require(data.table)
dt <- data.table(a=c(1001,1002,1011,1012), x = c(1,2,3,4))
dt[, .SD, by = .(i = substr(a, 3, 4))]
#     i x
#1: 01 1
#2: 02 2
#3: 11 3
#4: 12 4
dt[, .SD, by = .(i = substr(a, 1, 3))]
#      i x
#1: 100 1
#2: 100 2
#3: 101 3
#4: 101 4

For example, I have a data table of a long list of yyyyMMdd dates from 20150101 to 20151001 and I use by = substr(date, 1, 6) to group the data into year-months. But in each group accessed either from .SD or in the scope of j, date column disappears so that I cannot get the original date in this way. I am not sure if previous versions had this problem (I remember its behavior does not look like this before or I'm wrong).

To walk around I have to first make the new column year_month first and then by = year_month.

I'm using latest version of data.table (v1.9.6) in CRAN.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions