rm_db_name

The rm_db_name() function was inspired from having column names prefixed by database names when importing the data into R.

One solution to this problem is to use “as variable_name” in you SQL code without giving the table an “as table_name” alias. This solution works fine when you have less than a dozen or so fields. What if you had several hundred? No one in a job with deadlines would rename every single field.

Example

For example, lets say that you have a HIVE table called fiscal_dates with 3 fields, greg_d, wk_end_d, and year.

When you query this table, you would expect this:

expected <- tibble::tribble(~greg_d,~wk_end_d,~year)
str(expected)
#> tibble [0 x 3] (S3: tbl_df/tbl/data.frame)
#>  $ greg_d  : 'vctrs_unspecified' logi(0) 
#>  $ wk_end_d: 'vctrs_unspecified' logi(0) 
#>  $ year    : 'vctrs_unspecified' logi(0)

Instead you get this:

incorrect <- tibble::tribble(~fiscal_dates.greg_d,~fiscal_dates.wk_end_d,~fiscal_dates.year)
str(incorrect)
#> tibble [0 x 3] (S3: tbl_df/tbl/data.frame)
#>  $ fiscal_dates.greg_d  : 'vctrs_unspecified' logi(0) 
#>  $ fiscal_dates.wk_end_d: 'vctrs_unspecified' logi(0) 
#>  $ fiscal_dates.year    : 'vctrs_unspecified' logi(0)

The database prefix is very inefficient moving forward in any data wrangling and machine learning project.

The rm_db_name() will remove the prefix and return a dataframe with the correct or desired column names.

corrected <- RToolShed::rm_db_name(incorrect,"fiscal_dates")
str(corrected)
#> tibble [0 x 3] (S3: tbl_df/tbl/data.frame)
#>  $ greg_d  : 'vctrs_unspecified' logi(0) 
#>  $ wk_end_d: 'vctrs_unspecified' logi(0) 
#>  $ year    : 'vctrs_unspecified' logi(0)

Alfredo G Marquez

2021-12-15

Example