The rm_db_name()
function was inspired from having column names prefixed by database names when importing the data into R.
One solution to this problem is to use “as variable_name
” in you SQL code without giving the table an “as table_name
” alias. This solution works fine when you have less than a dozen or so fields. What if you had several hundred? No one in a job with deadlines would rename every single field.
For example, lets say that you have a HIVE table called fiscal_dates with 3 fields, greg_d, wk_end_d, and year.
When you query this table, you would expect this:
expected <- tibble::tribble(~greg_d,~wk_end_d,~year)
str(expected)
#> tibble [0 x 3] (S3: tbl_df/tbl/data.frame)
#> $ greg_d : 'vctrs_unspecified' logi(0)
#> $ wk_end_d: 'vctrs_unspecified' logi(0)
#> $ year : 'vctrs_unspecified' logi(0)
Instead you get this:
incorrect <- tibble::tribble(~fiscal_dates.greg_d,~fiscal_dates.wk_end_d,~fiscal_dates.year)
str(incorrect)
#> tibble [0 x 3] (S3: tbl_df/tbl/data.frame)
#> $ fiscal_dates.greg_d : 'vctrs_unspecified' logi(0)
#> $ fiscal_dates.wk_end_d: 'vctrs_unspecified' logi(0)
#> $ fiscal_dates.year : 'vctrs_unspecified' logi(0)
The database prefix is very inefficient moving forward in any data wrangling and machine learning project.
The rm_db_name()
will remove the prefix and return a dataframe with the correct or desired column names.
corrected <- RToolShed::rm_db_name(incorrect,"fiscal_dates")
str(corrected)
#> tibble [0 x 3] (S3: tbl_df/tbl/data.frame)
#> $ greg_d : 'vctrs_unspecified' logi(0)
#> $ wk_end_d: 'vctrs_unspecified' logi(0)
#> $ year : 'vctrs_unspecified' logi(0)