A stable version 1.0.0, with a new tabyl
API and with breaking changes to the output of clean_names()
.
This preserves the original functionality of janitor, but significantly changes the implementation.
tabyl
This is now a single function tabyl()
to count combinations of one, two, or three variables, ala base R’s table()
. This replaces the crosstab()
function. The resulting tabyl
data.frames can be manipulated and formatted using a family of adorn_
functions. See the tabyls vignette for more.
The now-redundant legacy functions crosstab()
and adorn_crosstab()
have been deprecated, but remain in the package for now. Existing code that relies on tabyl
will break if the sort
argument is used, as that argument no longer exists in tabyl
(use dplyr::arrange()
instead).
clean_names
clean_names()
now detects and preserves camelCase inputs, allows multiple options for case outputs of the cleaned data.frame, and preserves whether there’s space between letters and numbers. It also transliterates accented letters and turns #
into "number"
. This may cause old code to break. E.g., variableName
as a raw column name is now converted to variable_name
(or variableName
, VariableName
, etc. depending on your preference), where it would previously have been converted to variablename
. To minimize this inconvenience, there’s a quick fix for compatibility: you can find-and-replace to insert the argument case = "old_janitor"
, preserving the old behavior of clean_names()
as of janitor version 0.3.1 (and thus not have to redo your scripts beyond that.)
clean_names()
transliterates accented letters, e.g., çãüœ
becomes cauoe
(#120). Thanks to @fernandovmacedo.
clean_names()
offers multiple options for variable name styling. In addition to snake_case
output you can select smallCamelCase
, BigCamelCase
, ALL_CAPS
and others. (#131).
clean_names()
. Thanks also to @maelle for proposing this feature. janiLaunched the janitor documentation website: http://sfirke.github.io/janitor. Thanks to the pkgdown package!
remove_empty_rows()
and remove_empty_cols()
, which are replaced by the single function remove_empty()
. (#100)
remove_empty()
does not have a default value for the which
argument, forcing more explicit and readable code. e.g. remove_empty("rows")
.The new adorn_title()
function shows the name of the 2nd tabyl
variable (column name) - this un-tidies the data.frame but makes the result clearer to readers (#77)
tabyl
objects now print with row numbers suppressedclean_names()
now retains the character #
as "number"
in the resulting namesround_half_up()
is now exported for public use. It’s an exact implementation of http://stackoverflow.com/questions/12688717/round-up-from-5-in-r/12688836#12688836, written by @mrdwab.adorn_totals("row")
handles quirky variable names in 1st column (#118)
get_dupes()
returns the correct result when a variable in the input data.frame is already called "n"
(#162)
This is a bug-fix release with no new functionality or changes. It fixes a bug where adorn_crosstab()
failed if the tibble
package was version > 1.4.
Major changes to janitor are currently in development on GitHub and will be released soon. This is not that next big release.
The primary purpose of this release is to maintain accuracy given breaking changes to the dplyr package, upon which janitor is built, in dplyr version >0.6.0. This update also contains a number of minor improvements.
Critical: if you update the package dplyr
to version >0.6.0, you must update janitor to version 0.3.0 to ensure accurate results from janitor’s tabyl()
function. This is due to a change in the behavior of dplyr’s _join
functions (discussed in #111).
janitor 0.3.0 is compatible with this new version of dplyr as well as old versions of dplyr back to 0.5.0. That is, updating janitor to 0.3.0 does not necessitate an update to dplyr >0.6.0.
add_totals_row
and add_totals_col
were combined into a single function, adorn_totals()
. (#57). The add_totals_
functions are now deprecated and should not be used.adorn_crosstab()
is now “dat” instead of “crosstab” (indicating that the function can be called on any data.frame, not just a result of crosstab()
)%>%
pipe from magrittr (#107).Deprecated the following functions:
use_first_valid_of()
- use dplyr::coalesce()
insteadconvert_to_NA()
- use dplyr::na_if()
insteadadd_totals_row()
and add_totals_col()
- replaced by the single function adorn_totals()
adorn_totals()
and ns_to_percents()
can now be called on data.frames that have non-numeric columns beyond the first one (those columns will be ignored) (#57)
adorn_totals("col")
retains factor class in 1st column if 1st column in the input data.frame was a factorclean_names()
now handles leading spaces (#85)
adorn_crosstab()
and ns_to_percents()
work on a 2-column data.frame (#89)
adorn_totals()
now works on a grouped tibble (#97)
tabyl()
and crosstab()
(#87)
NA_
column in the result of a crosstab()
will appear at the last column position (#109)
tabyl()
and crosstab()
now appear in the package manual (#65)
tabyl()
and crosstab()
failed to retain ill-formatted variable names only when using R 3.2.5 for Windows (#76)
add_totals_row()
works on two-column data.frame (#69)
use_first_valid_of()
returns POSIXct-class result when given POSIXct inputsSubmitted to CRAN!
mtcars %>% tabyl(mpg) %>% tabyl(n)
(#54)
get_dupes()
now works on variables with spaces in column names (#62)
adorn_crosstab()
that formats the results of a crosstab()
for pretty printing. Shows % and N in the same cell, with the % symbol, user-specified rounding (method and number of digits), and the option to include a totals row and/or column. E.g., mtcars %>% crosstab(cyl, gear) %>% adorn_crosstab()
.crosstab()
can be called in a %>%
pipeline, e.g., mtcars %>% crosstab(cyl, gear)
. Thanks to @chrishaid (#34)
tabyl()
can also be called in a %>%
pipeline, e.g., mtcars %>% tabyl(cyl)
(#35)
use_first_valid_of()
function (#32)
ns_to_percents()
, add_totals_row()
, add_totals_col()
,crosstab()
returns 0 instead of NA when there are no instances of a variable combination.tabyl(df$vecname)
retains the more-descriptive $
symbol in the column name of the result - if you want a legal R name in the result, call it as df %>% tabyl(vecname)
clean_names()