Simple functions such as base R math operators +,
/, abs, etc, are now internally marked as
group-unaware. This has a very significant speed improvement for large
grouped data frames.
This means that expressions containing only group-unaware functions,
e.g. (x + y) / abs(z), are evaluated on the entire data
frame instead of on a by-group basis.
If the expression contains any functions not marked as group-unaware,
e.g. x + cumsum(y) (as cumsum() is not
flagged as group-unaware), then usual evaluation applies except in the
case of other statistical functions which are optimised in a separate
way.
across,
pick, etc.Accessing columns through .data should work
correctly now.
f_reframe would not recycle correctly in some cases
and has now been fixed.
An issue where f_arrange would add variables has
been fixed.
An issue where across was selecting grouped
variables has been fixed.
Fixed an issue where in some cases lists where not being handled
correctly in calls to across().
Many common expressions, such as sum(),
mean() and many others have been optimised in functions
like f_summarise(). For a current list of optimised
functions, see ?f_summarise.
f_mutate as an alternative to
mutate
f_reframe as an alternative to
reframe
Fast group metadata helper functions f_group_data,
f_group_indices, f_group_keys,
f_group_rows, f_group_size and
f_n_groups.
Small bug fix when f_summarise calculates means and
medians for zero-row data frames with integer variables.
R 4.0.0 now required.
f_summarise returning results in the incorrect
order.New function list_tidy as an alternative to
list that evaluates arguments dynamically with a focus on
setting precedence for objects created in the list over environment
objects.
new_tbl now evaluates its arguments dynamically.
f_expand also evaluates its argument dynamically unless the
data is grouped and the expressions supplied aren’t simply column
selections.
New function f_pull as a fast convenience function
for extracting vectors from columns.
New functions remove_rows_if_any_na and
remove_rows_if_all_na.
f_arrange gains the .descending
argument to efficiently return data frames in descending order.
f_fill to fill NA values
forwards and backwards by group.f_bind_rows sees a noticeable speed improvement.f_summarise now returns results in the correct order
when both multiple cols and multiple optimised functions were
specified.
Joins were returning an error when x and
y are grouped_df objects.
The join by argument now accepts a partial named character vector without throwing an error.
tidy_quantiles would return an error when
probabilities were not sorted and has now been fixed.
seed argument of f_slice_sample is
soft-deprecated. To achieve sampling, or really any RNG functions with a
local seed, use cheapr::with_local_seed().tidy_quantiles gains dramatic speed and efficiency
improvements.
The order and sort arguments for data
frame functions have been superseded in favour of .order
and .sort.
New argument .order added to both
f_summarise and tidy_quantiles to allow for
controlling the order of groups.
rowwise_df is now explicitly unsupported. To group
by row, use f_rowwise.
New functions f_nest_by, f_rowwise and
add_consecutive_id.
A few bug fixes including:
f_bind_rows was not working when supplied with more
than 2 data frames in some cases.f_summarise was not working when supplied with
non-function expressions.f_bind_cols now recycles its arguments and converts
non-data frames to data frames to allow for joining variables as if they
were columns.
Fixed a bug in the f_join functions where incorrect
matches were occurring when the columns being joined on are ‘exotic’
variables, e.g. lists, lubridate ‘Intervals’, etc. Currently fastplyr
uses a proxy method to join these kinds of variables through the use of
group_id. This was not being applied correctly for joined
exotic variables and should now be fixed.
New function f_consecutive_id as an alternative to
dplyr::consecutive_id.