Downsample

Description

The Downsample allows one to downsample a dataset.

Usage

Input data	.
`column`	numeric, observation IDs (events, ...)
`colour`	factor (optional), grouping factor for downsampling

Output data	.
`random_sequence`	Random sequence of integers, per group
`random_percentage`	Cumulative percentage of the sequence of integers, per group

Settings	.
`seed`	Random seed

How to use the operator?

Downsampling is typically used to handle unbalanced sample sizes between group, or to reduce the size of a dataset for performance improvement.

Note that the operator doesn't return a downsampled dataset but two sequences of numbers that can be used as a filter in the next step:

a sequence of integers randomly assigned to each observation, per group
a sequence of percentages assigned to each observation, per group. This percentage is relative to the total number of observations in the smallest group.

Those two factors can be used in different ways.

Filtering down to a given number of observations

You can use the "random_sequence" factor as a filter in the next step. If you select values less or equal than 1000, this number of observations will be kept per group provided that colors have been specified. If no color has been specified, a random subset of 1000 observations will be filtered.

Filtering down to a given percentage of observations

If you wish to balance the dataset size among groups, apply a filter in the next data step with the "random_percentage" factor, keeping values that are equal or less than 100.
If you set this value to 50, you will have for each group a size corresponding to 50 percent of the smallest group size.
If no color factor has been specified as part of the input data, using this filter will simply subsample the data to 50% of the total observations.

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
.github/workflows		.github/workflows
renv		renv
tests		tests
.Rprofile		.Rprofile
.gitignore		.gitignore
README.md		README.md
downsample_operator.Rproj		downsample_operator.Rproj
main.R		main.R
operator.json		operator.json
renv.lock		renv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Downsample

Description

Usage

How to use the operator?

About

Uh oh!

Releases 12

Packages

Contributors 7

Uh oh!

Languages

tercen/downsample_operator

Folders and files

Latest commit

History

Repository files navigation

Downsample

Description

Usage

How to use the operator?

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 12

Packages 0

Contributors 7

Uh oh!

Languages

Packages