Skip to content

Attribute Attack should report confidence that training set is not more vulnerable than test #166

@jim-smith

Description

@jim-smith

At moment we effectively run a worst-case attack where a simulated attacker has the model which outputs probabilities, and has a record with the target label and with just the value for one feature missing.
A `competent' published model may increase the likelihood that an attacker can estimate the missing value for a record more reliably than they could without the model.

So this uses is, is this risk different for items that were in the training set than it is for the general population?

We assess this risk separately for each attribute - assuming the TRE may set a different risk appetite for each.

Procedure:

  1. Compute the number of vulnerable train and test records ($v_{tr}, v_{te}$ respectively)
  2. Assess the proportion $p_{tr}$ of 'vulnerable' training set items: $p_{tr} = v_{tr}/ $n_{tr}$
  3. Assess the proportion of 'vulnerable' test set items $p_{te} = v_{te}/n_{te}$

Currently we report the ratio of the two fractions$ \frac { p_{tr} }}{p_{te}}$

We should report the probability that the observed differences of proportions is significant

  • using a one tailed test I.e. is the training data more vulnerable

-- some code examples in metrics.py for pdf, or description here

  • Null hypothesis $p_{tr} > p_{te}$
  • pooled proportion $p = \frac{ v_{tr} + v_{te}} / {n_{tr} + n{te}} $
  • standard error $SE = \sqrt{ p * ( 1 - p ) * [ (1/n_{tr}) + (1/n_{te}) ] }
  • test statistic $z = (p1 - p2) / SE $
  • P-value is the probability that the z-score is less than $z$

using norm from scipy.stats,

probability = norm.cdf(z, loc=0,scale=SE)

Then for report we have to decide whether to use 95% or 99% confidence

Metadata

Metadata

Labels

enhancementNew feature or requestwaitingThis issue is waiting for something else to be completed (see issue for details)

Type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions