Achieving Fair Inference Using Error-Prone Outcomes.

Authors

DOI:

https://doi.org/10.9781/ijimai.2021.02.007

Keywords:

Algorithmic Bias, Latent Variable Model, Error Analysis, Fair Machine Learning, Measurement Invariance

Abstract

Recently, an increasing amount of research has focused on methods to assess and account for fairness criteria when predicting ground truth targets in supervised learning. However, recent literature has shown that prediction unfairness can potentially arise due to measurement error when target labels are error prone. In this study we demonstrate that existing methods to assess and calibrate fairness criteria do not extend to the true target variable of interest, when an error-prone proxy target is used. As a solution to this problem, we suggest a framework that combines two existing fields of research: fair ML methods, such as those found in the counterfactual fairness literature and measurement models found in the statistical literature. Firstly, we discuss these approaches and how they can be combined to form our framework. We also show that, in a healthcare decision problem, a latent variable model to account for measurement error removes the unfairness detected previously.

Downloads

Download data is not yet available.

References

[1] R. Berk, H. Heidari, S. Jabbari, M. Kearns, A. Roth, “Fairness in criminal justice risk assessments: The state of the art,” Sociological Methods & Research, p. 0049124118782533, 2018.

[2] S. Corbett-Davies, E. Pierson, A. Feller, S. Goel, A. Huq, “Algorithmic decision making and the cost of fairness,” in Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2017, pp. 797–806.

[3] C. Dwork, M. Hardt, T. Pitassi, O. Reingold, R. Zemel, “Fairness through awareness,” in Proceedings of the 3rd innovations in theoretical computer science conference, 2012, pp. 214–226.

[4] J. Kleinberg, S. Mullainathan, M. Raghavan, “Inherent trade-offs in the fair determination of risk scores,” arXiv preprint arXiv:1609.05807, 2016.

[5] M. J. Kusner, J. Loftus, C. Russell, R. Silva, “Counterfactual Fairness,” in Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, R. Garnett Eds., Curran Associates, Inc., 2017, pp. 4066–4076.

[6] S. Verma, J. Rubin, “Fairness definitions explained,” in Proceedings of the International Workshop on Software Fairness, FairWare ’18, Gothenburg, Sweden, May 2018, pp. 1–7, As-sociation for Computing Machinery.

[7] Z. Obermeyer, B. Powers, C. Vogeli, S. Mullainathan, “Dissecting racial bias in an algorithm used to manage the health of populations,” Science, vol. 366, pp. 447–453, Oct. 2019, doi: 10.1126/science.aax2342.

[8] R. Nabi, I. Shpitser, “Fair Inference on Outcomes,” in Thirty-Second AAAI Conference on Artificial Intelligence, Apr. 2018.

[9] A. Chouldechova, “Fair prediction with disparate impact: A study of bias in recidivism prediction instruments,” Big data, vol. 5, no. 2, pp. 153–163, 2017.

[10] A. Z. Jacobs, H. Wallach, “Measurement and fairness,” arXiv preprint arXiv:1912.05511, 2019.

[11] A. P. Dawid, A. M. Skene, “Maximum likelihood estimation of observer error-rates using the em algorithm,” Journal of the Royal Statistical Society: Series C (Applied Statistics), vol. 28, no. 1, pp. 20–28, 1979.

[12] V. C. Raykar, S. Yu, L. H. Zhao, G. H. Valadez, C. Florin, L. Bogoni, L. Moy, “Learning from crowds.,” Journal of Machine Learning Research, vol. 11, no. 4, 2010.

[13] D. Borsboom, “When does measurement invariance matter?”, Medical care, vol. 44, no. 11, pp. S176–S181, 2006.

[14] J. Pearl, Causality models, reasoning, and inference. Cambridge: Cambridge University Press, 2013. OCLC: 956314447.

[15] P. Spirtes, C. N. Glymour, R. Scheines, D. Heckerman, Causation, prediction, and search. MIT press, 2000.

[16] T. B. Brakenhoff, M. Mitroiu, R. H. Keogh, K. G. Moons, R. H. Groenwold, M. van Smeden, “Measurement error is often neglected in medical literature: a systematic review,” Journal of clinical epidemiology, vol. 98, pp. 89–97, 2018.

[17] D. Borsboom, “Latent variable theory,” Measurement: Interdisciplinary Research and Perspectives, vol. 6, no. 1-2, pp. 25–53, 2008, doi: 10.1080/15366360802035497.

[18] N. Kilbertus, M. R. Carulla, G. Parascandolo, M. Hardt, D. Janzing, B. Schölkopf, “Avoiding discrimination through causal reasoning,” in Advances in Neural Information Processing Systems, 2017, pp. 656–666.

[19] W. A. Fuller, Measurement error models, vol. 305. John Wiley & Sons, 2009.

[20] H. M. Blalock, A. B. Blalock, “Methodology in social research,” 1968.

[21] A. L. McCutcheon, Latent class analysis. No. 64, Sage, 1987.

[22] G. Rasch, Probabilistic models for some intelligence and attainment tests. ERIC, 1993.

[23] G. J. McLachlan, K. E. Basford, Mixture models: Inference and applications to clustering, vol. 38. M. Dekker New York, 1988.

[24] F. M. Lord, Applications of item response theory to practical testing problems. Routledge, 2012.

[25] K. A. Bollen, Structural equations with latent variables. Wiley series in probability and mathematical statistics Applied probability and statistics, New York, NY Chichester Brisbane Toronto Singapore: Wiley, 1989. OCLC: 18834634.

[26] A. Skrondal, S. Rabe-Hesketh, Generalized latent variable modeling: Multilevel, longitudinal, and structural equation models. CRC Press, 2004.

[27] K. G. Jöreskog, “Testing structural equation models,” Sage focus editions, vol. 154, pp. 294–294, 1993.

[28] G. J. Mellenbergh, “Item bias and item response theory,” International Journal of Educational Research, vol. 13, pp. 127–143, Jan. 1989, doi: 10.1016/0883-0355(89)90002-5.

[29] P. W. Holland, H. Wainer, Differential Item Functioning. New York: Routledge, 1993.

[30] N. Schmitt, G. Kuljanin, “Measurement invariance: Review of practice and implications,” Human resource management review, vol. 18, no. 4, pp. 210–222, 2008.

[31] P. Flore, “Stereotype threat and differential item functioning: A critical assessment,” 2018.

[32] J.-B. E. Steenkamp, H. Baumgartner, “Assessing measurement invariance in cross-national consumer research,” Journal of consumer research, vol. 25, no. 1, pp. 78–90, 1998.

[33] R. J. Vandenberg, C. E. Lance, “A review and synthesis of the measurement invariance literature: Suggestions, practices, and recommendations for organizational research,” Organizational research methods, vol. 3, no. 1, pp. 4–70, 2000.

[34] B. M. Byrne, Structural equation modeling with Mplus: Basic concepts, applications, and programming. routledge, 2013.

[35] Y. Rosseel, “Lavaan: An r package for structural equation modeling and more. version 0.5–12 (beta),” Journal of statistical software, vol. 48, no. 2, pp. 1–36, 2012.

[36] K. G. Jöreskog, A. S. Goldberger, “Estimation of a model with multiple indicators and multiple causes of a single latent variable,” Journal of the American Statistical Association, vol. 70, no. 351a, pp. 631–639, 1975.

[37] R. Tibshirani, M. Saunders, S. Rosset, J. Zhu, K. Knight, “Sparsity and smoothness via the fused lasso,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 67, no. 1, pp. 91–108, 2005.

[38] T. Hastie, R. Tibshirani, J. Friedman, The elements of statistical learning: data mining, inference, and prediction. Springer Science & Business Media, 2009.

Downloads

Published

2021-03-01
Metrics
Views/Downloads
  • Abstract
    324
  • PDF
    50

How to Cite

Boeschoten, L., van Kesteren, E. J., Bagheri, A., and L. Oberski, D. (2021). Achieving Fair Inference Using Error-Prone Outcomes. International Journal of Interactive Multimedia and Artificial Intelligence, 6(5), 9–15. https://doi.org/10.9781/ijimai.2021.02.007