Bayesian Statistics (4th ed) Read online




  Contents

  Cover

  Title Page

  Copyright

  Dedication

  Preface

  Preface to the First Edition

  Chapter 1: Preliminaries

  1.1 Probability and Bayes’ Theorem

  1.2 Examples on Bayes’ Theorem

  1.3 Random variables

  1.4 Several random variables

  1.5 Means and variances

  1.6 Exercises on Chapter 1

  Chapter 2: Bayesian inference for the normal distribution

  2.1 Nature of Bayesian inference

  2.2 Normal prior and likelihood

  2.3 Several normal observations with a normal prior

  2.4 Dominant likelihoods

  2.5 Locally uniform priors

  2.6 Highest density regions

  2.7 Normal variance

  2.8 HDRs for the normal variance

  2.9 The role of sufficiency

  2.10 Conjugate prior distributions

  2.11 The exponential family

  2.12 Normal mean and variance both unknown

  2.13 Conjugate joint prior for the normal distribution

  2.14 Exercises on Chapter 2

  Chapter 3: Some other common distributions

  3.1 The binomial distribution

  3.2 Reference prior for the binomial likelihood

  3.3 Jeffreys’ rule

  3.4 The Poisson distribution

  3.5 The uniform distribution

  3.6 Reference prior for the uniform distribution

  3.7 The tramcar problem

  3.8 The first digit problem; invariant priors

  3.9 The circular normal distribution

  3.10 Approximations based on the likelihood

  3.11 Reference posterior distributions

  3.12 Exercises on Chapter 3

  Chapter 4: Hypothesis testing

  4.1 Hypothesis testing

  4.2 One-sided hypothesis tests

  4.3 Lindley’s method

  4.4 Point (or sharp) null hypotheses with prior information

  4.5 Point null hypotheses for the normal distribution

  4.6 The Doogian philosophy

  4.7 Exercises on Chapter 4

  Chapter 5: Two-sample problems

  5.1 Two-sample problems – both variances unknown

  5.2 Variances unknown but equal

  5.3 Variances unknown and unequal (Behrens–Fisher problem)

  5.4 The Behrens–Fisher controversy

  5.5 Inferences concerning a variance ratio

  5.6 Comparison of two proportions; the $2times 2$ table

  5.7 Exercises on Chapter 5

  Chapter 6: Correlation, regression and the analysis of variance

  6.1 Theory of the correlation coefficient

  6.2 Examples on the use of the correlation coefficient

  6.3 Regression and the bivariate normal model

  6.4 Conjugate prior for the bivariate regression model

  6.5 Comparison of several means – the one way model

  6.6 The two way layout

  6.7 The general linear model

  6.8 Exercises on Chapter 6

  Chapter 7: Other topics

  7.1 The likelihood principle

  7.2 The stopping rule principle

  7.3 Informative stopping rules

  7.4 The likelihood principle and reference priors

  7.5 Bayesian decision theory

  7.6 Bayes linear methods

  7.7 Decision theory and hypothesis testing

  7.8 Empirical Bayes methods

  7.9 Exercises on Chapter 7

  Chapter 8: Hierarchical models

  8.1 The idea of a hierarchical model

  8.2 The hierarchical normal model

  8.3 The baseball example

  8.4 The Stein estimator

  8.5 Bayesian analysis for an unknown overall mean

  8.6 The general linear model revisited

  8.7 Exercises on Chapter 8

  Chapter 9: The Gibbs sampler and other numerical methods

  9.1 Introduction to numerical methods

  9.2 The EM algorithm

  9.3 Data augmentation by Monte Carlo

  9.4 The Gibbs sampler

  9.5 Rejection sampling

  9.6 The Metropolis–Hastings algorithm

  9.7 Introduction to WinBUGS and OpenBUGS

  9.8 Generalized linear models

  9.9 Exercises on Chapter 9

  Chapter 10: Some approximate methods

  10.1 Bayesian importance sampling

  10.2 Variational Bayesian methods: simple case

  10.3 Variational Bayesian methods: general case

  10.4 ABC: Approximate Bayesian Computation

  10.5 Reversible jump Markov chain Monte Carlo

  10.6 Exercises on Chapter 10

  Appendix A: Common statistical distributions

  A.1 Normal distribution

  A.2 Chi-squared distribution

  A.3 Normal approximation to chi-squared

  A.4 Gamma distribution

  A.5 Inverse chi-squared distribution

  A.6 Inverse chi distribution

  A.7 Log chi-squared distribution

  A.8 Student’s t distribution

  A.9 Normal/chi-squared distribution

  A.10 Beta distribution

  A.11 Binomial distribution

  A.12 Poisson distribution

  A.13 Negative binomial distribution

  A.14 Hypergeometric distribution

  A.15 Uniform distribution

  A.16 Pareto distribution

  A.17 Circular normal distribution

  A.18 Behrens’ distribution

  A.19 Snedecor’s F distribution

  A.20 Fisher’s z distribution

  A.21 Cauchy distribution

  A.22 The probability that one beta variable is greater than another

  A.23 Bivariate normal distribution

  A.24 Multivariate normal distribution

  A.25 Distribution of the correlation coefficient

  Appendix B: Tables

  Appendix C: R programs

  Appendix D: Further reading

  D.1 Robustness

  D.2 Nonparametric methods

  D.3 Multivariate estimation

  D.4 Time series and forecasting

  D.5 Sequential methods

  D.6 Numerical methods

  D.7 Bayesian networks

  D.8 General reading

  References

  Index

  This edition first published 2012

  © 2012 John Wiley and Sons Ltd

  Registered office

  John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, United Kingdom

  For details of our global editorial offices, for customer services and for information about how to apply for permission to reuse the copyright material in this book please see our website at www.wiley.com.

  The right of the author to be identified as the author of this work has been asserted in accordance with the Copyright, Designs and Patents Act 1988.

  All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by the UK Copyright, Designs and Patents Act 1988, without the prior permission of the publisher.

  Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic books.

  Designations used by companies to distinguish their products are often claimed as trademarks. All brand names and product names used in this book are trade names, service marks, trademarks or registered trad
emarks of their respective owners. The publisher is not associated with any product or vendor mentioned in this book. This publication is designed to provide accurate and authoritative information in regard to the subject matter covered. It is sold on the understanding that the publisher is not engaged in rendering professional services. If professional advice or other expert assistance is required, the services of a competent professional should be sought.

  Library of Congress Cataloging-in-Publication Data

  Lee, Peter M.

  Bayesian statistics : an introduction / Peter M. Lee. – 4th ed.

  Includes bibliographical references and index.

  ISBN 978-1-118-33257-3 (pbk.)

  1. Bayesian statistical decision theory. I. Title.

  QA279.5.L44 2012

  519.5′42–dc23

  2012007007

  A catalogue record for this book is available from the British Library.

  ISBN: 9781118332573

  To The Memory of My Mother and of My Father

  Preface

  When I started writing this book in 1987 it never occurred to me that it would still be of interest nearly a quarter of a century later, but it appears that it is, and I am delighted to introduce a fourth edition. The subject moves ever onwards, with increasing emphasis on Monte-Carlo based techniques. With this in mind, Chapter 9 entitled ‘The Gibbs sampler’ has been considerably extended (including more numerical examples and treatments of OpenBUGS, R2WinBUGS and R2OpenBUGS) and a new Chapter 10 covering Bayesian importance sampling, variational Bayes, ABC (Approximate Bayesian Computation) and RJMCMC (Reversible Jump Markov Chain Monte Carlo) has been added. Mistakes and misprints in the third edition have been corrected and minor alterations made throughout.

  The basic idea of using Bayesian methods has become more and more popular, and a useful accessible account for the layman has been written by McGrayne (2011). There is every reason to believe that an approach to statistics which I began teaching in 1985 with some misgivings because of its unfashionability will continue to gain adherents. The fact is that the Bayesian approach produces results in a comprehensible form and with modern computational methods produces them quickly and easily.

  Useful comments for which I am grateful were received from John Burkett, Stephen Connor, Jacco Thijssen, Bo Wang and others; they, of course, have no responsibility for any deficiencies in the end result.

  The website associated with the book

  http://www-users.york.ac.uk/~pml1/bayes/book.htm

  (note that in the above pml are letters followed by the digit 1) works through all the numerical examples in as well as giving solutions to all the exercises in the book (and some further exercises to which the solutions are not given).

  Peter M. Lee

  19 December 2011

  Preface to the First Edition

  When I first learned a little statistics, I felt confused, and others I spoke to confessed that they had similar feelings. Not because the mathematics was difficult – most of that was a lot easier than pure mathematics – but because I found it difficult to follow the logic by which inferences were arrived at from data. It sounded as if the statement that a null hypothesis was rejected at the 5% level meant that there was only a 5% chance of that hypothesis was true, and yet the books warned me that this was not a permissible interpretation. Similarly, the statement that a 95% confidence interval for an unknown parameter ran from −2 to +2 sounded as if the parameter lay in that interval with 95% probability and yet I was warned that all I could say was that if I carried out similar procedures time after time then the unknown parameters would lie in the confidence intervals I constructed 95% of the time. It appeared that the books I looked at were not answering the questions that would naturally occur to a beginner, and that instead they answered some rather recondite questions which no one was likely to want to ask.

  Subsequently, I discovered that the whole theory had been worked out in very considerable detail in such books as Lehmann (1986). But attempts such as those that Lehmann describes to put everything on a firm foundation raised even more questions. I gathered that the usual t test could be justified as a procedure that was `uniformly most powerful unbiased’, but I could only marvel at the ingenuity that led to the invention of such criteria for the justification of the procedure, while remaining unconvinced that they had anything sensible to say about a general theory of statistical inference. Of course Lehmann and others with an equal degree of common sense were capable of developing more and more complicated constructions and exceptions so as to build up a theory that appeared to cover most problems without doing anything obviously silly, and yet the whole enterprise seemed reminiscent of the construction of epicycle upon epicycle in order to preserve a theory of planetary motion based on circular motion; there seemed to be an awful lot of `adhockery’.

  I was told that there was another theory of statistical inference, based ultimately on the work of the Rev. Thomas Bayes, a Presbyterian minister, who lived from 1702 to 1761 whose key paper was published posthumously by his friend Richard Price as Bayes (1763) [more information about Bayes himself and his work can be found in Holland (1962), Todhunter (1865, 1949) and Stigler (1986a)].1 However, I was warned that there was something not quite proper about this theory, because it depended on your personal beliefs and so was not objective. More precisely, it depended on taking some expression of your beliefs about an unknown quantity before the data was available (your ‘prior probabilities’) and modifying them in the light of the data (via the so-called ‘likelihood function’) to arrive at your ‘posterior probabilities’ using the formulation that ‘posterior is proportional to prior times likelihood’. The standard, or ‘classical’, theory of statistical inference, on the other hand, was said to be objective, because it does not refer to anything corresponding to the Bayesian notion of ‘prior beliefs’. Of course, the fact that in this theory, you sometimes looked for a 5% significance test and sometimes for a 0.1% significance test, depending on what you thought about the different situations involved, was said to be quite a different matter.

  I went on to discover that this theory could lead to the sorts of conclusions that I had naïvely expected to get from statistics when I first learned about it. Indeed, some lecture notes of Lindley's [and subsequently his book, Lindley (1965)] and the pioneering book by Jeffreys (1961) showed that if the statistician had `personal probabilities’ that were of a certain conventional type then conclusions very like those in the elementary books I had first looked at could be arrived at, with the difference that a 95% confidence interval really did mean an interval in which the statistician was justified in thinking that there was a 95% probability of finding the unknown parameter. On the other hand, there was the further freedom to adopt other initial choices of personal beliefs and thus to arrive at different conclusions.

  Over a number of years I taught the standard, classical, theory of statistics to a large number of students, most of whom appeared to have similar difficulties to those I had myself encountered in understanding the nature of the conclusions that this theory comes to. However, the mere fact that students have difficulty with a theory does not prove it wrong. More importantly, I found the theory did not improve with better acquaintance, and I went on studying Bayesian theory. It turned out that there were real differences in the conclusions arrived at by classical and Bayesian statisticians, and so the former was not just a special case of the latter corresponding to a conventional choice of prior beliefs. On the contrary, there was a strong disagreement between statisticians as to the conclusions to be arrived at in certain standard situations, of which I will cite three examples for now. One concerns a test of a sharp null hypothesis (e.g. a test that the mean of a distribution is exactly equal to zero), especially when the sample size was large. A second concerns the Behrens–Fisher problem, that is, the inferences that can be made about the difference between the means of two populations when no assumption is made about their variances. Another is the likelihood pr
inciple, which asserts that you can only take account of the probability of events that have actually occurred under various hypotheses, and not of events that might have happened but did not; this principle follows from Bayesian statistics and is contradicted by the classical theory. A particular case concerns the relevance of stopping rules, that is to say whether or not you are entitled to take into account the fact that the experimenter decided when to stop experimenting depending on the results so far available rather than having decided to use a fixed sample size all along. The more I thought about all these controversies, the more I was convinced that the Bayesians were right on these disputed issues.

  At long last, I decided to teach a third-year course on Bayesian statistics in the University of York, which I have now done for a few years. Most of the students who took the course did find the theory more coherent than the classical theory they had learned in the first course on mathematical statistics they had taken in their second year, and I became yet more clear in my own mind that this was the right way to view statistics. I do, however, admit that there are topics (such as non-parametric statistics) which are difficult to fit into a Bayesian framework.

  A particular difficulty in teaching this course was the absence of a suitable book for students who were reasonably well prepared mathematically and already knew some statistics, even if they knew nothing of Bayes apart from Bayes’ theorem. I wanted to teach them more, and to give more information about the incorporation of real as opposed to conventional prior information, than they could get from Lindley (1965), but I did not think they were well enough prepared to face books like Box and Tiao (1973) or Berger (1985), and so I found that in teaching the course I had to get together material from a large number of sources, and in the end found myself writing this book. It seems less and less likely that students in mathematics departments will be completely unfamiliar with the ideas of statistics, and yet they are not (so far) likely to have encountered Bayesian methods in their first course on statistics, and this book is designed with these facts in mind. It is assumed that the reader has a knowledge of calculus of one and two variables and a fair degree of mathematical maturity, but most of the book does not assume a knowledge of linear algebra. The development of the text is self-contained, but from time to time the contrast between Bayesian and classical conclusions is pointed out, and it is supposed that in most cases the reader will have some idea as to the conclusion that a classical statistician would come to, although no very detailed knowledge of classical statistics is expected. It should be possible to use the book as a course text for final year undergraduate or beginning graduate students or for self-study for those who want a concise account of the way in which the Bayesian approach to statistics develops and the contrast between it and the conventional approach. The theory is built up step by step, rather than doing everything in the greatest generality to start with, and important notions such as sufficiency are brought out of a discussion of the salient features of specific examples.