• If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!



Page history last edited by bob pruzek 13 years, 3 months ago

Welcome to my Propensity Score Analysis wiki!

    Bob Pruzek, the University at Albany SUNY   (rmpruzek@yahoo.com)


This wiki aims to facilitate learning about Propensity Score Analysis (PSA).  It provides a repository of links to articles and other sources that can help you learn what PSA is and how it works; it also aims to facilitate teaching about PSA methods, including a provision of links and aids to help students who want to conduct PSA studies. For a list of articles and related sources, mostly pdfs (including brief descriptions) see: PSAcoursePDF-References09.pdf each of which I can provide, perhaps on request. (This list will soon be updated.)


Note that in all my own recent teaching and research I have focused on R software, so R has become the focus of my computational approach. Many links below relate to this.


The following documents, and links below, are intended to be most useful as an introduction to PSA.


Since Donald Rubin is perhaps the 'father' of propensity score analysis, and has written many papers on this topic, you might well begin w/ the first item below to get an overview of what PSA is about. These files can also be found in the Files section of this wiki. 

    1.  Rubinpsaexposit.pdf

    2.  BehStatObserv.study.rosenbaum05.pdf

    3.  ConniffeEtAl-PSA-00.pdf  

    4.  psa.applications.graphics.04.pdf

    5.  IntroPropensityScoreAnalysisBP.pdf

    6.  PSAcoursePDF-References09.pdf

    7.  PSAmatchngGuo05WkshopPpt.pdf

    8.  COX.ASPCTS OF CAUSALITY92A.pdf   <-- This and the next entries are notable scholarly articles, often referenced.

    9. holland_JASA_1986.pdf

    10. dRubinDesign-StatsInMedicine06.pdf


In addition, an excellent recent paper reviewing PSA, especially matching, was recently posted by Elizabeth Stuart (former student of Don Rubin, and PSA scholar): see the first paper in particular at: http://www.biostat.jhsph.edu/~estuart/papers.html


A new link pertaining to PSA software, again, with an emphasis on matching, is Stuart's:



Also see http://www-stat.wharton.upenn.edu/~rosenbap/match2control.pdf


Rosenbaum's new book, dated 2010, Design of Observational Studies (which I strongly recommend) is now available. See http://www.springer.com/statistics/statistical+theory+and+methods/book/978-1-4419-1212-1?detailsPage=samplePages


The book Propensity Score Analysis, by Guo and Fraser, was recently published:                      


For a preview of many of the authors' ideas and emphases, see Item 7. above.


Although not available for download, I also recommend the basic paper by Rubin (1974) on Experiments and Observational Studies, where they are discussed for the first time "in parallel"; also, Rubin (1978) while much deeper and more abstract, is an almost essential read for those wanting a comprehensive and in-depth understanding of the essentials of what came to be known at the Rubin Causal Model (RCM), or the potential outcomes framework. 


The link http://www.ncbi.nlm.nih.gov/sites/entrez?db=pmc is also notable; here you can enter a term (e.g. propensity score) and find numerous, often recently published, papers that concern this issue; also, often related topics. Full texts ordinarily available.


For links to seminars, including movies, on software and methods of a wide variety see http://www.ats.ucla.edu/stat/seminars/default.htm  


To see a rather long list of dietary and related issues for which causal claims are relevant go to:


(Click on 'Statistics' to see documents that cite empirical data and interpretations.)


A link that identifies numerous articles related especially to matching and psa (largely economics) is:



A related link (especially relevant to basic issues of experimental design) is: 

 http://www.amstat.org/publications/jse/v17n1/helmreich.html for an introduction to what you may find to be useful graphics, where one of the examples concerns PSA explicitly (see Fig. 5). A key reason for including (actually for writing) this article, however, is that it contains what I see as especially powerful and largely untried methods foe experimentally assessing treatment effects based on one of the oldest ideas in statistics, the simple two dependent sample design. Read the discussion section carefully too.


See http://www.medpagetoday.com/Cardiology/CHF/tb/9210 for an excellent example of the effects/value of blocking.


For help in learning R, see the links at my wiki: epym08learnr.pbwiki.com as well as the tutorial by Professor Bruce Dudek pdf (Fall, 2008): http://www.albany.edu/psy/bcd/share/learningr/learningR_ver1.2.pdf


For current information and archives pertaining to R and related software applications see: http://r-statistics.com


My most recent post is a pdf of slides based on a recent power point talk: PSAintroBP.Mar11.ppt.pdf


You can view and download my recent IALSA conference paper on longitudinal applications of PSA here: IALSAtalk.BobP.JuneVictoria.pdf



** The following items relate mostly to class uses; you will (generally) need to have been invited to participate further in this wiki. Write to me at rmpruzek@yahoo.com if you want an invitation.


For Class-related PSA links and questions, go to:  Fall09 Class Links, Questions and Answers 


Here is YanWu's excellent MS project PSA presentation on survival analysis following radiation or surgery:  YanWuMSpptFinal.pdf

  (BTW, this dropbox is quite large now; you may request an invitation if you want access to lots of pdfs related to PSA.)


Regarding the dropbox Jason Bryer has posted files related to uses of LaTeX and Sweave.  Naturally, the more you study the various documents in advance of class, the more you are likely to get from the presentation. 


I have posted a second illustration of PSA centered on the Berkeley birthwt data, this time w/ a different (better) LR model and one more graphic.


In addition, I suggest you may find it useful to seehttp://imai.princeton.edu/research/files/matchit.pdf .  


Both of the old Rubin papers are in the bDudek.Pdfs subdirectory of PSArelated in your dropbox. (I look forward to drafts of your PSA analyses of the Berkley birthwt data ASAP!). I've downloaded a file here, and in the PSArelated dropbox folder#, called berk.birthwt2.txt. This file contains the birthwts (bwtt) in ounces for all infants. BP (If you want access to my PSArelated dropbox, please write to me.)


I suggest you take a look at http://www.r-project.org/doc/Rnews/Rnews_2007-2.pdf . The function optmatch is generally relevant, and the examples there are especially helpful. This is a long issue of the Rnews periodical and you should begin to examine it from time to time. Even granova is listed in this edition; but see all the listings near the end. Add this to your reading and you will have enough to do. BP


Go to http://www.jstatsoft.org/v29/i06 to see our PSAgraphics article (March '09 issue of the J. of Statistical Software). This article concerns our PSAgraphics package R, now v. 2.0, which has recently been updated


For students interested in reading more about ancova, which must be distinguished from PSA adjustment, see http://web.uccs.edu/lbecker/Psy590/ancova2.htm


The following URL provides links to several recent and draft papers authored (or coauthored) by Gary King, Harvard. He is an author of R package MatchIt and has contributed wide-ranging articles to statistical science, especially related to counterfactuals, matching, ways of conceptualizing/doing social science research, etc. King is concerned about both theory and practice so it will often be helpful to read his work.   gking.harvard.edu/preprints.shtml







Comments (6)

harryxkn@... said

at 11:20 am on Oct 24, 2008

A quick solution to plotting rpart results: draw.tree function from maptree package. For example:
> rpt.lalonde = rpart(formula= treat ~ re74 + re75 + u74 +u75 + educ + black + hispan + age, data=lalonde, control=rpart.control(cp=.015))
> draw.tree(rpt.lalonde )

bob pruzek said

at 2:44 pm on Oct 28, 2008

Note that numerous documents can be found on G. King's various sites at the link, here especially for package MatchIt:

bob pruzek said

at 2:48 pm on Oct 28, 2008

Note that I have added a new pdf to the files on this wiki (be sure to get it): GBMandBoasting04-Ridgeway.pdf

bob pruzek said

at 12:54 am on Oct 29, 2008

Also, see the new pdf: UsingMethodgbm4p-scoreEstim.pdf among new files.

Yi Sun said

at 3:42 pm on Oct 30, 2008

Hi, Dr Pruzek & Harry.
When I followed the "UsingMethodgbm4p-scoreEstim.pdf" , I couldn't get "ps.obj=ps(formula = treat ~ re75 + u74 + u75 + educ + black + re74 + hispan + age, data = lalnd)" to run,
the error information: "Diagnosis of unweighted analysis
Error in `[.data.frame`(data, , c(treat.var, var.names)) :
undefined columns selected" was always shown.
How can deal with that? Thanks

bob pruzek said

at 10:04 pm on Oct 30, 2008

Yi, Please bring it along tomorrow. That's the only reasonable way at this point. BP

You don't have permission to comment on this page.