A Practical Guide to Common Ecological Analyses in R

Note: This page is a first attempt to illustrate some R procedures of relevence to ecological analysis. These may not be the most efficient or effective ways of doing things, but they did work for me. Suggestions for modifications, corrections and additions are welcome. Comments in square brackets indicate intended additions.

Revision History:
17 Feb 2000: removed comparison of un-nestable regresion models.
05 Feb 2000: First draft.


Contents:
What is R?
R Resources
Helpful Hints
Conventions In This Document
Starting R
Reading data into R
Corellation analysis
Scatterplot matricies
Regression Lines in R
Comparing Regression Lines in R
Adding R Libraries
Using the Leaps Library
Using the Xgobi Interface

What is R?

R is probably the most awesome statistics package going, if there is such a thing as an awesome stats program. It can do things that SAS cannot, it's far more flexible than most stats packages AND it's free to boot. It links to databases, runs batch programs well, and has links to visualization software. Have a look at the R Project web site for more info.


R Resources

Most active open source software projects have copious amounts of documentation. R is no exception. There is the R-FAQ as well as a file called R-notes.ps which details much of the basic R functionality. The program also has considerable online help that you can access using the help.start() command. Venables and Ripley have a good book on the use of S, S+ and R; I had no trouble finding it in the local university library. And finally, there is also the R-help mailing list. Subscribe, and read. If at all possible, go back through the archives to make sure that your question hasn't been asked before.


Helpful Hints

If you're running a recent Linux machine, then chances are that you already have all of the useful tools that you could possibly want. This would include gv or Ghostview (a viewer for postscript files) and ps2pdf (a utility to convert postscript files to PDF files) so that you can share your results with those using Windows).

If you're one of these Windows-using souls, you'll want to find GSview or something similar to read postscript files. If you can afford the Adobe Acrobat program, then by all means, you can use that to convert your graphs to an easier-to-read (for Windows users) format. And very seriously, consider using that old P100 clunker in the lab to run Linux, a windows manager like fvwm and R. If your lab is anything like the lab I was in, you can't really afford the latest and greatest hardware. Fortunately, Linux will run quite happily on hardware that isn't the 'lastest and greatest,' so there isn't any excuse not to benefit from the cornucopia of free, quality, open source research software that the Linux world has to offer.

Oh, one last thing. R is developing quickly. Bugs found in earlier versions often get found and corrected quickly. It makes sense, therefore, to ensure that the version of R that you are running is as up to date as possible, which you can check via the R-help mailing list or on the main R site.. If you keep using R, you will quickly realize how easily it is to compile and install R and the relevant R libraries.


Conventions In This Document

Bolding is used in the normal text to emphasize certain points. Text written in fixed width font indicates output from the shell or the R program. Bold fixed width font indicates commands that I typed in as a user.


Starting R

What you see below is me starting from my home directory on my Linux machine, and starting the R program using the R command. It should be quite similar for Windows users. Notice, however, that this is happening at the command line, and most of R works at this level. In my experience, most Windows users who 'just want to use the software' often feel uncomfortable using a command line interface. I assure you, however, that the road (however long) is one worth travelling.

[pete@esox pete]$ R

R : Copyright 2000, The R Development Core Team
Version 0.99.0  (February 7, 2000)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type    "?license" or "?licence" for distribution details.

R is a collaborative project with many contributors.
Type    "?contributors" for a list.

Type    "demo()" for some demos, "help()" for on-line help, or
        "help.start()" for a HTML browser interface to help.
Type    "q()" to quit R.

[Previously saved workspace restored]

>

The 'greater than' arrow is the R prompt. here you can issue interactive commands to R. Alternatively, you can send job files to R in batch mode...

[pete@esox pete]$ R BATCH job1.R job1.Rout

This is nice because not only will R give you the results, it will also put them below the relevant line of your batch file. I can think of a few reasons to do this off the top of my head:

  1. You are carrying out a lot of analyses, repetitive or not, and you would rather build a simple text file to contain all of the necessary steps rather than sitting down at a terminal to enter them in one by one. You can later reuse this file with your trusty text editor so as not to have to keep re-typing the commands at the prompt.
  2. Your lab machine (say a P100) is needed during the day for users, and running R on a large analysis job would be unfairly monopolizing the machine. You could just issue an AT command to call the batch job sometime in the middle of the night, when nobody is using the machine
  3. The Linux / UNIX machine running R is in another lab or elsewhere, so you can actually send the job to another machine, perhaps more suited to this use than your desktop.
  4. R analyses can be called in batch mode from other programs, so large scale analysis is possible, even practical, between systems like the GRASS GIS and R, and R and the POSTGRES database system.
Sure, the last two are unusual, to say the least, but as your analysis needs change, R is extremely flexible and can be used in even the most challenging analytical scenarios.

Reading data into R

There are a lot of spreadsheet formats out there. Fortunately, R is compatible with the easiest format for all spreadsheets: comma- separated values (CSV in Excel-speak, ASCII "comma delimited" in Quattro-land). And yes, you *can* have title lines in the first row of the data file (and I STRONGLY encourage you to do just that).

Here's how I read in two comma-delimited files with headers on the top row from the R prompt:

> y <- read.table("rev-work-y3.txt", header=T)
> x <- read.table("rev-work-x3.txt", header=T)

This normally needs to be repeated every time you start R. However, if you are going to be using the same data frequently, you can have the data 'help' in memory by using the attach function.

> attach(y)
> attach(x)

Once this this done, you can keep a bunch of objects (datasets, etc) in memory every time you resume R. The objects function shows you which objects are in memory.

> objects()
 [1] "cor.prob"     "corr"         "cpx"          "cpy"          "lahod4csub"
 [6] "lahodlm"      "lahodsub"     "lahodtst"     "last.warning" "lvod4csub"
[11] "lvodsub"      "nspred"       "nsresp"       "nsresp2"      "pred"
[16] "predtp"       "resp"         "rslahodlm"    "sublahod"     "sublahod4c"
[21] "sublvod"      "sublvod4c"    "temp"         "x"            "y"

In any case, it is a good idea to check to see that your data did in fact import properly by using the str function. Here's an example of what your imported data could look like...

> str(y)
`data.frame':   93 obs. of  4 variables:
 $ LAHOD  : num  -0.869 -1.254 -1.220 -0.820 -1.161 ...
 $ LAHOD4C: num  -1.011 -1.354 -1.405 -0.925 -1.265 ...
 $ LVOD   : num  -1.32 -1.64 -1.77 -1.45 -1.79 ...
 $ LVOD4C : num  -1.47 -1.74 -1.95 -1.55 -1.89 ...

Well, that looked good. All of my data in that dataset is numeric, and it is seen that way. Now I check the second dataset...

> str(x)
`data.frame':   93 obs. of  20 variables:
 $ LDOC      : num  0.613 0.672 0.556 0.906 0.958 ...
 $ LTP       : num  0.751 0.924 0.791 1.078 1.139 ...
 $ LChlA     : num  0.232 0.211 0.084 0.611 0.321 ...
 $ LZmax     : num  1.20 1.20 1.20 1.30 1.30 ...
 $ LDepth    : num  0.954 1.000 0.845 0.778 0.845 ...
 $ LDSAm     : num  6.13 6.13 6.13 6.79 6.79 ...
 $ LLSAm     : num  5.58 5.58 5.58 5.79 5.79 ...
 $ LCALA     : num  0.553 0.553 0.553 0.999 0.999 ...
 $ LQS1      : num  0.400 0.400 0.400 0.781 0.781 ...
 $ LRETENT1  : num  0.342 0.342 0.342 0.405 0.405 ...
 $ LHAvT     : num  0.941 0.865 1.007 0.874 0.872 ...
 $ LLakeVol  : num  6.36 6.36 6.36 6.46 6.46 ...
 $ LLakeVSA  : num  0.778 0.778 0.778 0.668 0.668 ...
 $ LHypoVol  : num  5.49 5.32 5.76 5.90 5.79 ...
 $ LHypoVSA  : num  0.455 0.389 0.549 0.625 0.629 ...
 $ LHypoThick: num  0.875 0.813 0.978 1.161 1.130 ...
 $ LVHVT     : num  -0.872 -1.035 -0.600 -0.565 -0.666 ...
 $ LWRT      : num  -0.293 -0.293 -0.293 -0.740 -0.740 ...
 $ LVTCA     : num   0.227  0.227  0.227 -0.331 -0.331 ...
 $ LVHCA     : num  -0.644 -0.807 -0.372 -0.896 -0.997 ...

Notice here that both datasets are 93 records long. I decided to separate by response variables (y) from my predictor variables (x) to make some of my tests easier.

IMPORTANT: Keep in mind that if you are writing batch jobs, you will have to stick the read.table statements at the beginning of your job for R to work properly (yes, even if.the data files are ALREADY attached).

[TODO: explain how to remove an item in the objects() list - I don't know how]


Corellation analysis

The corellation matrix is frequently one of the first steps in ecological data exploration. We use the cor function to get the matrix of R values.

> cor(x)
                   LDOC          LTP       LChlA        LZmax      LDepth
LDOC        1.000000000  0.611361871  0.33407957 -0.260539456 -0.48375717
LTP         0.611361871  1.000000000  0.66288872 -0.201233999 -0.38236143
LChlA       0.334079570  0.662888724  1.00000000  0.045514262 -0.12229689
LZmax      -0.260539456 -0.201233999  0.04551426  1.000000000  0.26827900
LDepth     -0.483757167 -0.382361430 -0.12229689  0.268279000  1.00000000
LDSAm       0.449771802  0.351250434  0.24425539  0.009913158 -0.13082742
LLSAm      -0.120539033 -0.236122925 -0.15651418  0.198032581  0.14629193
LCALA       0.631619617  0.603163940  0.41348646 -0.141413134 -0.26981497
LQS1        0.621806457  0.603657096  0.41216664 -0.130307407 -0.25499195
LRETENT1    0.405044612  0.477506809  0.29981974 -0.029226060 -0.08223476
LHAvT       0.151358019  0.098025379 -0.18985805 -0.661892705 -0.32659125
LLakeVol   -0.294367536 -0.338707562 -0.13812244  0.614303072  0.26977751
LLakeVSA   -0.356056915 -0.205663884  0.06622402  0.867229544  0.29735711
LHypoVol   -0.143434460 -0.091133859  0.05478524  0.754926805 -0.11473921
LHypoVSA   -0.131927106 -0.005798852  0.16115870  0.859036308 -0.04839242
LHypoThick -0.149881840 -0.074237638  0.10047769  0.896063753 -0.08577070
LVHVT      -0.002804946  0.100735371  0.17042304  0.646741955 -0.34374559
LWRT       -0.631619617 -0.603163941 -0.41348647  0.141413134  0.26981497
LVTCA      -0.700468534 -0.638000303 -0.36222809  0.503506714  0.35769036
LVHCA      -0.402204724 -0.293345537 -0.08653418  0.744707852 -0.03853742

For my needs, though, I needed more than this. I also needed to know what probabilities that these R values were significantly different from 0. Well, I asked, and lo, I received: Open source works, and the proof is in the fact that when I sent out a request to the R-help list to try to see how I could get these, someone on the list replied quickly. Dr. Venables' reply is a great explanation of not only how you can calculate these probabilities, but also the R code needed to make it work:

cor.prob <- function(X, dfr = nrow(X) - 2) {
         R <- cor(X)
         above <-row(R) < col(R)
         r2 <- R[above]^2
         Fstat <-r2 * dfr / (1 - r2)
         R[above] <-1 - pf(Fstat, 1, dfr)
         R
}

All you have to do is to enter this, line by line, at the R command prompt (the > character). You can then call it much like any other function.

> cor.prob(x)
                   LDOC           LTP         LChlA        LZmax        LDepth
LDOC        1.000000000  7.568723e-11  1.065100e-03  0.011659249  9.001497e-07
LTP         0.611361871  1.000000e+00  4.539702e-13  0.053086103  1.554049e-04
LChlA       0.334079570  6.628887e-01  1.000000e+00  0.664860799  2.428764e-01
LZmax      -0.260539456 -2.012340e-01  4.551426e-02  1.000000000  9.321101e-03
LDepth     -0.483757167 -3.823614e-01 -1.222969e-01  0.268279000  1.000000e+00
LDSAm       0.449771802  3.512504e-01  2.442554e-01  0.009913158 -1.308274e-01
LLSAm      -0.120539033 -2.361229e-01 -1.565142e-01  0.198032581  1.462919e-01
LCALA       0.631619617  6.031639e-01  4.134865e-01 -0.141413134 -2.698150e-01
LQS1        0.621806457  6.036571e-01  4.121666e-01 -0.130307407 -2.549920e-01
LRETENT1    0.405044612  4.775068e-01  2.998197e-01 -0.029226060 -8.223476e-02
LHAvT       0.151358019  9.802538e-02 -1.898580e-01 -0.661892705 -3.265912e-01
LLakeVol   -0.294367536 -3.387076e-01 -1.381224e-01  0.614303072  2.697775e-01
LLakeVSA   -0.356056915 -2.056639e-01  6.622402e-02  0.867229544  2.973571e-01
LHypoVol   -0.143434460 -9.113386e-02  5.478524e-02  0.754926805 -1.147392e-01
LHypoVSA   -0.131927106 -5.798852e-03  1.611587e-01  0.859036308 -4.839242e-02
LHypoThick -0.149881840 -7.423764e-02  1.004777e-01  0.896063753 -8.577070e-02
LVHVT      -0.002804946  1.007354e-01  1.704230e-01  0.646741955 -3.437456e-01
LWRT       -0.631619617 -6.031639e-01 -4.134865e-01  0.141413134  2.698150e-01
LVTCA      -0.700468534 -6.380003e-01 -3.622281e-01  0.503506714  3.576904e-01
LVHCA      -0.402204724 -2.933455e-01 -8.653418e-02  0.744707852 -3.853742e-02

Lots more lines follow below this, representing the other rows being analysed this way. Notice that now the p-values are in the 'top' part of the matrix. Another interesting observation that I didn't make at the time was that after I added this function, it was added as an object to my version of R. If you go back up and look at the output from the objects function above, you will see an object called "cor.prob".


Scatterplot matricies

Often done with the corellation matrix, the scatterplot matrix is a graphical way of looking for pattern in data. It's also useful to look for pattern between your predictor values so that you can decide which predictor variables to remove from your dataset. The command is very simple, but beware: this will fill up a screen very, very quickly if you have a lot of columns in your dataset. I have a 17" monitor on the Linux machine, and running this function on my 20-variable data set more than fills an entire screen.

I have a small, four-variable dataset (the y dataset I mentioned earlier)

> pairs(y)

[TODO: explain how to send this to a postscript file, and not to the screen]
[TODO: somebody needs to write a general online tutorial on graphing.]


Regression Lines in R

Regression is commonly used in ecology to express explanatory or predictive relationships. It helps us understand how one factor can be affected by one or more variables. In this, I am interested in seeing what can predict chlorophyll A concentrations in our lakes. Already, papers like Dillon and Rigler [1974] have detailed the relationship between total phosphorus (LTP, nutrient) concentration and chlorophyll A (LChlA, algal biomass) concentration. Our Shield lakes are somewhat different from those in the Dillon and Rigler paper, so let's see *how* different.

Here, all my data is in the x dataset, so I will preface my variable names with the data set identifier (x) and then the $ separator.

> summary(lm.chl <- lm(x$LChlA ~ x$LTP))

Call:
lm(formula = x$LChlA ~ x$LTP)

Residuals:
      Min        1Q    Median        3Q       Max
-0.335560 -0.100729 -0.001572  0.108871  0.273484

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.40315    0.08262  -4.880 4.50e-06 ***
x$LTP        0.77338    0.09157   8.446 4.54e-13 ***
---
Signif. codes:  0  `***'  0.001  `**'  0.01  `*'  0.05  `.'  0.1  ` '  1

Residual standard error: 0.1263 on 91 degrees of freedom
Multiple R-Squared: 0.4394,     Adjusted R-squared: 0.4333
F-statistic: 71.33 on 1 and 91 degrees of freedom,      p-value: 4.54e-13

What does this tell us? Looking under the "Pr(>|t|)" column, we see that TP is a significant predictor of chlorophyll concetration in our lakes (notice the three *'s at the end of the line, they indicate p < 0.001, or specifically p=4.54e-13, but in any case, it's significant and we don't have to worry much beyond that). Also, notice that the intercept is also highly significant.

Note that I could have easily done the above command in two steps:

> lm.chl <- lm(x$LChlA ~ x$LTP)
> summary(lm.chl)

Just use which ever way works for you.

Knowing that TP is an important predictor of chlorophyll concentrations, we can then ask if there is another variable that would help predict chlorophyll A concentrations better. Our lakes lie on the Canadian Shield, and tend to have have dissolved organic carbon (DOC) concentrations. This brings up the question of whether DOC can be an additional useful predictor of TP concentrations in these lakes:

> lm.chladoc <- lm(LChlA ~ LTP + LDOC)
Error in eval(expr, envir, enclos) : Object "LChlA" not found

The error here lies in the fact that I didn't specify WHICH dataset the predictor variables were from. Easily rectified...

> lm.chladoc <- lm(LChlA ~ LTP + LDOC, data=x)
> summary(lm.chladoc)

Call:
lm(formula = LChlA ~ LTP + LDOC, data = x)

Residuals:
     Min       1Q   Median       3Q      Max
-0.33431 -0.09577  0.01001  0.10762  0.26063

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.38433    0.08409  -4.571 1.54e-05 ***
LTP          0.85446    0.11551   7.397 6.99e-11 ***
LDOC        -0.12465    0.10857  -1.148    0.254
---
Signif. codes:  0  `***'  0.001  `**'  0.01  `*'  0.05  `.'  0.1  ` '  1

Residual standard error: 0.1261 on 90 degrees of freedom
Multiple R-Squared: 0.4475,     Adjusted R-squared: 0.4352
F-statistic: 36.45 on 2 and 90 degrees of freedom,      p-value: 2.538e-12

From these results, we glean that DOC is not a good additional predictor of chlorophyll A concentrations in our lakes.

Note that I could have also written the above command this way:

> summary(lm.chladoc <- lm(LChlA ~ LTP + LDOC, data=x))

[TODO: change this example to use an existing R dataset so that others can use the example exactly]


Comparing Regression Lines in R

When a number of potentially similar regression lines are created, it is often important to be able to tell whether these lines are in fact significantly different from each other. Going through another example from my data, I have two models explaining oxygen depletion rates in my lakes. The first model using morphometry (LHypoPSA), nutrients (LTP) and organic matter (LDOC). The second uses morphometry alone. I then test to see if these are different.

> summary(lm.lahod2 <- lm(y$LAHOD ~ LHypoVSA + LTP + LDOC, data=x))

Call:
lm(formula = y$LAHOD ~ LHypoVSA + LTP + LDOC, data = x)

Residuals:
     Min       1Q   Median       3Q      Max
-1.52635 -0.07084  0.09872  0.18400  0.39080

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  -1.3833     0.2150  -6.434 6.07e-09 ***
LHypoVSA      0.5782     0.1132   5.109 1.83e-06 ***
LTP           0.7008     0.2871   2.441   0.0166 *
LDOC         -0.6935     0.2722  -2.548   0.0126 *
---
Signif. codes:  0  `***'  0.001  `**'  0.01  `*'  0.05  `.'  0.1  ` '  1

Residual standard error: 0.312 on 89 degrees of freedom
Multiple R-Squared: 0.3045,     Adjusted R-squared: 0.281
F-statistic: 12.99 on 3 and 89 degrees of freedom,      p-value: 4.124e-07

This is in fact a significant model. Each of the variables are significant, along with the intercept and the overal regression is significant at the 0.05 level or better. What if we use a simpler model...

> summary(lm.lahod3 <- lm(y$LAHOD ~ x$LHypoVSA))

Call:
lm(formula = y$LAHOD ~ x$LHypoVSA)

Residuals:
     Min       1Q   Median       3Q      Max
-1.64386 -0.05429  0.09138  0.16604  0.37745

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.28446    0.05778 -22.231  < 2e-16 ***
x$LHypoVSA   0.62429    0.11514   5.422 4.81e-07 ***
---
Signif. codes:  0  `***'  0.001  `**'  0.01  `*'  0.05  `.'  0.1  ` '  1

Residual standard error: 0.3216 on 91 degrees of freedom
Multiple R-Squared: 0.2442,     Adjusted R-squared: 0.2359
F-statistic:  29.4 on 1 and 91 degrees of freedom,      p-value: 4.806e-07

This too is significant, and the other model gives us a slightly better RSE and a higher adjusted R-squared. So we expect these two lines to be different, and we test them to make sure:

> anova(lm.lahod2, lm.lahod3)
Analysis of Variance Table

Model 1: y$LAHOD ~ LHypoVSA + LTP + LDOC
Model 2: y$LAHOD ~ x$LHypoVSA
  Res.Df Res.Sum Sq Df Sum Sq F value  Pr(>F)
1     89      8.663
2     91      9.414 -2 -0.751  3.8578 0.02473 *
---
Signif. codes:  0  `***'  0.001  `**'  0.01  `*'  0.05  `.'  0.1  ` '  1

Yes, these lines are in fact different from each other at the p<0.05 level.

Thanks to Martin Maechler <maechler@stat.math.ethz.ch> for explaining it so clearly and mailing his explanation to the R-help list.

[TODO: Modify this example so that it uses an existing R dataset]


Adding R Libraries

Installing packages in R is a relatively straightforward procedure. Once you have downloaded the package of interest (in this case I downloaded the leaps package for all-subsets regression from the R site), the first step to installing the library was to untar it into its own directory.

[pete@esox pkgs]$ tar xvzf leaps_2.0.tar.gz
leaps/
leaps/src/
leaps/src/leaps.f
leaps/src/leapshdr.f
leaps/man/
leaps/man/leaps.Rd
leaps/man/subsets.Rd
leaps/man/plot.subsets.Rd
leaps/man/leaps.setup.Rd
leaps/INDEX
leaps/TITLE
leaps/README
leaps/R/
leaps/R/leaps.R
leaps/R/plot.subsets.R
leaps/DESCRIPTION

The next step then is to call R directly with the INSTALL declaration. Notice that I did not decend into the leaps directory after untarring it. This is important.

[pete@esox pkgs]$ R INSTALL leaps
ERROR: cannot write to or create directory `/usr/local/bin/R/lib/R/library'

Rather than set the permissions on /usr/local/bin/R/lib/R/library so that anyone can write to it, I decided to become root and install the library that way.

[pete@esox pkgs]$ su root
Password:
[root@esox pkgs]# R INSTALL leaps
Installing package `leaps' ...
 libs
g77 -fPIC -g -O2 -c leaps.f -o leaps.o
g77 -fPIC -g -O2 -c leapshdr.f -o leapshdr.o
gcc -shared -o /usr/local/bin/R/lib/R/library/leaps/libs/leaps.so leaps.o leapshdr.o -L/usr/lib/gcc-lib/i386-redhat-linux/egcs-2.91.66 -L/usr/i386-redhat-linux/lib -lg2c -lm
 R
 help
 >>> Building/Updating help pages for package `leaps'
     Formats: text html latex example
 DONE (leaps)

DONE (INSTALL)
[root@esox pkgs]#

R then called the relevant compiler (in this case, g77 - the Fortran 77 compiler) to complete the compilation, installation of the binary files, as well as updating the help pages for this package.

For most R packages, this is all you have to do. Occasionally, however, when the library points to an external program (such as Xgobi) that you have already installed, you will need to specify where the binary files are located before you run R INSTALL. In the case of the Xgobi library, you'll need to edit two files after you untar the library: these are xgobi.R and xgvis.R located in the xgobi/R/ directory. I installed my copy of Xgobi in /usr/local/bin/

Open the files with a text editor, and set the path appropriately - you'll see where the comments in the file indicated where to put the path.


Using the Leaps Library

The leaps contains functions to carry out all-subsets regression, basically find the best possible models to describe or predict a particular variable. It can be an effective way to find the best predictors from amongst a group of variables. This is similar to the RSQUARE selection method in SAS.

In this example, I was looking for the best predictors of oxygen depletion. The first step was to call the leaps library (I had installed it earlier)...

> library(leaps)

> sublahod <- subsets(x, y$LAHOD, nvmax=8, nbest=10)
I was using the contents of the x dataset to regress against the oxygen depletion variable (y$LAHOD, and I wanted to run through the best ten models (nbest=10) for each model length - one through eight predictor terms (nvmax=8).

> summary(sublahod, matrix.logical=T)
Subset selection object
20 Variables (and intercept)
           Forced in Forced out
LDOC           FALSE      FALSE
LTP            FALSE      FALSE
LChlA          FALSE      FALSE
LZmax          FALSE      FALSE
LDepth         FALSE      FALSE
LDSAm          FALSE      FALSE
LLSAm          FALSE      FALSE
LQS1           FALSE      FALSE
LRETENT1       FALSE      FALSE
LHAvT          FALSE      FALSE
LLakeVol       FALSE      FALSE
LLakeVSA       FALSE      FALSE
LHypoVol       FALSE      FALSE
LHypoVSA       FALSE      FALSE
LHypoThick     FALSE      FALSE
LCALA          FALSE      FALSE
LVHVT          FALSE      FALSE
LWRT           FALSE      FALSE
LVTCA          FALSE      FALSE
LVHCA          FALSE      FALSE
10 subsets of each size up to 9
Selection Algorithm: exhaustive
          LDOC    LTP LChlA LZmax LDepth LDSAm LLSAm LCALA  LQS1 LRETENT1 LHAvT LLakeVol
1  ( 1 )  FALSE FALSE FALSE FALSE  FALSE FALSE FALSE FALSE FALSE    FALSE FALSE    FALSE
1  ( 2 )  FALSE FALSE FALSE FALSE  FALSE FALSE FALSE FALSE FALSE    FALSE FALSE    FALSE
1  ( 3 )  FALSE FALSE FALSE FALSE  FALSE FALSE FALSE FALSE FALSE    FALSE FALSE    FALSE
1  ( 4 )  FALSE FALSE FALSE FALSE  FALSE FALSE FALSE FALSE FALSE    FALSE FALSE    FALSE
1  ( 5 )  FALSE FALSE FALSE FALSE  FALSE FALSE FALSE FALSE FALSE    FALSE FALSE    FALSE
1  ( 6 )  FALSE FALSE FALSE  TRUE  FALSE FALSE FALSE FALSE FALSE    FALSE FALSE    FALSE
1  ( 7 )  FALSE FALSE FALSE FALSE  FALSE FALSE FALSE FALSE FALSE    FALSE FALSE    FALSE
1  ( 8 )  FALSE FALSE FALSE FALSE  FALSE FALSE FALSE FALSE FALSE    FALSE  TRUE    FALSE
1  ( 9 )  FALSE FALSE  TRUE FALSE  FALSE FALSE FALSE FALSE FALSE    FALSE FALSE    FALSE
1  ( 10 ) FALSE FALSE FALSE FALSE  FALSE FALSE FALSE FALSE FALSE    FALSE FALSE     TRUE
2  ( 1 )  FALSE FALSE  TRUE FALSE  FALSE FALSE FALSE FALSE FALSE    FALSE FALSE    FALSE
2  ( 2 )  FALSE FALSE  TRUE FALSE  FALSE FALSE FALSE FALSE FALSE    FALSE FALSE    FALSE

and so on... This function produces a LOT of output. I used this output, particularly the lower matrix to build a bunch of lm calls using a spreadsheet (where IF() and @IF() are your friends)

[TODO: I still don't know how to get leaps to do forward stepwise regression. Can anybody offer suggestions?]


Using the Xgobi Interface

Visualization of data is often something which takes a LOT of time in ecology, often because one is forced to make a lot of paper graphs. Being able to explore data visually would not only save time and trees, but would also allow one to more effectively locate trends in ecological data by doing so visually. Recently, I found the binaries for the Xgobi program, a free data exploration tool that can be used in Linux.

Xgobi's documentation explains how to install it; my only suggestion is to do the installation in the directory where you want it to reside permanently. As I explained above, you will have to edit two source files to indicate where the program is installed, so it just makes sense to put the program directory in a logical place.

I'm just starting to play with Xgobi now, so I don't have a lot of suggestions as to what exactly to do with it, other than to explore and read the documentation that either comes with the program, or some of the papers mentioned on the Xgobi web site.

[TODO: I suspect that Xgobi could have a whole doc written about it. Any takers?]


I'd like to thank the many people on the R-help list who have provided many helpful and patient answers to my questions, and the R hackers who have built such a great packages for statistical analysis.
Errors and omissions in this page are my own responsibility, and I appreciate any constructive advice to improve this page. Please contact me regarding page issues.
Pete St. Onge