Graphics tasks in research tend to range from the simple (like xy plots)
to the increasingly complex (contour maps, PCA plots, etc). A common
misconception held by many researchers is the apparent lack of
high-quality graphical and graphing tools for Linux and *BSD. This
week's report will hopefully help dispel that myth.
Before I get too far into this report, I would like to thank Aaron
Malone for the paragraph on web graphics. Aaron is a long-time member
of SEUL and SEUL/sci.
Linux and Research Graphics
One of the primary tasks in research is the communication of results
in a transparent way so that independant assessments can be made as
to the applicability and generality of that research. Peer review is,
after all, the cornerstone of the scientific method.
The specialized nature of software required to produce research
graphics, be it for image analysis or the graphical summarization
of results, tends to make commercial versions rather expensive to
obtain. Certain types of research graphics are typically only
created by statistical software (eg. principal component analysis)
or are otherwise time-consuming to make by hand. Another
consideration, particularly for exploratory data analysis or visual
verification of structured data, is the ability to create a large
number of graphs through some batch-processing facility in combination
with simple scripts.
The breadth of graphing software for Linux and other Unix platforms
has expanded considerably over the last few years. I've mentioned some
of these applications in previous Linux in Science reports, and
they bear a fresh mention. Scigraphica and Grace are
both well-known GUI graphing packages. Scigraphica has improved
substantially in the last few months, and will be familiar to users
having experience with a program like Origin or SigmaPlot.
Grace is a mature package offering lots of functionality,
several types of graph types, and perhaps most importantly for the
purposes of large datasets, a command-line interface most amenable to
scripting. Further, Grace has a python interface to allow
'live' scripting.
As command-line programs, Guppi and Gnuplot are
well-known and quite capable. Gnuplot has a long history in
being used for dynamic web graphs, given its flexibility. Guppi
has become the standard graphing tool in many other applications,
most notably the Gnumeric spreadsheet. The R statistical
package probably has the most developed graphing facility, with a
great deal of flexibility as to the placement and presentation of
data. For mathematical modelling, the similarity of Octave to
the commercial MatLab program makes it ideal for mathematical /
function graphing.
Diagramming is another critical aspect of research graphics.
Conventional graphing programs tend to be less useful in this area
because of their complexity for most diagrams. Two excellent offerings
for Linux users are xFig and Dia. xFig has a long
history of use in scientific and engineering and is easily one of the
most mature and capable Unix applications I've had the pleasure of
using in a very long time, as I recount below. Dia is a more
recent but similarly well-developed application, similar to the
commercial Visio. Alternatively, there is also a vector graphic
format used by the LaTeX typography system, and some
LaTeX packages are very useful when trying to create certain
types of diagrams (chemical or electrical structures, musical
notation, etc) that might otherwise be difficult to creating using a
typical diagramming program.
Perhaps one of the most important considerations when presenting
research graphics is the format in which that information will be
presented, taking into account the inherent differences between
screen and and paper output. I've often been misled by the
appearance of a graph on the screen, thinking that since the screen
is graphical, the resultant image should be as well-defined on paper
or on the web page. This is often not the case, however, as the
resolution of the screen far poorer than most paper documents.
There is literally a plethora of print graphic formats (EPS, BMP, JPG,
WMF, WPG, etc), the bulk of which are commercial in nature. Generally,
these files are either line-based (vector graphics) or bitmaps (raster
graphics). Raster-based formats (eg. JPEG or PNG) are best suited for
digital representations of images / photographs. Graphs and diagrams
are almost always best stored in vector formats for printing. These
formats will tend to scale better on paper without losing fine detail
from graphical elements or textual annotations. Perhaps the most
popular format for these graphics is Encapsulated Post Script
(EPS). Most journals are amenable to this format as they are easily
imported by desktop publishing applications (EPS is a well-known and
standard format). The LaTeX vector graphic format described
above is also widely used.
For web graphics, choices are limited to three formats: GIF,
JPEG (often notated "jpg" because of the filename limits on some
obsolete operating systems), and PNG. GIF, however, is
considered to be a deprecated format and is often completely unusable in
open-source work because of
patent encumbrance. JPEG is a "lossy" format, meaning that
detail is sacrificed in favour of compression efficiency.
This makes it ideal for photographic data, where some colour-blending
will not be noticed. PNG, on the other hand, is a "lossless"
format. File size will depend on the complexity of the image;
line-art and simple diagrams compress to extremely small sizes
in PNG, whereas detailed photographs can end up quite large, and
would often be better served by JPEG compression. If you're not
sure which format to use, the best idea is to just try both, and
compare the resulting quality and file size.
Although these three format options are typically available for
print publications, unless the graphic is an image (photograph,
gradient or similar), it is almost always best to use postscript
(PS or EPS), LaTeX (TEX) or similar line-type graphic format. The
former will often render unsatisfactorily, giving blocky and poorly
defined images. The latter will scale better, take less space and
generally give far clearer output. As mentioned in the
last Linux in Science Report, most publishers will gladly accept
these formats for graphics, especially since these formats are the most
versatile to work with.
One advantage of using standard, open formats is the flexibility that
is offered. One situation I often see is the properly cited use and
adaptation of graphics from other articles (sometimes called
('meta-analysis'). A case in point was the study area map used in my
own M.Sc. thesis. Having found a suitable study area map in a
published article, I noted that not only were there a number of
errors (albeit slight ones) in the original map, the symbology used
to indicate certain features was completely inconsistant with that
used in the thesis. Once I converted the postscript file to fig
format using pstoedit, I was easily able to modify the graphic
in xfig without losing the original infomation contained in
the map while correcting the errors and altering the colour and
appearance of the relevant symbols. I was also able to add an inset
map I had previously made and correct some aethetic issues I didn't
like in the original map.
Another useful aspect of pstoedit lies in its ability to
"sanitize" postscript files of questionable quality. I had previously
made a number of graphs in a Win32 graphing application and printed
them to postscript files. A couple of years later, when I went to use
them in a LaTeX document, the images didn't come out properly. Sent
individually to a postscript printer (or even the postscript filter
used by my own Linux box), the graphs came out fine. What I hadn't
realized was that these graphic files were malformed (missing some
rendering information required to output properly) when included
as part of a larger document. Converting to fig format and back
to encapsulated postscript (EPS) resolved all of the problems.
In summary, although there is no single graphing and diagramming tool
that will satisfy all graphing and diagramming tasks, I found that
learning a couple of the many available programs will cover most
research needs.
Here are some more links and updates for the software mentioned in
this report ...
As always, I look forward to receiving your comments and suggestions for links or future feature articles.