CompBio style guide
Regardless of the styles you will follow, please do be sure to
read the general and specific suggestions/tips regarding creating and publishing manuscripts, posters, web
pages, and grant applications. The ram.org
style guide may also be relevant.
Do
- Do write with excellent grammar, spelling, punctuation. This
should be the norm (not to mention that the latest research has
shown that it is correlated with a longer lifespan, with a P-value
of 0.49). Organisation and consistency particularly is key. Finally,
clarity of what you present is the most important. "suggestion" is
quoted because it is used in the sense that elements of a 12 step
program is a suggestion (smiley face for the humour impaired). It
signals our collective wisdom and if you deviate from it without
good reason, it's the end of the world (insert another smiley with
tongue in cheek).
- Do know difference between "you're" and "your", "it's" and
"its", "there" and "their". See the external urls section for help with this.
- Do go over this entire list of suggestions and use it as a
checklist along with other guides and tips
and perform an appropriate search (and any necessary replacing)
before sending others documents to peruse. It wastes time and
prevents one from getting to the core issues in the document; this
shouldn't happen and in general you should never place a burden on
the reader to fix style issues already addressed here, even if only
mentally.
Do not
- Do not use contractions (i.e., "Don't use contractions") in
formal writing. It is sometimes okay on web pages.
Do not anthropomorphise or, more specifically, attribute actions to
inanimate objects: For example, "protein's structure" should be
"structure of a protein" (the former implies that the structure is
owned by the protein, something a protein cannot do). Also, "which"
is not always appropriate when refer ring to inanimate objects
(i.e., "a protein that is used for..." as opposed to "a protein
which is used for...").
- Do not put apostrophes to indicate plurals; just make the word
plural as you would normally: "DNAs" is good; "DNA's" is not (unless
you wish to refer to the actions of DNA, in which case, see item
above).
- Do not use the phrase "in order to...". "To" is adequate in almost
all situations.
- Do not use the phrase "it's clear that...", especially in formal
article. If it's clear, it's clear, and there's no need to say
it. It's clear that saying "it's clear that..." usually just
indicates the opposite.
- Do not compare quantities in a specific manner especially if it's
done throughout the manuscript. Work on the meaning of better/worse
(see above) rather than high/low which doesn't tell the reader what
is better and what is worse.
Do
- Do always concisely summarise the message of your tables and
your figures at the end of their captions. This should be done so
that someone seeing only your tables and figures should be to able
to understand your entire paper. In general, the structure of a
figure or a table should be: A concise title explaining what is
being shown in bold; any details absolutely necessary to
understand the figure/table, but no more (definitions, etc. can be
left to the text); and a summary sentence containing any conclusions
(usually for results). The abstract combined with the figures and
tables and their corresponding captions taken by themselves should
be adequate to understand the entire work being described without
needing to reading the rest of it.
- Do be quantitative and precise and focus on conveying
conceptual understanding, while eschewing vague words and phrases
and limiting jargon to the appropriate section (methods sections are
where a lot of jargon belongs especially relating it to the
conceptual activities/understanding). Check out an example of a supremely
vague abstract.
- Create a separate description of columns and avoid using short
column headings altogether. Column descriptions can either be at the
top of an ASCII table or in a separate README file.
- Do use long descriptive filenames for tables and figures.
- When comparing quantities, it is best to define what
better/worse (or best) is early on/once and then use better/worse
throughout instead of high/low or low/high (in the case of energy or
RMSD for instance). Like with everything, there may be a situation
where this rule doesn't work.
Do not
- Do not have tables and tables of numbers. All tables should be
converted in descriptive figures, and the table should be made
available as supplementary material. It's rarely necessary to use
more than 2-4 significant figures after the decimal point; use the
minimum number of significant figures that you need to convey the
message.
- Do not use obtuse headers such as "std-full" or "rankcor" in ASCII
tables. It is better to explain the columns either at the top of the
file or in a separate file.
- Do not use "x axis" when describing graphs. Just say what is plotted
against what. Worst case you use "horizontal axis" and "vertical
axis".
- Do not have widows and orphans (i.e., single lines or section
headings by themselves on a page).
- Do not have this style of writing: "N/M corresponds to A/B" where N
and M are usually numbers. This takes more time for the reader to
figure out. Try something like "It takes N to do A and M to do
B". It's almost always worth spelling it out. But this works best
only when all numbers are >= 21; if they are smaller, the current
rule is to use words for numbers, but this may be an exception or
handled as a special case.
- Do not overuse (and avoid in general) words and phrases like
"many", "sometimes", "often", "clearly", "of course", "obviously",
and "and so on". Check out an
example of a supremely vague abstract.
Consistency is extremely important. Some issues I've come across so
far include:
- Acronyms: When you first use an acronym, first write out its
full name as it is commonly used; for example: "Critical Assessment
of protein Structure Prediction methods (CASP)". You may choose to
repeat it for every large section you have (i.e., results, methods,
etc.) but it needs to be done consistently.
- Spelling: English vs. American. I use English spelling and most
of you use American spelling. It doesn't matter what we
use. Generally, if we submit to a British Journal (such as
Nature) the manuscript will adhere to English spelling
and will be changed accordingly. If we submit it to an American
journal, it will be American spelling. The final draft of a
manuscript should have consistent spelling (either English or
American---the typesetters will get it right).
- Consistency of tense: If there's something that generally holds true
always (i.e., "Contacts are compiled from a set of known structures
in the PDB"), then it may be better to not use the past
tense. Generally, when talking about a methodological description
(say of an implementation) that everyone who reads your paper has to
do, present tense can work. When you're talking about an experiment
you did, past tense is better.
- Hyphenation, dashes, and emdashes: There is generally no need to hyphenate most
words. I've realised that stating "well characterised" is as
communicable as saying "well-characterised". So for consistency's
sake, it's better to not use hyphenation whenever possible. Please
check this as you write. The LaTeX conventions of "---" to represent
an emdash (to separate within a sentence), "--" for attribution (i.e.,
"--Ram"), and "-" for a hypen (to separate words) is to be used.
- Use of numbers as letters or digits: The main rule is that
anything less than or equal to twenty should be written as a word
(since it's a single word up to that point). Anything else is
written as a number. This doesn't always read well aesthetically, so
a slight modification to this rule is that if you're enumerating
something that is mixed ("6 compounds", "10 structures" initially
and then later "30 compounds", "40 structures"), then you could use
digits if done consistently AND if there is a greater than 21 number
as part of these enumerations (i.e., in one context, there is a
"six" but in the other there is "30"). If you're only referring to
items only once (like three studies), then you should use a word,
along with the >= 21 rule. There is no automated solution to this
problem and it really does depend on context a lot. No matter what,
do not start a sentence with a digit. If you're using labels or
fractions, then you can use digits rather than words, i.e.,
"18/1024" or "3.5.2021".
- Capitalisation: Generally, only the first letter of each word
is capitalised for important things such as section headings or
titles of publications.
- Perhaps the most important style suggestion regarding
fonts is using logical, as opposed to physical, styles for the
actual code specifying the desired formatting. The use of logical
styles imparts a semantic meaning onto your text, which makes you a
better writer. You end up focussing on the meaning of what you're
writing instead of what it looks like. For example, in HTML, instead
of using the <i> or <b> elements for specifying objects
that are typically rendered with italics or bold, you can use:
- <em> for emphasis (
\em
in
LaTeX);
- <cite> for specifying a journal
name;
- <var> for specifying a variable name;
- <strong> for strong emphasis.
- The above are just some common examples. A better description of
logical vs. physical styles is available elsewhere.
- Punctuation: Quotes and font modifications, such as italics,
always go inside the punctuation. See the HTML source of this:
"example." In other words: generally put punctuations
outside of formatting, i.e., only the actual text needs to be under
the format. For example, the full stop should go outside the
formatting quotes used for ‘‘italic text''.
- Organism names. Should be italicised and in the proper form. Genus
should have a capital first letter and species should not. Genus
name can be abbreviated with the first letter only and a full
stop. That is, Plasmodium falciparum and P. falciparum are
acceptable if used in a consistent manner. Usually the full name is
for the first usage and subsequent usages can have the abbreviated
name.
- Language other than English (i.e., Latin). Should be italicised
if it's really not accepted in English. "in vitro" is italicised,
but "et al." is not.
- Headings should have only the first letter capitalised for
certain and the rest should follow regular writing style.
- Try to use headings hierarchically and logically. For example,
in wiki, HTML, and LaTeX: "=", "<h1>", "\section" is the top
level heading to use (which should be the default), "==",
"<h2>", and "\subsection" is the second level, "===",
"<h3>", and "\subsubsection" is the third level, and so
on. The bold text style is good for nonhierarchical headings.
Dates
- "September 6, 2010" is the long format. When padding makes
sense (such as in a fixed width font situation, be it in a
manuscript or a web page), padding with a 0 will result in
"September 06, 2010". Unix date command: no padding date '+%B %e,
%Y' with padding: date '+%B %d, %Y'
- Sep 6, 2010 is the medium format; Sep 06, 2010 or Sep x6,
2010 when padded with "0" or "x" respectively. Comma may be
excluded for aesthetic or other reasons. Unix date command: no
padding: date '+%b %e, %Y' with padding: date '+%b %d, %Y'
sep062010 (mmmddyyyy) is currently the most standard convention
for dates appearing as strings, such as with filenames. Default
padding makes sense here. date '+%b%d%Y' | tr '[A-Z]' '[a-z]'
- Sep062010 (Mmmddyyy) has also been used but avoid at all
costs. It was considered rarely for aesthetic or other reasons,
for example, when capitalising in filenames such as Foo.Sep062010
instead of Foo.sep062010. This is Ram's pedantry at its
worst. date '+%b%d%Y'
- See publishing guide and manuscript and
grant application writing tips for even more on how to present
written work. For formatting publications (references/citations),
please try to use the same format used for our publication list.
Over the decades, we have adopted terminology (and style, note
the fonts) to refer to CANDO and its components as follows:
- The CANDO platform consists of pipelines that generates a
compound-proteome interaction matrix and indication-specific
protocols to rank compounds for particular diseases/indications
given particular proteomes/interactomes/heterogeneous data. These
are the main phrases and should be used extensively.
- We also defined a module to refer to a protocol that is
its software equivalent, i.e, the canpredict and
canbenchmark modules in cando.py refer to the
prediction and benchmarking protocols within a given pipeline within
the CANDO platform.
- The generic words "component" and "algorithm" are used rarely
as a substitute for protocol and/or module. The reasons are the same
as below.
- The words "method" and "approach" are rarely to spiff up the
writing, or for rhetorical purposes to replace many of the terms
above. These more generic phrases that are used to avoid repeating
words in a sentence, over use of words like platform, pipeline, and
protocol, etc.
- Typically if bolding is used for the terms above, then it is
done so only for the first usage, not every instance. However, every
usage of fixed width font above (i.e., for cando.py) is
used throughout.
- Also when describing CANDO, focus on the conceptual meaning
(i.e., the protocol) in describing a technical module throughout a
manuscript/grant application/poster/presentation/etc. For instance,
"the benchmarking module in the Vina pipeline"
vs. "canbenchmark". That is, avoid unnecessary
jargon. However, in the methods sections, you can refer to the
module proper to connect to the protocol to the module name, but
this should typically be done only once.
Local documentation ||
Samudrala Computational Biology Research Group ||
admin@compbio.washington.edu