The Gene

An Intimate History.

By: Siddhartha Mukherjee

Published: 2016

Read: 2019

Summary:

Story of the birth, growth and future of the gene, the fundamental unit of heredity and the basic unit of biological information.

Interwoven with the his personal experience with the hereditary impact of disease, the author provides a detailed biography of the gene: what it is, how it works, how it influences who and what we are today and what lies ahead.

  • The gene is the basic unit of hereditary information.
    • Has all the information to produce the form and function of an organism.
  • The genetic code is universal.
    • There is only one alphabet.
  • Genes influence form, function and fate, not in a one-to-one manner.
    • Produces form and function together with chance and environment.
  • Variations in genes produce variations in outcomes.
    • Diversity is magnified through culture (male and female share 99.688% of genes).
  • Genes for certain “features” mirror the definition of such “features”.
    • Reflection of cultural emphasis and focus (race, beauty, intelligence).
  • Can’t speak about the impact of nature or nurture in absolute or abstract terms.
    • Depends on where and when you look, the type of feature and the context.
  • Variations and mutations are the norm and essential for evolution.
    • Normalcy is the antithesis of evolution.
  • Illness is defined as a mismatch between environment and genes.
    • Fixing the illness: fix either the environment or the gene.
  • Genetic intervention and manipulation, in some cases, may be justified …
    • Limited knowledge of unintended consequences.
  • … is chemically and biologically possible …
    • Assuming complexity can be dealt with.
  • … and is a self-fulfilling prophesy: the desire to change the genome is encoded in the genome.
    • Hopefully not to the detriment of the mutants…

Worth Reading:

Very well written and researched deep dive. Compelling and well-structured narrative. The historic phases that are picked out for description deserve the space they get and the larger themes emerge naturally.

If there is a minor quibble, it is that at times the author looks for a pattern or purpose (without becoming overtly spiritual) that may not explicitly be there.

The section on the future of the gene is short and likely to be outdated soon.

Key Takeaways:

  • Complexity is not achieved by the number of genes, but by the organization of them. [in other words, the network structure is key]
  • Nature and nurture are not abstract and absolute determinants, but interact with and influence each other iteratively over time.
  • Genetic differences in general traits within groups tend to be larger than differences between groups (too little time for the accumulation of substantial divergence).
  • We are culturally inclined to magnify differences, even if they are relatively minor.

  

Key Concepts:

Anatomy of the gene: what genes are.

  • Parallel between atom, bit and gene.
    • Irreducible building blocks that are part of larger, hierarchically organized wholes.
  • Importance of finding organizing principles:
    • Understanding the smallest part is crucial to understanding the whole.
      [Necessary, but not sufficient; there are severe limits to reductionism]
    • Allows you to go from explanation to control and manipulation (of the gene).
  • Genes, chromosomes, DNA and the genome.
    • Chromosomes: each human cell contains 23 chromosomes pairs (23 from each parent).
    • Genes: each chromosome carries long strands of genes (in total, about 21-23k genes).
    • DNA: each gene is made of a chemical called DNA (4 bases and a backbone, 3 billion base pairs).
    • Genome: the entire set of genetic instructions is termed the genome (gene + chromosome).
  • Trying to understand the nature of heredity: how does like beget like.
    • What is the logic behind heredity, behind organization in biology?
    • All chaotic and complex phenomena are the result of highly organized natural laws.
    • Aristotle (350 BC): heredity is about the transition of information.
      • Message (semen) becomes material (body), then becomes message again (semen), etc.
      • Cycle: information begets form begets information, etc.
    • Darwin (1859): nature is a process of cause and effect, not static.
      • You can understand the present/future by examining the past (origin).
        • Asks the question: how does nature evolve, how does information transfer and transform.
        • The answer is survival of the fittest: variation, competition, selection, reproduction.
      • For evolution to work and produce stable, but adaptable organisms, you need both change (variation through mutation) and stability (transmission through reproduction).
      • Darwin can’t answer the question of “blending”: what keeps variation from being constantly watered down and diluted over time (for instance, all birds slowly become the same blended color).
    • Mendel (1865): variance is based on unitary, distinct traits.
      • Explains the lack of “blending” for some traits.
      • Every trait is determined by an independent, indivisible particle of information.
      • Heredity involves the passage of these discrete pieces of information.
      • Discrete units of information = genes (coined in 1905 by Bateson).
    • Morgan & Muller (1908): genes likely physically linked and carried on chromosomes.
      • “Maleness” determined by unique factor:
        • Y chromosome only present in male embryos.
      • Proposed that all genes may be carried on chromosomes.
      • Some phenotypes always appear together (in fruit flies), suggested genes are physically linked together.
        • Genes don’t travel independently, they move in packs.
        • The tightness of the physical traits predicts the physical proximity of genes on chromosomes.
  • The idea of the gene predated its actual physical discovery.
    • What we know about genes at this point is only that they:
      • Are packages of information with an unknown chemical and physical nature.
      • Can become mutated and thereby specify alternative traits.
      • Tend to be chemically or physically linked to each other.
      • Are located at the chromosomes.
  • The idea of genes offers a potential solution to the central problem of information in biology:
    • Every biological activity requires decoding of coded instructions (ie, needs information).
      • How do living things arise (genesis)?
      • How do species look the way they look (evolution, variation)?
      • How does a single cell create a living thing and how do cells function (living)?
    • What is the nature of instruction and how is this information passed on (heredity)?
  • Questions that are being answered: how do species look the way they look?
    • How can discrete particles of information (genes) give rise to smooth, continuous variations in a population (height, etc.)?
      • Answer lies in math.
      • The combinatorial power of (many) genes and environmental interacting creates a vast amount of potential outcomes (smooth curve).
    • How are certain traits selected for over time?
      • Distinction of genotype (genetic composition) and phenotypes (physical features).
      • Theory: genotype + environment + chance + time = phenotype (physical features).
      • Genes that produce physical features that are the best fit with their environment, survive, reproduce and become over-represented.
      • The engine of evolution is the ongoing matching of features and environment.
    • Why are species distinct and separate?
      • For a new species to arise, some factor must make interbreeding impossible.
      • If two populations become sufficiently varied, genetic incompatibility may arise.
  • These theories deal mostly with issues of genes travelling vertically (reproduction, parent to child).
  • Throughout the process of vertical transfer, genes remain invisible (chemically).
  • Transformation: the horizontal transfer of genes:
    • Genes can be transmitted between two organisms without any form of reproduction.
    • Happens rarely in mammals, but often in bacteria.
    • If genes are physical molecules with chemical properties, what are should these properties be?
      • Possess some regularity: to allow for copying and transmission.
      • Capable of irregularity: to explain variation, diversity.
      • Compact: carry vast amount of information, but also neatly packaged into a cell.
  • Life in general is a special type of chemistry:
    • Cells depend on chemical reactions to live.
    • Organisms continue to exist because of chemical reactions that are “barely possible”.
      • If there is too much reactivity, we “combust”.
      • If there is too little reactivity, we die.
      • We live on the edge of “chemical entropy”.
  • Two chemical candidates for the gene are identified: they are proteins or nucleic acids.
    • Chromatin (chromosomes) is the biological structure where genes reside.
      • It is made of two types of chemicals: proteins and nucleic acids.
    • Proteins enable the “barely possible” chemical reactions necessary for living.
      • Nearly every cellular function (metabolism, respiration, division, etc.) requires proteins.
      • Proteins speed up some and slow down other reactions, just enough to be compatible with living.
    • Nucleic acids are the dark, unknown horses.
      • Early 1920s: discovery of DNA and RNA (both nucleic acids).
        • Both DNA and RNA are long chains made of four components (DNA: A, G, C and T; the bases), strung together along a backbone.
        • Initially thought to be too monotonous to be the carrier of genetic information.
      • Early 1940s: isolation and confirmation of DNA as carrier of genetic information (Avery).
  • Knowing the chemistry, what is the physical form of the gene molecule?
    • In life, physics enables chemistry, which enables physiology, which enables biology:
      • The physical structure of a molecule enables its chemical nature.
      • The chemical nature enables its physiological function.
      • The physiology ultimately permits biological activity.
      • [Levels of information, emergence of complexity]
  • So what is the form, the structure of DNA?
    • We know the chemistry (bases and backbone), but not the physics.
    • How is it linked to the physiological function of the gene (to transmit information)?
    • How are the four bases attached to a backbone of sugars and phosphates?
    • Watson and Crick (1953) discover the double stranded helix structure of DNA, linking form and function.

The physiology of the gene: how do genes work.

  • Knowing the physics and chemistry of the gene, how do they enable the gene’s function?
    • How do DNA bases translate into physical features? What links genotype and phenotype? What is the gene’s action?
    • Beadle & Tatum (1941): a gene “acts” by directing the building of proteins.
      • A gene carries instructions, the code, to build proteins.
      • These proteins then proceed to perform all the cellular functions.
        • A protein is created from 20 simple chemicals (amino acids).
        • The shape of a protein relates to its specific function in a cell.
  • First: DNA provides instruction to build RNA (transcription)
    • DNA first copies its genetic information into that of a complementary RNA molecule.
      • Similar to DNA: RNA has four bases (A, G, C and U – DNA’s “T” is substituted for a “U”).
    • RNA moves from the nucleus to the cytosol.
    • This process allows for multiple copies of the gene to be in circulation and for the number of copies to be decreased and increased on demand.
  • Then RNA provides instructions to build proteins.
    • The sequence of the bases in the genetic code specifies the genetic information.
    • The genetic code occurs in triplets: three bases specify one amino acid in a protein.
  • Universal flow of information through living systems:
    • DNA – RNA – proteins.
  • Knowing how the gene acts generally, what enables a gene’s specific “selective action”?
    • How and when do the properties implicit in genes become explicit in different cells? How are they turned on and off?
  • DNA regulation:
    • A gene possesses not only information about what to build, but also when and where.
    • Regulatory sequence:
      • Every gene has an “on” and “off” switch, a regulatory DNA sequence, triggered by what goes on in the environment.
        • When “on”, genes produce RNA.
        • When “off”, genes stop producing RNA.
        • This allows it to regulate the production of (the amount of) RNA.
    • Proteins act as regulatory sensors, turning genes on and off.
      • DNA contains the information for the development and maintenance of organisms.
      • But, proteins actualize the information inside the DNA/gene.
      • Proteins “conduct” the genome.
      • Cyclical and iterative process: a gene encodes a message that builds a protein that regulates a gene that encodes a message, etc.
    • The combination of regulatory sequences and protein-encoding sequences defines a gene.
      • The genome contains not only the blueprint, but also the program for controlled execution.
  • DNA replication:
    • DNA is replicated when DNA copies itself (strands separate and new strands are formed).
    • Regulated by a specific enzyme to avoid rogue copying:
      • DNA polymerase: the enzyme that allows DNA to replicate.
  • DNA recombination:
    • One of two mechanisms for generating genetic diversity:
      • Mutation: occurs when DNA is damaged (chemicals, copying error).
      • Recombination: swapping of genetic information between (male and female) chromosomes (when sperm and egg are combined).

 How genes build, maintain, repair and reproduce humans

  • Genesis: how do genes make a whole organism grow out of a single cell.
  • First three phases: axis determination, segment formation, and organ building:
    • The egg cell inside the womb is placed asymmetrically.
    • This causes certain proteins to be present in higher or lower concentrations in certain parts of the cell.
    • High versus low concentration activates different genes that determine the head-tail axis.
    • After this, mapmaker genes are activated that split the body into segments.
    • The next step is master-regulatory genes controlling the development of segments, organs, structures.
  • Next step, the process of cell differentiation:
    • Invariance: most cells arise and develop in a precise and stereotypical manner.
    • Programmed death: regulated, controlled, programmed cell deletion.
    • Natural ambiguity: within severe constraints, the fate of some cells (under influence of neighboring cells) can change.
  • Human physiology is the complexity that emerges when genes and proteins interact over time:
    • Most genes do not behave like one-to-one blueprints for proteins.
    • Interactive and cyclical processes among large numbers of genes and proteins.
    • Varying concentrations of proteins switch on or off varying numbers of genes, which then create varying amounts of other proteins, etc.

DNA cloning – the writing of genes

  • Use of viruses to smuggle foreign DNA into a human cell:
    • In viruses, genes are not stretched out on chromosomes, but strung into a circle of DNA.
    • Once inside a human cell, the virus becomes a linear string that attaches itself to a chromosome.
    • Need an enzyme to cut open the virus’ genome circle and paste in foreign DNA.
  • Use of bacteria to contribute enzymes that cut and paste:
    • Restriction enzymes: recognize unique DNA sequences and cut at a specific site.
    • Ligase: enzyme that stitches two pieces of broken DNA together.
  • Provides the basis for artificial “recombination” (as opposed to natural recombination).
    • Labeled “recombinant DNA” (basis of drug production).
    • If recombination takes place inside bacterial cells, allows for bacterial cell rapid replication (and drug production).

DNA sequencing – the reading of genes

  • The sequence (of base pairs) carries the meaning of the genetic code.
  • Most genes are split into parts and modules that can be mixed and matched to create a vast range of “messages”:
    • Exons (message DNA) and introns (the “stuffer” fragments).

DNA sequencing and cloning allows for manipulation of genes

  • Based on specific process – reverse transcriptase:
    • If you want to reproduce a function, look for the protein that triggers that function.
    • Then look for the RNA that produces that protein.
      • Build libraries of RNA that are active in each cell.
      • Look for differences in active RNA.
    • Then use that RNA to reproduce DNA.
  • Potential to look for genes related to diseases, regulatory processes in the body, etc.
  • Marks the start of the use of technology to manipulate genes.
    • Medicine: production of proteins from recombinant DNA (for instance, insulin).

DNA and disease

  • Mutations in one gene can cause different types of disease in different types of organs.
  • Reverse also true: different genes can influence one aspect of physiology.
  • Certain genes only activate (disease) phenotypes upon environmental triggers or chance.
  • Misunderstandings about  mutation:
    • Mutation, ie genetic diversity, is the natural state.
    • Mutation is relative: a variation from what is normal (called: “wild type”).
    • We are all mutants.
    • Whether or not a mutation is “good” or “bad” is not absolute.
    • It depends on the match between the genetic mutation and the current environment.
      • The definition of disease: a variation that is mismatched with its environment.
  • Problem of detection – the need for a genome map:
    • There are 3 billion base pairs, some diseases are linked to only a handful of them.
    • Process of isolating genes linked to disease time consuming and inefficient.
      • Linkage analysis: use other traits paired with disease to locate part of the gene.
      • Requires zooming in on ever smaller fragments to isolate single DNA fragment.
    • Need to map the genome to be more efficient in locating specific disease related genes.
    • Genome map also needed to identify and understand multi-gene diseases.

 Genome map: sequencing the genome:

  • Starts with sequencing the genome of the worm and fruit fly
  • C. elegans:
    • 18,891 genes.
    • 36% of its genes encode for proteins the same as humans.
    • 10,000 genes are unique to worms.
    • 90% of the genome is dedicated to organism building.
  • Fruit fly:
    • 13,000 genes.
    • More complex organism than worm, fewer genes.
  • Complexity is not achieved by the number of genes, but by the organization of them.
    [in other words, network structure is key]
  • Venter & Collins (2000): successfully sequence the whole human genome.

 Genomics and Identity:

  • Cross over from using genes to analyze pathologies to understand “normalcy”.
  • Use genetics to investigate human history:
    • Mutations accumulate in populations over generations.
    • Section of a population with the most mutations is likely the oldest section.
      • Provided genes have not blended recently.
      • So, you need to analyze genes that don’t recombine during reproduction.
      • Use mitochondrial DNA: part of the egg, you only get it from your mother.
  • What we learnt about human history from genetics:
    • Young (200k years) and homogenous (the diversity of the human genome is small).
    • Originated from a narrow slice of sub-Saharan Africa.
    • Started to leave Africa about 100k ago.
    • Only one female lineage for mitochondrial DNA (there is a mitochondrial Eve).
  • What we learnt about race from genetics:
    • Humans are largely similar in genetic terms, but with enough variation to have diversity.
    • Based on this diversity, you can form clusters of variations and features and label the categories, for instance “race”.
      • The choice of features and traits (identity) used to categorize tends to be cultural, social and political.
    • Genetic differences in general traits within these groups tend to be larger (85-90% of variations) than genetic differences between such groups (about 7% of variations).
      • Most variation antedates the separation into continents.
      • Too little time for the accumulation of substantial divergence.
    • We are culturally inclined to magnify differences, even if they are relatively minor.
      • What is normal becomes superior, what is rarer becomes inferior.
      • Human need to categorize and superimpose other attributes.
    • Using genetics is a one way street only:
      • You can use the genome to derive a person’s general ancestry.
      • You can’t use a person’s general ancestry to predict its genome (and associated traits).

 Genomics and sex/gender:

  • Big picture: males and females are anatomically and physiologically different.
    • These differences are specified by genes.
    • These differences influence an individual’s identity.
  • Sex, gender, identity:
    • Sex: the anatomic and physiological aspects of male versus female bodies.
    • Gender: the psychic, social and cultural roles that individuals assume.
    • Identity: an individual’s sense of self.
  • XX versus XY:
    • Sex chromosome: XX (female) versus XY (male).
      • Sperm carries 50% only X and 50% only Y chromosomes, combining with the egg’s X.
    • On the Y chromosome:
      • Unpaired: there is no duplicate copy, like other chromosomes.
      • Therefore, mutations on it can’t be fixed and are vulnerable to damage.
      • Over time, it has shrunk and is now the smallest chromosome.
      • Genes for important traits have shuttled away to other chromosomes.
      • It may disappear completely over time.
      • A single gene on it activates the “male pathway”.
    • [Side note: purpose of “sex”: not reproduction (it would be easier for cells just to make duplicate copies), but recombination: produce variation to protect the species.]
  • Sex is determined by a single gene.
  • Gender and identity are more complex:
    • Influenced by the interaction over time of genes, culture, environment, etc.
  • Generally, nature and nurture are not abstract and absolute:
    • Iterative interaction over time (as opposed to nature vs nurture).
    • At the top (or at the start), nature works forcefully (ie, master genes for this or that).
    • From there on, iterative interactions between nature and nurture follow.

 Genomics and self, behavior, sexuality:

  • Genes play a role in influencing sexuality, but we know very little about genetic determinants.
    • For instance, despite many attempts, no single “gay” gene has been discovered.
  • Inherited differences in genes influence many aspects of “normalcy” in varying degrees.
    • Strong and binary influence: simple, master-switch type genes at the top of a developmental hierarchy (ie, maleness vs femaleness).
    • Weak influence: if they act further down the chain (likelihoods, continuous outcomes).

 Epigenetics: from predisposition to disposition:

  • The “last mile”: genes provide probabilities, but can’t predict certain outcomes.
    • Until now: genetics focused on shared traits in population to understand pathologies.
    • Now: investigate variations in individuals to understand human form and function.
  • Can be answered by looking at these questions:
    • Why do identical twins growing up together become different over time?
    • What causes the difference if both twins are identically predisposed?
  • Generally, genes perform a balancing act of stability and adaptability by switching on and off:
    • Balance between intrinsic, pre-determined genes react and chance, external influences.
  • But if genes can switch on and off, why are certain outcomes fixed over time?
    • Why do embryonic cells irreversibly turn into specific cell types over time?
    • How are genetic fragments switched on or off indeterminately?
  • Waddington (1950): epigenetics.
    • Cells somehow carry a mark above (epi) their genes.
      • A constant imprint of the cell’s history and identity.
    • The attachment of small molecules to DNA was correlated to a gene switching off.
      • Small molecules: methyl and histone tags.
    • This system of silencing and reactivation (through chemical tags) persists in time.
      • Tags can be added, erased, amplified, etc.
      • In response to cues from a cell or its environment.
    • The chemical tags can be changed, but not easily.
      • For instance, cell differentiation (from embryonic to specific cells) is largely irreversible.
    • Over time, every genome acquires its own, unique epigenome.
      • If nurture has an influence, it’s by leaving its marks on nature.
    • The epigenetic system, by selective de-activation, allows the genome to function.
      • Cells devoid of epigenetic selective repression and reactivation would be overwhelmed and could not function.
    • Ultimate purpose of epigenetics is to establish the individuality of a specific cell.
      • Individuality of the larger organism may be unintended consequence.
    • Epigenetic marks are not carried forward across generations.
      • Epigenetic marks are (almost always) erased in sperms and eggs.

Human genetics

  • Two fundamental elements:
    • Genetic diagnosis: predicting future fate from genes (reading the genome).
    • Genetic alteration: intentional changing of the genes (writing the genome).
  • Genetic diagnosis:
    • Pre-emptive diagnosis of (probable) illnesses.
    • Creating a backward catalog of genetic disorders:
      • Knowing that the syndrome has materialized, catalogue the mutant genes.
    • Lacking a forward catalog:
      • If someone has the mutant gene, what are the chances of developing the syndrome (function of penetrance and expressivity).
    • Limitations:
      • Genes may cause multiple phenotypes, some “good”, some “bad”.
      • Complex systems:
        • Difficult to analyze vast amount s of (interacting) individual genes.
        • Impact of chance and environment over time.
      • Requires a lot more computing power to be efficient.
  • Genetic alteration:
    • Stem cells versus embryonic stem cells:
      • Stem cells (SC):
        • Can renew themselves.
        • Can give rise to limited number of other cell types.
        • Reside in particular organs and tissues
        • Change in SC does not alter the human genome (across generations).
      • Embryonic stem cells (ESC).
        • Arise from an organism’s embryo.
        • Pluripotent: can give rise to every cell type.
        • Can be isolated from the embryo and grown (in a petri dish).
        • Change in ESC affects the germ line, the human genome permanently.
        • Changes in ESC can be easily and selective made (in mice…).
    • Gene/foreign DNA therapy
      • Using viruses or foreign DNA to deliver genes into a cells’ genetic material.
      • Unreliable, inefficient and imprecise.
    • CIRSPR/Cas9: genome editing
      • Doudna & Charpentier (2012): programmed gene cutting and repairing.
      • For now, technique is still cumbersome and inefficient.

 

 

Leave a Reply