Triangulation vs. “In Common With”

This question came up in one of the posts in Blaine Bettinger’s Facebook Group Genetic Genealogy Tips & Techniques, so I thought I’d give a quick example here that I refer to myself when I get confused.

A man with 3 children, who have all tested, has a match to a 2nd cousin (documented through now through both DNA and traditional genealogy).  He and the 2nd cousin share 11 segments of DNA.

It so happens that all 11 of those segments have passed down to those 3 children, which you can see in the illustration below.  Of those 11 segments shared by their father and his 2nd cousin, Child 1 inherited 4 segments.  Child 2 also inherited 4 segments — but an entirely different four segments than Child 1.  Child 3 inherited 7 of the 11 segments.

The inheritance and sharing is illustrated below, in data pulled from GedMatch.

Inheritance

For purposes of illustration, we’re setting aside the fact that generally, when triangulating to find a common ancestor, we don’t use two full-blooded siblings as 2 of the triangle legs; they are too closely related, and will triangulate on many segments.

That said, Child 1, Child 2 and their 2nd cousin once removed (2C1R) have DNA in common with each other, but no triangulated segments with their 2C1R.  This is because Child 1 shares DNA with 2C1R on chr 6, on chr 12 and 2 segments on chr 15, while Child 2 shares DNA with 2C1R

Child 1, Child 3 and their 2C1R have 3 triangulated segments: on chr 6, on chr 12, and 1 segment of chr 15.

Child 2, Child 3 and their 2C1R also have 3 triangulated segments: on chr 4, on chr 8, and on chr 18.

*******
And that is a quick overview of triangulation vs. in common with.

 

 

The Shared cM Project: Tracking individual contributions

Do you know about Blaine Bettinger’s Shared cM Project?  It’s the crowd-sourced collection of shared centiMorgans (cMs) for the purpose of analyzing the ranges of cMs found at different levels of relationship (full siblings, half 1st cousins, 2nd cousins, etc.)

If you and your known relatives have done your DNA tests at any of the big vendors, you can submit your data here.  Relationship is asked for, but no identifying information (except your email) is needed for submission.

I have been submitting my own DNA data since 2016 — Blaine released the first results in May 2015.   However, so that I won’t skew the results (with duplicate or triplicate submissions), I keep track of my submissions in my own Excel spreadsheet.  A sample of the page I use is below, filtered on just some of the 2C1R relationships I have submitted.  (I have hidden the names of the two testers involved in each relationship.)

Submission_Shared cM Project

Note that the cM range for my submissions of 2nd cousin 1 removed is from 32 cM to 267 cM.  The vast majority of relatives at this level were known beforehand, or otherwise targeted tests.  In the case of the tester who only shares 32 cM, they share that cM with one of 1st cousins.  The rest of us — my siblings and I — and my 1st cousin’s siblings share a much more “typical” amount of DNA with the tester, around 110 – 140 cM.  And since we all match at the full-sibling level, and at the full 1st cousin level, it (so far!) appears that the 32 cM is just due to the randomness of DNA inheritance.

One thing I did in the beginning was submit a relationship for each vendor.  (My father, for example, has tested at 23andMe, FTDNA, and Ancestry, as have I.)  So, originally, I submitted 3 different sets of father/daughter data.  (Obviously, the cM count varied in only minor amounts.)

Since mid-2017, though, I only submit once no matter how many places the two testers have tested at.  (Blaine does ask for the vendor name when you submit.)

In general, I have said no endogamy — but that is based on what I know of the relationship.  Who knows?  With enough research on certain lines, I may find that indeed there was endogamy.

I also, for my own interest, track “expected” DNA shared with actual DNA shared (assuming grandparents and uncles/aunts share an average of 25% with the tester, first cousins share an average of 12.5%, 1st cousins removed (1C1R) 6.25% and 2nd cousins 3.13%.)  It never fails to amaze me how my sister, brother and I have such variations in the amount of cM shared with a given targeted cousin.

How do you track your submissions, if you are submitting to the Shared cM Project?  Are you concerned with not submitting twice — or do you figure it will all average out in the end (certainly a possibility)?

Cite/link to this post: Cathy M. Dempsey, “The Shared cM Project: Tracking individual contributions” Genes and Roots, posted 31 Mar 2019 (https://genesandroots.com : accessed (date)).

 

Ancestry ThruLines: Analysis of my mom’s lines

Yesterday I read Roberta Estes’ blog post on ThruLines, which you can read here.  It’s amazing how quickly she can research and walk you through new DNA tools that come to light!  I adopted my own version of her spreadsheet, a snippet of which you can find on that same blog post.  

Rather than focus on my own ThruLines, I focused on my mother’s ThruLines.  Here is the tree her DNA is linked to.  Note that I have not done any work on Mom’s paternal side (Italian lines) — but I do have the tree out to her 4 Italian great-grandparents.  I feel confident about Maria Bolognesi’s parents, and about Giuseppe Diamantini’s father.  The name Maddelena Serafini comes from another branch of the family, without attendant documentation, so it may or may not be correct.

Mom_Tree

Below is a screenshot of Mom’s closest ancestors who have ThruLines.  Note that Maria Bolognesi, her paternal grandmother, is missing. I have no idea why.  Mom’s closest match at Ancestry — after my sibling and I — is her paternal 1st cousin, who would likely share DNA with mom from both the Diamantini line AND the Bolognesi line.

ThruLineAncestors

Speculation on my part as to why Maria Bolognesi is missing is that there are 2 other DNA matches to Mom and her paternal 1st cousin (alias “Elena”) who match them on the Diamantini side.  Except for Mom’s siblings (who have not tested) and “Elena’s” sibs (who also haven’t tested), no other Bolognesi kin is known to be in the U.S.  Perhaps this is why Ancestry ThruLines are focusing on the Diamantini side??

Another possibility — again, this is speculation on my part — is that my mom and “Elena” share a relatively low amount of DNA (619 cM) for full-blooded first cousins.  The paperwork (birth certificates, marriage licenses, family tradition, family resemblances, etc.) indicates full first cousins, but Ancestry is treating them as half 1st cousins, presumably because of the amount of DNA shared (?).  Could that be why Ancestry has deemed them half 1st cousins, and thus ignored their shared grandmother?  (Both have the grandmother in their trees, so it’s not a lack of matching, as far as I can tell.)

ThruLines links Mom’s Serafini line specifically to one Ancestry member tree.  This particular member either has not done a DNA test, or simply does not match Mom at all.  However, this person has over 400 Serafini persons in their tree; it appears the tree includes all the Serafini families from one specific community in the Abruzzo region of Italy.  (Abruzzo borders the Adriatic Sea, and is just south of the Marche region, which is where my known Italian ancestors are from, and where known kin is living now.)

This Ancestry member’s tree with 400+ Serafini persons in it was a source tree for the tree created by the wife of a known second cousin to Mom on Mom’s Diamantini line.  No other sources (such as baptismal records, marriage records, censuses, etc.) are shown in either tree.  All 3 trees, though  — meaning Mom’s, the 400+ Serafini tree, and the 2nd cousin’s wife’s tree — have a “Maddelena Serafini”.  (She is married to someone different in each tree.)

The Abruzzo region connection with Serafini is intriguing; however, there is nothing else to go on, given no sources to review and validate for all of these names.  

Ancestry ThruLines, though, provides Mom with 42 potential new ancestors, 20 of whom are supposedly on her Serafini line (as shown below in the screenshot of Excel).  I say “no DNA matches on Ancestry to this line” referring to the fact that the trees Ancestry used to determine these 20 potential ancestors are trees of members who share no DNA with my mother.

Mom_PotentialAncestors

Below is the screenshot for how I  broke out Mom’s 254 possible ancestors through the 7th generation (through 5th great-grandparents).  Yes, her tree has a lot of blanks in it; 201 ancestors are not in her tree at all.  The bulk of those, though, are on her father’s Italian side.  By contrast, her most complete line is her 2nd great-grandfather Copple’s line, with only 5 persons missing from the tree.

Mom_ThruLines

So, the numbers that truly matter relate to the 53 ancestors who are in her tree.  Note that 20 ancestors have no known DNA matches in Ancestry; they are recently immigrated (late 1870’s) from Denmark — now Germany — and had small families with no living descendants today except for Mom, her kids and her grandkids.

The 3 missing ancestors are her paternal grandmother and parents of that grandmother.  Claus Clausen, Mom’s 4th great-grandfather and in her tree, was replaced by a Claus Clausen from a tree whose owner is not a DNA match.  Mary Addams in Mom’s tree was also replaced with another Ancestry member’s Mary Addams.  Mary was the likely stepmother of Mom’s direct ancestor, James Englehart, having married Samuel Englehart in Guernsey County, Ohio, some 5 years after James was born in Pennsylvania.

Regardless of her genetic relationship to us, Mary Addams was already in Mom’s tree, so it’s not clear why she was ignored in favor of someone else’s tree.

Moms53Ancestors

The 28 ancestors in Mom’s tree with DNA-match descendants are primarily the ancestors who have been in the United States the longest, since at least 1730 in some cases, to the best of my knowledge.  All of them are ancestors of my mother’s maternal grandmother, Hazel (Englehart) Holst. Hazel’s paternal grandmother, Hannah (Hill) Englehart, and Hazel’s maternal grandfather, Ben Franklin Copple, have the most-complete branches on Mom’s tree.  They are indicated by the blue check marks.

Many of these DNA matches also currently show up in my mom’s DNA circles for some of these same ancestors.  A number of the relationships I feel fairly confident about, having done my own documentation of the relationships involved. 

However, some of the trees used in these ThruLines I believe are incorrect — especially regarding Philip Copple, Mom’s 4th great granddad, who is, in many Ancestry trees, mixed up with his cousin Philip.  Both had daughters named Catherine, and named Margaret.  Assignment of the daughters to the fathers is, frankly, a mess!  (And it was a mess showing up in Shaky Leaf hints as well as the Philip Copple circle.)

HillLineCoppleLine

The bottom line is that I see a flood of Serafini potential ancestors, which would be awesome if I actually do some Italian research and trace my (reported) Serafini line.  Maybe that 400+ Serafini tree does have accurate — if undocumented — information.

I also know I cannot trust ThruLines any more than I trusted DNA circles or shaky leaf Shared Ancestor Hints.

And I suspect I will find similar issues when I explore my dad’s ThruLines shortly.

All that said, I saved the best for last…. thank to ThruLines, I just found out that possibly one more of Jacob Copple’s 7 children (who lived to adulthood and had descendants) may actually have a descendant alive today who also DNA-tested and matches Mom!!  I will be working to validate this match’s tree if I cannot connect with the person.  (See below.) I had thought Milton’s descendants were all deceased by the 1940’s.  If this proves out, 6 of the 7 children who had descendants (and 6 of 9 who lived to adulthood) not only tested but match Mom.   

This matters to me because Libby Copple was my original brick wall; oral history indicated she was a “Copple”.  It has only been with DNA testing that her likely father, Ben, and his family have been revealed.

JacobCopple

Cite/link to this post: Cathy M. Dempsey, “Ancestry ThruLines: Analysis of my mom’s lines” Genes and Roots, posted 12 Mar 2019 (https://genesandroots.com : accessed (date)).

 

23andMe Ethnicity Update

If you’ve tested at 23andMe, have you checked out your ethnicity results lately? 

In a recent post[1], Judy Russell mentioned 23andMe’s latest ethnicity update, which somehow I missed completely!

Naturally, I had to go check it out, fearing a bit that my ethnicity percentages might be “messed up”.  Even though I know they are estimates, 23andMe has for some time had the percentages closest to what would be expected by my family narrative.  My dad is “all Irish”; my mom is “half Italian” due to her father being from Italy.  Et cetera, et cetera.

23andme_ethnicity

Very little has changed in my ethnicity percentages.   Here, I’ve noted in an Excel spreadsheet my former ethnicities per 23andMe (as of November 2018) and my current ones as of today when I reviewed the changes.

What is interesting, though, is that they seems to have taken a page from Ancestry’s “genetic communities” playbook, and zeroed in on specific areas in Ireland, Britain and Italy where my ancestors possibly lived in the past 200 years.

Let’s take a look.  We’ll start with Ireland.  On my paper trail, both my dad’s parents have Irish roots.  My paternal grandfather’s family left Ireland, depending on the branch of his tree, around the time of the Famine and shortly after – say, the 1850 to 1865 range.  My great-great grandfather, Patrick Dempsey, reportedly came from Kings County (now Co. Offaly) – per his obituary.  I don’t have more details than that.  His wife Hanora Hurley (or is it Hanora Riordan) – whom he married in the U.S. — may have come from anywhere in southern Ireland.  Best guess is Co. Cork or Co. Limerick.  On my grandfather’s maternal line, her father’s Lamburth ancestors likely came from England, while her mother Eliza (Landrigan) Lamburth came from the town of Garryrickin, Windgap Parish, Co. Kilkenny.[2]

My paternal grandmother’s father came from Athea, Co. Limerick, as did his father, while his mother came from Cooraclare, Co. Clare.  My grandmother’s mother came from Athea, Co. Limerick, as did her father, with her mother coming from Beale, Co. Kerry.[3]

In sum, my Irish heritage on my Nana’s side is from the province of Munster, specifically southwest of Ireland, around the River Shannon, while my Grandpa’s Irish heritage is from the province of Leinster, specifically Co. Kilkenny and Co. Offaly.

And 23andMe’s ethnicity determination – for the moment at least – largely agrees.[4]

23andme_irishethnicity

County Kerry, County Clare, County Limerick and County Kilkenny are all in the top 10.

As far as Great Britain/the U.K. is concerned, I have no idea where my ancestors came from.  My paternal grandfather’s Lamburth line, here in the U.S. since at least 1800, likely came from England but none of us researching this line have yet “crossed the pond”.  My mother’s maternal grandmother’s Wright line has been here in the U.S. since at least 1730 or so; researchers on this line have not yet crossed the pond either.  Here is what 23andMe estimates[5]:

23andme_ukethnicityPerhaps these areas could be clues, but it would be silly to jump ahead of myself and start researching Wrights and Lamburth/Lamberts over in England without knowing more about the family here in the U.S. in the 18th century.  The references to Scotland surprise me a bit, but could be related to the Gaelic / Celtic heritage of my Irish side.

With respect to Italy, my grandfather’s parents came from the province of Marche.  My great-grandfather was from Fano, and my great-grandmother was from Sant’Elpidio a Mare[6].  Some of us in my family have even gone to Marche and met our living cousins – that’s a story for another blog post.

Here is what 23andMe estimates[7]

23andme_marche_ancestryPretty wild, huh?  Marche!!  Still have to take it with a grain of salt – my brother’s estimated places of origin in Italy are completely different from mine – but still, right now, today, it “fits”.

 

 

[1] Judy G. Russell, “And still not soup…,” The Legal Genealogist, posted 27 Jan 2019 (https://www.legalgenealogist.com/blog : accessed 28 Jan 2019).

[2] For sources, see cathymd, “Dempsey Family Tree“, Ancestry.com (https://www.ancestry.com/family-tree/tree/17377380/family : accessed 26 Dec 2018).

[3] Ibid.

[4] 23andMe, Inc., “Cathy, your DNA suggests that 56.8% of your ancestry is British & Irish”, 23andMe.com (https://you.23andme.com/reports/ancestry_composition_hd/british_irish/ : accessed 29 Jan 2019).

[5] 23andMe, Inc., “Cathy, your DNA suggests that 56.8% of your ancestry is British & Irish”, 23andMe.com (https://you.23andme.com/reports/ancestry_composition_hd/british_irish/ : accessed 29 Jan 2019).

[6] For sources, See cathymd, “Serafini_Diamantini1“ tree, Ancestry.com (https://www.ancestry.com/family-tree/tree/19505554/family : accessed 29 Jan 2019).

[7] 23andMe, Inc., “Cathy, your DNA suggests that 12.6% of your ancestry is Italian”, 23andMe.com (https://you.23andme.com/reports/ancestry_composition_hd/italian/ : accessed 29 Jan 2019).

NodeXL Clustering for Mom’s Ancestry matches

I posted my dad’s NodeXL clustering results a few weeks back (here).  As promised, now I am posting my mom’s NodeXL clustering results, focusing on just a few of the most intriguing (puzzling?) aspects.  (You can read a step-by-step how-to on using NodeXL to cluster your Ancestry matches here, at Shelley Crawford’s blog.)

Mom’s matches for this clustering exercise were limited to those with 15 cM or greater shared; it simply gets too cluttered if I include everybody down to 6 cM.

Also in the photo below I have turned off the display for all clusters with less than 4 people.  (NodeXL’s algorithms will cluster in groups of two, while other algorithms like Jonathan Brecher’s Shared Clustering tool use three as a minimum.)  

mom_clustering_mostgroups

Let’s look first at “Group 13”, the cluster at the bottom in navy blue that looks like 2 separate clusters to me.  (I don’t fully understand how the algorithm works.)  Below is group 13, zoomed in and with inter-group links turned off so you can look at the cluster itself more closely.  Clearly, only one match links to both halves of this group.  So, they’re not related as closely as one might think.  

mom_clustering_grp13

The additional photos below bear out that theory.  On the left, “Cousin X” is highlighted; you can see that “X” shares a match with only 2 people (in addition to my mom).  On the right, “Cousin B” is highlighted.  “Cousin B” only matches others in the one subcluster, and nobody in the other subcluster.

Another group that looks intriguing is one to my mom’s cousin “Sally Sue” (alias) who is fairly closely related to Mom.  (You can tell she is more closely related to my mom by the size of the blue square.  These matches look like a hub and spokes.  “Sally Sue” is in the middle with the largest square; the others are more distantly related to my mother.  (As an aside, the option to size the squares or dots by the shared cM amount is available in the NodeXL tool, but is not automatic.)

“Sally Sue’s” group, shown below with the outside links removed, is one in which she matches every single person in her cluster, but each of them only matches her (or, not shown, at least one person in a different cluster.)  

mom_clustering_hub and spokes

The last cluster that is intriguing is shown below.  This cousin, let’s call her Jane, appears to be in the “wrong” cluster.  While she does have matches in her own cluster, she has many more matches in a different cluster. 

mom_clustering_1cousin_whyingrp7

One reason this might happen is that Jane and Mom could share DNA on, say, chromosome 1 (possibly with others in her group); the cousins in the other cluster could share DNA with mom on, say, chromosome 9, and then share DNA with Jane on chromosome 4.  We don’t know for sure, since we don’t have segment info.

However, since clustering my mother’s matches in NodeXL and starting the draft of this post, I used Jonathan Brecher’s Shared Clustering tool, which groups “Jane” with the cluster where she has most of her matches. 

On the face of it, that makes more sense.  However, seeing “Jane” in a separate group (as below) could be useful for realizing that she may be connected on a different ancestral to my mother than the bulk of her matches.  This suggests I need to be careful in analyzing Jane’s tree and ancestral surnames, vis-a-vis the matches in the other cluster.  

In fact, I am finding that it is useful to cluster your shared DNA matches with more than one tool, as each uses different algorithms.  (More on other clustering methods in a later post.)

Feline Fridays — with a twist of DNA

I did a DNA test for my cat Simba back in 2017 (available here for a nominal fee), and below are the results of Simba’s ethnicity.[1]  He is descended from Western European cats!  

Now, the real reason I did a cheek swab on Simba — he was less than impressed with being swabbed, by the way — was to donate his genome to science.  At the time I had read (somewhere, can’t remember now where) that there were many more samples of canine genomes than feline genomes for scientific research, so I decided to contribute on Simba’s behalf.

simba ancestry

Below is Simba with his best buddy Leo. Both were adopted as “teenagers” (over 6 mos. old but less than 1 year) from the local city shelter in December 2013, shortly after our long-term feline companion Rory died.

imag4093 - copy

[1] Veterinary Genetics Laboratory, UC Davis School of Veterinary Medicine,
“Cat Ancestry [Report] Simba”, case CAT92330; (https://www.vgl.ucdavis.edu/services/cat/ancestry/ : accessed 1 Mar 2017).

 

Ancestry, Ancestry, how does my garden (of DNA matches) grow?

At the end of each month, I use the DNAGedcom client to download my matches from Ancestry, as well as those of my father, my mother and (one of) my sibling(s).

Here are our 4th cousin level (generally, 20.0 cM or more total  DNA shared) and our total match counts as of December 30, 2018.   My dad has nearly twice as many total matches as my mom, likely because of his 100% Ireland/British Isles ancestry, while mom’s father, a 1st-generation American hails from the Marche region of Italy.  Not much DNA testing going on with that side of her family.

ancestrymatchcounts

Notice, however, that even though my mom’s total matches is roughly half that of dad’s, she has almost as many 4th cousin level matches as he does.

Let’s take a look at a few of the numbers, percentage-wise.   You can see that my dad and my sibling are tracking at the same ratio of 4th cousin level matches to total matches.  My percentage is a bit smaller, but mom’s 4th cousins are just over 3% of her total matches.

ancestrymatchtrends

I suspect the reason for Mom’s higher percentage of 4th cousin matches is due to the fact that she has 4 great-great grandparents whose ancestors have been in the United States since the 18th century.  Dad, on the other hand, has only 2 great-great grandparents who have been in the U.S. since the 18th century.

How fast have the numbers grown over the years?  It will vary for each person who does a DNA test with Ancestry.  In general, if you are of European heritage, and you have many colonial ancestors, you will have a lot of matches, and a lot of matches at the 4th cousin level or closer.   On the other hand, if you have ancestors that immigrated to the U.S. in the 20th century (as I do, and my mom does), you’ll have proportionately fewer matches.

That said, the number of my matches increases daily!  I’ve been averaging more than 17 new matches per day in the last 3 years.  My dad (not shown here) is averaging over 31 new matches per day in the last 3 years.  I suspect those with extended colonial roots have an even greater number of matches coming in, as more and more people test.

cathy_ancestrymatcheschange

Have you taken a recent look at your match totals?

 

Pictures Really ARE worth a thousand words (or more!)…

I’ve been struggling to make sense of — or, more accurately, wisely use — my dad’s matches at Ancestry to extend some of his lines.  Dad has one great-grandparent who was born in the U.S.; the others were all born in Ireland (where all but three remained throughout their lives.)  So, I’ve long thought most of my dad’s matches are not easily assignable to one of his great-grandparents because there is much I don’t know about the aunts/uncles/first cousins of those ancestors.

Now, that may still be the case to some degree, but I did have an eye-opener when I used the NodeXL template with Excel to cluster my matches.  NodeXL is a template for graphing your networks (often in reference to social media)  — see here. I found about the tool from reading Shelley Crawford’s blog Twigs of Yore; she has an entire step-by-step series on how to create visual networks of your Ancestry DNA matches using NodeXL and Excel. (An indexed version is here.)

So, I downloaded my dad’s matches at year-end from Ancestry using DNAGedCom, and loaded the data into the NodeXL template.  I limited the number of matches to those who share at least 17 cM with my dad; I also did not include my brother or me as matches, nor my paternal 1st cousin.

The reason you want to exclude close matches is  because they will match so many people you (or your target person) that there will be connections all over the graph, and you won’t be able to discern any useful information.

For this same reason, I also excluded children and grandchildren of matches, for those cases I know about.  (As a disclaimer, just to be clear, with Ancestry’s matches, I have no way of telling if match A and match B are, say, child/parent to each other — unless I personally know A and B, or unless I’ve “met” online regarding our shared matches, and they’ve shared that with me.)

That’s the context; here is the first picture of Dad’s top 1,000 (or so) matches clustered into the top groups.

dad_ancestrymatch_clustering_majorgroups

The bigger dots represent the closest genetic connections to my dad.   Big dots exist in the navy dot group (upper left), the turquoise group (lower left) and the kelly green group (upper right).

The grey lines denote connections, both within groups and between groups.  In one easy glance, one can determine that the group most tightly related to each other is the group on the top row with dark green dots.  It looks like a web.

As far as inter-group connections go, the turquoise dot group seems to have the most connections with other groups.

So, when I highlight the turquoise group, what do I find?  Connections to most every group of matches my dad has — except for the navy blue group.   Which is kinda cool — but so what?  Unless you know something about the matches within the group.

dad_ancestry_match_lamburthcluster_all her lines

So, the matches in the highlighted group above are all kin to my dad’s great-grandfather, Archibald Lamburth (born c. 1833 Tennessee – died 1909 San Francisco).  He has the distinction of being my dad’s only great-grandparent born in the United States.  Given that the bulk of Ancestry’s DNA customers are U.S.-born, and that many with colonial ancestry say they have many thousands of matches, I suspect most of these connections will tie back to 18th-century U.S. and the colonies should I ever break this “brick wall”.

My second surprise was looking at the navy blue group.  Other than the one outlier I have yet to explore, all the matches are intra-group matches.  This group includes known close relatives of my dad’s maternal side.

dad_ancestry_match_nanacluster_all her lines

My dad has matches to his maternal grandfather‘s side (and his parents AND grandparents), as well as to his maternal grandmother‘s side (and her parents), the clustering algorithm does not distinguish between the two lines — at least based on the current population of matches used.

I may need to do a separate analysis on these particular matches — perhaps bringing down the filter to 15 cM — to see if I can break out that group into Maternal Grandfather and Maternal Grandmother.

Right now, the only useful information is that my dad’s mother’s matches and my dad’s father’s matches are separate.  They weren’t related to each other, based on the information we currently have — the above graphs, plus the genealogy I’ve already done.

The next picture, below, shows how some close genetic relatives (> 275 cM shared, in this case 1st cousins 1 generation removed), share matches with other groups.  This cluster could be a Dempsey cluster, with ties to Lamburth kin.  Which makes sense in my family tree since a Dempsey married a Lamburth.

dad_ancestry_match_bartjones billydodge cluster_their daughters_dempseylamburthlandriganhurley

Notice also that the group is somewhat open, like a child’s scribble.  Not everyone within the group is closely connected to everyone else in the group.

An example of a tightly-connected group is below. This is the group with dots in chartreuse green. Right now, I have no idea how they fit into the family tree.  It’s pretty much a self-contained group, with minor ties to the Lamburth (dad’s paternal grandmother’s side) group, but nothing significant.   Yet.

dad_ancestrymatch_clustering_group8

That was a look at my dad’s clustered Ancestry matches; sometime in the near future, I’ll take a look at my mom’s clustered Ancestry matches using the NodeXL tool.

 

Clustering your Ancestry DNA matches with Excel (and DNAGedcom)

There are more and more good visualization tools available for clustering your DNA matches with the intent of discovering a new ancestor.  Recently I’ve been using a clustering tool created by Evert-Jan Blom at Genetic Affairs (more on that tool in an upcoming post). 

The DNA Color Clustering method used by Dana Leeds clustering methodology is straightforward, and especially effective for those persons who have many 2nd and 3rd cousin matches on Ancestry — which I don’t.  (Although it actually works quite well for more distant cousins, in my opinion, especially if you’ve been working on clustering your matches for several years!)  You can find out more about Dana’s method here.

Despite these cool clustering methods — and others — in the end, I keep returning to my trusty Excel spreadsheet and my list of “ICW” (In Common With) matches from Ancestry.com which I download using the DNAGedCom client tool (available here via a yearly subscription).

I’m sharing my way of clustering my matches — or, more specifically, my mother’s matches and my father’s matches — because the “best” method is the one that makes the most sense to you, or seems the most “intuitive”.

Mom_RitaShared

Some of Mom’s shared matches with “Cousin B”, on Ancestry

Let’s say I’m working with my mother’s DNA matches from Ancestry.com.  Using the DNAGedcom Client tool, I will download a list of all her matches, and then download a list of all her “ICW” matches into CSV format. 

Default ICW file

This is a sample of the default ICW file, before I combine it with the default Match file.

Default Match file

This is an abbreviated sample of the default match file.  The columns of interest are “Range” and “SharedCM”.

Once I have the two files, I use the VLOOKUP tool in Excel to associate (Cousin) Range and SharedCM to the primary match, and then to the In Common With matches.  The result is a combined file like that below.  The combined columns are highlighted in green.

Combined File

The “Mtch cM” and “Mtch Cousin” columns associate to Cousin B; the “icw cM” and “icw Cousin” associate to the ICW match: me, my brother, and cousins C, D, E, F, G, and H.  Shared cM (centiMorgans) = shared DNA; see my previous post here for more on centiMorgans. 

For purposes of clustering, though, all we really care about is that in general, the more DNA you share, the closer you are related — at least in the case of 2nd cousins or closer.  You can see that to some extent with Ancestry’s predicted ranges in the green highlighted columns.

The In-Common-With (ICW) list is basically a subset of your matches list.  My mom’s paternal first cousin — let’s call her “B” — has also tested at Ancestry.  So, Mom’s ICW list for “B” would include me, my brother, and six other cousins: C, D, E, F, G, H.  (Mom’s father was a first generation American, and “B”‘s father was born in Italy — not a lot of our Italian side, many still residing in Italy, have tested their DNA on Ancestry.  Hence, we don’t have a lot of matches.)  The critical point is that C, D, E, F, and G as well as my brother and I would show up on Mom’s match list AND on B’s match list — we are the “in common” matches.

So, if Mom and cousin “B” are first cousins, their Most Recent Common Ancestor(s) (MRCA) would be their shared set of (Italian) grandparents: Guiseppe Diamantini and Maria Bolognesi.  Obviously that same couple would be the great-grandparents of my brother and me.  But my brother and I are not the interesting cousins in the ICW cluster.  Cousins C, D, E, F, G and H are the key here. 

Mom_DNAGedCom_Example2

Let’s look at the example above.  I “cluster” my mom’s DNA matches by adding two columns (shown here highlighted in red).  Because I know my mom and Cousin B share the same set of grandparents, I put the MRCA couple’s name in the “Mtch MRCA” column for each row where there is an In Common With cousin.  (Note that, despite Ancestry’s prediction that my mom and Cousin B are 2nd cousins, they are in fact 1st cousins.)

The amounts of DNA shared, shown in the “Match cM” column and the “icw cM” column are the amounts Mom shares with these cousins.  We cannot determine from the information shown here how much, if any, “B” shares with “F”, or “C” shares with “D”.   We only know C, D, E, F, G, H not only share DNA with Mom, but MUST also share some amount with Cousin B because Ancestry has given us that information.

I then look at each of the ICW cousins: that is, my brother and I, plus cousins C through H.  I note that my brother and I are children, which means our DNA amounts won’t have any new information to determine cousin clustering — because whatever we share, we inherited from Mom.  (You can always exclude known children of a DNA match when you’re working with clustering, because they will always be a subset of their parents — if you have your parents or grandparents tested.)

Cousins C and D are two people whose place in my mother’s family tree I already know — therefore I include their MRCA information (Fortunato Camillucci and Maddelena Serafini).  They are my mother’s cousins on her Diamantini line.  Since the Diamantini line is my mother’s paternal line, I shade it blue for male.

Cousins E, F, G and H are unknown to me.  In this case, none of them have trees on Ancestry which might give me more detailed information as to how they relate to my mother.  The amount of DNA shared is fairly small, so it is possible the Most Recent Common Ancestor (MRCA) with Mom is quite a few generations back.  So I note them as “Diamantini or Bolognesi” (as I don’t yet know whether they share on the Diamantini line or the Bolognesi line) and also shade the cell in blue.  I leave those notes unbolded, since I’m not certain of how the cousin actually fits into our tree.

I then do the same thing with each of the other cousins listed here.  Below is a screen shot of the In-Common-With listing for Mom and Cousin C.  Note that there is some overlap with the In-Common-With listing for Mom and Cousin B, but there is one person who shares DNA with Mom and Cousin C, but who does not share with Cousin B.  I labeled that person Cousin J (highlighted in bright yellow.)

Bree_DNAGedCom_Example1Because the Most Recent Common Ancestor between Mom and Cousin C is the Camillucci & Serafini couple, I then use those names to populate the cell in the icw MRCA column, as shown below.

Bree_DNAGedCom_Example2

Mom doesn’t have that many matches on Ancestry.com to her paternal side, in part because her father was a 1st generation American.  A better example of the clustering is shown below, with one of her 4th cousins.  The shared Most Recent Common Ancestor between Mom and cousin “K D” is Jacob Copple and Margaret Blalock.

Cousin KD

I have hidden the names of the In-Common-With cousins, but you can see the amount of DNA they share with my mother.   What this screenprint shows is how the different In-Common-With cousins have different Most Recent Common Ancestors with Mom.  But all of them are related in some way to either Jacob Copple or Margaret Blalock.  Philip Copple and Patsy Wright, for instance, are the presumed parents of Jacob Copple.  Patsy Wright’s presumed grandparents are Richard Wright & Ann.  Ben Copple is the son of Jacob Copple & Margaret Blalock, while Nicholas Copple & wife are the likely paternal grandparents of Jacob’s father Philip.

A different cousin of Mom’s who also descends from Jacob Copple & Margaret Blalock possibly inherited some of Margaret (Blalock) Copple’s DNA.  You can see that in the ICW MRCA column below, where some of the In-Common-With cousins (names are whited-out) appear to have Blalock / Blaylock lineage.  One of the cousins who shares DNA with both Mom and “M M” is fairly closely related to Mom; you can tell that by the amount of DNA shared (140.4 cM) and the MRCA = Sam Englehart and Libby Copple.  Libby Copple is the granddaughter of Jacob Copple & Margaret Blalock.

Cousin MM

All in all, this is just one more method of using color coding and Most Recent Common Ancestor information to figure out how your unknown matches may be related to you.  It’s not an absolute — it’s just a hint.  But it gives you something to work with.