4th Cousins on Ancestry.com: a quick study


Last week in the “Genetic Genealogy Tips & Techniques” group on Facebook, Blaine Bettinger posted a study of his own 4th-cousins-and-closer matches on Ancestry.com which can be viewed here.   I decided to do the same.  These are my results:

Cathy’s 4th cousin (and closer) matches on Ancestry.com

Matches which are included here are matches who, in general, share at least 20 cM of DNA with me (although I have some matches at the 20 cM level who are labeled as “distant” cousins).

The “Amt DNA” information does NOT come from Ancestry; it comes from having done a process called “chromosome mapping” or “visual phasing” and it required the DNA results from both my parents, as well as from my sibs, compared against that of my 1st and 2nd cousins who have tested. On my dad’s side, the amount shared skews towards my grandmother, in part because one of my X chromosomes comes from her and her alone.

The number of matches sharing >= 50 cM with me also skews towards my paternal grandmother because 2 of my dad’s 3 maternal 1st cousins have tested at Ancestry, as well as some of their children and grandchildren. All are no more distant than 2C1R to me. (Note: in that figure I do not include my dad, my sibling, or my paternal 1st cousins — since they share both paternal grandparents with me.)

However, in total numbers of matches, my two grandparents with “colonial” ancestry (and by that I mean roots in the U.S. at least as early as 1790 — but not necessarily as far back as, say, 1650), are the ones with the most matches. That seems to correlate with what I’ve heard from others who have tested at Ancestry. My paternal grandfather has one line — his maternal grandfather — that is “colonial”. My maternal grandmother has 2 lines — both of her maternal grandparents are “colonial”.

I compared the paternal and maternal labeling, but it doesn’t tell me much, in my opinion. Ancestry only labels the DNA match as paternal or maternal if the match is >= 20 cM for both parent and child. Where there are differences in the totals, it is due to the match being >= 20 cM for me, but not for my parent. That’s an artifact of the computer algorithms.

Finally, tree availability in and of itself may not be the be-all end-all for matches. 85% of the matches I identify as paternal unknowns — I cannot discern which grandparent they are kin to — have public trees. The trees have done nothing to help me figure out how that match is related to me! Any suggestions?

DNA Match Changes at Ancestry.com

As is the case with everyone who has had a DNA test at Ancestry, my small matches are gone.  (However, I did go through the match list of my parents and my sibling and 1st cousins, “saving” small matches that were of interest (like “Thru Lines” matches) by marking them with a group identifier or making notes.

For me, the issue was removing old notes for small matches where I had indicated “doesn’t match mom or dad” so those false matches would not be saved!

The blessing of having both your parents alive and willing to test means you can check any of your own matches to validate whether they match one of your parents (if your parents give you collaborator access).  I had already determined — via 3rd party tools — that over 25% of my matches were invalid.  Meaning they didn’t match one of my parents.

So, all in all, I’m not at all upset at losing matches.  Especially if it speeds up server response time.

How many matches did my family members and I lose?  Over 50% in each case!

Ancestry Match Counts 912020

The reduction in matches (everyone with < 8 cM of DNA shared) isn’t the only change. Ancestry also updated the number of shared segments with your matches. Mom and Dad still show more than the 22 autosomal segments they share in actuality, but it’s a lot closer. You can see that all he segment numbers go down for my matches with my closest kin.

My segments with my father were always fewer than with my mother. One reason is that there are fewer recombinations passed down from males, as I understand it. Another reason may be that my dad and I tested back in 2012, and therefore tested under a different version than my mother, who tested years later.

Here’s a list of my mom’s top matches, noting old number of segments compared to new number of segments. Segment number only changed when appropriate, so some of these 50 cM matches show no change.


The last change at Ancestry DNA was the addition of longest segment information.  From what I’m hearing, this feature will be most useful to those who have significant endogamy in their ancestry (Acadian French, Ashkenazi Jewish, etc.)  However, it can be useful if your match has tested elsewhere, and you have the chromosome segment information.   

For the match below who has tested elsewhere, I already know that my mother’s (and mine, for that matter) primary segment match is on chr 9, and is hugely long (60 – 90 cM) per other vendors.  So, seeing the below information validates that Ancestry shows the match on chromosome 9 as well, despite the fact they don’t tell you where you match.  

The longest segment is calculated before Ancestry’s algorithms massage the data by removing “pile-up” regions (shared by many people) which are not considered genealogically relevant.

Have You Seen This? (News related to Genealogy)

I just came across these genealogy-related news items this weekend and thought I would share.

DNA Testing — an abandoned baby found 55 years later?
I’ve been intermittently following the story of Paul Fronczak, the baby who was abducted from a Chicago hospital in 1964.  An abandoned toddler was found 2 years later, was declared to be Paul, and then raised by Paul’s parents.  “Paul” did a DNA test in 2013 only to discover he was not biologically related to the parents of the baby originally abducted.  The Fronczak story is again in the news: it’s possible that the biological son of the Fronczaks has been found — again due to DNA testing.  More here , here, and here.

Dallas City (Pauper’s) Cemetery
In the 22 December 2019 paper edition [1] of the Dallas Morning News (and online, dated 20 December [2]) is an article about Dan Babb (a software programmer by trade and genealogist by hobby) who is working to identify the 2,000 + graves of persons (including infants and children) buried here from 1933 to 1978.  Part of the site is regularly flooded during heavy rains.  Babb is posting memorials for this cemetery to FindAGrave.



[1] “A Mission to return this to place of rest,” The Dallas Morning News, (Dallas, Tex.), 22 Dec 2019, Metro section, p. 1B, col 4.

[2] Robert Wilonsky, “No one cared about them,” commentary, 20 Dec 2019, The Dallas Morning News (Dallas, Tex.); (https://www.dallasnews.com/news/commentary/2019/12/20/no-one-cared-about-them-in-life-or-death-why-one-man-fights-to-restore-dallas-old-paupers-cemetery/ : accessed 22 Dec 2019).

Shared Clustering Tool and NodeXL– my mom’s match to her 4C

The other day I posted about how some of my Ancestry DNA matches looked on in the Shared Clustering Tool.  Today I’m comparing that same cousin — my 4th cousin 1 removed and my mom’s 4th cousin — against my Mom’s Ancestry DNA matches both in the Shared Clustering Tool and Node XL.

Cousin “Jane” (as I’ll call her) shares a set of 3rd great grandparents with my mom: Jacob Copple and Margaret (Blalock) Copple.  She shares 71 cM in 4 segments with my mom, according to Ancestry.  I can see 3 of those segments clustered in the Shared Clustering tool.  One segment appears to tie to matches with a Blalock/Blaylock in their tree and/or a segment on chromosome 9 (based on those matches who are also on 23andMe, FTDNA, MyHeritage or GedMatch).  A second segment matches another possible Blalock segment, likely on chromosome 13.  Finally, a third segment cluster is with matches whose MRCA is likely Jacob Copple’s parents (Philip Copple & Patsy Wright) or grandparents.


The orange line vertical and horizontal (in both pictures) represents cousin “Jane”.  The three blue arrows above show the three main clusters she shares with Mom and with other matches of Mom’s.

Below is a zoomed-in look at the “chromosome 13” segment cluster.


Below is the likely chromosome 9 cluster.  The blue labeling in the rows and columns represent matches who have a Blalock/Blaylock in their own trees.  (Of course, the shared DNA may be due to another family line altogether, but the evidence at this point seems to be hinting at Margaret Blalock’s line rather than her husband Jacob’s.)  


Can I see three clusters for “Jane” using the Node XL tool?  Actually, yes, I can.  The Node XL tool is not as intuitive to use as the Shared Clustering tool, and I don’t know the algorithms behind either, but it’s reassuring when different clustering tools give somewhat similar answers!

Cousin “Jane” is highlighted in red.  She is based in the green group, and matches the hunter-green group, the chartreuse group, and a whole bunch of my mother’s matches in the gold group.  The Node XL clusters are limited to Mom’s matches of at least 15 cM.

Mom_4C_LF dewtru_NodeXL

I haven’t done enough research with the groups in the Node XL tool, but I was intrigued by “Jane’s” cluster.  It looked like there were actually two groups — and sure enough, there are two groups, as you can see below.  I’m not sure why the cluster was not split out in a definitive manner, as there is not a lot of crossover between them.

If you’ve used Node XL regularly, do you know why that might happen?  Perhaps it’s the algorithm used?

Finally, in addition to more study of Node XL, I need to run a clustering report on the Genetic Affairs tool, which I haven’t used much.  It would be interesting to see how “Jane” clusters with my mom’s closest matches using that tool.

Grp 4 NodeXL 20191019 using 20190820 Data

Ancestry’s Latest Ethnicity Update

Ancestry is apparently in the process of updating ethnicity percentages yet again.  I got an email today from them, and checked it out.  The change is not particularly significant for me, but keeps getting farther from the “truth” (i.e., my maternal grandfather was a 1st-generation American, born to 2 Italian immigrants.)  One of my male cousins on that side has done the Y-500 test at FTDNA; his haplogroup (which should also have been my grandfather’s) has deep roots in the Italian peninsula.

Here’s what it was as of the last change (September 2018), when my Italian was dropped from 19% to 3%:

Ancestry Cathy Ethnicity Old

That was the big shift.  The image below shows what it is now as of today.  What IS very much in line with my family history is the southern Ireland genetic communities, such as Co. Clare, Co. Limerick and Co. Kerry.  (The Irish ethnicity is all on my paternal side.)  The Germanic Europe and Northwestern Europe which appears to include Schleswig-Holstein is also in line with my maternal roots. 

It’s just the lack of Italian heritage — which shows up on FamilyTreeDNA, MyHeritage, 23andMe, and GedMatch — is really my only quibble with Ancestry’s results.  (And it may be due to Ancestry’s customer population being heavily weighted towards persons of European ancestry who have (relatively) deep roots in North America.)

Ancestry Ethnicity Update 20191023

My mother and my brother apparently have not gotten their updates yet.  If you’ve tested at Ancestry, have you seen a recent update to your ethnicity?  If so, how did it change?




How I use the Shared Clustering Tool

The other day, in the Facebook user group for the Shared Clustering Tool created by Jonathan Brecher, I saw a post about how different folks use the tool.  I mentioned capturing MRCA information and aligning it to the clusters, but thought I would expound here in a blog post.

Before I begin, I’m making one basic assumption for this post — that you’ve already started playing with the Shared Clustering tool yourself. 

First of all, I only use it for Ancestry matches at this time, primarily because that’s where I have the most matches (ditto for my mom and my dad) and because Ancestry currently doesn’t provide segment information.

Secondly, although the tool offers the option of downloading match data directly from Ancestry, I do not use that feature.  Instead, I use the match and ICW (“In Common With”) files downloaded from Ancestry via DNAGedcom.com, which is, frankly, my go-to tool. 

DNAGedCom’s CSV files are my go-to files because I’m most comfortable using Excel – one of the reasons I like Shared Clustering, actually – and because that’s how I started, and I’ve kept on.   (Long story short, had I begun by using Ancestry’s Notes feature more effectively than I did, I could save myself some time, but I do it all in my DNAGedCom match file, and then update each subsequent download using VLOOKUP.)

An example of tracking on my mom’s Ancestry DNA match list (via DNAGedCom) is shown below:


Color-coded by known MRCA.  If I’m not certain of the MRCA, based on the clustering, I add comments like “Copple kin” or “Hill?”

I upload the MRCA information to the completed Shared Clustering file via VLOOKUP since Jonathan has so nicely included the Test ID in the tool.  Usually, I will take the time to color-code the MRCA data in the Shared Clustering result file, simply so I can zoom out and easily see which cluster “belongs” to which possible MRCA.

Below you can see where I’ve zoomed out to see a fairly large clustering of my matches.  I’ve zoomed out to 10% and have highlighted 284 matches.  Per Jonathan Brecher’s Wiki, the red color indicates likely shared DNA.  The gray color indicates that, although the two matches (one in the row and one in the column) do not share DNA with each other, they likely share with a third person.  You can also see (barely) my color-coded MRCA notes on the left side of the image.


So, let’s zoom in a bit on this large cluster.

Below, notice that I have highlighted in green (as indicated by the yellow arrows) one of my closest matches (although she is a 4th cousin 1 removed).  She and I share a common ancestral couple:  Jacob Copple and Margaret Blalock, my 4th great-grandparents.  We also share 3 segments of DNA, and two of those segments are indicated here, in the cluster of red at the top left, and the cluster of red (circled in yellow).  Note the vertical line of red that merges into a vertical line of green — the red is showing me that she and I share DNA with the bulk of the two circled groups.


What does this tell us?  First it indicates two different segments of DNA, so if we go far enough back in time, it would be 2 different ancestors.   Second, she and I likely share those 2 segments of DNA.  Third, all the associated gray indicates a link between these 2 segments of DNA, so these matches are all most likely related to me via one ancestor and upstream of that ancestor.

Let’s zoom in even further and look more closely, now at my MRCA/clustering information I’ve imported from DNAGedCom.  The blue labels refer to matches who are Blalock/Blaylock descendants.  The gray labels reference a known match on Chromosome 9.


This would seem to point at the connection being on a segment of chromosome 9 and also relating to Margaret (Blalock) Copple.  This does not mean these matches share Margaret (Blalock) Copple as an ancestor with me, but rather one of Margaret’s own ancestors.

To put it another way, I have a clue!  These shared cluster results would seem to indicate that I need to do more research on Margaret (Blalock) Copple’s line, and connect with the matches who are Blalock descendants. And, at other DNA vendors, I should connect with matches who share the same segment on chromosome 9 to find out how or if they might be connected to a Blalock/Blaylock ancestor.


Let’s look at the second cluster, below.  This zoomed-in, partial view show matches who potentially share a segment on chromosome 13 with me.  Based on their Ancestry tree information, there are some who share Jacob Copple and Margaret Blalock as common ancestors with me (just one shown here).


Other matches in this cluster have no Copple or Blalock at all in their tree.  Their trees could be incomplete or incorrect, of course (as could mine!)  OR, their trees could be indicating a shared ancestor further “upstream” (meaning, a possible ancestor of Margaret (Blalock) Copple.  To that end, I’ve noted where there are Hemphill and Hungate ancestors in my matches’ trees.

These Hemphill and Hungate families, according to the Ancestry trees of my matches, hailed from Kentucky (where Margaret Blalock was born ca. 1810) and a branch of the Hungate family ended up in Washington County, Indiana in the 1810’s – 1830’s.  This is the same county Margaret lived in during the same time frame.  Although not definitive, it’s worth noting as a potential clue.

In summary, because the two groups are related (as indicated by all the gray associated with them), both DNA segments the groupings indicate are more likely to have been inherited by me from Margaret (Blalock) Copple (and, ultimately, her ancestors) rather than from her husband Jacob Copple.

Here’s another example of a cluster on my Copple line, where you can quickly see, from the teal color on the left-hand side, that these matches share an MRCA.  In fact, I use the teal to indicate more than one generation of Copple ancestors (all also ancestors of Jacob Copple who married Margaret Blalock).


The last example is a line from my dad’s side.  As with the Copple and Blalock lines from my mother’s side, this paternal line is rooted in the United States from at least 1800 if not decades before that. 

The bulk of these DNA matches share my third great-grandparents, Anderson Lamburth and Ermine Farley (or Farnham).  However, they are clearly grouped in two clusters, so that one set may share Lamburth DNA and another set Farley DNA, or “upstream” (as in Anderson’s mother and Anderson’s father, or Ermine’s two parents).

Most intriguing is the linking between the two clusters.  Not just the general gray, but the vertical red lines indicated by the blue arrow.  I need to look more closely at these two matches — their names will be in the column headers (not shown here for privacy reasons). 

One, they likely share 2 DNA segments with me.  Two, they clearly share DNA with the small cluster on the upper left, as well as the larger cluster on the lower right.  AND the folks in the middle who are only indirectly related (indicated by gray) to the two obvious clusters.


One other item to note in this cluster.  Some of the MRCAs are not highlighted in yellow.  That’s legit; referenced is the granddaughter of Anderson & Ermine, Mary (Lamburth) Dempsey, who was my great-grandma and her husband William. Clearly, the segment shared here relates to Mary rather than William.

If you use the Shared Clustering tool to visualize your Ancestry DNA matches, do you use any visualization aids to assign clusters to ancestors?  Perhaps you make better use of the Notes field than I do?


Cite/link to this post: Cathy M. Dempsey, “How I use the Shared Clustering Tool,Genes and Roots, posted 21 Oct 2019 (https://genesandroots.com : accessed (date)).



My DNA Traits at 23andMe (v3)… how accurate are they?

I rarely look at my traits and health data on 23andMe, but after reading Roberta Estes’ post the other day, I thought I would take a look at my own traits and see which predictions are accurate and which are less so.

Below is the first page, and on the whole, it’s accurate.  Yes, I can taste bitter — but I like it!


Here is the second page of traits.  These are less accurate, especially about the hair color!


I intend to do a separate post on the issue of red hair.  I suspect hair color (and eye color, for that matter) is more complicated than what I originally learned in sophomore biology class, long before the human genome was decoded.

Have you tested at 23andMe?  And, if so, did you find your predicted traits to be fairly accurate — or not so much?

Triangulation vs. “In Common With”

This question came up in one of the posts in Blaine Bettinger’s Facebook Group Genetic Genealogy Tips & Techniques, so I thought I’d give a quick example here that I refer to myself when I get confused.

A man with 3 children, who have all tested, has a match to a 2nd cousin (documented through now through both DNA and traditional genealogy).  He and the 2nd cousin share 11 segments of DNA.

It so happens that all 11 of those segments have passed down to those 3 children, which you can see in the illustration below.  Of those 11 segments shared by their father and his 2nd cousin, Child 1 inherited 4 segments.  Child 2 also inherited 4 segments — but an entirely different four segments than Child 1.  Child 3 inherited 7 of the 11 segments.

The inheritance and sharing is illustrated below, in data pulled from GedMatch.


For purposes of illustration, we’re setting aside the fact that generally, when triangulating to find a common ancestor, we don’t use two full-blooded siblings as 2 of the triangle legs; they are too closely related, and will triangulate on many segments.

That said, Child 1, Child 2 and their 2nd cousin once removed (2C1R) have DNA in common with each other, but no triangulated segments with their 2C1R.  This is because Child 1 shares DNA with 2C1R on chr 6, on chr 12 and 2 segments on chr 15, while Child 2 shares DNA with 2C1R

Child 1, Child 3 and their 2C1R have 3 triangulated segments: on chr 6, on chr 12, and 1 segment of chr 15.

Child 2, Child 3 and their 2C1R also have 3 triangulated segments: on chr 4, on chr 8, and on chr 18.

And that is a quick overview of triangulation vs. in common with.



The Shared cM Project: Tracking individual contributions

Do you know about Blaine Bettinger’s Shared cM Project?  It’s the crowd-sourced collection of shared centiMorgans (cMs) for the purpose of analyzing the ranges of cMs found at different levels of relationship (full siblings, half 1st cousins, 2nd cousins, etc.)

If you and your known relatives have done your DNA tests at any of the big vendors, you can submit your data here.  Relationship is asked for, but no identifying information (except your email) is needed for submission.

I have been submitting my own DNA data since 2016 — Blaine released the first results in May 2015.   However, so that I won’t skew the results (with duplicate or triplicate submissions), I keep track of my submissions in my own Excel spreadsheet.  A sample of the page I use is below, filtered on just some of the 2C1R relationships I have submitted.  (I have hidden the names of the two testers involved in each relationship.)

Submission_Shared cM Project

Note that the cM range for my submissions of 2nd cousin 1 removed is from 32 cM to 267 cM.  The vast majority of relatives at this level were known beforehand, or otherwise targeted tests.  In the case of the tester who only shares 32 cM, they share that cM with one of 1st cousins.  The rest of us — my siblings and I — and my 1st cousin’s siblings share a much more “typical” amount of DNA with the tester, around 110 – 140 cM.  And since we all match at the full-sibling level, and at the full 1st cousin level, it (so far!) appears that the 32 cM is just due to the randomness of DNA inheritance.

One thing I did in the beginning was submit a relationship for each vendor.  (My father, for example, has tested at 23andMe, FTDNA, and Ancestry, as have I.)  So, originally, I submitted 3 different sets of father/daughter data.  (Obviously, the cM count varied in only minor amounts.)

Since mid-2017, though, I only submit once no matter how many places the two testers have tested at.  (Blaine does ask for the vendor name when you submit.)

In general, I have said no endogamy — but that is based on what I know of the relationship.  Who knows?  With enough research on certain lines, I may find that indeed there was endogamy.

I also, for my own interest, track “expected” DNA shared with actual DNA shared (assuming grandparents and uncles/aunts share an average of 25% with the tester, first cousins share an average of 12.5%, 1st cousins removed (1C1R) 6.25% and 2nd cousins 3.13%.)  It never fails to amaze me how my sister, brother and I have such variations in the amount of cM shared with a given targeted cousin.

How do you track your submissions, if you are submitting to the Shared cM Project?  Are you concerned with not submitting twice — or do you figure it will all average out in the end (certainly a possibility)?

Cite/link to this post: Cathy M. Dempsey, “The Shared cM Project: Tracking individual contributions” Genes and Roots, posted 31 Mar 2019 (https://genesandroots.com : accessed (date)).