|
Logic of Ancestry indexing
Sat, 15 Jul 2006 13:34:40 +0000 (UTC)
soc.genealogy.britain
previous
roy.stockdill...
|
The current debate about an 1841 census look-up prompts me to query
whether anybody here fully understands and can explain some of the logic
behind the Ancestry census indexing and its "match quality" ratings?
squealing...
|
"Soundex" is the basis for most matching algorithms, see the links at the
bottom of the wikipedia entry for an overview & examples:
Normally, both Philender and Phillinder would map onto the same Soundex
code of P453. An articles at Ancestry implies it uses the standard
algorithm, see:
However, it doesn't act like it should in this case, so presumably their
programmers have "improved" it based on their previous experiences. I
wonder if the double-L has been given a different code to represent its
different "Y" sound in Spanish?
|
Ron...
|
I read the Help and it says entering more data helps with the search
So I entered as much data as I had -
Philenda Clark 1786 Cornwall Falmouth born Cornwall
Well she was number 1 but with only 3 stars the same as Ellen P Clark
from Ireland living in Notts born 1781.
I would only make that 1 star - same surname and near year born
perhaps 1/2 for P as second name.
And there are plenty of people whom apart from first name match Philenda
so why oh why is Ellen first - is Ellen on some Soundex for Philenda.
The Help says show more data and I cannot then see why someone living
and born miles away would be shown near the top.
I never use Ranked Search - had a terrible problem when it was only
search that came up - the Support person pointed me to the problem as a
Saved Bookmark - Phew
Worse about 1841 is that there is no comment in the search box itself
with a "county born" warning - mentioned in the detail below but I would
imagine that many would not read that. At least they have dropped the
place born town box.
|
Let's take the example proferred by the original poster who was looking for a
PHILINDER CLARK(E) born in Falmouth, Cornwall, about 1786.....
Entering Philinder (as spelt by the OP), Clark, the birth date 1786 and born in
Cornwall (no point in entering Falmouth, since the 1841 doesn't have birth
place names except the county) produces a long list of possible "suspects"
headed as the No. 1 - and thus presumably what Ancestry thinks is the most
likely - by a PHILLINDER CLARK born about 1839. Clearly, this cannot be her,
since the age is nowhere near correct and this is a two-year-old child. Then
you get as No. 2 an Ellen P Clark born about 1781 in Ireland and resident in
Nottinghamshire, followed by a John P Clark, born about 1791, no birth county
given, and living in Manchester.
After them you get a list of people all born about 1786 in Cornwall and there
sitting at No. 12 is PHILENDER CLARK in Falmouth, which is the one the OP
wanted (or so I assume, since we haven't heard from her since).
Now, to my way of thinking this Philender Clark should have been right at the
top of the list, since the forename matches to all but one letter and so does
the birth date, but not apparently to whoever devised the Ancestry indexing
system! I note that all of the top 50 possibles are given an equal match quality
rating of three stars, which baffles me somewhat since most of them have far
more common forenames than Philender. Redoing the search and entering
PHILENDER as the forename merely moves the required person up one place
to No. 11 in the list, while the Ellen P Clark born in Ireland about 1781 now
becomes No. 1 and the younger PHILLINDER CLARK disappears altogether!
Comments from someone who knows about computerised indexing would be
welcome, but it doesn't make a lot of sense to me.
singhals...
|
They are treating the middle initial P as if it were the word Philender.
The default middle name/initial seems to be a null value which sorts as
"something" rather than "nothing". Remember the old sorting rules:
Nothing before Something. This particular search is doing some
proximity ratings and some unwarranted assumptions which DO catch a lot
of otherwise unfindable folk, but which at the same time disguise some
easily findable ones. "Philender within 1 word of Clark" + "P= inkey$
when inkey$=P*" sort of thing.
Try putting blank spaces after the given name and see if it straightens out.
roy.stockdill...
|
You are right! Entering "PHI*" (wildcard, not blank spaces) promotes her to
No. 1, right at the top of the list. But entering the forename Philinda (just one
letter out) drops her down to No. 12 again.
Graham P Davis...
|
You're lucky to find what you're looking for on the first page. On
occasions, I've found the best fit a couple of hundred places or more down
the list. I've used better sorting routines written forty years ago - in
Assembler.
|
Roy Stockdill
Newbies' Guide to Genealogy & Family History:
"There is only one thing in the world worse than being talked about,
and that is not being talked about."
OSCAR WILDE
|
|
Q...
|
I find them very annoying. And they don't make any sense to me, either.
And new users who might be tempted to use their quality filter feature
should beware, because they are likely to filter out the thing they are
looking for. -- Q
|
Roy Stockdill
Newbies' Guide to Genealogy & Family History:
"There is only one thing in the world worse than being talked about,
and that is not being talked about."
OSCAR WILDE
|
next
|