KDnuggets » Forums
Latest News



 FAQFAQ    SearchSearch    MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

Given Names, Gender, Ethnicity

 
Post new topic   Reply to topic    www.kdnuggets.com Forum Index -> Classification & Clustering
View previous topic :: View next topic  
Author Message
ekim256



Joined: 19 Sep 2008
Posts: 3

PostPosted: Wed Oct 15, 2008 11:16 pm    Post subject: Given Names, Gender, Ethnicity Reply with quote

I have a database of names. Does anyone know how I can guess their gender and ethnicity based on their first and last name?
Back to top
View user's profile Send private message
adam
Data Mining Guru


Joined: 23 Jan 2008
Posts: 20

PostPosted: Thu Oct 16, 2008 2:04 pm    Post subject: Reply with quote

Do you know the primary country of residence for these people?
Back to top
View user's profile Send private message
ekim256



Joined: 19 Sep 2008
Posts: 3

PostPosted: Thu Oct 16, 2008 2:06 pm    Post subject: Reply with quote

Yes, Canada. I also know that a good portion (if I had to guess, ~80%) of the sample are immigrants
Back to top
View user's profile Send private message
adam
Data Mining Guru


Joined: 23 Jan 2008
Posts: 20

PostPosted: Thu Oct 16, 2008 3:24 pm    Post subject: Reply with quote

Does anyone in your population have a known gender and ethnicity? If not, try to get a good sample of names (with genders and ethnicities) from somewhere.

This is how I see it... The first name will predict the gender and the last name will predict the ethnicity. So that means you're looking at 2 models. Let's say you start with the gender problem first. I would use the first names to derive as many variables as you possibly can. Variables such as Last Letter, Last 2 Letters, Last 3 Letters, First Letter, First 2 Letters, First 3 Letters, Number of Vowels, Length, etc. etc... Then use a feature selection method or perhaps just a decision tree to see if any of those are predictive. If you can find some predictive variables, you might want to explore them further and maybe you'll discover more variables.

Then you could repeat the same process with the ethnicity problem. Keep in mind that I have zero experience in computational linguistics, but this is probably how I would approach the problem.
Back to top
View user's profile Send private message
TimManns
Data Mining Guru


Joined: 25 Sep 2006
Posts: 37
Location: Sydney

PostPosted: Thu Oct 16, 2008 6:39 pm    Post subject: ethnicity for what purpose? Reply with quote

- maybe I'm stating the obvious here, and I'm just checking in case.

There has not been any mention of what these are used for...

As a data cleaning exercise to later report (say to ensure we communicate with the customer with a 'ethnic origin' tailored message would be fine, or describe how many x, y, z ethnic origins voted for presidential candidate etc) again fine.

But don't use as inputs to a predictive model, something like credit risk or insurance claims... Anything that limits with whom you take action (maybe some exceptions, say medical reasons).

Cheers

Tim
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    www.kdnuggets.com Forum Index -> Classification & Clustering All times are GMT - 5 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group

KDnuggets » Forums

Copyright © 2012 KDnuggets.   Subscribe to KDnuggets News! Tweet Twitter | facebook Facebook | RSS RSS | About KDnuggets