Welcome to the Invelos forums. Please read the forum rules before posting.

Read access to our public forums is open to everyone. To post messages, a free registration is required.

If you have an Invelos account, sign in to post.

    Invelos Forums->DVD Profiler: Contribution Discussion Page: 1 2 3 4 5 6  Previous   Next
Parsing of Asian Names
Author Message
DVD Profiler Unlimited RegistrantStar Contributormarcelb7
Registered: Oct. 16, 2000
Registered: March 13, 2007
Reputation: Great Rating
Netherlands Posts: 767
Posted:
PM this userView this user's DVD collectionDirect link to this postReply with quote
Quoting synner_man:
Quote:
As those with large Hong Kong collections are already aware, I've been auditing my HK titles over the last few months.

I've seen a few of your contributions fly by. They're extremely consistent, and voted yes each time. Thanks.

DragonMa has also made quite a few good contributions. He pointed me to a forum where people who speak and read Chinese, Japanese, Korean etc. are gathering credits for each film: http://forum.hkcinemagic.com/index.php?c=7.

Some fine work is done on that forum, and our database can only benefit from their knowledge.

Check this one for The Odd Couple: http://forum.hkcinemagic.com/t9013-The-Odd-Couple.html
 Last edited: by marcelb7
DVD Profiler Unlimited RegistrantStar ContributorKathy
Registered: May 29, 2007
Reputation: Highest Rating
United States Posts: 3,475
Posted:
PM this userEmail this userView this user's DVD collectionDirect link to this postReply with quote
I am updating "Audition" UPC 031398-178972 and ran into an issue relevant to this thread.

What would you do with the following:

Executive Producer:
             
    Current profile: Toyoyuki Yokohama

    On screen: To Yo Yuki Yokohama  (there are clear and distinct spaces between these words).
   
    Invelos CLT: Toyoyuki Yokohama > 20 titles (60 profiles) vs To Yo Yuki Yokohama 0 titles (0 profiles).

How I would submit this change according to my understanding of the rules. And, without any understanding of the issues brought up in this thread:

    Toyoyuki Yokohama [To Yo Yuki Yokohama}.
 Last edited: by Kathy
DVD Profiler Unlimited RegistrantStar Contributormarcelb7
Registered: Oct. 16, 2000
Registered: March 13, 2007
Reputation: Great Rating
Netherlands Posts: 767
Posted:
PM this userView this user's DVD collectionDirect link to this postReply with quote
That's correct, Kathy. But you might want to add a comment about the credited name: Is it in the actual film, or in the subtitled credits?
 Last edited: by marcelb7
DVD Profiler Unlimited RegistrantStar ContributorWinston Smith
Don't be discommodious
Registered: March 13, 2007
United States Posts: 21,610
Posted:
PM this userEmail this userView this user's DVD collectionDirect link to this postReply with quote
Quoting Taro:
Quote:
Indeed, the main problem I had with profiling Korean names, is that both parsings:
X/Y/Z
and
XY//Z
present the data exactly as it is credited on-screen. Parsing method one takes Y as a middle name while parsing method two takes XY as a first name and no middle name. So I wasn't sure which is a valid input method or perhaps both were ...?



Yes, Yun Young Park is just a fictious example but I do have a concrete example in mind, albeit not 100% Asian since it's an actor with a mixed name. Perhaps this will make it easier to discuss the issue:
Daniel Dae Kim (appears in LOST, among others)

When I use CLT, I find:
"daniel dae kim" is credited in the following 474 titles (1068 profiles)
However, I can't see from the CLT if all profiles are parsed as Daniel/Dae/Kim or Daniel Dae//Kim or something else (unless I am missing some CLT functionality or something  )

I then Used NiceGuys' Name Variant tool (very handy by the way) and came up with this result from my local database (which is copy-paste of the online for those profiles) :
27 profiles with Daniel/Dae/Kim
2 profiles with Daniel//Dae Kim
Anyone have an idea if both are acceptable or if one of the two should be corrected?

So, now it looks like you are attempting to apply some sort of cultural Rule to an acotor who has very clearly Angicized his name, Taro. I don't think Daniel is a Korean or any other Asian Name.  So are you ASSUMING something that you should NOT. Again as I have said numerous times, the key lies in the documentation, regardless of culture I will ALWAYS say start with the default 1/2/3,, if you document something else then fine, but do NOT make assumptions, provide valid documentation to support your opinion and i don't believe that you should have any problem. If, however, it is based on some sort of an assumption then it should be voted NO. Very simple.

Skip
ASSUME NOTHING!!!!!!
CBE, MBE, MoA and proud of it.
Outta here

Billy Video
DVD Profiler Unlimited RegistrantStar Contributorsynnerman
Take me with you. Please.
Registered: March 13, 2007
United States Posts: 736
Posted:
PM this userDirect link to this postReply with quote
Quoting marcelb7:
Quote:
Quoting synner_man:
Quote:
As those with large Hong Kong collections are already aware, I've been auditing my HK titles over the last few months.

I've seen a few of your contributions fly by. They're extremely consistent, and voted yes each time. Thanks.

DragonMa has also made quite a few good contributions. He pointed me to a forum where people who speak and read Chinese, Japanese, Korean etc. are gathering credits for each film: http://forum.hkcinemagic.com/index.php?c=7.

Some fine work is done on that forum, and our database can only benefit from their knowledge.

Check this one for The Odd Couple: http://forum.hkcinemagic.com/t9013-The-Odd-Couple.html


I agree on you with DragonMa.  He was one of the first to touch the third rail of name parsing around here for Asian titles with his HKL contributions.

It reminds me of another point.  During my contributions, I noticed that there were only a handful of voters that actually voted on these titles.  Many titles passed with 1 or 0 votes.  The most I saw was maybe 7 or 8.  In other words, very few people are actually affected by this issue or they just don't care.

ETA: Thanks for the links.  Many of the films that don't have any Western credits are some of the most famous Hong Kong titles.  I'm glad to see the work being done and will keep an eye out there.
 Last edited: by synnerman
DVD Profiler Desktop and Mobile RegistrantStar ContributorTheMadMartian
Alien with an attitude
Registered: March 13, 2007
Reputation: Highest Rating
United States Posts: 13,202
Posted:
PM this userEmail this userView this user's DVD collectionDirect link to this postReply with quote
Quoting Taro:
Quote:
Yes, Yun Young Park is just a fictious example but I do have a concrete example in mind, albeit not 100% Asian since it's an actor with a mixed name. Perhaps this will make it easier to discuss the issue:
Daniel Dae Kim (appears in LOST, among others)

When I use CLT, I find:
"daniel dae kim" is credited in the following 474 titles (1068 profiles)
However, I can't see from the CLT if all profiles are parsed as Daniel/Dae/Kim or Daniel Dae//Kim or something else (unless I am missing some CLT functionality or something  )

I then Used NiceGuys' Name Variant tool (very handy by the way) and came up with this result from my local database (which is copy-paste of the online for those profiles) :
27 profiles with Daniel/Dae/Kim
2 profiles with Daniel//Dae Kim
Anyone have an idea if both are acceptable or if one of the two should be corrected?

I could not find anything, other than this Wikipedia explanation.  It states that, "Most Chinese Americans move their Chinese given name (transliterated into the Latin alphabet) to the middle name position, and use an English first name, e.g. James Chu-yu Soong, Jerry Chih-Yuan Yang, and Michelle Wingshan Kwan."

Based on that alone, because I am not expert, I would say 'Daniel/Dae/Kim' is the correct parsing.
No dictator, no invader can hold an imprisoned population by force of arms forever.
There is no greater power in the universe than the need for freedom.
Against this power, governments and tyrants and armies cannot stand.
The Centauri learned this lesson once.
We will teach it to them again.
Though it take a thousand years, we will be free.
- Citizen G'Kar
DVD Profiler Unlimited RegistrantStar ContributorT!M
Profiling since Dec. 2000
Registered: March 13, 2007
Reputation: Highest Rating
Netherlands Posts: 8,736
Posted:
PM this userDirect link to this postReply with quote
Quoting Jubal:
Quote:
I will ALWAYS say start with the default 1/2/3

Except that there is no default 1/2/3. The rules have never mentioned a default standard, and neither has Ken. The closest thing we've got is the field names - which don't suggest 1/2/3 at all - and Ken's comment that, for the last name field, the "surname is the intent." Again: nothing about a 1/2/3 standard. And if you're looking for a "default" in the program itself, try entering "Daniel Dae Kim", press the "Add Cast Member" button, and see what "default" parsing the program comes up with...

I really have no intention to get involved in this debate again, but I just can't let you keep declaring that we have a default standard, while the big problem with parsing is that we don't have one. The fact that we don't have one, has resulted in double, non-linking entries for virtually EVERY three-part name in the database. So don't tell me there's a default standard: there's not. Anything goes: we can all do as we please. Do I like it? Of course not. But it is how it is: ignoring the problem won't make it go away...
DVD Profiler Unlimited RegistrantStar ContributorAce_of_Sevens
Registered: December 10, 2007
Reputation: High Rating
Posts: 3,004
Posted:
PM this userView this user's DVD collectionDirect link to this postReply with quote
To clarify: I understand this is an open issue as far as the rules are concerned, but the problem is we have names to enter now. What we should be doing is working out a defacto standard that meets the following criteria:

[li]Based on clearly stated criteria
As easy as is practical to figure out correct parsing
works, meaning names sort and display correctly
Can be easily converted if the program ever handles this better.[/li]

I believe my suggestion meets most of these criteria. Number 2 is tricky, but that will be the case with any system that meets the other criteria (ie not entering in a single field). If anyone has a better idea, please clearly explain what it is and why it is better.
 Last edited: by Ace_of_Sevens
DVD Profiler Unlimited RegistrantStar ContributorWinston Smith
Don't be discommodious
Registered: March 13, 2007
United States Posts: 21,610
Posted:
PM this userEmail this userView this user's DVD collectionDirect link to this postReply with quote
Quoting T!M:
Quote:
Quoting Jubal:
Quote:
I will ALWAYS say start with the default 1/2/3

Except that there is no default 1/2/3. The rules have never mentioned a default standard, and neither has Ken. The closest thing we've got is the field names - which don't suggest 1/2/3 at all - and Ken's comment that, for the last name field, the "surname is the intent." Again: nothing about a 1/2/3 standard. And if you're looking for a "default" in the program itself, try entering "Daniel Dae Kim", press the "Add Cast Member" button, and see what "default" parsing the program comes up with...

I really have no intention to get involved in this debate again, but I just can't let you keep declaring that we have a default standard, while the big problem with parsing is that we don't have one. The fact that we don't have one, has resulted in double, non-linking entries for virtually EVERY three-part name in the database. So don't tell me there's a default standard: there's not. Anything goes: we can all do as we please. Do I like it? Of course not. But it is how it is: ignoring the problem won't make it go away...


And you constant attitude Tim, does not help to resolve the situation now does it. It is a reasonable and rational default given three names and three fields, even more based upon the Rules. It has NOTHING to do with Culture, anyone's, it is based only on the data that we are given, naturally I am not referring to either hyphenated last names or hyphenated First Names.

I make a reasonable and rational suggestion and you simply not only won't hear of it. You have no suggestion of your own. I suspect that is because the ONLY possible suggestion you could come up with WOULD BE based upon culture. But that is typical.

Skip
ASSUME NOTHING!!!!!!
CBE, MBE, MoA and proud of it.
Outta here

Billy Video
DVD Profiler Unlimited RegistrantStar ContributorWinston Smith
Don't be discommodious
Registered: March 13, 2007
United States Posts: 21,610
Posted:
PM this userEmail this userView this user's DVD collectionDirect link to this postReply with quote
Quoting Ace_of_Sevens:
Quote:
To clarify: I understand this is an open issue as far as the rules are concerned, but the problem is we have names to enter now. What we should be doing is working out a defacto standard that meets the following criteria:

[li]Based on clearly stated criteria
As easy as is practical to figure out correct parsing
works, meaning names sort and display correctly
Can be easily converted if the program ever handles this better.[/li]

I believe my suggestion meets most of these criteria. Number 2 is tricky, but that will be the case with any system that meets the other criteria (ie not entering in a single field). If anyone has a better idea, please clearly explain what it is and why it is better.

What suggestion, Ace? I don't see one.

And there you go again making a vague reference to sorting and displaying properly. What does sorting properly mean? Let us take an obvious example HBC, if they were ALL listed as H/B/C and you search under C guess what you will find, all sorted properly, oh...we some credits which say HB-C, fine CLT and guess what the Cs STILL sort properly and together. I hope you will pardon me I have never and still don't comprehend the brouhaha over 'proper' parsing. What it invariably leads to everytime, is people making improper conclusions based upon their culture. I will us a hypothetical name but the argument that was presented remains the same. This happened to be a German user and hew trying parse MTM as M//TM. Whe I offered him a friendly correction, he apologized and was just assuming based on the European method of naming. This is culture based and just does not necessarily or automatically apply to a non-Euro actor.

Thus I believe instead of dealing with culture, at all. We use the simple default I have described for four years, it is simply based on the data presented. Then if someone can present documentation we can ALWAYS implement whatever standard should be applied to go along with such documentation.

This is a simple, logical and rational solution and NO ONE has ever suggested anything better.

Skip
ASSUME NOTHING!!!!!!
CBE, MBE, MoA and proud of it.
Outta here

Billy Video
DVD Profiler Desktop and Mobile RegistrantStar ContributorTheMadMartian
Alien with an attitude
Registered: March 13, 2007
Reputation: Highest Rating
United States Posts: 13,202
Posted:
PM this userEmail this userView this user's DVD collectionDirect link to this postReply with quote
Quoting Ace_of_Sevens:
Quote:
To clarify: I understand this is an open issue as far as the rules are concerned, but the problem is we have names to enter now. What we should be doing is working out a defacto standard that meets the following criteria:

T!M has perfectly illustrated the problem with this approach.  Many of us agreed on a standard for parsing.  That standard is '1/2/3' unless it can be documented otherwise.  Unfortunately, there are people who do not agree with that standard, so won't follow it.  Because it is a user created standard, there is no way to enforce it, so it doesn't make a difference.
No dictator, no invader can hold an imprisoned population by force of arms forever.
There is no greater power in the universe than the need for freedom.
Against this power, governments and tyrants and armies cannot stand.
The Centauri learned this lesson once.
We will teach it to them again.
Though it take a thousand years, we will be free.
- Citizen G'Kar
DVD Profiler Unlimited RegistrantStar ContributorAddicted2DVD
Registered: March 13, 2007
Reputation: Highest Rating
United States Posts: 17,334
Posted:
PM this userEmail this userView this user's DVD collectionDirect link to this postReply with quote
Exactly... unless there is a standard coming from Invelos themselves... there  is no way we can enforce it.
Pete
DVD Profiler Unlimited RegistrantStar ContributorWinston Smith
Don't be discommodious
Registered: March 13, 2007
United States Posts: 21,610
Posted:
PM this userEmail this userView this user's DVD collectionDirect link to this postReply with quote
Agreed, but as I have said, even the mighty Tim has never come up with a better alternative that is not based on culture, only that he OBJECTS. How seriously can that be taken? There is only one logical standard that is based upon data and not completely disregarding culture. Tim is the horse and i hope he is not the horse's backside, I can lead him to the water but I can't make him drink it.

Skip
ASSUME NOTHING!!!!!!
CBE, MBE, MoA and proud of it.
Outta here

Billy Video
DVD Profiler Desktop and Mobile RegistrantStar ContributorTaro
Registered: February 23, 2009
Reputation: High Rating
Belgium Posts: 1,580
Posted:
PM this userView this user's DVD collectionDirect link to this postReply with quote
Quoting Kathy:
Quote:
I am updating "Audition" UPC 031398-178972 and ran into an issue relevant to this thread.

What would you do with the following:

Executive Producer:
             
    Current profile: Toyoyuki Yokohama

    On screen: To Yo Yuki Yokohama  (there are clear and distinct spaces between these words).
   
    Invelos CLT: Toyoyuki Yokohama > 20 titles (60 profiles) vs To Yo Yuki Yokohama 0 titles (0 profiles).

How I would submit this change according to my understanding of the rules. And, without any understanding of the issues brought up in this thread:

    Toyoyuki Yokohama [To Yo Yuki Yokohama}.

Sorry to butt in here but I happen to know a few things about Japanese names.
First thing of note is that the name To Yo Yuki is non-existant in Japanese.

If the (western) credits show To Yo Yuki (with clear spaces), then I would indeed use the credited as feature and input him as Toyoyuki Yokohama [To Yo Yuki Yokohama]

However, if there are original Japanese credits on-screen and they are subtitled in English, in that case I would simply leave it as Toyoyuki Yokohama, as that will be the way the Japanese will have entered those credits. If you're not sure about the Japanese credits, feel free to post a screenshot and I'll gladly help you with a correct romanization.

Basically, whenever there are both Japanese and English credits for a Japanese movie, I tend to use the Japanese credits, seeing as the English ones tend to ommit important cast and crew members as well as contain spelling errors.
Blu-ray collection
DVD collection
My Games
My Trophies
 Last edited: by Taro
DVD Profiler Unlimited RegistrantStar ContributorWinston Smith
Don't be discommodious
Registered: March 13, 2007
United States Posts: 21,610
Posted:
PM this userEmail this userView this user's DVD collectionDirect link to this postReply with quote
Taro:

I really hate to be a nudge. But, and don't take this wrong, I see you pontificationg, BUT I see no documentation to back it up. I hope that you can provide some for Kathy to use.

Skip
ASSUME NOTHING!!!!!!
CBE, MBE, MoA and proud of it.
Outta here

Billy Video
DVD Profiler Unlimited RegistrantStar ContributorDarklyNoon
No Godz, No Masterz
Registered: May 8, 2007
Reputation: Highest Rating
Germany Posts: 1,945
Posted:
PM this userView this user's DVD collectionDirect link to this postReply with quote
Skip,

in Hong Kong and Korea middle names do not exist, they never did, they never will.
So it is kind of obvious that the middle name ALWAYS will be blank with Hong Kong and Korean actors.

cheers
Donnie
www.tvmaze.com
    Invelos Forums->DVD Profiler: Contribution Discussion Page: 1 2 3 4 5 6  Previous   Next