Curse of Data Taxonomy: We do not need a perfect taxonomy for good applications

6 minute read

Published:

As I work on a commonsense knowwledge base reasoning problem in the format of classification, I understand how awkward it is when I have to deal with data which I misclassify often. It’s not clear about the category that each instance fall into, nor the need to have such taxonomy. The problem makes me angry sometimes so that I just can’t bear to continue working on the problem. It seems meaningless for me, as at the end of the day, we don’t need a perfect taxonomy to apply the data sources in other downstream tasks.

Nonetheless, not only the problem I work on but also other tasks in the field of AI share the same problem. Starting with the classic “5-Likert-Scale Sentiment Analysis” on Yelp datasets. I know this task is often considered as a standard practice problem for beginners and we shouldn’t treat it so seriously, I had a bad impression about this task. What’s this for? Isn’t a more simple 2-Likert-Scale problem enough for the downstream application as determining customers’ satisfaction toward the service? Why do we need to determine it in a finer-grain scale, in which the boundary between two close classes is very blurry? Yet it’s an exercise for beginner, I don’t care about it anymore.

Coming closer to my research, a paper of Prof. Ernest Davis criticizes many commonsense tasks/datasets as they often contain a significant portion of non-commonsense knowledge. It seems to be the case, as the dataset that I work on is ambiguous, other famous datasets do contain a lot of instances which I view as other types of knowledge rather than commonsense. Based on this paper, I hatched a new research idea that I would filter benchmarks for commonsense instances only, then do the evaluation on the pure commonsense subsets of these benchmark to see the true performance of LLM on these task. I eagerly discussed the idea with my supervisor, but received a very negative feedback. He questioned the need to do filtering, and which literature I could use to define “commonsense”. I looked back to data, tried to filter for some instances, and soon realized how difficult it is. I wonder why people do not just call such kind of knowledge as common, as at the end of the day, commonsense or common knowledge is all needed for LMs to be human-like. Also, coining the term “commonsense”, no one clearly defines it. So bad!

In other AI fields such as computer vision, there is also an example. One of my friends works on a problem named “Emotional Classification based on Facial Image”. It seems to be an easy task, but NO! There is not much previous works on this problem, somehow indicating the poor attraction of a task that is not well defined. Why do I say this task is not well defined? First, it is subject dependent. One person might has a default sad-looking face, but it doesn’t mean (s)he is sad. People’s faces often do not reflect their true inner emotion. Second, this task of emotion classification means to control the conversation of a nurse robot with patients. The design is that, if the robot detects a negative emotion of the patient’s face, it will cease to chat with the patient. I think it’s a poor design, though I don’t have much experience to argue. Assuming that the design is poor, what is the point of dealing with an ill-defined Facial Emotion Classification? Absolutely meaningless! In fact, my friend already struggles with this task for a long time. Maybe the whole computer vision team, not just him.

Overall, my point of view toward such kind of problem is that, clear goal with a good design will entails development, otherwise it will fall into awkward situation which likely leads to failure!

Leave a Comment