Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Banning exploration in my infovis class (medium.com/eytanadar)
165 points by mdlincoln on May 1, 2017 | hide | past | favorite | 32 comments


Whoa, I cannot disagree more with the premise and the conclusions of the author.

Exploration is absolutely one of the key goals of data visualization. Tukey's insight was two-fold: First, that statistics puts too much emphasis on confirmation but not enough on systematic exploration of a data set. Too often researchers do not understand their data. They bring their own biases and preconceived notions, where they should be listening to the data.

Second, too often is visualization mistaken for confirmation. A curious pattern or an outlier may just be an artifact of the layout algorithm. The "find" part of visualization should happen through mathematical insight. Graphics can at best describe the underlying mathematical reality but no more. One cannot strictly speaking "find" anything, only form intuitions or illustrate already proven insights.

Perhaps the author's difficulty in evaluating his students work lies not in the exploration part of their assignments, but in his own pedagogic emphasis on "tools," "frameworks," and "users." None of those things are relevant to data visualization as such. They might be goals for a business built around data visualization (to produce tools, or to identify user needs). A university should offer more than job training. Those interested in users and in "what one gets paid for" would do better in an internship or at a more narrowly technical trade school.

I don't know how "finding" is any more of a goal for data visualization than "exploring." Data visualizations tell stories. They often support the first and the last step in data analysis: the exploratory phase and the presentation of findings. They are inherently subjective, evocative, concise, artful.


This whole conversation is frustrating because it is boiling down to a stupid semantic debate. The author is claiming that people don't get paid to explore data, they get paid to find things. IMHO, this statement doesn't even make sense. When I explore data, I almost always find something. This something might not be useful to an "end user," but it is almost always useful and necessary.

Sometimes, the only thing I find by doing exploration is that a particular dataset is absolute garbage and shouldn't be used for any purpose. The only way I find stuff like that out is if I explore the dataset.


>The author is claiming that people don't get paid to explore data, they get paid to find things. IMHO, this statement doesn't even make sense. When I explore data, I almost always find something.

People are not paid to find "something", they are paid to find specific things.

Hence, the following makes even less sense that TFA:

>This something might not be useful to an "end user," but it is almost always useful and necessary.


In reasonably sized datasets, you'll typically find a lot of interesting information and relationships that are only loosely or not at all related to what the analyst is actually paid to do at the time.

Analysts who only find the specific thing and end their work on that are a dime a dozen, and need to be micromanaged. Good analysts will find all the other interesting stuff on their own and inform the business about it. Those good analysts are the explorers, and banning those people form exploring during training seems like an effective way to take talented budding analysts and turn them into mediocre ones.


In reasonably sized datasets, you'll also find a lot of spurious correlations simply by chance. That's one reason in science you're supposed to write down your hypothesis and methods of analyzing data before touching the data. Otherwise you risk finding some random noise and thinking it's important.


Probably you do find something, and then presumably go on to dig out the interesting bits and then present it. But it would not surprise me one second if students on an assignment stop when they have a tool for exploring with some dataset loaded. That would be problematic. Exploration is a means, a starting point. Not the final thing.


The headline is misleading. Exploration isn't banned, but creating class projects solely for exploration is banned, because it's vague and subjective.

What he's going for is to have the students explore the data and get insights into it, and then actually use those insights and data to create their projects.

And I think that is a really good idea.


It's not misleading and I don't think the GP is responding to the headline alone. The guy chose to present his idea this way and the headline is a fair summary of his attitude.


Strongly agree with the second point, that visualization is used not as a tool for understanding but for confirming, whereas it should be vice versa.

It reminds me of a thread on MathOverflow (the StackExchange for professional mathematicians) asking for proofs without words (which tends to fall in the category of "visual" proofs: the second most-voted answer is "because I think proof by picture is potentially dangerous, I'll present a link to the standard proof that 32.5 = 31.5". Link: https://mathoverflow.net/a/17347


The top rated comment reply to that the MathOverflow answer about "proof by picture" being dangerous.

> think it is just as easy to introduce some kind of logical gap in a written proof as in a graphical one.


Have you taught undergraduates though? They can be very drawn to vague woolly thinking and it's really this guy's job to get them thinking precisely and directedly.


I love bringing up Anscombe's Quartet as a simple, cubicle-postable example of why it's important to visualize and explore beyond the immediate metrics.

https://en.wikipedia.org/wiki/Anscombe%27s_quartet


He banned "explore" because students were turning-in poor projects. He should get better students.


Or, you know, he should do exactly what he did, and use more precise terms instead of vague ones.


Or be a better teacher?


This advice is also worth thinking about for anyone who builds -- or is tempted to build -- open-ended software tools.

I've often made the mistake of emphasizing exploratory UIs in situations where the reality is that >95% of users are looking for specific solutions, rather than the chance of dicking around with the generic toolset that I personally found captivating.

It's really hard to step back from "I want to share this exhilarating exploration with everyone!" to "I'll narrow this down to a specific use case, and leave the rest of the potential for another day."


Couldnt agree more. Also I think that defending ones project by saying its an exploration is often just taking the safer, easier route.


Great article and applicable to way more than just infovis.

> In denying the student the ability to frame their main task as exploration, they are forced to concede that what they want to find is not what their end-user may be looking for and then: (a) engage with their client or “create” a reasonable one with real tasks and decisions, (b) understand the data and tasks much more deeply, and (c) identify good validation strategies (no more insights!).

> Maybe this is obvious, but when I started teaching I thought that being more open-ended about what I allowed was better. That somehow it would lead to more diverse, cool, weird, and novel projects. In some ways that’s true, but as I’ve argued elsewhere, teaching infovis is itself a wicked design problem.

Its not just infovis, I think this is a good teaching idea in a very broad sense. Reading other people's writing nowadays, people are apt to be very lazy with their language and this by consequence makes them lazy with their ideas. Infovis should ban "exploration." Architecture should ban "modern." Career centers doing resume reviews should definitely ban "utilize." I'm sure every field has such tropes that are maybe useful in the real world sometimes, but make for quite lazy school projects.

In fact I think he may be going a little overkill in justifying his ban on "exploration". He doesn't need to talk about weighing the pro/con here, if before he had a paucity and now he has a multiplicity of interesting student results, he's won and they've won. "Surprise" be damned. The kids can buy lotto tickets if they want to be surprised by big data.

For being so simple, restricting the common and obvious in classrooms is probably an underrated technique. This is widely done in photography classes, disallowing students from doing certain things, like making them only use film for a while, so they really have to frame photos and can't just snap 1000 and "discover" one good one, or restrict to annoyingly wide prime lenses on an assignment to take some good portraits, etc. These constraints, even though they are constraints, greatly reduce the samey-ness of results and make the students engage their brains.


Reporting software thoughts:

a) Whether or not it's pretty and looks interesting will be enough for first year sales unless you have a particularly sophisticated buyer

b) If pretty and looks interesting are the only things it does, you'll get slammed at renewal time because no-one used it

c) You need to know what behaviour your users will change based on the tool. They may not know that yet. That's a pretty good sales pitch though.

d) If you're particularly cynical, find a way to give users a magical number they can change based on behaviour, that doesn't actually mean anything. cf: "Klout influencer score". The more opaque its calculation, the better. Add a slightly random element to how it's calculated so that users build superstitions about how it's calculated. Allow their boss to easily run reports on your users's magical score, and include rankings.


Congratulations, you just described the advertising industry.


The line that I use on my students is that: No one is paid to explore, they’re paid to find.

You're a teacher, not an employer. It's not your job to tell people what to be interested in. This is exactly the wrong attitude to bring to any scientific endeavor, but I guess you don't get tenure for letting your students dick around on their own.

All I can say is that heavens I didn't run into anyone like this when I was first learning technology.


I think this is excellent for education, because of the side benefits that comes with it...like thinking more along the lines of, and learning about, what the users are doing with the tool. It's also somewhat contrary to the state of lots of education, which often focuses on generalizing as a way of understanding rather than finding specificity.

However, in real-life, figuring out all of the specific ways in which a user might want to use your tool can be mind-bogglingly difficult. In many cases, the specific few use-cases works, which means you can often just automate away most of the infovis stuff and just get to the result.

But when you have more than just a handful of these use-cases, or the use-cases are not well enumerated, the generalized approach can work better, and that approach for infoviz is often "exploration". In fact, building the generalized tool is often a good way at discovering the more specific use-cases for later rethinking, simplification and capture-in-code.

In that sense, the exploration infoviz tool can act as kind of a meta-exploration tool for figuring out what your users really need when they aren't otherwise able to articulate it.


When I read the headline and first paragraph, I thought I couldn't disagree more - though when continuing reading I actually grew sympathetic. As a student I've actually found myself "exploring" data in the "meandering" sense quite a few times - trying to find "interesting" patterns without a clear idea what "interesting" means or whether what I'm seeing genuinely constitutes a pattern. Such tasks started out kind of exciting but quickly became incredibly frustrating. So if that guy demands a bit more rigor in order to avoid this situation, more power to him.

That said, I think he states in his article something that I see as a core didactic problem without identifying it as such:

>The student is often engaging in “exploration” for the purpose identifying patterns that influence their design. They are often missing background knowledge and develop it in this step. But this is not “exploration” for the analyst who may already have a mental model of interesting and uninteresting patterns.

So he is expecting students to make a tool suited for the mental model of an expert even though the students have no idea what that mental model should be (and without giving them any hints what that should be). If some motivated students try to derive those tools for themselves on-the-go, he'll permit that in a fit of generosity.

If the problem he has diagnosed is that the students don't know enough rigorous definitions and techniques to find patterns, maybe the curriculum should focus on teaching those instead of going a step further and asking them to build a visualisation tool based on that non-existent knowledge.


Doesn't this boil down to the classic question of whether a sensible approach to discovery can be serendipitous as well as hypothesis driven?

I work in a big pharma, and long ago I learned that our biologists and chemists had absolutely no patience for inquiry that lacked a basis in the purposeful exploration of mechanism of action (hypothesis). Without a guiding principle, the number of possible (and meaningless) patterns tends to explode combinatorically and there's not enough time in a dozen lifetimes to test all the nutty proposals your computer can generate in a microsecond.

Isn't that what the author is suggesting? If not, and exploration via random walk IS in fact a productive practice that's worthwhile, then what's the rub?


Yes, I think that's exactly the insight that underlies this article, and that's why I think it makes sense. Unguided "exploration" coupled with large datasets leads wasting time on finding nonsense.

That doesn't mean flexibility in a tool isn't warranted - it's just that such flexibility should have a goal, and people should be discouraged from aimlessly applying it and then claiming they found something important.

(Also "exploration" is kind of a weasel word for student assignment; you can call your half-assed project "exploratory" and spin a good story from it that will give you a grade without any real work on your part.)


This is great advice for anyone building a feature that utilizes visualization.

I wish I had asked myself "what do users want to find?" instead of "what do users want to explore?" the last time I built a dashboard. Perhaps people would have actually used it.


That's the reason I consider most contemporary dashboards to be useless and missing the point.

The goal of a dashboard should be to give users insight, not to show them pretty pictures of moving lines and pies. You can find a lot of nice-looking dashboards on-line, for which the authors didn't even stop and consider why they're building it in the first place. It's easy to spot them, those are ones with pretty but unlabeled graphs, with plots missing error bar, with pie charts and various forms of "chartjunk".

It's hard to make a good dashboard, because to do that you need to figure out what questions will the user want to answer with it.


So I took a infovis class and I think what the author means is that your visualization should have some objective utility. Exploration is good in a dataset as a part of design. But if you are doing some visualization for like th nytimes or five thirty eight you need to have the skill to communicate something useful to the user.

Exploration is a tool but giving people utility to actually use your visualization is more important. You should allow exploration but it shouldn't be the only thing that people will use your visualization for.

Link to the class I took https://www.evl.uic.edu/aej/424/


I more or less agree with this. I do lots of "exploratory data analysis", but almost every plot or other output that I generate is designed to help answer a specific question or test an assumption that I have about the data.


The word explore is actually great in a data analysis context. The notions of exploratory vs confirmatory analysis are widely used, and exploratory means exactly what your students think it means. Just make sure they don't explore all of the data at once, otherwise they will have to go collect more so that they can confirm what they found when they were exploring.


We are going to call this place... Yosemite, but hey, everyone don't go about exploring this place, we're just here to count how many deer are in the woods.


All I can say is that data issues are often discovered during the data exploration process. But why would the author even care about data quality.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: