Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Create an algorithm to distinguish dogs from cats (kaggle.com)
135 points by willis77 on Sept 25, 2013 | hide | past | favorite | 73 comments


All it needs is a robot that says: "Come Here Boy, Come! That's a good doggie."

If the animal comes, it's a dog. If it continues without looking at you, it's a cat.


Lots of good cat jokes here. One thought would be to automatically upload the images to Reddit and gather the # of upvotes. The higher the upvotes the more likely it's a cat.

/joke session


This is plausible. Upload it with a generic title like "Reddit, meet Sniffles"

And then read the comments and detect indicative words or expressions connect with each animal


Feed a dog and it thinks you're god. Feed a cat and it thinks IT is god:

if (human.feedanimal() == true) { animal.type = "Dog"; human.name = "GOD"; }else{ animal.type = "Cat"; animal.name = "GOD"; }


> Feed a cat and it thinks IT is god

Incorrect. The cat doesn't depend on your validation of its status; if you feed it, you just increase the chance that it thinks you are a subject worthy of its time and attention.


good point


I'm not sure that's a joke. You could upload each pic to both /r/puppies and to /r/kittens, and classify by net upvotes or something. Then you can use the result as training or benchmarking data for a "real" algorithm.


If you can find it in day time, it's a dog... Otherwise it's a cat.


Even simpler, all you need is a water hose and a decibel meter.


When we can get computers to tell the difference between animals accurately we can make a real life pokedex app. I can't wait.

EDIT: If anyone one thinks we can start working on this now, I'm game.


Not quite a pokedex, but you can get a leaf-dex today!

http://leafsnap.com

You can take photos of leaves and it'll identify them for you :)


Oh my god you're right, wow, I need to start working on this now and fulfill my dream to be the very best...

EDIT: I would definitely be interested in building something like this. iOS/mobile app? I have basic experience in ML and have written an ANN in C++ to classify letters (they were 'pixelated' images, 1 and 0's).


If I saw this post earlier maybe I would taken more AI route with my classes.


Ha! If I hadn't thought ML was a fad used for making book recommendations on Amazon I wouldn't be sitting here kicking myself for never learning AI/ML techniques.

Bloody ML Summer...



Note that these are both approaches to do "fine-scaled visual categorization" (FGVC), which assumes you already know you're looking at a dog/bird and want to identify which species it is. This is increasingly becoming an important problem in computer vision, and in fact we just recently held the 2nd FGVC workshop [1] this year to encourage more people to work on these sorts of things.

The kaggle competition is for determining if it is a dog or cat, so it's a bit unlikely that one of these approaches would directly work (although they might be adaptable to the task). See my other comment [2] for a lighter-weight approach that is likely to do just as well, if not better.

[1] http://www.fgvc.org/

[2] https://news.ycombinator.com/item?id=6446309


It's actually already possible to train convolutional network-like models to distinguish between a variety of dogs, cats etc with precision that is pretty much super human. The real problem is getting high-quality training data without involving tons of domain experts that would tells us with high degree of confidence whether a given image is of a specific breed of dog (getting millions of images of dogs is easy, so is building a classifier).

It's not immediately obvious to me how useful such an app would be btw. Unless I of course misunderstood what a "real life pokedex app" is :).


If you can figure out the enemy dog is a fire type, you can switch your team up accordingly :)

Is state-of-the art for that kind of recognition deep learning?


Yes, though I think on public benchmarks this is still not the case. There's a dog-breed classification problem in this year's Fine-Grained challenge (https://sites.google.com/site/fgcomp2013/) so we'll see in December!


This would actually be kind of amazing.


The sample images are of two types: images which are mostly of the subject (cat or dog), and images which have a cat or dog in them, but are not necessarily focused on them.

In computer vision, these two types of images are traditionally handled separately. First, a detector for a class (like "dog" or "cat") is run across the image at all locations and multiple scales to find where the things are. Once you have the locations, then an image classification algorithm is run for each detection window to either confirm it, or to give you more information about the object.

The latter often takes the form of giving more fine-grained category information, such as what species of dog/cat it is. Both leafsnap [1] and dogsnap [2] take the form of this type of program; i.e., they both assume that you've captured a single subject, roughly centered in the photo window, and that you already know that it's a plant/dog.

Sometimes you don't have to run a detector even if the object is not the focus of the image, if the context/setting can narrow down the answer for you. For example, if you were deciding between dogs and airplanes, it would be pretty unlikely to see a dog on a runway or a plane in a living room, so just by classifying the entire image, you can do reasonably well. That's not the case here, as dogs and cats will, for the most part, appear in pretty similar environments.

So if I were attacking this problem, I'd first see how many images were of the non-focused type. If not many, I'd basically ignore them and focus on building a classification system. Note also that if you're constrained to make a hard choice between only two classes, that's a much easier problem than a more open-ended "what is this?"

As many have pointed out, deep learning approaches seem to be the current state of the art on classification tasks such as these. But deep learning requires a lot of training data to be effective. A procedure I've been hearing many people use to great success is to use the Imagenet [3] hierarchy and images to train a deep learning classifier (i.e., as if you were going to compete in the Imagenet Large Scale Visual Recognition Challenge [4]). Then use the trained network, chop off the last stage (which makes the final prediction), and replace it with an SVM trained on your specific training data. In this way, you'd be using the network only as a feature extractor.

I'm happy to try and answer other questions.

[1] http://leafsnap.com or see my project page for more details on how it works: http://homes.cs.washington.edu/~neeraj/projects/leafsnap/

[2] https://itunes.apple.com/app/dogsnap/id532468586?mt=8

[3] http://www.image-net.org/

[4] http://www.image-net.org/challenges/LSVRC/2013/index


As so often happens, the top rated comment on HN is more interesting than the article itself. Leafsnap and Dogsnap are so awesome!


I think that if I were to do this, I would use facial landmark recognition (using something like a Haar classifier). Haar-like features have been used to aid in (human) facial recognition since 2001 to great success[0]. And recently, people have been thinking about using similar methods for animal tracking[1].

If one could locate the face in the test set, she could also presumably find some landmarks of interest: eyes, nose, mouth, etc. Considering that dogs typically have longer snouts, cats have pointier ears, etc, this data could be used to differentiate between a dog and a cat. There would be difficulty dealing with awkward angles and bad lighting though.

[0] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.6.35...

[1] http://www.eng.auburn.edu/~troppel/internal/sparc/TourBot/To...


Haar wavelets are most useful for detecting faces (drawing a rectangle around the entire face). They are not very good for locating landmarks on the face. Also, they tend to be much more sensitive to the orientation of the face than other features, so modern face detectors are often composed of multiple independent detectors, each specialized for different pose angles.

Standard computer vision features like HOG (histograms of oriented gradients) or SIFT will probably do much better, or the deep learning features others have mentioned.

Your larger point of adapting a face detectors for animal use is well taken, though probably overkill for simply saying "dog" or "cat". You need that level of detail to identify which breed (e.g., this is the approach that dogsnap takes), but not for the base distinction.

The other way to go would be to train a deformable parts model (DPM) detector [1] for dogs and cats. DPMs are the current state of the art in detecting objects, e.g. as measured on the pascal VOC benchmark [2].

[1] http://www.cs.berkeley.edu/~rbg/latent/

[2] http://pascallin.ecs.soton.ac.uk/challenges/VOC


"Hey Cool challenge dude, any relation to AI? Didn't think so..."

or

"You too could solve this problem, a get a Phd and joined that overcrowded labor market"

Just consider that if you have M categories and you have N Phd students who can each four years to create one clever algorithms to distinguish category i from category j, then you need M(M-1) Phd students for a complete classification system - which when you consider many, many categories there are in human knowledge, works out to being more than can even be pumped out by excess student loans today and exponentially more than can find tenured positions.

IE, once you'd add to the "deep but not wide" algorithms of computer vision, And twenty years ago, we might have believed this adding-to would lead to something broad and general but it's been twenty years and the trend is becoming clear.

See:

https://news.ycombinator.com/item?id=6401026


Easy: Put videos of the animal on youtube and http://www.cuteoverload.com and count the upvotes.

To quote @BigDataBorat (Twitter): 90% of data is unstructure. Furthering analysis reveal that 60% of unstructure data is cat video.


While it isn't specific to dogs and cats, nor open source or publicly available, doesn't Google already have this ability? - https://encrypted.google.com/search?tbm=isch&q=dogs&tbs=imgo... - https://encrypted.google.com/search?tbm=isch&q=cats&tbs=imgo...

edit: I'm sure some of theirs is from metadata, but I thought I read a while back that they were doing some graphical identification also.


There is definitely some graphical identification. You should try Google+ image search (if you have any images on there), it's really incredible. I searched "water" on my friend's images and got pictures of water glasses, the ocean, etc. None of the pictures had comments or metadata. Also worked searching for things like "soccer", got a bunch of pictures of him playing soccer.


I can bet for $1000 that winning team is going to use Convolutional neural networks. Anyone willing to bet (I can bet also for smaller amount if you prefer)?


To be clear, you're singling out a specific algorithm and offering a 1000USD, even money bet that it will be used by the winner?


yep


The "state of the art" they reference is SVM's trained on color and texture features.

Pre deep belief network I'd agree with your guess on convolutional neural networks. However, now I'd guess you'd use a deep belief network to create a network that would pick out better features than those picked out "by hand" in the convolutional neural network. (See for example [1][2])

So my money would be on some deep belief network.

[1] Hinton, G. E, Osindero, S., and Teh, Y. W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18:1527-1554.

[2] Building high-level features using large scale unsupervised learning arXiv:1112.6209


So far as it comes to large datasets unsupervised learning doesn't work ! You better off training initially discriminatively your network on imagenet, and then switch to this cat vs dog training. Rather, than do unsupervised learning.


program it in Lush then.

everyone here found out about Deep Neural Networks and that is all they know.


Whole bunch of stuff on RBM and Deep belief nets. Also, has results on a competition on recognizing 1000 objects.

http://www.cs.toronto.edu/~hinton/


Yes, it'd be surprised if the straightforward implementation from https://code.google.com/p/cuda-convnet/, run on a GPU with lots of transformations, wasn't the winning entry.


What would it say about hyenas?


I understand Kaggle wants someone to make an algorithm to "identify the entity", but if used as an alternative to CAPTCHA, is it not possible to defeat this HIP (Human Interactive Proof) by reading the image and the classification data from the same Petfinder.com and just do image matching?

It may take some time to match from 3 million images, but doable right? Or am I missing something here?


I just gave it a try and submitted a program. I scored 64% accuracy. Currently in 4th place, but I'm sure that won't last for long. http://www.kaggle.com/c/dogs-vs-cats/leaderboard


[deleted]


They provided the test data for this project. I believe they know it can be broken.


A captcha of 8 characters has a space of ~26^8 (~208 billions) possible combinations in a brute force attack. To divide a set of 12 images between dogs and cats has a space of 2^12 (4096) possible combinations in a brute force attack.



Sure, they may have solved the cat problem, but the well-documented challenges of "pug face" and "slobber smudging" makes dog recognition an order of magnitude harder. Some say the Clay Institute is pondering a $1M prize for it.


Clever way to crowdsource your spambot's CAPTCHA breaking routines.


Too bad it's just for swag. I'd have given it a shot :D


Anyone else compete on these types of sites? Are they worth it?


Have been playing with their contests since I finished ML course on coursera. I think they worth it, pretty fun and addictive, plus a very good way to practice your machine learning/data mining skills. Community there is very good and helpful.


It's "easy" to get a good ranking so it looks good in your cover letter :P


OpenCV plus a ton of training data should do the trick.


I have a cat. I know it's a cat, because she bites me when I don't let her outside, when I let her back inside, when I brush her, when I don't brush her, etc, etc.


I would have expected that putting this through a machine learning algorithm( or one of the face recognition ones) trained with a very huge dataset might improve the odds.


That's the whole point of kaggle. ;)

Fighting your machine learning algorithm against everyone else's. Best one wins the prize.

There's a lot more "interesting" competitions from different companies of course.

https://www.kaggle.com/competitions


This competition is the first "Playground" one just for fun http://blog.kaggle.com/2013/09/25/the-playground/


So narrow and so useless. What exactly are dogs? Almost all cats look the same and are almost the same size. But dogs? Dogs vary greatly in size, and looks. some of what we have accepted as dogs today, if you take them back to the past before TV/Computers, people back then won't recognize them as dogs, because of the looks or size. They would have to hear it back and behave like a dog to classify it as such. if all they had was a picture, they mgiht very well refuse and reject say pugs as dogs. so an algorithm to distinguish dogs from cats without context (behaviour, sound) will be more difficult.


> Almost all cats look the same

Congratulations, you've just dramatically simplified the algorithm. This means that once you can identify a cat, you can say that any image which isn't a cat is most likely a dog.

Also, cheer up! This isn't supposed to be "useful". Who cares if it's narrow and can't be applied to anything else? It's a chance to have a bit of fun and for some people (like me) it's a chance to learn about image recognition techniques.


While true, I think that's besides the point. If you surveyed random people I bet you could get them to agree on whether an animal is a dog or a cat 99 times out of a 100.


I'd sure like to see a picture of the unclassifiable dog/cat!


Thylacine looks like both dog and cat. Known as the Tasmanian tiger or alternatively the Tasmanian wolf.

Unfortunately it's extinct. http://en.wikipedia.org/wiki/Thylacine



>if all they had was a picture, they mgiht very well refuse and reject say pugs as dogs.

Oh, you mean like this picture from 1759 (you know, before TV/computers)?

http://upload.wikimedia.org/wikipedia/commons/6/6f/Louis-Mic...

Sorry, but your comment is complete nonsense. Not only about the pugs, but in fact cats vary quite a bit: http://en.wikipedia.org/wiki/Sphynx_(cat)


How do you calculate size of something without reference? For all you know that animal is 200 feet tall. You could use shadows I guess if they were standing outside and you knew their location and the date and time.

Oh, and the fact that some dogs are small and some cats are large.

But otherwise, I think you're on to something. Something that won't work, but it's something.


>if you take them back to the past before TV/Computers, people back then won't recognize them as dogs, because of the looks or size. They would have to hear it back and behave like a dog to classify it as such. if all they had was a picture, they mgiht very well refuse and reject say pugs as dogs.

my dog without using TV or computer (at least to my knowledge as i don't know what he is up to when we are not at home) easily recognizes other dogs of all the different breeds and sizes from the distance like across the street, etc...


I've found that to be amazing. Given the differences between a Great Dane and a Chihuahua¹, how do dogs identify them as being another dog at a distance? They're vastly different sizes, they have different ear shapes, nose/face shapes, tail shapes, gait, and coat.

¹ There's argument that they're now different species, since they can no longer successfully interbreed -- for purely mechanical reasons.


Given the differences between a Great Dane and a Chihuahua¹, how do dogs identify them as being another dog at a distance?

Smell and sound are primary senses for dogs. Sight, not so much.


>Smell and sound

when another dog is inside a car that just stopped at the intersection?

Another issue here is that smell of different dogs is supposed to have at least some variation as well (is this feature variation bigger or smaller than variation in size?), and if one dog is downwind then another is upwind - i.e. while smell obviously plays a major role in dog's sensing of the world we just can't ascribe it all to the smell. In my experience visual recognition plays major part in many cases as well (note: i'm not arguing which dog's sense is strongest, only that there are situations when visual is basically the only one that could have brought the information)


To be fair this is usually based on scent.


From my experience, it's based on body language. So: not the superficial appearance of the dog, but its movement and 'greeting' signals.

And, of course, cats move in a very different way to dogs!


Do you have a dog?


So perhaps a more exciting problem is "cat, dog or nither"? As you described it, it really only makes sense to solve this problem by solving only for cats, and then assuming all other items are dogs.

Edit: to clarify, is it safe to say that because cats look mostly alike, they would be easier to recognize consistently? or at least a good place to start?


>cats look mostly alike

only to people i guess. Reminds me about experiment with chimpanzees having problem to recognize faces (face recognition is a sign of intelligence according to the human thinking about intelligence) ... well, until experimenters stopped showing human faces to the chimpanzees and started to show chimpanzees faces :)

It would be interesting to see whether well-trained AI would "think" that "cats look mostly alike" compare to say dogs as it is only artifact of humans perception and of how our perception propagates into the software what we create.


Cats look mostly alike in comparison to dogs. Although each feline has a unique face, the degree of variance is considerably higher in dogs.

This is mainly due to breeding. Dogs can vary between 8-80 lbs, depending upon breed - some will fit in handbags, others will barely fit into a car. Cats, on the other hand, have significantly less variation (between 8-25 lbs[1]).

Further complicating the problem is that we have bred some dogs specifically for facial features and shapes. An English Bulldog, for example, has a drastically different face than a Labrador.

However, the vast majority of cat breeds retain the same facial features and shape. Those that do vary (e.g, Siamese cats) differ by small amounts in comparison to dog breeds.

This is why building a "dog or cat" detector is reasonably straightforward (if it's not a cat, it must be a dog), but building a "dog, cat, or other" detector is far more complex.

[1]:http://www.petobesityprevention.com/ideal-weight-ranges/


you're talking about features important for human model of cognition. No arguing here - for most of the people, using the features you described (and for AI systems built by humans and using the same features), the cats would look more alike than dogs.

>However, the vast majority of cat breeds retain the same facial features and shape.

i had cats for many years and with time learned to see the difference, and i'm sure that some people like judges from cat shows would see even more distinction.

It is all about model of perception (and we naturally think and talk like our, human, model is the [only] model) and how well it is trained.

>Dogs can vary between 8-80 lbs

btw, it is at least 2-180 lbs for adult dogs :)


I think a more exciting problem is an animal recognition and classifer system. Dogs, cats, horses, It will recognize an animal or not an animal. It will classify all animals. It will for instance recognize pugs and group them together, it might not group them with dobermans, but it should group all dobermans together.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: