demagogue on 19/6/2015 at 17:10
I've been seeing this story tossed around today, about the weirdo "neural net dreams" images: (
http://www.atlasobscura.com/articles/this-mystery-photo-haunting-reddit-appears-to-be-image-recognition-gone-very-weird).
I was going to post some thoughts about it on Facebook, but frankly FB doesn't deserve it & probably couldn't handle it, and my main point was one I actually first made (
http://www.ttlg.com/forums/showthread.php?t=123691&p=1783992&viewfull=1#post1783992) in this forum anyway. So I thought I'd reprise it here.
So these "neural-net generation" images are being treated like a new thing, but we were already playing with this back in the mid-90s at UTexas, in the incomparable Risto Miikkulainen's CogSci course. It was recognizable to me right off, and I wrote about what's going on a few years ago here as a "real life ghost" story, which the above "halloween" thread posts to: (
https://docs.google.com/a/trioptimum.com/document/d/1_zp_WB5sa2H11RCdquLtvsjFxc81PKfs21TcYdlJEsA/edit?usp=docslist_api) It's not a technical article, and it's not my home field, so it's probably dodgy in the details, but it gets the general idea across.
As a technical side matter (as long as I have this can of worms opened) ... the part in the above article about a neural-net never seeing a barbel without an arm lifting it--so it naturally morphed the two together in its "understanding" of a barbel in its generation algorithm--reminds me of late-Wittgenstein's famous discussion: does the word "rabbit" refer to a whole rabbit, or just a half-rabbit part, if a person can't distinguish the term applying to the whole vs. half rabbit (since they always come together). To cut a long story short, it points to the limits of vanilla neural nets for AI, since it's all blind "set theory" data in a vacuum & higher-level functionalism (i.e., vision is tailored to tasks like manipulating objects or traversing landscapes, so image data gets fit to recognition as rotatable-objects or traverseable-terrain, etc) takes a back seat ... which is the paradigm shift David Marr was pushing & I believe is the dominant position now, at least in neuroscience (I get the sense that AI still hasn't really gotten the memo yet), but that's another story.
Uh, I guess people can talk about the coming robot invasion here, or anything else they want to about AI brains or whatnot. I just felt like talking about this story when I saw it, and this seemed like the natural place.
van HellSing on 19/6/2015 at 18:38
Cool stuff, makes you think about how perception works. A note though: the google doc requires clearance.
faetal on 19/6/2015 at 19:48
I hear a lot of rhetoric about this being the dreams of machines, but to my mind it isn't. It's a recursive piece of human language with the input and output set to images rather than words. It's just a pixel analysis heuristic which can lead to a kind of amplified apophenia when telling to have a bias towards one type of visual archetype. It's cool, but it's not profound.
Sulphur on 20/6/2015 at 06:07
It's certainly cool in terms of showing us how far pattern recognition has come with computer algorithms. The tendency to anthropomorphise and attribute behaviours or the human template of consciousness to random sets of things isn't in the original article from what I can see; it's the general reaction people have towards this information. I know I'm almost tempted to say this is a sort of digital equivalent of a reverse Rorschach test for neural networks... but that would imply we've gifted them layers of consciousness beyond their programming loops. I find it incredible how much we're programmed to do that through our own neural networks.
In any case, the uses for this get a lot more interesting in a hurry if they build it back into a generalised AI project that could use it as a sub-component. Not that I know if we've made any headway with those since the 90s.
Gryzemuis on 20/6/2015 at 14:37
Quote Posted by faetal
I hear a lot of rhetoric about this being the dreams of machines, but to my mind it isn't.
(
https://en.wikipedia.org/wiki/Clarke%27s_three_laws) Any sufficiently advanced technology is indistinguishable from magic.
And more and more people will have problems distinguishing algorithms from magic, intelligence or even real living entities.
Sulphur on 20/6/2015 at 15:07
This is not quite what Clarke had in mind when he said that, and that's not quite what faetal was talking about - if this truly was something inexplicable to the common observer of today, they'd have to be completely unaware of what an algorithm is.
demagogue on 20/6/2015 at 15:13
If people are having trouble reading my google doc, I'm going to paste it below.
................
Creepy Story #2
Another story from the same class...
I really hope this one isn't too technical, but I think you need details to appreciate why it was so chilling at the time.
We were studying how neural nets are used for facial recognition (recognizing previously-learned faces in novel images), in particular with some occlusion, like a black bar over the face. How much can you black-out a guy's face and still get accurate recognition?
Again, I hate to detour into theory, but it's helpful to have an image in mind...
The idea is you have a number of layers of connected nodes like this:
[ATTACH=CONFIG]2154[/ATTACH]
The top layer is an input layer, in this case carrying the raw data for a bitmap image (e.g., each node would correspond to a pixel). The bottom layer is the output layer, which in this case would be whatever you want the system to identify ... e.g., we had one system that could identify 1 individual out of a group of 6 people, so there'd be 6 output nodes, one for each individual. Another one identified males vs females, so there'd be 2 output nodes, "male" and "female". This diagram has 3.
The actual "output" occurs when one output node gets a stronger share of the original signal than the others (e.g., "male" gets 70% of the total signal, and "female" only gets 30%), after all the original signal has meandered its way down to the bottom. The middle layers themselves are called hidden layers. This diagram just has one hidden layer, but it can have more (ours had about 4 or 5 middle layers, iirc)
All the actual work is done in the connections between nodes. They are weighted ... so when a signal comes into a node, say it has just two lines coming out, one weighted 10% (going to A) and the other 90% (going to B), then 10% of the incoming signal goes to A and 90% to B.
You can think about it like nodes being buckets of water, and links being pipes of various sizes between buckets, carrying more or less water depending on their size. So you fill the top layer of buckets with varying amounts of water (e.g., for an image file, each pixel is a bucket, and the darkness is the amount of water). The water will flow downwards through all these 100s of intervening different-sized pipes, into the next layer of buckets, some filling up with more water than others, then through more pipes into more buckets, etc. (Important point for later, though: notice that there are more nodes on top than going down, so with each layer information granularity is lost, the fine details of the face get washed out in return for generality gained. Think about 10 liters spread out among 1000 buckets vs the same 10 liters in 3 buckets.) Anyway, eventually the same total amount of water will pour into the final few output buckets, but more should flow into 1 output bucket than any other, and the label on that bucket ("male") is the output.
The system "learns" by adjusting pipe-width over time. It runs a test image, and if the output is "correct" (the output is "male" and the image really is a male), then it is rewarded: fat pipes are made fatter, skinny pipes skinnier; do more of that. If the output is "incorrect" (the image is really a female), then it is punished: fat pipes are made skinnier, skinny pipes fatter; do less of that). The more test runs you do (say 10,000 over a weekend), and the more different test images you use (say 100 images of each person you want recognized), and the more hidden layers you have (we used 4 or 5 iirc), the more accurate the results will be over time. We got into the 95+% accuracy iirc.
So that's the theory; sorry to belabor it, but I think the image helps.
So we had built-up a system that could identify 1 individual in a novel photo from a group of 6 people. After letting the system run thousands of trials over 100s of images of their faces, we started testing its accuracy with occluded faces.
The face which is the main interest of this story is this guy:
[ATTACH=CONFIG]2152[/ATTACH]
But this isn't the image we used of him.
Here is the actual image we ran through the system after a weekend of learning trials, with a black bar occluding the eyes:
[ATTACH=CONFIG]2153[/ATTACH]
Because the accuracy was very high by that point, we weren't surprised that the system got the right answer even with the occlusion.
Now the professor did something very clever with the same photo to make a point. He took the signal as it existed on the last hidden layer and ran the whole algorithm again backwards, bottom-up, from that level back up to the top. Using the bucket metaphor, it's like he took all the water in the next-to-last layer of buckets (that had flowed down from the occluded image), then flipped the whole structure upside-down, so the water now ran "up" the pipes, and eventually back to the top layer where it gave the data for a bitmap image.
Now you might think, well it's just going to return the original image with the face and the black bar. But the way the algorithm worked, it wasn't a linear transformation from layer to layer. Because you are now adding nodes with each layer, you are gaining information particularity (fine details on the face) with each layer, but that level of particularity has already been lost from the signal at its now-heavily-processed level of generality. It's like trying to re-sharpen with details an image that's already been blurred, even if the original image was sharp before you blurred it and you try to 'de-blur' it by running the blur routine in reverse. (Anyway, it's something like that. I get hazy on the details now...)
The point is: if you run the algorithm backwards, you get a different image than the original. The translation gets distorted by the weights run in reverse; they want to "add" information that's already been lost. But those details have to get added from ... somewhere. Instead of the original, you'll get a sort of "ghost" image, an "ur-image" that's like an idealization of the face built into all of the links and nodes as a whole, like a faint "memory" of facial details it wants to see in the image, what you can almost imagine the computer is really "seeing" in its "mind's eye" to make its discrimination (and for that matter, maybe what our brain's see as well, deep deep down on some dream-level).
Footnote: It's related to a process used to make a kind of eigenface (I think?), which is like the average of all facial information in a neural net, what you'd get if you ran a "white" signal back through it. Those are haunting enough. But here it's a particular image being filtered through that facial memory, and even then, not a full face, but one where all of the "eyes" information is completely missing.
Anyway, after about an hour of the backwards-algorithm churning, finally the signal reached the top layer and we had the data for a bitmap image ready to render.
The anticipation was palpable.
He packaged the data and slid it into Photoshop, and slowly but surely the following image streamed into view:
[ATTACH=CONFIG]2155[/ATTACH]
This haunting image with ghost eyes that were not there before. I almost shit my pants then and there. I seriously did a double-take, because this was something I knew wasn't engineered or manipulated by anyone to look like that.
I should emphasize that absolutely nowhere in the signal fed into the algorithm (the processed 2nd picture up there) was there any "eye" information. Those ghost eyes were entirely reconstructed by the neural net adding its own details to deal with the increasing number of nodes (and information granularity), details that could only be embedded deeeep in the memory abyss of 100,000s of weights unwittingly working together, en masse "dreaming" of something that wasn't actually there in reality, something that no single node controlled by itself, but only all of them conspiring together could put there.
What I recall is staring at those terrifying ghost eyes staring back at me, those eyes that came from nowhere, from the deep abyss of the net's subconscious, while the professor nonchalantly chatted away explaining this and that technical point.
I don't think I heard a word he said.
I was haunted by THOSE EYES!