[[ the following TEXT mixed has been moved and duplicated from
models: machine learning (AI - artificial intelligence) (modelsreadingroom.blogspot.com)
]]
Kai-Fu Lee., AI superpowers: China, Silicon Valley and the new world order, 2018
pp.6-10
A brief history of deep learning
Machine learning ── the umbrella term for the field that includes deep learning ── is a history-altering technology but that is lucky to have survived a tumultuous half-century of research. Ever since its inception, artificial intelligence has undergone a number of boom-and-bust cycles. Periods of great promise have been followed by “AI winters”, when a disappointing lack of practical results led to major cut in funding. Understanding what makes the arrival of deep learning different requires a quick recap of how we got here.
Back in the mid-1950s, the pioneers of artificial intelligence set themselves an impossibly lofty but well-defined mission: to recreate human intelligence in a machine. That stiking combination of the clarity of the goal and the complexity of the task would draw in some of the greatest minds in the emerging field of computer science: Marvin Minsky, John McCarthy, and Herbert Simon.
As a wide-eyes computer science undergrad at Columbia University in the early 1980s, all of this seized my imagination. I was born in Taiwan in the early 1960s but moved to Tennessee at the age of 11 and finished middle and high school there. After four years at Columbia in New York, I knew that I wanted to dig deeper into AI. When applying for computer science Ph.D. programs in 1983, I even wrote this somewhat grandiose description of the field in my statement of purpose: “Artificial intelligence is the elucidation of the human learning process, the quantification of the human thinking process, the explication of human behavior, and the understanding of what makes intelligence possible. It is men's final step to understand themselves, and I hope to take part in this new, but promising science.”
That essay helped me get into the top-ranked computer science department of Carnegie Mellon University, a hotbed for cutting-edge AI research. It also displayed my naivaté about the field, both over estimating our power to understand ourselves and underestimating the power of AI to produce superhuman intelligence in narrow spheres.
By the time I began my Ph.D., the field of artificial intelligence had forked into two camps: the “rule-based” approach and the “neural networks” approach. Researchers in the rule-based camp (also sometimes called “symbolic systems” or “expert systems”) attempted to teach computers to think by encoding a series of logical rules: If X, then Y. This appproach worked well for simple and well-defined games (“toy problems”) but fell apart when the universe of possible choices or moves expanded. To make the software more applicable to real-world problems, the rule-based camp tried interviewing experts in the problems being tackled and then encoding their wisdom into the program's decision-making (hence the “expert systems” moniker).
The “neural networks” camp, however, took a different approach. Instead of trying to teach the computer the rules that had been mastered by a human brain, these practitioners tried to reconstruct the human brain itself. Given that the tangled webs of neurons in animal brains were the only thing capable of intelligence as we knew it, the researchers figured they'd go straight to the source. This approach mimics the brain's underlying architecture, constructing layers of artificial neurons that can receive and transmit information in a structure akin to our networks of biological neurons. Unlike the rule-based approach, builders of neural networks generally do not give the networks rules to follow in making decisions. They simply feed lots and lots of examples of a given phenomenon ── pictures, chess games, sounds ── into the neural networks and let the networks themselves identify patterns within the data. In other words, the less human interference, the better.
Differences between the two approaches can be seen in how they might approach a simple problem, identifying whether there is a cat in a picture. The rule-based approach would attempt to lay down “if-then” rules to help the program make a decision: “If there are two triangular shapes on top of a circular shape, then there is probably a cat in the picture”. The neural network approach would instead feed the program millions of sample photos labeled “cat” or “no cat”, letting the program figure out for itself what features in the millions of images were most closely correlated to the “cat” label.
During the 1950s and 1960s, early versions of artificial neural networks yielded promising results and plenty of hype. But then in 1969, researchers from the rule-based camp pushed back, convincing many in the field that neural networks were unreliable and limited in their use. The neural networks approach quickly went out of fashion, and AI plunged into one of its first “winters” during the 1970s.
Over the subsequent decades, neural networks enjoyed brief stints of prominence, followed by near-total abandonement. In 1988, I used a technique akin to neural networks (Hidden Markov Models) to create Sphinx, the world's first speaker-independent program for recognizing continuous speech. That achievement landed me a profile in the New York Times. But it wasn't enough to save neural networks from once again falling out of favor, as AI reentered a prolonged ice age for most of the 1990s.
What ultimately resuscitated the field of neural networks ── and sparked the AI renaissance we are living through today ── were changes to two of the key raw ingredients that neural networks feed on, along with one major technical breakthrough. Neural networks require large amounts of two things: computing power and data. The data “trains” the program to recognize patterns by giving it many examples, and the computing power lets the program parse those examples at high speeds.
Both data and computing power were in short supply at the dawn of the field in the 1950s. But in the intervening decades, all that has changed. Today, your smartphone holds millions of times more processing power than the leading cutting-edge computers that NASA used to send Neil Armstrong to the moon in 1969. And the internet has led to an explosion of all kinds of digital data: text, images, videos, clicks, purchases, Tweets, and so on. Taken together, all of this has given researchers copious amounts of rich data on which to train their networks, as well as plenty of cheap computing power for that training.
But the networks themselves were still severely limited in what they could do. Accurate results to complex problems required many layers of artificial neurons, but researchers hadn't found a way to efficiently train those layers as they were added. Deep learning's big technical break finally arrived in the mid-2000s, when leading researcher Geoffrey Hinton discovered a way to efficiently train those new layers in neural networks. The result was like giving steroids to the old neural networks, multiplying their power to perform tasks such as speech and object recognition.
Soon, these juiced-up neural networks ── new rebranded as “deep learning” ── could outperform older models at a variety of tasks. But years of ingrained prejudice against the neural networks approach led many AI researchers to overlook this “fringe” group that claimed outstanding results. The turning point came in 2012, when a neural network built by Hinton's team demolished the competition in a international computer vision contest.
After decades spent on the margins of AI research, neural networks hit the mainstream overnight, this time in the form of deep learning. That breakthrough promised to thaw the ice from the latest AI winter, and for the first time truly bring AI's power to bear on a range of real-world problems. Researchers, futurists, and tech CEOs all began buzzing about the massive potential of the field to decipher human speech, translate documents, recognize images, predict consumer behavior, identifying fraud, make lending decisions, help robot “see”, and even drive a car.
p.10
So how does deep learning do this? Fundamentally, these algorithms use massive amounts of data from a specific domain to make a decision that optimizes for a desired outcome. It does this by training itself to recognize deeply buried patterns and correlations connecting that many data points to the desired outcome. This pattern-finding process is easier when the data is labeled with that desired outcome ─ “cat” versus “no cat”; “clicked” versus “didn't click”; “won game” versus “lost game”. It can then draw on its extensive knowledge of these correlations ─ many of which are invisible or irrelevant to hman observers ─ to make better decisions than a human could.
Doing this requires massive amounts of relevant data, a strong algorithm, a narrow domain, and a concrete goal. If you're short any one of these, things fall apart. Too little data? The algorithm doesn't have enough examples to uncover meaningful correlations. Too broad a goal? The algorithm lacks clear benchmarks to shoot for in optimization.
Deep learning is what's known as “narrow AI” ─ intelligence that takes data from one specific domain and applies it to optimizing one specific outcome. While impressive, it is still a far cry from “general AI”, the all-purpose technology that can do everything a human can.
Deep learning's most natural application is in fields like insurance and making loans. Relevant data on borrowers is abundant (credit score, income, recent credit-card usage), and the goal to optimize for is clear (minimize default rates).
pp.10-11
Take one step further, deep learning will power self-driving cars by helping them to “see” the world around them ─ recognize patterns in the camera's pixels (red octagons), figure out what they correlate to (stop signs), and use that information to make decisions (apply pressure to the brake to slowly stop) that optimize for your desired outcome (deliver me safely home in minimal time).
p.11
deep learning
to recognize pattern,
optimize for a specific outcome,
make a decision
can be applied to so many different kinds of everyday problems.
p.11
People are so excited about deep learning precisely because its core power ─ its ability to recognize a pattern, optimize for a specific outcome, make a decision ─ can be applied to so many different kinds of everyday problems.
p.110
the fact that internet users are automatically labeling data as they browse.
p.110
traditional companies have also been automatically labeling huge quantities of data for decades. For instance, insurance companies have been covering accidents and catching fraud, banks have been issuing loans and documenting repayment rates, and hospitals have been keeping records of diagnoses and survival rates.
p.110
Business AI mines these databases for hidden correlations that often escape the naked eye and human brain.
p.110
historic decisions and outcomes within an organization and
uses labeled data to train an algorithm that can outperform even the most experienced human practitioners.
p.110
strong features
human normally make predictions on the basis of strong features, a handful of data points that are highly correlated to a specific outcome, often in a clear cause-and-effect relationship. For example, in predicting the likelihood of someone contracting diabetes, a person's weight and body mass index are strong features.
p.111
weak features
weak features: peripheral data points that might appear unrelated to the outcome but contain some predictive power when combined across tens of millinos of examples.
These subtle correlations are often impossible for any human to explain in terms of cause and effect: why do borrowers who take out loans on Wednesday repay those loans faster?
p.111
Optimizations like this work well in industries with large amounts of structured data on meaningful business outcomes. In this case, “structured” refers to data that has been categorized, labeled, and made searchable. Prime examples of well-structured corporate data sets include historic stock prices, credit-card usage, and mortgage defaults.
(AI superpowers: China, Silicon Valley and the new world order / Kai-Fu Lee.;
Boston: Houghton mifflin Harcourt, 2018; includes bibliographical references and index; subjects: artificial intelligence ── economic aspects ── china.| artificial intelligence ── economic aspects ── united states.; HC79.155 (ebook)
HC79.155 L435 2018 (print); 338.4; https://lccn.loc.gov/2018-17250; 2018, )
____________________________________
• I believed the technology [AI speech recognition] would go mainstream within five years.
• It turned out that I was off by twenty years.
Kai-Fu Lee., AI superpowers: China, Silicon Valley and the new world order, 2018
p.143
in the late 1980, I was the world's leading researcher on AI speech recognition, and I joined Apple because I believed the technology would go mainstream within five years. It turned out that I was off by twenty years.
p.178, p.177
chief scientist for speech recognition, 1991
we used voice commands to schedule an appointment, write a check, and program a VCR,
showcasing the earliest examples of futuristic functions that wouldn't go mainstream for another 20 years, with Apple's Siri and Amazon's Alexa.
(AI superpowers: China, Silicon Valley and the new world order / Kai-Fu Lee.; Boston: Houghton mifflin Harcourt, 2018; includes bibliographical references and index; subjects: artificial intelligence ── economic aspects ── china.| artificial intelligence ── economic aspects ── united states.; HC79.155 (ebook)
HC79.155 L435 2018 (print); 338.4; https://lccn.loc.gov/2018-17250; 2018, )
____________________________________
Stephen Witt., , The chosen chip : how nvidia is powering the A.I. revolution., The new yorker., Dec. 4, 2023
Jensen Huang, Nvidia's c.e.o.,
RIVA 128
GeForce
PC gamers, looking to gain an edge, brought new GeForce cards every time they were upgraded.
p.30
He founded Nvidia in 1993, with Chris Malachowsky and Curtis Priem, two veteran microchip designers.
p.30
Malachowsky and Priem were looking to design a graphics chip, which they hoped would make competitors, in Priem's words, “green with envy.”
‘’•─“”
p.31
In 2000, Ian Buck, a graduate student studying computer graphics at Stanford, chained 32 GeForce cards together to play Quake using 8 projects. “It was the first gaming rig in 8K resolution, and it took up an entire wall,” Buck told me. “It was beautiful.”
p.31
Buck wondered if the GeForce cards might be useful for tasks other than launching grenades at his friends. The cards came with a primitive programming tool called a shader. With a grant from DARPA, the Department of Defense's research arm, Buck hacked the shaders to access the parallel-computing circuits below, repurposing the GeForce into a low-budget supercomputer. Soon, Buck was working for Huang.
p.32
Since 2004, Buck has overseen the development of Nvidia's supercomputing software package, known as CUDA. Huang's vision was to enable CUDA to work on every GeForce card.
p.32
As Buck developed the software, Nvidia's hardware team began allocating space on the microchips for supercomputing operations. The chips contained billions of electronic transistors, which routed electricity through labyrinthine circuits to complete calculations at extraordinary speed. Arjun Prabhu, Nvidia's lead chip engineer, compared microchip design to urban planning, with different tasks. As Tetris players do with falling blocks, Prabhu will sometimes see transistors in his sleep. “I've often had it where the best ideas happen on a Friday night, when I'm literally dreaming about it,” Prabhu said.
p.32
When CUDA was released, in late 2006, Wall Street reacted with dismay.
p.32
“They were spending a fortune on this new chip architecture,” Ben Gilbert, the co-host of “Acquired,” a popular Silicon Valley podcast, said. “They were spending many billions targeting an obscure corner of academic and scientific computing, which was not a large market at the time ── certainly less than the billions they were pouring in.”
p.32
Huang argued that the simple existence of CUDA would enlarge the supercomputing sector. This view was not widely held, and by the end of 2008 Nvidia's stock price had declined by 70 per cent. ([ this would be the time to buy, knowing what we know now ])
p.32
Ting-Wai Chiu, a professor of physics at National Taiwan university,
had constructed a homemade supercomputer in a laboratory adjacent to his office.
Huang arrived to find the lab littered with GeForce boxes and the computer cooled by oscillating desk fans. “Jensen is a visionary,” Chiu told me. “He made my life's work possible.”
Chiu was the model customer, but there weren't many like him.
p.33
Downloads of CUDA hit a peak of 2009, then declined for three years. Board members worried that Nvidia's depressed stock price would make it a target for corporate raiders. “We did everything we could to protect the company against an activist shareholder who might come in and try to break it up”, Jim Gaither, a longtime board member, told me.
([ this would be the time to buy, knowing what we know now ])
p.33
Dawn Hudson, a former N.F.L. marketing executive, joined the board in 2013. “It was a distinctly flat, stagnant company”, she said.
p.33
In marketing CUDA,
‘’•─“”
pp.33─34
p.33
One application that Nvidia spent little time thinking about was artificial intelligence. There didn't seem to be much of a market.
At the beginning of the 2010, A.I. was a neglected discipline. Progress in basic tasks such as image recognition and speech recognition had seen only halting progress. Within this unpopular academic field, an even less popular subfield solved problems using “neural networks” ── computing structures inspired by the human brain. Many computer scientists considered neural networks to be discredited. “I was discouraged by my advisers from working on neural nets”, Catanzaro, the deep-learning researcher, told me, “because, at the time, they were considered to be outdated, and they didn't work.”
pp.33─34
Catanzaro, the deep-learning researcher
Catanzaro described the researchers who continued to work on neural nets as “prophets in the wilderness.” One of those prophets was Geoffrey Hinton, a professor at the University of Toronto. In 2009, Hinton's research group used Nvidia's CUDA platform to train a neural network to recognize human speech. He was surprised by the quality of the results, which he presented at a conference later that year. He then reached out to Nvidia. “I sent an e-mail saying, ‘Look, I just told a thousand machine-learning researchers they should go and buy Nvidia cards. Can you send me a free one?’” Hinton told me. “They said no.”
Despite the snub, Hinton encouraged his students to use CUDA, including a Ukrainian-born protege of his named Alex Krizhevsky, who Hinton thought was perhaps the finest programmer he'd ever met. In 2012, Krizhevsky and his research partner, Ilya Sutskever, working on a tight budget, bought two GeForce cards from Amazon. Krizhevsky then began training a visual-recognition neural network on Nvidia's parallel-computing platform, feeding it millions of images in a single week. “He had the two G.P.U. boards whirring in his bedroom,” Hinton said. “Actually, it was his parents who paid for the quite considerable electricity costs.”
Sutskever and Krizhevsky were astonished by the cards' capabilities. Earlier that year, researchers at Google had trained a neural net that identified videos of cats, an effort that required some 16,000 C.P.U.s.
Sutskever and Krizhevsky had produced world-class results with just two Nvidia circuit boards. “G.P.U.s showed up and it felt like a miracle,” Sutskever told me.
AlexNet, the neural network that Krizhevsky trained in his parents' house, can now be mentioned along-side the Wright flyer ([ Wright brothers heavier than air flying machine; traditional object heavier than air can not float, rise, or fly; balloon can float because it produces enough hot air to be lighter than the surrounding atmosphere; aircraft with wings can fly by moving the wing through the air at fast enough speed (or with strong enough wind), causes lift (flight, flying) ]) and the Edison bulb. In 2012, Krizhevsky entered AlexNet into the annual ImageNet visual-recognition contest; neural networks were unpopular enough at the time that he was the only contestant to use this technique ([ what were other techniques ]). AlexNet scored so well in the competition that the organizers initially wondered if Krizhevsky had somehow cheated. “That was a kind of Big Bang moment”, Hinton said. “That was the paradigm shift.”
“That was a kind of Big Bang moment”, Hinton said. “That was the paradigm shift.”
In the decade since Krizhevsky's nine-page description of AlexNet's architecture was published, it has been cited more than a hundred thousand times, making it one of the most important papers in the history of computer science. (AlexNet correctly identified photographs of a scooter, a leopard, and a container ship, among other things.)
Krizhevsky pioneered a number of important programming technique, but his key finding was that a specialized G.P.U. could train neural networks up to a hundred times faster than a general-purpose C.P.U. “To do machine learning without CUDA would have just been too much trouble,” Hinton said.
Krizhevsky pioneered a number of important programming technique, but his key finding was that a specialized G.P.U. could train neural networks up to a hundred times faster than a general-purpose C.P.U.
Within a couple years, every entrant in the ImageNet competition was using a neural networks trained on G.P.U.s were identifying images with 96 per cent accuracy, surpassing humans.
‘’•─“”
p.34
“The fact that they can solve computer vision, which is completely unstructured, leads to the question ‘What else can you teach it?’” Huang said to me.
The answer seemed to be: everything.
p.34
Huang concluded that neural networks would revolutionize society, and that he could use CUDA to corner the market on the necessary hardware.
“He sent out an e-mail on Friday evening saying everything is going to deep learning, and that we were no longer a graphics company,” Greg Estes, a vice-president at Nvidia, told me. “By Monday mornning, we were an A.I. company. Literally, it was that fast.”
p.35
in 2017, a new architecture for neural net training called a transformer.
The following year,
Open AI used Google's framework to build the first
“generative pre-trained transformer”, or G.P.T.
The G.P.T. models were trained on Nvidia supercomputers, absorbing an enormous corpus of text and learning how to make human like connections.
(The new yorker, Dec. 4, 2023, brave new world dept., The chosen chip : how nvidia is powering the A.I. revolution., By Stephen Witt., pp.28─37, )
____________________________________
• (2006) Hinton and colleagues's landmark paper: Geoffrey Hinton, Simon Osindero, and Yee-Whye The, “A fast learning algorithm for deep belief nets”, Neural computation 18 (2006)
• (2006) CUDA was released, in late 2006, p.32, (The new yorker, Dec. 4, 2023, brave new world dept., The chosen chip : how nvidia is powering the A.I. revolution., By Stephen Witt., pp.28─37, )
• (2009) Downloads of CUDA hit a peak of 2009, then declined for three years., p.33, (The new yorker, Dec. 4, 2023, brave new world dept., The chosen chip : how nvidia is powering the A.I. revolution., By Stephen Witt., pp.28─37, )
• (2012) In 2012, Krizhevsky entered AlexNet into the annual ImageNet visual-recognition contest; pp.33─34, (The new yorker, Dec. 4, 2023, brave new world dept., The chosen chip : how nvidia is powering the A.I. revolution., By Stephen Witt., pp.28─37, )
• 2016, a neural network build by Hinton's team demolish the competition in an international computer vision contest.
Kai-Fu Lee., AI superpowers: China, Silicon Valley and the new world order, 2018
p.238
Hinton and colleagues's landmark paper:
Geoffrey Hinton, Simon Osindero, and Yee-Whye The,
“A fast learning algorithm for deep belief nets”,
Neural computation 18 (2006): 1527-1554.
p.9
Soon, these neural networks ─ now rebranded as “deep learning” ─ could out perform older models at a variety of task.
p.9
The turning point came in 2016, when a neural network build by Hinton's team demolish the competition in an international computer vision contest.
(AI superpowers: China, Silicon Valley and the new world order / Kai-Fu Lee.; Boston: Houghton mifflin Harcourt, 2018; includes bibliographical references and index; subjects: artificial intelligence ── economic aspects ── china.| artificial intelligence ── economic aspects ── united states.; HC79.155 (ebook)
HC79.155 L435 2018 (print); 338.4; https://lccn.loc.gov/2018-17250; 2018, )
____________________________________
A Fireside Chat with Turing Award Winner Geoffrey Hinton, Pioneer of Deep Learning (Google I/O'19)
https://youtu.be/UTfQwTuri8Y
https://youtu.be/UTfQwTuri8Y
39:01
TensorFlow
May 9, 2019
it was 40 years ago,
it seems to me there is no other way the brain could work,
it has to work by learning the strengths of connections.
And, if you want to make a device do something intelligent,
you've got two options.
you can program it, or it can learn.
And we certainly weren't programmed.
So we had to learn.
So this had to be the right way to go.
Neural network explain
----------------------
so you have relatively simple processing
elements that are very loosely models of neurons.
They have connections coming in.
Each connection has a weight on it.
That weight can be changed to do learning.
And what a neuron does is take the activities
on the connections times (x) the weights, adds them all up,
and then decide whether to send an output.
And if it gets a big enough sum, it sends an output.
If the sum is negative, it doesn't send anything.
That's about it.
it's just a question of how you change the weights.
it was designed to be like how the brain works.
the whole idea was to have a learning device that
learned like the brain, like people think the brain learns
by changing the connection strengths.
this wasn't my idea
Turing,
he believed that the brain was this unorganized device
with random weights.
And it would use reinforcement learning
to change the connections.
And it would learn everything, and
he thought that was the best route to intelligence.
it wasn't just Turing. Lots of people thought that back then.
it turns out that it was mainly a question of scale
just by trying to model the structure of the data
I actually still believe that.
YOu can say this model finds the data less surprising than this.
Then around the same time, they started developing the GPUs.
The people doing neural networks started using GPUs in about 2007.
And so they were using this idea of pre-training.
after they've pre-training, then they'd just stick labels
on top and use back propagation.
And it turned out that way, you could have a very deep net
that was pre-trained this way.
you can use back propagation and it actually work
since it was beating standard models that
are taking 30 years to develop with a bit more development
would do really well.
Google was the fastest to turn it into a production speech recognizer.
And by 2012, that work was first done in 2009
came out in Android.
And Android suddenly got better speech recognition
it felt really good that it got state of the art on real problem
George Dahl,
this stuff is going work for image recognition
Fei-Fei Li has created the correct data set for it,
And so what we did was take an approach originally developed
by Yann LeCun.
A student called Alex Krizhevsky was a real wizard.
He could make GPUs do anything.
Programmed the GPUs really, really well.
And we got results that were a lot better
than standard computer vision.
that was 2012.
Kaggle, modeling chemical molecule, predictor of molecule binding
if you told me in 2012 that in the next five years,
we'll be able to translate between many languages using
just the same technology, recurrent nets,
but just the stochastic gradient descent
from random initial weights,
I wouldn't have believed you.
It happened much faster than expected.
SO I think what we've learned in the last 10
years is that if you take a system with billions
of parameters, and you'd use stochastic gradient descent
in some objective function,
and the objective function might be to get the right labels
or it might be to fill in the gap in a string of words,
or any objective function, it works much better than it
has any right to.
it works much better than you would expect.
You would have thought, and most people in conventional AI
thought, take a system with a billion parameters,
start them off with random values,
measure the gradient of the objective function.
That is, for each parameter figure out how the objective function
would change if you change that parameter a little bit.
And then change it in that direction that improves
the objective function.
You would have thought that would be a kind of hopeless
algorithm that would get stuck.
And it turns out, it's a really good algorithm.
And the bigger you scale things, the better it works.
And that's just an empirical discovery really.
There's some theory coming along,
but it's basically an empirical discovery.
Now because we've discovered that,
it makes it far more plausible that the brain
is computing the gradient of some objective function
and updating the weights of strengths of synapses
to follow that gradient.
And we know now that's wrong.
You can just put in random parameters and learn everything.
ONe theory of Dreaming (unlearning)
Boltzmann machine learning algorithm
And the Boltzmann machine learning algorithm
had a very interesting property, which is I show you data.
That is, I fixed the states of the observeable units.
And it sort of rattles around the other units
until it's got a fairly happy state.
And once it's done that, it increases
the strength of all the connection based
on if two units are both active, it increases the connection strength.
That's called kind of Hebbian learning.
But you just do that, the connection strengths
just get bigger and bigger.
You also have to have a phase where you cut it off from the input.
You let it rattle around to settle into a state it's happy with.
So now it's having a fantasy.
And once it's had the fantasy you say,
take all passive neurons that are active
and decreases the strength to the connection.
So I'm explaining the algorithm to you just as a procedure.
But actually that algorithm is the result of doing some math
and saying, how should you change these connection
strengths so that this neural network with all
these hidden units finds the data unsurprising?
And it has to have this other phase.
It has to have this what we call the negative phase when
it's running with no input.
And it's cancelling out - its unlearning whatever state it settles into.
Terry Sejnowski and I showed that actually that
is a maximum ... learning procedure for Boltzmann machines.
so that's one theory of dreaming
Yeah, we show theoretically
that's the right thing to do if you want
to change the weights so that you big neural network finds
the observed data less surprising.
So yes, we had machines learning algoriths.
Some of the first algorithms that
could learn what to do with hidden units
were Boltzmann machines.
Those were the things that learned one layer feature
detector at a time.
And it was an efficient form of restricted Boltzmann machine.
wake-sleep algorithm
____________________________________
Geoffrey Hinton: The Foundations of Deep Learning
https://www.youtube.com/watch?v=zl99IZvW7rE
https://www.youtube.com/watch?v=zl99IZvW7rE
28:21
Elevate
Feb 7, 2018
it worked better, it worked just a little bit better
but good speech people, particularly down at Microsoft
realized right away that if this works a little bit better
and two graduate students did it in few months
it's going to completely wipe out the existing state-of-the-art
and indeed over the next couple years, it did.
google uses it, and it suddenly got better than Siri
now all speech recognition are trained with back propagation
and neural net
Error rates on the ImageNet-2012 competition
• 2017 deep neural nets • 3%
• 2015 deep neural nets (or people!) • 5%
• university of toronto (Krizhevsky et al, 2012) • 16%
• university of tokyo • 26%
• oxford university (zisserman et al) • 27%
• INRIA (French national research insitute in CS)
+ XRCE (Xerox research center europe) • 27%
• university of amsterdam • 29%
recurrent neural networks
in medical images, very soon we will be better than radiologist
in skin cancer, we have a system that is comparable to radiologist
with dermatologist
____________________________________
• (2006) Hinton and colleagues's landmark paper: Geoffrey Hinton, Simon Osindero, and Yee-Whye The, “A fast learning algorithm for deep belief nets”, Neural computation 18 (2006)
• 2016, a neural network build by Hinton's team demolish the competition in an international computer vision contest.
([ if you listened to the following 18 minute video, you're going to about Hinton and his colleagues, the hardware (Nvidia), ImageNet (image dataset), and the fast learning algorithm (Hinton and colleagues) ])
https://en.wikipedia.org/wiki/ImageNet
([ so why I am mentioning this; until now, I had no idea that Nvidia was the hardware that Hinton and gang were using to do their machine learning ])
How Nvidia Won AI
https://www.youtube.com/watch?v=GuV-HyslPxk
https://www.youtube.com/watch?v=GuV-HyslPxk
Asianometry
234,119 views Feb 20, 2022
When we last left Nvidia, the company had emerged victorious in the brutal graphics card Battle Royale throughout the 1990s.
Very impressive. But as the company entered the 2000s, they embarked on a journey to do more. Moving towards an entirely new kind of microprocessor - and the multi-billion dollar market it would unlock.
In this video, we are going to look at how Nvidia turned the humble graphics card into a platform that dominates one of tech’s most important fields: Artificial Intelligence.
Links:
- The Asianometry Newsletter: https://asianometry.substack.com
- Patreon: https://www.patreon.com/Asianometry
- The Podcast: https://anchor.fm/asianometry
- Twitter: https://twitter.com/asianometry
____________________________________
IBM Watson
en.wikipedia.org
https://en.wikipedia.org/wiki/IBM_Watson
Siri
en.wikipedia.org
https://en.wikipedia.org/wiki/Siri
Nuance communication
https://en.wikipedia.org/wiki/Nuance_Communications
speech recognition
https://en.wikipedia.org/wiki/Speech_recognition
Amazon Echo (Alexa)
en.wikipedia.org
https://en.wikipedia.org/wiki/Amazon_Echo
OpenAI (company), chatGPT 4 (query and response text chat bot)
en.wikipedia.org
https://en.wikipedia.org/wiki/OpenAI
GitHut copilot
https://en.wikipedia.org/wiki/GitHub_Copilot
GTP-3
Generative Pre-trained Transformer 3 (GPT-3) is a large language model released by OpenAI in 2020.
https://en.wikipedia.org/wiki/GPT-3
OpenAI codex
https://en.wikipedia.org/wiki/OpenAI_Codex
CALO
"Cognitive Assistant that Learns and Organizes"
https://en.wikipedia.org/wiki/CALO
CUDA
CUDA (or Compute Unified Device Architecture) is a proprietary and closed source parallel computing platform and application programming interface (API) that allows software to use certain types of graphics processing units (GPUs) for general purpose processing, an approach called general-purpose computing on GPUs (GPGPU). CUDA is a software layer that gives direct access to the GPU's virtual instruction set and parallel computational elements, for the execution of compute kernels.[1]
https://en.wikipedia.org/wiki/CUDA
Stephen Witt., , The chosen chip : how nvidia is powering the A.I. revolution., The new yorker., Dec. 4, 2023
p.35
in 2017, a new architecture for neural net training called a transformer.
The following year,
Open AI used Google's framework to build the first
“generative pre-trained transformer”, or G.P.T.
The G.P.T. models were trained on Nvidia supercomputers, absorbing an enormous corpus of text and learning how to make human like connections.
(The new yorker, Dec. 4, 2023, brave new world dept., The chosen chip : how nvidia is powering the A.I. revolution., By Stephen Witt., pp.28─37, )
____________________________________
put the text selection from Amazon Unbound on Amazon Echo here
(done - Thur 27 Jan 2022)
Brad Stone, Amazon unbound: Jeff Bezos and the invention of a global empire, 2021
• The math suggested they would need to roughly double the scale of their data collection efforts to achieve each successive 3 percent [ 3% ] increase in Alexa's accuracy., p.37, Brad Stone, Amazon unbound: Jeff Bezos and the invention of a global empire, 2021.
p.23
The initiative was originally designated inside Lab126 as Project D. It would come to be known as the Amazon Echo, and by the name of its virtual assistant, Alexa.
p.24, p.45
Project D, also known as ‘Amazon Alexa’, later named ‘Amazon Echo’
January 4, 2011, first email from Bezos on Project D, p.24
November 6, 2014, product launch, p.45
([
within a four year time horizon Amazon developed a voice-enable user interface, inside a real─world working product,
─ development far─field speech recognition
─ refine speech communication (speak and sound like natural voice)
─ backoffice technical development
─ developed the plan to gather enough data for the far─field speech recognition
─ the heavy lifting of the speech recognition and other sensory data processing happen at the data center
─ need internetwork [Internet or VPN] connection with the data center
─ (( I would be interested to know, if you were to connect an Amazon Echo inside a corporate network, configure the device with a proxy server to communicate to the Amazon server; who what else does the Echo need to connect to work properly; how would a corporate firewall react to this new traffic. ))
─ port number for Amazon Echo (Alexa)
─ for example, port number for e─mail is 25, or, is it 24
• The math suggested they would need to roughly double the scale of their data collection efforts to achieve each successive 3 percent [3%] increase in Alexa's accuracy., p.37, Brad Stone, Amazon unbound: Jeff Bezos and the invention of a global empire, 2021.
])
p.462 Index
Amazon Alexa, 26─38
AMPED and, 43─44
beta testers
Bezos's sketch for,
bug in,
as Doppler project, 26─38, 40, 42─47
Evi and, 34─36
Fire tablet and, 44
language─specific version of, 60
launch of, 44─46
name of, 32
Skills Kit, 44─46
social cue recognition in, 34─35
speech recognition in,
voice of, 27─30
voice service, 47
see also Amazon Echo
far─field speech recognition, 27─28
p.24
Greg Hart
([ in 2010, Greg Hart pointed out to Jeff Bezos that speech recognition technology was good at dictation and search; he did this by showing to Jeff, Google's voice search on an Android phone; ])
speech recognition 2010
Google's voice search, Android phone
technology was finally getting good at dictation and search
p.24
Hart remembered talking to Bezos about speech recognition one day in late 2010 at Seattle's Blue Moon Burgers. Over lunch, Hart demonstrated his enthusiasm for Google's voice search on his Android phone by saying, “pizza near me”, and then showing Bezos the list of links to nearby pizza joints that popped up on-screen. “Jeff was a little skeptical about the use of it on phones, because he thought it might be socially awkward”, Hart remembered. But they discussed how the technology was finally getting good at dictation and search.
p.24
January 4, 2011
Greg Hart,
Ian Freed, device vice president,
Steve Kessel
Amazon's HQ, Day 1 North building
p.25
voice-activated cloud computer
speaker, microphone, a mute button
Fiona, the Kindle building
p.26
One early recruit, Al Lindsay,
Al Lindsay, who in a previous job had written some of the original code for telco US West's voice-activated directory assistance. Lindsay spent his first three weeks on the project on vacation at his cottage in Canada, writing a six-page narrative that envisioned how outside developers might program their own voice-enabled apps that could run on the device.
p.26
internal recruit,
John Thimsen, director of engineering
p.26
To speed up development
Hart and his crew started looking for startups to acquire.
p.27
Yap, a twenty-person startup based in Charlotte, North Carolina, automatically translated human speech such as voicemails into text, without relying on a secret workforce of human transcribers
p.27
though much of Yap's technology would be discarded, its engineers would help develop the technology to convert what customers said into a computer-readable format.
p.27
industry conference in Florence, Italy
Amazon's newfound interest in speech technology
p.27
Jeff Adams, Yap's VP of research
two-decade veteran of the speech industry
pp.27-28
after the meeting, Adams delicately told Hart and Lindsay that their goals were unrealistic. Most experts believed that true “far-field speech recognition” ── comprehending speech from up to 32 feet away, often amid crosstalk and background noise ── was beyond the realm of established computer science, since sound bounces off surfaces like walls and ceilings, producing echoes that confuse computers.
“They basically told me, ‘We don't care. Hire more people. Take as long as it takes. Solve the problem,’” recalled Adams. “They were unflappable.”
p.28
Polish startup Ivona generated computer-synthesized speech that resembled a human voice.
Ivona was founded ìn 2001 by Lukasz Osowski, a computer science student at the Gdansk university of technology. Osowski had the notion that so-called “text-to-speech”, or TTS, could read digital texts aloud in natural voice and help the visually impaired in Poland appreciate the written word.
Michael Kaszczuk
he took recording of an actor's voice and selected fragments of words, called diphones, and then blended or “concatenated” them together in different combinations to approximate natural-sounding words and sentences that the actors might never have uttered.
p.28
While students, they paid a popular Polish actor named Jacek Labijak to record hours of speech to create a database of sounds. The result was their first product, Spiker, which quickly became the top-selling computer voice in Poland.
Over the next few years, it was used widely in subways, elevators, and for robocall campaigns.
p.29
annual Blizzard Challenge, a competition for the most natural computer voice, organized by Carnegie Mellon university.
p.29
Gdansk R&D center were put in charge of crafting Doppler's voice.
p.29
the team considered lists of characteristics they wanted in a single personality, such as trustworthiness, empathy, and warmth, and determined those traits were more commonly associated with a female voice.
pp.29-30
Atlanta area-based voice-over studio, GM Voices, the same outfit that had helped turn recording from a voice actress named Susan Bennett into Apple's agent, Siri.
p.30
To create synthetic personalities, GM Voices gave female voice actors hundreds of hours of text to read, from entire books to random articles, a mind-numbing process that could stretch on for months.
p.30
voice artist behind Alexa
professional voice-over community: Boulder-based singer and voice actress Nina Rolle.
warm timbre of Alexa's voice
Nina Rolle (Boulder-based singer and voice actress)
p.32
Bezos also suggested “Alexa”, an homage to the ancient library of Alexandria, regarded as the capital of knowledge.
p.32
[ seven omnidirectional microphones ] at the top
a cylinder elongated to create separation between the array of seven omnidirectional microphones at the top and the speakers at the bottom, with some 14 hundred holes punctured in the metal tubing to push out air and sound.
p.34
In 2012, inspired by Siri's debut, Tunstall-Pedoe pivoted and introduced the Evi app for the Apple and Android app stores. Users could ask it questions by typing or speaking. Instead of searching the web for answer like Siri, or returning a set of links, like Google's voice search, Evi evaluated the question and tried to offer an immediate answer. The app was downloaded over 250,000 times in its first week and almost crashed the company's servers.
p.34
Evi employed a programming technique called knowledge graphs, or large databases of ontologies, which connect concepts and categories in related domains. If, for example, a user asked Evi, “What is the population of Cleveland?” the software interpreted that question and knew to turn to an accompanying source of demographic data. Wired described the technique as a “giant treelike structure” of logical connections to useful facts.
Putting Evi's knowledge base inside Alexa helped with the kind of informal but culturally common chitchat called phatic speech.
p.35
Integrating Evi's technology helped Alexa respond to factual queries, such as requests to name the planets in the solar system, and it gave the impression that Alexa was smart. But was it? Proponents of another method of natural language understanding, called deep learning, believed that Evi's knowledge graphs wouldn't give Alexa the kind of authentic intelligence that would satisfy Bezos's dream of a versatile assistant that could talk to users and answer any question.
p.35
In the deep learning method, machines were fed large amounts of data about how people converse and what responses proved satisfying, and then were programmed to train themselves to predict the best answers.
p.35
The chief proponent of this approach was an Indian-born engineer named Rohit Prasad. “He was a critical hire”, said engineering director John Thimsen. “Much of the success of the project is due to the team he assembled and the research they did on far-field speech recognition.”
p.35
BBN Technologies (later acquired by Raytheon)
Cambridge, Massachusetts-based defense contractor
At BBN, he [Rohit Prasad] worked on one of the first in-car speech recognition systems and automated directory assistance services for telephone companies.
p.37
For years, Google also collected speech data from a toll-free directory assistance line, 800-GOOG-411.
p.37
Hart, Prasad, and their team created graphs that projected how Alexa would improve as data collection progressed. The math suggested they would need to roughly double the scale of their data collection efforts to achieve each successive 3 percent increase in Alexa's accuracy.
• The math suggested they would need to roughly double the scale of their data collection efforts to achieve each successive 3 percent increase in Alexa's accuracy., p.37, Brad Stone, Amazon unbound: Jeff Bezos and the invention of a global empire, 2021.
p.37
“How will we even know when this product is good?”
early 2013
Hart, Prasad, and their team created graphs that projected how Alexa would improve as data collection progressed. The math suggested they would need to roughly double the scale of their data collection efforts to achieve each successive 3 percent [3%] increase in Alexa's accuracy.
p.38
“First tell me what would be a magical product, then tell me how to get there.”
p.38
Bezos's technical advisor at the time, Dilip Kumar,
p.38
they would need thousands of more hours of complex, far-field voice commands.
p.38
Bezos apparently factored in the request to increase the number of speech scientists and did the calculation in his head in a few seconds.
“Let me get this straight. You are telling me that for your big request to make this product successful, instead of it taking forty years, it will only take us twenty?”
p.42
the resulting program, conceived by Rohit Prasad and speech scientist Janet Slifka over a few days in the spring of 2013
p.42
Rohit Prasad and speech scientist Janet Slifka
spring of 2013
p.42
answer a question that later vexed speech experts ──
how did Amazon come out of nowhere to leapfrog Google and Apple in the race to build a speech-enabled virtual assistant?
pp.42-43
internally the program was called AMPED
Amazon contracted with an Australian data collection firm, Appen, and went on the road with Alexa, in disguise.
p.43
Appen rented homes and apartments, initially in Boston, and then Amazon littered several rooms with all kinds of “decoy” devices: pedestal microphones, Xbox gaming consoles, televisions, and tablets. There were also some twenty Alexa devices planted around the rooms at different heights, each shrouded in an acoustic fabric that hid them from view but allowed sound to pass through.
p.43
Appen then contracted with a temp agency, and a stream of contract workers filtered through the properties, eight hours a day, six days a week, reading scripts from an iPad with canned lines and open-ended request
p.43
The speakers were turned off, so that Alexa didn't make a peep, but the seven microphones on each device captured everything and streamed the audio to Amazon's servers. Then another army of workers manually reviewed the recordings and annotated the transcripts, classifying queries that might stump a machine,
p.43
so that next time, Alexa would know.
p.43
The Boston test showed promise, so Amazon expanded the program, renting more homes and apartments in Seattle and ten other cities over the next six months to capture the voices and speech patterns of thousands more paid volunteers. It was a mushroom-cloud explosion of data about device placement, acoustic environments, background noise, regional accents, and all the gloriously random ways a human being might phrase a simple request to hear the weather, for example, or play a Justin
p.44
by 2012
multimillion-dollar cost.
p.44
By 2014, it has increased its store of speech data by a factor of ten thousand and largely closed the gap with rivals like Apple and Google.
p.47
over the next few months, Amazon would roll out the Alexa Skills Kit, which allowed other companies to build voice-enabled apps for the Echo, and Alexa Voice Service, which let the makers of products like lightbulbs and alarm clocks integrate Alexa into their own devices.
p.47
a smaller, cheaper version of Echo, the hockey puck-sized Echo Dot,
a portable version with batteries, the Amazon Tap.
Echo
Echo dot
Amazon Tap (a portable batteries version of Echo)
─“”‘’•
p.24
January 4, 2011
p.45
November 6, 2014
Brad Stone, Amazon unbound: Jeff Bezos and the invention of a global empire, 2021
____________________________________
artificial intelligence
DARPA program
CALO (Cognitive Assistant that Learns and Organizes)
CALO, years later, helped inspire the creation of Siri.
CALO (Cognitive Assistant that Learns and Organizes)
Siri
Google's voice search
Amazon Echo
Henry Kressel, If you really want to change the world, 2015 [ ]
p.12
SRI (formerly the Stanford Research Institute), one of the world's largest independent research institutes
SRI won a project under DARPA program and called it CALO (Cognitive Assistant that Learns and Organizes).4 CALO, years later, helped inspire the creation of Siri.
p.12
CALO developed into a massive program under the leadership of Bill Mark; Ray Perrault, director of the artificial intelligence center; Adam Cheyer, David Israel, Karen Myers, and Tom Garvey, program directors in the artificial intelligence center; Tom Dietterich, professor at Oregon State University; and many others. DARPA funded the program from 2003 to 2009, and it included the participation of more than 23 universities (including Stanford University, Carnegie Mellon, UC Berkeley, and MIT) and labs from the who's who of the artificial intelligence world. At more than $180 million, CALO was the largest artificial intelligence program in the history of DARPA. Concepts from the CALO program contributed to the basis of Siri and subsequent ventures.
p.15
Siri would be a “do engine” ...
Siri would allow people to buy tickets, make reservations, get the weather report, and find a movie by speaking into a smartphone. Siri would give them answers, no links.
p.67
Adam Cheyer, VP of engineering at Siri, throughout his career kept a list of the top five people in various technological fields. In meetings, Adam would talk about his recruiting progress with statement like, “I've got three of the top five people in this field. I'm going after the other two this month.” As a result of having this top talent, the Siri team exceeded goals and expectations at every stage.
(Kressel, Henry, If you really want to change the world : guide to creating, building, and sustaining breakthrough ventures / Henry Kressel, Norman Winarsky., 1. New business enterprises., 2. Venture capital., 3. Entrepreneurship., 2015, 658.11 Kressel, )
____________________________________
• “expectations maximization algorithm”, Leonard Baum
• computerized translation of foreign language [text] [or scripture]
• idea of “statistical machine translation”
• Canada's parliamentary records, which contain thousands of pages of paired passages in French and English
• Canadian Hansard
• database of parliamentary speeches
• Peter F. Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, and Robert L. Mercer, “The mathematics of statistical machine translation: parameter estimation”, computation linguistics 19, no. 2 (1993).
• Andy Way “A critique of statistical machine translation”. In W. Daelemans and V. Hoste (eds.), Journal of translation and interpreting studies: special issue on evaluation of translation technology, Linguistica antverpiensia, 2009, pp.17-41.
Sebastian Mallaby., More money than god : hedge funds and the making of a new elite, 2010.
pp.298-301
p.298
in 1993
Peter Brown and Robert Mercer.
They came from IBM's research center,
Before arriving at Renaissance, Brown and Mercer had worked a little on cryptography, but their real achievement lay elsewhere.
They had upended a related field ── that of computerized translation.
p.298
on translation, the subject was dominated by programmers who actually spoke some foreign languages. The approach was to understand the language from the inside, to know its grammer and its syntax, and to teach the computer that “la fille” means “the girl” and “les filles” is the plural form, much as you might reach a middle schooler.
p.299
But Brown and Mercer had a different method. They did not speak French, and they were not about to wade into its syntax or grammer. Instead, they got hold of Canada's parliamentary records, which contain thousands of pages of paired passages in French and English. Then they fed the material into an IBM workstation and told it to figure out the correlations.
p.299
their experiment at IBM was written up and published.21 It began with some scrubbing of data: Just as financial-market price histories must be checked for “bad tics” ── places where a sale is reported at $16 instead of $61 ── so the Canadian Hansard contained misprinted words that might confuse a translation program. Next, the computer began to search the data for patterns.
p.299
For all it knew at the outset, a given English word [and common English phrasings] was equally likely to be translatable into any of the 58,000 French words [and common French phrasings] in the sample, but once the computer had checked through the twinned passages, it found that most English words appeared in only some: Immediately, nearly 99 percent of the uncertainty was eliminated. Then the computer proceeded with a series of more subtle tests; for example, it assumed that an English word was most likely to correspond to a French word that came in the same position in the sentence. By now some word pairs were starting to appear: Couplings such as lait/milk and pourquoil/why shouted from the data. But other correlations spoke in a softer voice.
p.299
To hear them clear, you had to comb the data multiple times, using slightly different algorithm at each turn. “Only in this way can one hope to hear the quiet call of marqué d'un asterisque/starred or the whisper of qui s'est fait bousculer/embattled”, Brown and Mercer reported.
p.299
To the code breaker at the Institute for Defense Analyses, this method would not have seemed surprising.22
“expectations maximization algorithm”, Leonard Baum
p.299
Indeed, Brown and Mercer used a tool called the “expectations maximization algorithm”, and they cited its inventor Leonard Baum ── who had worked for IDA [Institute for Defense Analyses] and then later for Simons.23
p.299
But although the idea of “statistical machine translation” seemed natural to the code breakers, it was greeted with outrage by traditional programmers. A reviewer of the Brown-Mercer paper scolded the “the crude force of computers is not science”,
p.300
and when the paper was presented at a meeting of translation experts, a listener recalled, “We were all flabbergasted .... People where shaking their heads and spurting grunts of disbelief or even of hostility.”
“Where's the linguistic intuition?” the audience wanted to know ── to which the answer seemed to be, “Yes that's the point; there isn't any”.
Fred Jelinek, the IBM manager who oversaw Brown and Mercer, poured salt into the wounds. “Every time I fire a linguist, my system's performance improves”, he told the naysayers.24
p.300
By the time Brown and Mercer joined Renaissance in 1993, the skeptics were capitulating. Once the IBM team's program had figured out the sample passages from the Canadian Hansard, it could translate other material too: If you presented it with an article in a French newspaper, it would zip through its database of parliamentary speeches, matching the article's phrases with the decoded material. The results outclassed competing translation systems by a wide margin, and within a few years the advent of statistical machine translation was celebrated among computer scientists as something of an intellectual revolution.25
p.300
Canadian political rhetoric had proved more useful than suspected hitherto. And Brown and Mercer had reminded the world of a lesson about artificial intelligence.
The lesson concerned the difference between human beings and computers.
p.300
The early translation programs had tried to teach computers vocabulary and grammar because that's how people learn things.
p.300
But a computers are better suited to a different approach: They can learn to translate between English and French without paying much attention to the rules of either language. Computer don't need to understand verb declensions or adjectival inflections before they approach a pile of political speeches; they prefer to get the speeches first, then penetrate their code by combing through them algorithmically.
p.300
Likewise, computers have no trouble committing millions of sentences to memory; they can learn languages in chunks, without the crutch of grammatical rules that human students use to prompt their memories.
pp.300-301
For example, a computer can remember the English translations for phrases as “la fille est intelligente, les filles sont intelligentes”, and a dozen other variations besides; they do not necessarily need to understand that “fille” is the singular form of “filles”, that “est” and “sont” are different forms of the verb “être”, and so on.26
p.301
Contrary to the harrumphing of the IBM team's critics, the crude force of a computer's memory can actually substitute for human notions of intelligence and science. And computers are likely to work best when they don't attempt to reach results in the way that humans would do.
p.301
Brown and Mercer fed the data into the computer first and let it come up with the answers.
“”─“”‘’•“”
p.453
21. See, for example, Peter F. Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, and Robert L. Mercer, “The mathematics of statistical machine translation: parameter estimation”, computation linguistics 19, no. 2 (1993). As noted below, the Della Pietra brothers followed Brown and Mercer from IBM to Renaissance Technologies.
p.454
22. As far back as 1949, code breakers had wondered about the application of their technique to translation. But they lacked computing power; statistical translation depended on feeding a vast number of pairs of sentences into a computer, so that the computer had enough data from which to extract meaningful patterns. But by around 1990, statistical translation was possible on a well-equipped workstation.
23.
24. An account of the reaction to the Brown-Mercer work is given in Andy Way “A critique of statistical machine translation”. In W. Daelemans and V. Hoste (eds.), Journal of translation and interpreting studies: special issue on evaluation of translation technology, Linguistica antverpiensia, 2009, pp.17-41.
25. See, for example, Pius Ten Hacken, “Has there been a revolution in machine translation?” Machine Translation 16, no. 1 (March 2001): pp. 1-19.
26. The initial version of the IBM program included no linguistic rules at all. Later versions did use some, but they played a far smaller role than in the traditional translation programs.
p.454
29.
explicitly presented their experience with statistical machine translation as relevant to finding order in other types of data, including financial data. See Adam L. Berger, Stephen A. Della Pietra, and Vincent J. Della Pietra, “A maximum entropy approach to natural language processing”, computational linguistics 22, no. 1 (March 1996): pp.39-71.
(More money than god : hedge funds and the making of a new elite / Sebastian Mallaby., 1. hedge funds., 2. investment advisors., HG4530.M249 2010, 332.64'524──dc22, 2010, )
____________________________________
____________________________________
• "Data Science", which is the automatic (or semi-automatic) extraction of knowledge from data.;── Yann LeCun (self.MachineLearning).
• the goal of extracting information from data.;── Jure Leskovec, Anand Rajaraman, Jeffrey D. Ullman.
• ... and thus discover something about data that will be seen in the future.;── Jure Leskovec, Anand Rajaraman, Jeffrey D. Ullman.
• All algorithms for analysis of data are designed to produce a useful summary of the data, from which decisions are made.;── Jure Leskovec, Anand Rajaraman, Jeffrey D. Ullman, Mining of massive datasets, 2010.'; http://infolab.stanford.edu/~ullman/mmds/book.pdf
http://infolab.stanford.edu/~ullman/mmds/book.pdf
(( last checked: Thur May 13, 2021 [up] ))
____________________________________
A Artificial intelligence - machine learning - unsupervised machine learning
B Big data - data science
C Cloud - cloud computing - warehouse computing - data center
- Amazon Cloud, same as Amazon AWS (Amazon Web Services))
- Microsoft Azure, and others
- software as a service
- APIs as a service
ABC
A AI - artificial intelligence (Machine learning,
Data mining, Deep Learning)
this neural network machine deep learning
https://en.wikipedia.org/wiki/Deep_learning
not this educational meaning of deep learning
https://en.wikipedia.org/wiki/Deeper_learning
B Big data (Data science)
https://en.wikipedia.org/wiki/Big_data
https://en.wikipedia.org/wiki/Data_science
http://phys.org/news/2015-10-human-intuition-algorithms-outperforms-teams.html
C Cloud computing - Amazon Web Services (AWS),
Microsoft Azure,
IBM cloud,
Google Cloud Platform (GCP),
____________________________________
•─ aerospace, communications, and electronics (ACE) sectors,
Malcolm Harris, Palo alto : a history of california, capitalism, and the world
by Malcolm Harris, 2023
klystron, 99, 189─92, 194, 223, 247, 253─54, 255
p.223
aerospace, communications, and electronics (ACE) sectors,
specific exceptional competencies in growing ACE subfields that made Stanford an irresistible lure for federal and private research funds.
Varian klystron (which remained a source of passive income for the university),
to found the (not “a” or “Stanford”, “the”) Microwave Lab.
That meant the government paid for new and expensive building-size research machines, including particle accelerators, nuclear reactors, and computers.6
6. Audra J. Wolfe, Competing with the soviets: science, technology, and the state in cold war america, 2013, 42.
(Palo alto : a history of california, capitalism, and the world
by Malcolm Harris, 2023)
____________________________________
Barry Boehm oral history
Computer history museum
Oral history of Barry Boehm, part 2 of 2
interviewed by:
David C. Brock
Lee Osterweil
recorded February 20, 2018
TRW
cocomo [constructive cost model] model to estimate whether we would be able to improve productivity.
biggest thing outside of defense is auto parts,
used the COCOMO model to say, "If your tools are better than this, if you educated your peoples in these technologies, you ought to be able to double productivity in 10 years."
automated test case generation and things like that.
re-basedlined everybody
Looking at things, we found that of most of the time people were producing documents and filling out forms rather than writing computer programs.
So, we make sure that the secretaries would get on this.
DARPA
software engineering
project called Arcadia
Lee [Osterweil]
Dick [Richard] Taylor
a carpenter has a whole toolkit and which tool to use at what point in order to build a house. And a software engineer has all these tools lying around all over the place, and they all do something different, and you, you know, you ought to have a nice box for them. The tools all belong to the right place and the people know what tool to use at what time and in what way. We called it an environment.
the holder, it was the framework for integrating the tools.
design tools, requirements tools, code tools, test tools
IBM, Toronto
Univac, Minneapolis
Boeing, a strategic partnership with Digital Equipment, DEC.
prioritizing
high priority things are the things that you want to test first, and the things you want to inspect first.
CMM [capability maturity model] with Watts Humphrey.
configuration management
requirement management
test management
verification and validation
TRW
creating a new satellite system.
propulsion people
structures people
guidance and control people
communcations people
architectures
defines what the system was going to be like.
after a while the software was really driving the systems
Winston Royce
1970
wrote a definitive paper
it did say that you really want to do some building it twice, so that you know roughly the directions you want to go.
statistical decision theory
prototyping is a form of risk reduction,
Rather than doing a sequence of specifications, you want to be doing a combination of specifications and prototypes.
[International Software] process workshops
first one was in England
we hosted the second one in California
Watts Humphrey.
about determining predictability.
He was not particularly interested in how you build software, just whether your projections about cost and budget, scheduling and budget, could be trusted or not.
motivation for the CMM, was just so that the Defense Department knew which of those lying contrators could be believe, and which ones could not be believed.
there are combinations of agile things that you want to do, and plan-driven things that you want to do.
Rich [Richard] Turner
book, Balancing Agility and Discipline.
there are things where lockstep discipline is not a very good to do, but there are places where coordinating what you're doing is a good thing to do.
Oral History of Barry Boehm, part 2
CHM Ref: X8388.2018 © 2017 Computer History Museum Page 15 of 27
At one point at TRW, I was on a panel that was saying, “What were the causes of so many missiles getting launched from Vandenberg and then blowing up because of the software?” In most cases, it was because people are responding to change over following a plan and saying, "We've got this fix that, we've got to do, or we've got this telemetry station that's moved and we've got to put a patch in the software. There's not enough time to do the regression testing and the configuration management and following the plan." And so, launched the rocket and boom, there it goes. Responding to change over following a plan may not be good in some situations.
in fact for things that really matter, if your whole bank is going to rest on this or your entire business is going to stand or fall depending on whether this thing works correctly or not, people don't tend to use agile methods.
So I think when things are important people really do fall back on plans and they want to know what they have, and they want to be sure they can trust it.
1958
Hubert Dreyfus, who wrote the book, What computers can't do,
and showed all of the failed prediction that say,
"In 1958, in 10 years, the computer will be the world's chess champion." Well, they got there but not in ten years.
I was getting sort of a balance of skepticism and enthusiasm.
Minuteman command and control system
Montana, North Dakota, and various places.
"Your job as a manager is to manage expectations. Never let people's expectations get out of the box, because if they get out of the box you can never win. If you do those incredible things people will say, 'WEll sure', but most likely you're not going to be able to do those things and people are going to get mad."
so the AI people have made the mistake of over-promising.
in 1955, "pretty soon a computer's going to be the world's best chess player. Computers are going to automatically translate any language into any other language faster than anybody can even think the words."
My own personal belief is that just as every AI boom is bigger than the previous one, every AI bust will be bigger than the previous one, too.
One of my program managers was a Air Force major when I got there, and he got promoted to lieutenant colonel, but he introduced himself as saying, "I am the major cross that you're going to have to bear."
Boehm: He's now the number two guy at Georgia Tech, and he was the director of the Software Engineering Institute and had a really outstanding career. He got some CMU [Carnegie Mellon University]
and MIT people to come up with an AI constraint-based planning approach to solve transportation problems. This was in 1990 and '91, and they came up with a system that could do in four hours using
constraint-based planning what it was taking the clunky transportation command software four days to do. Just about in 1991, we needed to get a half a million people off to the Middle East to fight the first Iraq war. And the transportation command said, “We're confiscating your Sun computers because we need your system to plan all of these things.”
Brock: Wow.
Boehm: They replaced them eventually. <laughs> But fundamentally, this was a key to getting all that stuff there really fast, and a triumph for AI. Steve Cross got the Golden Nugget Award from the
commander of the Air Force and went on from there. So, yes, there were enough examples like that that you can make a case that AI was something that was really going to help.
a lot of organizations didn't want money added to their budget that
they didn't control.
"I get my money from the Chief of Naval Operations and I follow his priorities, and our big priority right now is corrosion. Our boats are getting corroded and we need more research in corrosion technology. And software, I can't really accept your software money. If I get more money I'm going to use it for corrosion."
Dean Leffingwell
Chalmers university in Sweden, Jan Bosch,
T-shape people
Software Management and Economics course, say, “You are the CTO [chief technical officer] of a 500 person software company and your chief executive officer is a concerned about AI, or DevOps, or Artificial Intelligence of various kinds and the like. What you need to do is to give him an analysis of how mature are these, and what are their strengths and what are their weaknesses, and what would we have to do to address these. And that you're going to get graded on how incisive your analysis is, plus the number of different ways that you learn about things. So you can't just go to Google and stop there. You should try to interview some people that are in companies, or are developing this kind of research and things like that. You should look at the proceedings of conferences and the ACM [Association for Computing Machinery]/IEEE [Institute of Electrical and Electronics Engineers] literature, and so the more sources that you do the better your grade is going to be.”
International Conference on Software Engineering (ICSE)
source:
Computer history museum
Oral history of Barry Boehm, part 2 of 2
interviewed by:
David C. Brock
Lee Osterweil
recorded February 20, 2018
____________________________________
Charles Duhigg., The optimists : the full story of microsoft's relationship with OpenAI., The new yorker, Dec. 11
p.33
One day in 2019, an OpenAI vice-president named Dario Amodei demonstrated something remarkable to his peers: he inputted part of a software program into GPT and asked the system to finish coding it. It did so almost immediately (using techniques that Amodei hadn't planned to employ himself). Nobody could say exactly how the A.I. had pulled this off ── a large language model is basically a black box. GPT has relatively few lines of actual code; its answers are based, word by word, on billions of mathematical “weights” that determine what should be outputted next, according to complex probabilities. It's impossible to map out all the connections that the model makes while answering users' questions.
For some within OpenAI, GPT's mystifying ability to code was frightening ── after all, this was the setup of dystopian movie such as “The Terminator”. It was almost heartening when employees noticed that GPT, for all its prowess, sometimes made coding gaffes. Scott and Murati felt some anxiety upon learning about GPT's programming capabilities, but mainly they were thrilled. They'd been looking for a practical application of A.I. that people might actually pay to use ── if, that is, they could find someone within Microsoft willing to sell it.
Five years ago, Microsoft acquired GitHub ── a Web site where users shared code and collaborated on software ── for much the same reason that it invested in OpenAI. GitHub's culture was young and fast-moving, unbound by tradition and orthodoxy. After it was purchased, it was made an independent division within Microsoft, with its own C.E.O. and decision-making authority, in the hope that its startup energy would not be diluted. The strategy proved successful. GitHub remained quirky and beloved by software engineers, and its number of users grew to more than a hundred million.
So Scott and Murati, looking for a Microsoft division that might be excited by a tool capable of autocompleting code ── even if it occasionally got things wrong ── turned to GitHub's C.E.O. Nat Friedman. After all, code posted on GitHub sometimes contained errors; users had learned to work around imperfection. Friedman said that he wanted the tool. GitHub, he noted, just had to figure out a way to signal to people that they couldn't trust the autocompleter completely.
GitHub employees brainstormed names for the product: Coding autopilot, Automated pair programmer, programarama automat. Friednam was an amateur pilot, and he and others felt these names wrongly implied that the tool would do all the work. The tool was more like a co-pilot ── someone who joins you in the cockpit and makes suggestions, while occasionally proposing something off base. Usually you listen to a co-pilot; sometimes you ignore him. When Scott heard Friedman's favored choice for a name ── GitHub copilot ── he loved it. “It perfectly conveys its strengths and weaknesses.”
But when GitHub prepared to launch its Copilot, in 2021, some executives in other Microsoft divisions protested that, because the tool occasionally produced errors, it would damage Microsoft's reputation. “It was a huge fight”, Friednam told me. “But I was the C.E.O. of GitHub, and I knew this was a great product, so I overrode everyone and shipped it.” When GitHub copilot was released, it was an immediate success. “Copilot LITTERALLY BLEW MY MIND”, one user tweeted hours after it was released. “IT'S WITCHCRAFT!!!” another posted. Microsoft began charging ten dollars per month for the app; within a year, annual revenue had topped a hundred million dollars. The division's independence had paid off.
(The new yorker, Dec. 11, 2023, The optimists : the full story of microsoft's relationship with OpenAI., By Charles Duhigg., By Stephen Witt., p.33, )
____________________________________
• data science competitions
• "Data Science", which is the automatic (or semi-automatic) extraction of knowledge from data.;── Yann LeCun (self.MachineLearning).
• the goal of extracting information from data.;── Jure Leskovec, Anand Rajaraman, Jeffrey D. Ullman.
• ... and thus discover something about data that will be seen in the future.;── Jure Leskovec, Anand Rajaraman, Jeffrey D. Ullman.
• All algorithms for analysis of data are designed to produce a useful summary of the data, from which decisions are made.;── Jure Leskovec, Anand Rajaraman, Jeffrey D. Ullman, Mining of massive datasets, 2010.'; http://infolab.stanford.edu/~ullman/mmds/book.pdf
• to find predictive patterns in unfamiliar data sets
• a system that not only searches for patterns but designs the feature set [that the pattern is composed of], too.
• But where the teams of humans typically labored over their prediction algorithms for months, the Data Science Machine took somewhere between two and 12 hours to produce each of its entries - predictive patterns in unfamiliar data sets.
• "We view the Data Science Machine as a natural complement to human intelligence," says Max Kanter, whose MIT master's thesis in computer science is the basis of the Data Science Machine.
• feature engineering - identify what variables to extract from the database or compose
• MIT's online-learning platform (MITx) doesn't record either of those statistics, but it does collect data from which [the two crucial indicators] can be inferred.
• data marker
...even if a specific data marker is not included in the data set, it may be included by proxy in a combination of other, relevant data
• Once [the MIT's "Data Science Machine"] produced an array of candidates, ["Data Science Machine" algorithms] reduces their number by identifying those whose values seem to be correlated. Then [the algorithms] starts testing its reduced set of features on sample data, recombining them in different ways to optimize the accuracy of the predictions [the reduced set of features] yield.
• "The Data Science Machine is one of those unbelievable projects where applying cutting-edge research to solve practical problems opens an entirely new way of looking at the problem," says Margo Seltzer, a professor of computer science at Harvard University who was not involved in the work. "I think what they've done is going to become the standard quickly—very quickly."
• October 16, 2015 by Larry Hardesty
• System that replaces human intuition with algorithms outperforms human teams
• http://phys.org/news/2015-10-human-intuition-algorithms-outperforms-teams.html
____________________________________
• October 16, 2015 by Larry Hardesty
• System that replaces human intuition with algorithms outperforms human teams
• http://phys.org/news/2015-10-human-intuition-algorithms-outperforms-teams.html
•
•
MIT researchers aim to take the human element out of big-data analysis, with a new system that not only searches for patterns but designs the feature set, too. To test the first prototype of their system, they enrolled it in three data science competitions, in which it competed against human teams to find predictive patterns in unfamiliar data sets. Of the 906 teams participating in the three competitions, the researchers' "Data Science Machine" finished ahead of 615
In two of the three competitions, the predictions made by the Data Science Machine were 94 percent and 96 percent as accurate as the winning submissions. In the third, the figure was a more modest 87 percent. But where the teams of humans typically labored over their prediction algorithms for months, the Data Science Machine took somewhere between two and 12 hours to produce each of its entries.
"We view the Data Science Machine as a natural complement to human intelligence," says Max Kanter, whose MIT master's thesis in computer science is the basis of the Data Science Machine. "There's so much data out there to be analyzed. And right now it's just sitting there not doing anything. So maybe we can come up with a solution that will at least get us started on it, at least get us moving."
Between the lines
Kanter and his thesis advisor, Kalyan Veeramachaneni, a research scientist at MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL), describe the Data Science Machine in a paper ...
Veeramachaneni co-leads the Anyscale Learning for All group at CSAIL, which applies machine-learning techniques to practical problems in big-data analysis, such as determining the power-generation capacity of wind-farm sites or predicting which students are at risk for dropping out of online courses.
"What we observed from our experience solving a number of data science problems for industry is that one of the very critical steps is called feature engineering," Veeramachaneni says. "The first thing you have to do is identify what variables to extract from the database or compose, and for that, you have to come up with a lot of ideas."
In predicting dropout, for instance, two crucial indicators proved to be how long before a deadline a student begins working on a problem set and how much time the student spends on the course website relative to his or her classmates. MIT's online-learning platform MITx doesn't record either of those statistics, but it does collect data from which they can be inferred.
Featured composition
Kanter and Veeramachaneni use a couple of tricks to manufacture candidate features for data analyses. One is to exploit structural relationships inherent in database design. Databases typically store different types of data in different tables, indicating the correlations between them using numerical identifiers. The Data Science Machine tracks these correlations, using them as a cue to feature construction.
For instance, one table might list retail items and their costs; another might list items included in individual customers' purchases. The Data Science Machine would begin by importing costs from the first table into the second. Then, taking its cue from the association of several different items in the second table with the same purchase number, it would execute a suite of operations to generate candidate features: total cost per order, average cost per order, minimum cost per order, and so on. As numerical identifiers proliferate across tables, the Data Science Machine layers operations on top of each other, finding minima of averages, averages of sums, and so on.
It also looks for so-called categorical data, which appear to be restricted to a limited range of values, such as days of the week or brand names. It then generates further feature candidates by dividing up existing features across categories.
Once it's produced an array of candidates, it reduces their number by identifying those whose values seem to be correlated. Then it starts testing its reduced set of features on sample data, recombining them in different ways to optimize the accuracy of the predictions they yield.
"The Data Science Machine is one of those unbelievable projects where applying cutting-edge research to solve practical problems opens an entirely new way of looking at the problem," says Margo Seltzer, a professor of computer science at Harvard University who was not involved in the work. "I think what they've done is going to become the standard quickly—very quickly."
•
•
• http://phys.org/news/2015-02-tackles-biggest-bottlenecks-science-industry.html
• Researcher tackles some of the biggest bottlenecks holding back the data science industry
• February 25, 2015 by Eric Brown
•
____________________________________
“... the arrival of AI will not be any more or any less disruptive than the arrival of indoor plumbing, vaccines, the car, air travel, the television, the computer, the internet, etc.“.;── Yann LeCun (self.MachineLearning), http://www.reddit.com/r/MachineLearning/comments/25lnbt/ama_yann_lecun
data science/ DS/ machine learning/ ML/ unsupervised feature learning/
unsupervised learning/ computer science/ CS/ science fiction/ SF/ sci-fi/
fantasy/ fa/ fiction/ fi/ Finland/ fi/ reinforcement learning/ RL/
deep learning/ DL/ artificial intelligence/ AI/ expert systems/
representation learning/ RL/
AMA: Yann LeCun (self.MachineLearning)
http://www.reddit.com/r/MachineLearning/comments/25lnbt/ama_yann_lecun
reinforcement learning uses Q-learning (a very classical algorithm for RL)
convolutional network (a now very classical method for image recognition)
The DeepMind video-game player that trains itself with reinforcement learning uses Q-learning (a very classical algorithm for RL) on top of a convolutional network (a now very classical method for image recognition). One of the authors is Koray Kavukcuoglu who is a former student of mine.
<----------------------------------------------------------------->
http://arxiv.org/abs/1312.5602
Playing Atari with Deep Reinforcement Learning
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, Martin Riedmiller
(Submitted on 19 Dec 2013)
We present the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning. The model is a convolutional neural network, trained with a variant of Q-learning, whose input is raw pixels and whose output is a value function estimating future rewards. We apply our method to seven Atari 2600 games from the Arcade Learning Environment, with no adjustment of the architecture or learning algorithm. We find that it outperforms all previous approaches on six of the games and surpasses a human expert on three of them.
Comments: NIPS Deep Learning Workshop 2013
Subjects: Learning (cs.LG)
in December, DeepMind published a paper showing that its software could do that by learning how to play seven Atari2600 games using as inputs only the information visible on a video screen, such as the score. For three of the games, the classics Breakout, Enduro, and Pong, the computer ended up playing better than an expert human. It performed less well on Q*bert and Space Invaders, games where the best strategy is less obvious.
<----------------------------------------------------------------->
to make sure people like Vladimir could work on their research with minimal friction and distraction.
Deep learning has become the dominant method for acoustic modeling in speech recognition, and is quickly becoming the dominant method for several vision tasks such as object recognition, object detection, and semantic segmentation.
The next frontier for deep learning are language understanding, video, and control/planning (e.g. for robotics or dialog systems).
I believe there is a role to play for specialized hardware for embedded applications. Once every self-driving car or maintenance robot comes with an embedded perception system, it will make sense to build FPGAs, ASICs or have hardware support for running convolutional nets or other models.
"Data Science", which is the automatic (or semi-automatic) extraction of knowledge from data.
Otherwise, the order in which we learn things would not matter. Obviously, the order in which we learn things does matter (that's why pedagogy exists). The famous developmental psychologist Jean Piaget established that children learn simple concepts before learning more complex/abstract ones on top of them.
There are four main uses for unsupervised learning: (1) learning features (or representations); (2) visualization/exploration; (3) compression; (4) synthesis. Only (1) is interesting to me (the other uses are interesting too, just not on my own radar screen).
Theses are folks who have long been interested in representing data (mostly natural signals like audio and images). These are people who have worked on wavelet transforms, sparse coding and sparse modeling, compressive sensing, manifold learning, numerical optimization, scientific computing, large-scale linear algebra, fast transform (FFT, Fast Multipole methods). This community has a lot to say about how to represent data in high-dimensional spaces.
The questions become: how well does my method work on this particular problem, and how large is the set of problems on which it works well.
It's important to keep mind that the arrival of AI will not be any more or any less disruptive than the arrival of indoor plumbing, vaccines, the car, air travel, the television, the computer, the internet, etc.
establishing causal relationships is a hugely important problem in data science. There are huge applications in healthcare, social policy....
http://code.madbits.com/wiki/doku.php?id=tutorial_basics
For a long time, speech recognition has stagnated because of the dictatorship of results on benchmarks. The barrier to entry was very high, and it was very difficult to get state-of-the-art performance with brand new methods.
There has to be a process by which innovative ideas can be allowed to germinate and develop, and not be shut down before they get a chance to produce good results.
It's very useful for time series prediction. Alex Graves (from Deep Mind) has quite a few nice papers on applying neural networks to time series though most of his work is focused on classification rather than forecasting.
Learning with temporal/sequential signals: language, video, speech.
Marrying deep/representation learning with reasoning or structured prediction.
In the early days of aviation, some people (like Clément Ader) tried to copy birds and bats a little too closely (without understanding the principles of lift, drag, and stability) while others (like the Wright Brothers and Santos-Dumont) had a more systematic engineering approach (building a wind tunnel, testing airfoils, building full-scale gliders....). Both were somewhat inspired by nature, but to different degrees. My problem with sticking too close to nature is that it's like "cargo-cult" science. A bird biologist will tell you how important the micro-structure of feathers is to bird flight. You will think that you need to reproduce feathers in their most minute details to build flying machines. In reality, flight relies on the Bernoulli principle: pushing an angled plate (preferably shaped like an airfoil) through air creates lift. I don't use neural nets because they look like the brain. I use them because they are a convenient way to construct parameterized non-linear functions with good properties. But I did get inspiration from the architecture of the visual cortex to build convolutional nets.
____________________________________
Joshua Cooper Ramo (author), The seventh sense (book), 2016
pp.276-80
Pattie Maes
p.276
When I first met her, in the 1990s, she was in charge of much of the work on artificial intelligence (AI) at MIT's Media Lab, Danny Hillis's old home.
p.276
he introduced me to a puzzle of her field that has stayed on my mind in the year since. It is called the disappearing AI problem.
p.276
Back in the 1990s, ..., Maes and her team were tinkering with what was known as computer-aided prediction.
pp.276-277
Maes intended to design a computer that could ask, for instance, what movie stars you like. “Robert Redford”, you'd type. And then the machine would spit back some films you might enjoy. The Paul Newman classic Cool Hand Luke, for instance.
p.277
And, well, you had liked that film. This seemed magic, just the sort of data-meets-human question that showcased a machin learning and thinking. An honestly artificial intelligence. Maes hoped to design a computer that could predict what moview or music or books you or I might enjoy. (And, of course, buy.)
p.277
A recommendation engine.
p.277
But to confidently bridge your knowledge of a friend's taste and the nearly endless library of moview and songs and books? Beyond human capacity. It seemed an ideal job for a thoughtful machine.
The traditional approach to such a problem was to devise a formula that would mimic your friend. What are his hobbies? What areas interest him? What cheers him up? Then you'd program a machine to jump just as deep into movies and music and books, to break them down by plot and type of character to see what might fit your friend's interests.
p.277
But after years building programs that tried ── and failed ── to tackle the recommendation problem in this fashion, the MIT group changed tack.
p.277
Instead of teaching a machine to understand you (or Tolstoy), they simply began compiling data about what movies and music and books people liked. Then they looked for patterns. People were not, they discoverd, all that unique.
p.277
Pretty much everyone who liked Redford in Downhill Racer loved Newman in The Hustler. Anyone who enjoyed Radiohead's Kid A could be directed safely to Sigur Rós's Ágaetis Byrjun.
pp.277-278
Maes and her team found themselves, as a result, less focused on the mechanics of making a machine think than on devising formulas to organize, store, and probe data.
p.278
What had begun as a problem of artificial intelligence became, in the end, a puzzle of mathematics.
p.278
The mystery of human thought, that great, unknowable sea of chemicals and instinct and experience that would have let you place your finger on just the song to open the heart of your date, had been unlocked by data. Here was the disappearing AI problem. A puzzle that looked like it needed computer intelligence demanded, in the end, merely math. The AI had disappeared.
p.278
Many problems that once seemed to demand the miracle of thought really only needed data.
Joshua Cooper Ramo, The seventh sense: power, fortune, and survival in the age of network, 2016.
____________________________________
── Now, with the Aardvark data: Romney insisted the revisions were the result not of systemic errors but of getting more data. He had been relying on historical data of large Soviet nuclear tests and extrapolating down to make estimates about the detection of smaller tests, which might be confused with earthquakes. “The change came about as a result of additional information we got”, Romney insisted., Sharon Weinberger, The imagineers of war : the untold history of DARPA, the pentagon agency that changed the world, 2017, [p.390]
Sharon Weinberger, The imagineers of war : the untold history of DARPA, the pentagon agency that changed the world, 2017
p.102
He had been arguing that it would be difficult to distinguish small underground nuclear tests from earthquakes, which would make verifying a nuclear test ban treaty difficult, if not impossible.
Now, with the Aardvark data, he knew he had been wrong on a key point.
During a July 3, 1962, meeting, Romney announced that the new seismic data let him to conclude that distinguishing between tremors and small nuclear tests might not be as difficult as he had previously thought.
102 Now, with the Aardvark data: Romney insisted the revisions were the result not of systemic errors but of getting more data. He had been relying on historical data of large Soviet nuclear tests and extrapolating down to make estimates about the detection of smaller tests, which might be confused with earthquakes. “The change came about as a result of additional information we got”, Romney insisted. Romney, interview with the author. [p.390]
(The imagineers of war : the untold story of DARPA, the Pentagon agency that changed the world / by Sharon Weinberger., New York : Alfred A. Knopf, 2017, united states. defense advanced research projects agency──history. | military research──united states. | military art and science──technological innovations──united states. | science and state──united states. | national security──united states──history. | united states──defenses──history., U394.A75 W45 2016 (print) | U394.A75 (ebook) | 355/.040973, 2017, )
Sharon Weinberger, The imagineers of war : the untold history of DARPA, the pentagon agency that changed the world, 2017
pp.99-104
p.99
ARPA was assigned nuclear test detection under the code name Vela at the end of 1959 as a counterweight to the CIA's and the air force's secret test detection network. ARPA got the work, quite simply, because President Eisenhower did not trust his spooks and wanted an assessment that was independent of the CIA and its assets.
p.99
brought renewed focus and funding to the Vela test detection program.
By 1961, Vela had three parts:
Vela Uniform, to detect underground nuclear tests;
Vela Sierra, to detect nuclear explosions in the atmosphere; and
Vela Hotel, which would launch satellites with sensors to detect nuclear tests from space.
99 Vela had three parts: The two most significant parts of Vela ended up being Vela Hotel and Vela Uniform. Vela Sierra, which involved ground-based sensors to detect nuclear tests in space, was eventually folded into Vela Hotel. Some of the Vela work, it turns out, did not really require any exotic science. For example, detecting underwater explosions required little new research. ARPA conducted some underwater tests using conventional explosives under the code name CHASE, short for “cut holes and sink 'em”. Huff and Sharp, Advanced Research Projects Agency, VII-15. “The ocean detection system was a nonproblem”, Frosch said. Frosch, interview with author. [p.390]
p.99
The academic discipline of seismology, at the time, was a backwater. Robert Frosch, who was recruited to ARPA to run Vela, recalled going with the director, Robert Sproull, to visit what was supposed to be a start-of-the-art seismic vault, one of the underground bunker-like structures that were used to measure tremors. The two men came out of the vault in shock, feeling as if they had just emerged from a time capsule. The seimologists there were using pen recorders and primitive galvanometers, an analog instrument used to measure electrical current.
p.99
Vela began to change that with an influx of funding for seismology that was almost unimaginable in scale for most areas of science. The military's need to distinguish earthquakes from nuclear tests brought seismology “kicking and screaming” into the 20th century, according to Frosch. At one point, he said, he funded almost “every seismologist in the world, except for two Jesuits at Fordham university” who refused to take money from Pentagon.
p.100
Large Aperture Seismic Array, or LASA,
a massive nuclear detection system that comprised 200 “seismic vaults” buried across a 200-kilometer-diameter area in the eastern half of Montana. For it to work, more than a dozen of these enormous sites would have to be constructed around the world to monitor the Soviet Union.
There had been smaller arrays, including one in the United Kingdom,
The air force hated the idea,
p.100
Bilings, Montana
What was amazing about LASA, according to Frosch, was the scale of the work, which was completed in just 18 months, a schedule unimaginable for government projects that typically take years, if not decades.
When ARPA needed to have a center where all the seismic data could be collected and analyzed, the agency ended up renting space in downtown Billings, where data from the array was routed to an IBM computer.
p.100
ARPA also began funding the placement of seismograph stations around the world that were operated by scientists.
pp.100-101
the CIA and the air force, who up to that point had a monopoly on advice to political leaders about what was theoretically possible to monitor a [nuclear explosion] test ban.
p.101
local scientists only needed to agree to operate them and share the data.
p.101
a growing tension between secret and open research
p.102
air force and the CIA refused to release data from their network of sensors.
bête noire - Fr. Anything that is an object of hate or dread; a bugaboo. [< F, black beast]
p.102
The bête noire of the nuclear detection would was Carl Romney, a scientist who worked for the Air Force Technical Application Center, or AFTAC, the agency responsible for nuclear test detection.
p.102
Whether deliberate or not, the problem with secret data, as Ruina pointed out, was that “nobody could argue with it; they could just question it.” The secret data problem came to a head in 1962, when the United States carried out a test called Aardvark, a part of the first series of tests conducted completely underground.
p.102
Aardvark, a 40-kiloton nuclear device intended for nuclear artillery, produced reliable seismographic data on a nuclear underground explosion, and Romney suddenly realized he had been wrong about a critical national security issue.
p.102
He had been arguing that it would be difficult to distinguish small underground nuclear tests from earthquakes, which would make verifying a nuclear test ban treaty difficult, if not impossible.
Now, with the Aardvark data, he knew he had been wrong on a key point.
During a July 3, 1962, meeting, Romney announced that the new seismic data let him to conclude that distinguishing between tremors and small nuclear tests might not be as difficult as he had previously thought.
102 Now, with the Aardvark data: Romney insisted the revisions were the result not of systemic errors but of getting more data. He had been relying on historical data of large Soviet nuclear tests and extrapolating down to make estimates about the detection of smaller tests, which might be confused with earthquakes. “The change came about as a result of additional information we got”, Romney insisted. Romney, interview with the author. [p.390]
p.102
it would look as if the government were “withholding information that would tend to ease the inspection problem in a nuclear test ban.”
pp.102-103
Ruina called it an “honest mistake”, but one that would have been avoided if other scientists had been given access to the classified data that Romney jealously guarded. “This is what can happen when you have one person interpreting data, there's no peer group reviewing it, and there's nobody duplicating the experiment”, the ARPA director wrote in a three-page letter, blaming the mistake on secrecy.
p.103
Glenn Seaborg, chairman of the Atomic Energy Commission
played a key role in test ban negotiations.
“VELA seemed to indicate that the detection capability was better than had been thought by American experts in the period from 1959 to 1961”, Seaborg wrote in his memoir detailing the negotiations.
(The imagineers of war : the untold story of DARPA, the Pentagon agency that changed the world / by Sharon Weinberger., New York : Alfred A. Knopf, 2017, united states. defense advanced research projects agency──history. | military research──united states. | military art and science──technological innovations──united states. | science and state──united states. | national security──united states──history. | united states──defenses──history., U394.A75 W45 2016 (print) | U394.A75 (ebook) | 355/.040973, 2017, )
____________________________________
Joshua Cooper Ramo (author), The seventh sense (book), 2016
p.279
You and I might be able to spot patterns in movie habits, given enough time, but as more complex problems emerge, as a world of a trillion connected points becomes a sea of data to examine, there is no chance we'll match the machines.
pp.282-283
• predictive learning (AI systems design) and
• representation learning (AI systems design)
The AI systems designer Roger Grosse has named two paths to this sort of wired sensibility: predictive learning and representation learning. That first approach is what Mae's movie machine pursued. The computer is simply checking what it encounters against a database. It teaches itself to predict based on what has been seen before. This sort of knowledge begins with massive amounts of data and then hunts for patterns, tests their reliability, and improves by mapping quirks and similarities.
p.283
Google engineers have a device that can gaze into a human eye and spot signs of impending optical failure. Is the machine smarter than your ophthalmologist? Hard to know, but let's just say this: It has seen, studied, and compared millions of eyes to find patterns that nearly perfectly predict a diagnosis. It can review in seconds more cases than your doctor will see in a lifetime ── let alone recall and compare at submillimeter accuracy. Fast, thorough predictive algorithms make what might once have been regarded as AI disappear. The machine isn't all that wise; it just knows a lot.
p.283
On the other path, the one of representation learning, the machine uses a self-sketched image of the world, a “representation”. Say you wanted a computer to identify a restaurants with outdoor seating. A predictive system might be told, Look for pictures in which a third of the pixels are sky colored. You can see how such a primitive approach might be limited. But a representation-based program would use a neural network to examine thousands of photos ── such a collection is called “training data” ── of restaurant patios. It would develop its own sense of what makes these images special: sunlight glinting off glasses, sky reflected in silverware. It would assemble, bit by bit, an accurate feeling for the features of an outdoor dining space. And over time, it could aspire to near-perfect fidelity.
p.284
Faces, disease markers, obscure sounds
p.284
Today, basic versions of representational AI can study a map and name the most important roads. They can predict cracks in computer networks days before a fault. Representation-based programs take longer to train, as you might expect. But these training times are getting shorter. And though representational AIs are harder to program ── and they demand almost unimaginable amounts of computing power ([ and unimaginable amounts of [label?] data to reach the degree of accuracy and reliability to make the program practical ]) ── they product a subtle, lively kind of insight.
p.284
A machine with a prediction-based understanding of classical music can listen to a clip of a symphony and name it. One with a representation-based understanding of, say, Mozart's forty-one symphonies can write you an extremely convincing forty-second symphony ── or, if you wish, an even earlier First Symphony, based on what it knowns about Mozart's evolution as a composer. It can do it again and again. In seconds.
Joshua Cooper Ramo, The seventh sense: power, fortune, and survival in the age of network, 2016.
____________________________________
Albert-László Barabási, BURSTS, 2010 [ ]
[pp.171-172]
19
the patterns of human mobility
... About a year after the publication of my first book on networks I had grown used to e-mails and calls from readers seeking advice on inter-connected systems. This was one of the few times that someone had called not to ask but to give. He had my full attention.
The caller was a high-ranking executive at a mobile-phone consortium who'd recognized the value in having records of who is talking with whom. After reading 'Linked' he had become convinced that social networking was essential to improving services for his consumers. So he offered access to their anonymized data in exchange for any insights our research group might provide.
His intuition proved correct: My group and I soon found the mobile users' behavior patterns to be so deeply affected by the underlying social network that the executive ordered many of his company's business practices redesigned, from marketing to consumer retention. With that, he pioneered a trend that over the past few years has swept most mobile carriers, triggering an avalanche of research into mobile communications. Despite his crucial role in advancing network thinking in the mobile industry, his combination of modesty and caution prevented his ever wanting his name attached to any of it.
As my group and I immersed ourselves in the intricacies of mobile communications, we came to understand that mobile phones not only reveal who our friends are but also capture our whereabouts. Indeed, each time we make a call the carrier records the tower that communicates with our phone, effectively pinpointing our location. This information is not terribly accurate, as we could be anywhere within the tower's reception area, which can span tens of square miles. Furthermore, our location is usually recorded only when we use our phone, providing ... information about our whereabouts between calls.
Despite these contraints, the data offered an exceptional opportunity to explore the mobility of millions of individuals.
(Barabási, Albert-László; 'BURSTS: the hidden pattern behind everything we do', copyright © 2010, 303.4901 Barabási, )
(BURSTS by Albert-László Barabási, © 2010, 303.4901 Barabási, pp.171-172)
[pp.193-195]
... As a result we tend to romanticize college life, the cradle of youth culture, seeing students as perhaps the most spontaneous and thus least predictable segment of the population. Yet Sandy Pentland, an MIT professor who follows the chatter of hundreds of students every day, finds that concept preposterous.
In the early 1990s Pentland started a research program in wearable computing at the Media Lab at MIT, prompted by the the realization that, given the rate at which computers were shrinking, we soon would want to have them with us all the time. Sandy's vision of the future proved remarkably accurate, as today computers have become a part of our wardrobe, fashion accessories of a kind. In fact, for the most part we have stopped even calling them computers. We refer to them simply as smart phones.
In the fall of 2002 Nathan Eagle, a doctoral student in Sandy's lab, offered one hundred MIT students free Nokia smart phones, a desirable top-of-the-line gadget at the time. This was no handout, however; the catch was that the phones collected everything they could about their owners: whom they called and when, how long they chatted, where they were, and who was nearby. By the end of the year-long experiment, Nathan Eagle and Sandy Pentland had collected about 450,000 hours of data on the communication, whereabouts, and behavior of seventy-five Media Lab faculty and students and twenty-five freshmen from MIT's Sloan School of Management.
Trying to make sense of his data, Nathan arranged each student's whereabouts into three groups: home, work,and "elsewhere," the latter category assigned when they were neither at home nor at work but jogging along the Charles River or partying at a friend's house. Then he developed an algorithm to detect repetitive patterns, quickly discovering that on weekdays the students were mainly at home between the hours of ten P.M. and seven A.M. and at the university between ten A.M. and eight P.M. Their behavior changed slightly only during the weekends, when they showed an inclination to stay home at late as ten A.M.
None of these patterns would shock anybody familiar with graduate student life. But the level of predictability of their routines was still remarkable. Nathan found that if he knew a business-school student's morning location he could predict with 90 percent accuracy the student's afternoon whereabouts. And for Media Lab students, the algorithm did even better, predicting their whereabouts 96 percent of the time. ([ we are creatures of habits ])
It is tempting to see life as a crusade against randomness, a yearning for a safe, ordered existence. If so, the students excelled at it, ignoring the roll of the dice day after day. Indeed, Nathan's algorithm failed to predict their whereabouts only twice a week, during rare hours of rebellion when they finally lived up to our expectation that they be wild and spontaneous. Yet the timing of these unpredictable moments was by no means random--they were the typical party times, the Friday and Saturday nights. The rest of the week, twenty-two out of twenty-four hours a day, the students were neither the elusive Osama bin Laden nor the ubiquitously erratic Britney Spears but intead dutifully trod the deeply worn grooves of their lives. So maybe the Harlequins were onto something when they insisted on using an RNG(Random Number Generator). Had they studies at MIT, their whereabouts would have been no mystery--not to Nathan, nor to the Vast Machine.
But we may yet avert the dawn of an Orwellian world as described in 'The Traveler'. For me, this sense of hopefulness emerged in the summer of 2007 when I purchased a brick-sized wristwatch. It was a loud antifashion statement and doubled as a GPS device, which recorded my precise location every few seconds. After I had worn it for several months, Zehui Qu, a visiting computer-science student, applied Nathan Eagle and Sandy Pentland's predictive algorithm to the data collected by my GPS. Sure enough, after a few days of training, Qu was able predict my whereabouts with 80 precent accuracy.
While the algorithm's performance was impressive, the persistent gap between the 96 percent predictability Nathan found amoung the MIT students and my 80 percent raise a red flag. Neither I nor the MIT students were a fair representation of the population at large. Marta's study of the mobile-phone records had already explained why: When it comes to our travel patterns, we are hugely different. Some, like the MIT students and myself, are relatively home- and office-bound. Others are outliers, however, and travel a lot, tending to be less localized.
So does that mean there are people out there who are far less predictable than the MIT students and I? Truck drivers, perhaps, who travel the country for weeks at a time? Soccer moms, whose minivans shuttle between piano and fencing lessons? What about super-traveler Hasan Elahi, whose "suspicious movements" will undoubtedly land him in hot water again? How different are they from you and me? Are there Harlequins among us, individuals whose lives are driven by the roll of the dice to such a degree that their movements are impossible to foresee?
* The difference between human dynamics and data-mining boils down to this: Data mining predicts our behaviors based on records of our patterns of activity; we don't even have to understand the origins of the patterns exploited by the algorithm. Students of human dynamics, on the other hand, seek to develop models and theories to explain why, when, and where we do the things we do with some regularity.
(Barabási, Albert-László; 'BURSTS: the hidden pattern behind everything we do', copyright © 2010, 303.4901 Barabási, )
(BURSTS by Albert-László Barabási, © 2010, 303.4901 Barabási, pp.193-195)
____________________________________
„Machine learning is a mathematical technique for training computer systems to make accurate predictions from a large corpus of training data, with a degree of accuracy that in some domains can mimic human cognition.“
—— Maciej Ceglowski,
May 7, 2019,
US Senate Committee on Banking, Housing, and Urban Affairs
on Privacy Rights and Data Collection in a Digital Economy
<< long read - scroll down to skip this section >>
Maciej Ceglowski's Senate testimony on Privacy Rights and Data Collection in a Digital Economy
May 7, 2019,
Senate Committee on Banking, Housing, and Urban Affairs
Privacy Rights and Data Collection in a Digital Economy (Senate hearing)
privacy
pinboard
regulation
gdpr
long read
https://idlewords.com/talks/senate_testimony.2019.5.htm
Consent in a world of inference
For example, imagine that an algorithm could inspect your online purchasing history and, with high confidence, infer that you suffer from an anxiety disorder. Ordinarily, this kind of sensitive medical information would be protected by HIPAA, but is the inference similarly protected? What if the algorithm is only reasonably certain? What if the algorithm knows that you’re healthy now, but will suffer from such a disorder in the future?
The question is not hypothetical—a 2017 study showed that a machine learning algorithm examining photos posted to the image-sharing site Instagram was able to detect signs of depression before it was diagnosed in the subjects, and outperformed medical doctors on the task.
Addendum: Machine Learning and Privacy
Machine learning is a mathematical technique for training computer systems to make accurate predictions from a large corpus of training data, with a degree of accuracy that in some domains can mimic human cognition.
For example, machine learning algorithms trained on a sufficiently large data set can learn to identify objects in photographs with a high degree of accuracy, transcribe spoken language to text, translate texts between languages, or flag anomalous behavior on a surveillance videotape.
The mathematical techniques underpinning machine learning, like convolutional neural networks (CNN), have been well-known since before the revolution in machine learning that took place beginning in 2012. What enabled the key breakthrough in machine learning was the arrival of truly large collections of data, along with concomitant [accompanies or is collaterally connected with] computing power, allowing these techniques to finally demonstrate their full potential.
It takes data sets of millions or billions of items, along with considerable computing power, to get adequate results from a machine learning algorithms. Before the advent of the surveillance economy, we simply did not realize the power of these techniques when applied at scale.
Because machine learning has a voracious appetite for data and computing power, it contributes both to the centralizing tendency that has consolidated the tech industry, and to the pressure companies face to maximize the collection of user data.
Machine learning models poses some unique problems in privacy regulation because of the way they can obscure the links between the data used to train them and their ultimate behavior.
A key feature of machine learning is that it occurs in separable phases. An initial training phase consists of running a learning algorithm on a large collection of labeled data (a time and computation-intensive process). This model can then be deployed in an exploitation phase, which requires far fewer resources.
Once the training phase is complete, the data used to train the model is no longer required and can conceivably be thrown away.
The two phases of training and exploitation can occur far away from each other both in space and time. The legal status of models trained on personal data under privacy laws like the GDPR, or whether data transfer laws apply to moving a trained model across jurisdictions, is not clear.
Inspecting a trained model reveals nothing about the data that went into it. To a human inspecting it, the model consists of millions and millions of numeric weights that have no obvious meaning, or relationship to human categories of thought. One cannot examine an image recognition model, for example, and point to the numbers that encode ‘apple’.
The training process behaves as a kind of one-way function. It is not possible to run a trained model backwards to reconstruct the input data; nor is it possible to “untrain” a model so that it will forget a specific part of its input.
Machine learning algorithms are best understood as inference engines. They find structure and excel at making inferences from data that can sometimes be surprising even to people familiar with the technology. This ability to see patterns that humans don’t notice has led to interest in using machine learning algorithms in medical diagnosis, evaluating insurance risk, assigning credit scores, stock trading, and other fields that currently rely on expert human analysis.
The opacity of machine learning models, combined with this capacity for inference, also make them an ideal technology for circumventing legal protections on data use. In this spirit, I have previously referred to machine learning as “money laundering for bias”. Whatever latent biases are in the training data, whether or not they are apparent to humans, and whether or not attempts are made to remove them from the data set, will be reflected in the behavior of the model.
A final feature of machine learning is that it is curiously vulnerable to adversarial inputs. For example, an image classifier that correctly identifies a picture of a horse might reclassify the same image as an apple, sailboat or any other object of an attacker’s choosing if they can manipulate even one pixel in the image. Changes in input data not noticeable to a human observer will be sufficient to persuade the model. Recent research suggests that this property is an inherent and ineradicable feature of any machine learning system that uses current approaches.
In brief, machine learning is effective, has an enormous appetite for data, requires large computational resources, makes decisions that resist analysis, excels at finding latent structure in data, obscures the link between source data and outcomes, defies many human intuitions, and is readily fooled by a knowledgeable adversary.
—Maciej Ceglowski, 2019
source:
https://tildes.net/~tech
____________________________________
Copy (cut) and Paste Text #RLA-POST
By Russell L. Ackoff
post industrial revolution
Written by Russell L. Ackoff
pp.24-25
The conversion of the Industrial Revolution into what has come to be called the Post industrial Revolution has it origins in the last century. Scientists who explored the use of electricity as a source of energy found that it could not be observed easily. Therefore, they developed such instruments as the amp-meter, ohm-meter, and volt-meter to observe IT for them. The development of instruments exploded in this century, particularly after the advent of electronics and sonar and radar. Look at the dashboard of a large commercial airplane, or even one in an automobile. These intruments GENERATE SYMBOLS that represent the properties of objects or events. Such symbols are called DATA. Instruments, therefore, are observing devices, but they are not machines in the Machine-Age sense because they do not apply energy to matter in order to transform it. The technology of instrumentation is fundamentally different from that of mechanization.
Another technology with this same characteristic emerged when the telegraph was invented in the last century. It was followed by the telephone, wireless, radio, television, and so on. This technology, like that of instrumentation, has nothing to do with mechanization; it has to do with the TRANSMISSION OF SYMBOLS, or COMMUNICATION.
The technologies of observation and communication formed the two sides of a technological arch that could not carry any weight until a keystone was dropped into place. This did not occur until the 1940s when the computer was developed. It too did no work in the Machine-Age sense; it manipulated SYMBOLS logically, which, as John Dewey pointed out, is the nature of THOUGHT. It is for this reason that the computer is often referred to as a thinking machine.
Because the computer appeared at a time when we had begun to put things back together again, and because the technologies of observation, communication, and computation all involve the manipulation of symbols, people began to consider systems that combine these three functions. They found that such systems could be used to control other systems, to automate. Automation is fundamentally different from mechanization. Mechanization has to do with the replacement of MUSCLE; automation with the replacement of MIND. Automation is to the Post industrial Revolution what mechanization was to the Industrial Revolution.
Automations are certainly not machines in the Machine-Age sense, and they need not be purposeless. It was for this reason that they came to be called teleological mechanisms. However, automation is no more as essential ingredient of the systems approach than is high technology in general. Both come with the System Age and are among its producers as well as its products. The technology of the Post industrial Revolution is neither a panacea nor a plague; it is what we make of it. It generates a host of problems and possibilities that systems thinking must address. The problems it generates are highly infectious, particularly to less-technologically developed cultures. The system approach provides a more effective way than previously has been available for dealing with both the problems and the possibilities generated by the Post industrial Revolution, but it is by no means limited to this special set of either or both.
(Ackoff's best : his classic writings on management, Russell L. Ackoff., © 1999, hardcover, John Wiley & Sons, Inc., pp.24-25)
____________________________________
drones
fix site
mobile drones
space
ground (autombile drone, like Knight rider (television series))
ground (street)
ground (sidewalk)
ground (tree climbing)
ground (legs)
ground (extreme cold)
under ground
water (surface)
water (under water)
water (ocean, extreme depth)
water (ocean, storm condition)
air (high altitude)
air (ground hugging)
air (aircraft)
mobile drones (robot) that can recharged itself
motors
autonomous
semi-autonomous
remote control (human)
remote control
remote control toys
remote control cars
remote control boat
remote control aircraft
robotic arm
rifles, hand gun, scope
cameras
camera phone
video phone
microphone, speaker
wheel, track
battery power (electricity)
mobile phone
communication
remote control
frequency hopping
time division multiplexing
adaptive frequency hopping
used mobile phone (computing power)
leverage mobile communication infrastructure
mobile phone as a remote control computing platform
RADAR jammer
digital radio frequency memory chip
DRFM jammer
playstation (video gaming computing machine)
general purpose platform
application specific platform
navigation, communication
you needed ways to navigate and communicate.
____________________________________
Palo alto : a history of california, capitalism, and the world
by Malcolm Harris
p.186
During the interwar period, a plane all by itself wasn't much more than a toy; to do anything purposeful with it, you needed ways to navigate and communicate. From the beginning, planes relied on ground and onboard electronics systems to guide them. These systems were collectively called avionics
Palo alto : a history of california, capitalism, and the world
by Malcolm Harris
____________________________________
Nassim Nicholas Taleb, Fooled by Randomness, 2nd edition, hardcover, 2004 [ ]
ergodicity, 57-58, 96, 156-57, 254
p.96
on average, animal will be fit, but not every single one of them, and not at all times.
Just as an animal could have survived because its sample path was lucky, the “best” operators who survived because of overfitness to a sample path ── a sample path that was free of the evolutionary rare event.
One vicious attribute is that the longer these animals can go without encountering the rare event, the more vulnerable they will be to it.
We said that should one extend time to infinity, then, by ergodicity, that event will happen with certainty ── the species will be wipe out!
For evolution means fitness to one and only one time series, not the average of all the possible environments.
(Taleb, Nassim (2004)., Fooled by Randomness, 2nd edition, hardcover)
(Fooled by Randomness: the hidden role of chance in life and in the markets / Nassim Nicholas Taleb, 1. investments, 2. chance, 3. random variables, 123.3 Taleb, )
____________________________________
• most medical doctor are trained to look for strong features when making a diagnosis, because ...
• if you can overcome the Garbage In, Garbage Out (GIGO) problem; the machine learning (also refer to as Artificial Intelligence [AI] in mainstream articles) algorithm that has been trained to detect a specific type of cancer would look at the strong and the weak features ...
Kai-Fu Lee., AI superpowers: China, Silicon Valley and the new world order, 2018
pp.190-191
My first doctor classified the disease as stage IV, the cancer's most advanced stage. On average, patients with 4th-stage lymphoma of my type have around a 50 percent shot surviving the next five years. I wanted to get a second opinion before beginning treatment, and a friend of mine arranged for me to consult his family doctor, the top hematology practitioner in Taiwan.
It would be a week before I could see that doctor, and in the meantime I continued to conduct my own research on the disease.
p.190
But as a trained scientist whose life hung in the balance, I couldn't help trying to better understand the disease and quantify my chances of survival.
p.190
lymphoma: possible causes, cutting-edge treatment, and long-term survival rates. Through my reading, I came to understand how doctors classify the various stages of lymphoma.
pp.190-191
Medical textbooks use the concept of “stages” to describe how advanced cancerous tumors are, with later stages generally corresponding to lower survival rates. In lymphoma, the stage has traditionally been assigned on the basis of a few straightforward characteristics: Has the cancer affected more than one lymph node? Are the cancerous lymph nodes both above and below the diaphragm (the bottom of the rib cage)? Is the cancer found in organs outside the lymphatic system or in the patient's bone marrow? Traditionally, each answer of “yes” to one of the above questions bumps the diagnosis up a stage. The fact that my lymphoma had affected over twenty sites, had spread above and below my diaphragm, and had entered an organ outside the lymphatic system meant that I was automatically categorized as a stage IV patient.
p.191
But what I didn't know at the time of diagnosis was that this crude method of staging has more to do with what medical students can memorize than what modern medicine can cure.
p.191
Ranking stages based on such simple characteristics of a complex disease is a classic example of the human need to base decisions on “strong features”. Humans are extremely limited in their ability to discern correlations between variables, so we look for guidance in a handful of the most obvious signifiers. In making bank loans, for example, these “strong features” include the borrower's income, the value of the home, and the credit score. In lymphoma staging, they simply include the number and location of the tumors.
p.191
These so-called strong features really don't represent the most accurate tools for making a nuanced prognosis, but they're simple enough for a medical system in which knowledge must be passed down, stored, and retrieved in the brains of human doctors.
p.191
Medical research has since identified dozens of other characteristics of lymphoma cases that make for better predictors of five-year survival in patients. But memorizing the complex correlations and precise probabilities of all these ictors is more than even the best medical students can handle. As a result, most doctors don't usually incorporate these other predictors into their own staging decisions.
p.191
In the depths of my own research, I found a research paper that did quantify the predictive power of these alternate metrics. The paper is from a team of researchers at the University Modena and Reggio Emilia in Italy, and it analyzed fifteen (15) different variables, identifying the five (5) features that, considered together, most strongly correlated to five-year survival.
pp.191-192
These features included some traditional measures (such as bone marrow involvement) but also less intuitive measures (are any tumors over 6 cm in diameter? Are hemoglobin levels below 12 grams per deciliter? Is the patient over 60?). The paper then provides average survival rates based on how many of those features a patient exhibited.
p.192
this new decision rubric still seemed far from rigorous.
But it also showed that the standard staging metrics were very poor predictors of outcomes and had been created largely to give medical students something they could easily memorize and regurgitate on their tests. The new rubric was far more data-driven, and I leaped at the chance of quantify my own illness by it.
p.192
my age, diameter of largest involved node, bone-marrow involvement, β2-microglobulin status, and hemoglobin levels. Of the five features most strongly correlated to early death, it seemed to appear that I exhibited only one.
my risk factors and survival rate.
(AI superpowers: China, Silicon Valley and the new world order / Kai-Fu Lee.; Boston: Houghton mifflin Harcourt, 2018; includes bibliographical references and index; subjects: artificial intelligence ── economic aspects ── china.| artificial intelligence ── economic aspects ── united states.; HC79.155 (ebook)
HC79.155 L435 2018 (print); 338.4; https://lccn.loc.gov/2018-17250; 2018, )
____________________________________
Michael Lewis, The undoing project, 2017
p.228
Amos Tversky and Don Redelmeier
“Discrepancy between Medical Decision for Individual Patients and for Groups”, April 1990
p.228
“Physicians deal with patients one at a time, whereas health policy makers deal with aggregrate.”
But there was a conflict between the two roles.
(Michael Lewis, The undoing project, 2017, )
____________________________________
Executive summary of ‘expert, The expert’
• experts are not perfect; they make mistake, because there is always a degree of uncertainty, no matter how close to zero that uncertainty might be; however, most experts and public figures, do not want or like to admit that uncertainty do exist, because to admit to uncertainty is to admit to the possibility of being wrong, in another word ‘error’; and error, usually, have consequences;
• many times, the experts do not follow the decision-making process (the set of rules or factors that they use to come to a decision or conclusion) that they would tell you that they use in practice; in other words, they say one thing, but in practice, they do some thing else - a bit different (the words do not match up with actions);
• just like the rest of us, experts do make mistake, and they tend to make the same mistake, over and over again; specifically, experts make the kind of mistakes that is in the design and structure of the system; not only that, these mistakes are usually hidden or invisible; and just like the rest of us, experts do not like to admit that they made the mistake; they might attribute the mistake or error to randomness, which is another way of saying, they don't really know why the mistake or error happened;
• this is not to say, we should not listen to the experts
• we want the experts to explain, to frame the information, to create a public mental model, to teach, to give illustrative meaningful, relate able, practical examples, to tell stories, to create an understanding, maybe, even multiple understandings, to dis spell misunderstand, to forewarn pitfalls; ...
• there is no such thing as a public mental model, people have mental model; the public is an abstraction - a potentially helpful, useful, fictional creation;
• the person has a mental model
• a mental model probably can be determine within a team to a degree
• however, a public mental model is a label I made up by combining: (public) + (mental model) := (public mental model)
____________________________________
Michael Lewis, The undoing project, 2017 [ ]
p.171
Goldberg said he preferred to start simple and build from there. As his first case study, he used the way doctors diagnosed cancer.
pp.171-172
They had found a gaggle of radiologists at the University of Oregon and asked them: How do you decide from a stomach X-ray if a person has cancer? The doctors said that there were seven (7) major signs they looked for: the size of the ulcer, the shape of its borders, the width of the crater it made, and so on. The “cues”, Goldberg called them, as Hoffman had before him.
p.172
Goldberg pointed out that, indeed, experts tended to describe their thoughts processes as subtle and complicated and difficult to model.
p.172
The Oregon researchers began by creating, as a starting point, a very simple algorithm, in which the likelihood that an ulcer was malignnant depended on the seven (7) factors the doctors had mentioned, equally weighted.
p.172
96 different individuals stomach ulcers, on 7-point scale from “definitely malignant”, “definitely benign”. Without telling the doctors what they were up to, they showed them each ulcer twice, mixing up the duplicates randomly in the pile so the doctors wouldn't notice they were being asked to diagnose the exact same ulcer they had already diagnosed.
p.172
The researchers didn't have a computer. They transferred all of their data onto punch cards, which they mailed to UCLA, where the data was analyzed by the university's big computer. The researchers' goal was to see if they could create an algorithm that would mimic the decision making of doctors.
p.173
But then UCLA send back the analyzed data, and the story became unsettling. (Goldberg described the results as “generally terrifying”.) In the first place, the simple model that the researchers had created as their starting point for understanding how doctors rendered their diagnoses proved to be extremely good at predicting the doctors' diagnoses.
p.173
The doctors might want to believe that their thought processes were subtle and complicated, but a simple model captured these perfectly well.
p.173
More surprisingly, the doctors' diagnoses were all over the map: The experts didn't agree with each other. Even more surprisingly, when presented with duplicates of the same ulcer, every doctor had contradicted himself and rendered more than one diagnosis: These doctors apparently could not even agree with themselves.
p.173
Experience appeared to be of little value in judging, say, whether a person was at risk of committing suicide. Or, as Goldberg put it, “Accuracy on this task was not associated with the amount of professional experience of the judge.”
p.174
Still, Goldberg was slow to blame the doctors.
p.174
How could their simple model be better at, say, diagnosing cancer than a doctor? The model had been created, in effect, by the doctors. The doctors had given the researcher all the information in it.
p.174
The Oregon researchers went and tested hypothesis anyway. It turned out to be true. If you wanted to know whether you had [stomach ulcer] cancer or not, you were better off using the algorithm that the researchers had created than you were asking the radiologist to study the X-ray. The simple algorithm had outperformed not merely the group of doctors; it had outperformed even the single best doctor.
p.17
You could best the doctor by replacing him with an equation created by people who knew nothing about medicine and had simply asked a few questions of doctors.
p.174
Lew Goldberg, “Man versus Model of Man”
p.175
It was as if the doctors had a theory of how much weight to assign to any given trait of any given ulcer. The model captured their theory of how to best diagnose an ulcer. But in practice they did not abide by their own ideas of how to best diagnose an ulcer. As a result, they were beaten by their own model.
p.175
But in practice they did not abide by their own ideas of how to best diagnose an ulcer. As a result, they were beaten by their own model.
p.175
Why would the judgement of an expert--a medical doctor, no less--be inferior to a model crafted from that very expert's own knowledge?
(Michael Lewis, The undoing project, 2017, )
____________________________________
Clayton M. Christensen, The innovator's prescription, 2009
p.391
The Joint Commission on Accreditation of Health Care Organisations also weighted in to require teleradiology services to meet licensing and accreditation standards that have long been in place for hospital-based solution shops of radiologists.46 The result: a typical NightHawk radiologist has licenses in 38 states and is credentialed at over 400 hospitals. The company employs 35 to 40 people simply to manage all of this administrative overhead--and yet can still provide these services to lower cost than most of its customers can when they choose to perform them in-house.47
However, a funny thing is happening at the edge of this stalemate. A growing segment of work is no longer dependent on a radiologist's expert eye and clinical experience to interpret shadowy anatomical strutures and link them to patients' clinical histories and physical symptoms.48 “Functional” radiology, involving dynamic in-motion studies and molecular tracers rather than still pictures, and “quantitative” radiology--a related discipline based on measurements and scoring algorithms--have significantly enhanced the ability of nonradiologist physicians to elucidate physiologic abnormalities.49 Starting with basic technologies like ultrasound and fluoroscopy, these machines automate image acquisition and analysis, embedding into algorithms some of the diagnostic skill that used to reside only in the intuition of radiologists. These machines also require less space, shielding, and power, so they can be integrated into the offices of cardiologists and orthopedic surgeons working in value-adding process clinics.50
NOTES
48. Our thanks to Dr. Keith Batchelder and Peter Miller of Genomic Healthcare Strategies for suggesting these technological enablers of disruption in radiology.
( Christensen, Clayton M., 2009, The innovator's prescription : a disruptive solution for health care / by Clayton M. Christensen, Jerome H. Grossman, Jason Hwang., 1. Health services administration., 2. Public health administration., 3. Disruptive technologies., RA971.C56 2009, 362.1 Christen, )
____________________________________
Are Translator Devices Worth it in 2020? Testing it in Japan
https://www.youtube.com/watch?v=p6TF1iUi6fQ
https://www.youtube.com/watch?v=p6TF1iUi6fQ
13:12
Tokyo Lens
Jan 14, 2020
Today we are in Tokyo, Japan putting a translator device to the test and seeing if these kinds of devices are worth it in the modern day and age of 2020. With the Olympics hitting Tokyo this year, plenty of people who don't speak Japanese (or English) will be making their way to Japan, and its time to see if a device like this can help.
Doing a full review of this translator device (which can be used with and without internet!)
THE DEVICE:
https://amzn.to/3adZmsf
This code should get you 10% off lens10lw
I GET NO MONEY from the code or anything, but any purchase you make on amazon through clicking one of my amazon links, does give support to the channel~
____________________________________
https://www.amazon.com/gp/product/B07LF9XPJW/
Langogo Genesis Portable Language Translator Device, 100+ Languages Pocket Translator, Real-time Voice Translator with Offline Translation, Built-in Data, 3.1inch Retina Display Traductor, Black
⚡【Reliable Travel Buddy】Langogo enhances the travel experience, it helps you overcome cross-language barriers, always stay connected via its mobile hotspot and get local information like hotels to stay, attractions to visit, as well as weather forecast and so on while traveling overseas.
⚡【One-Button Accurate Translation】Langogo offers an online two-way translation in one second with a single button. Powered by 24 world-leading translation engines, it ensures the translation even against different accents of the highest accuracy for 104 languages.
⚡【Voice Recording and Transcription】Genesis records a single speech up to 4 hours and instantly shows the transcription on the screen so you can focus on meeting and interview. Enjoy free English transcription before 2021 and a 1-month trial for the others.
⚡【Mobile Wi-Fi Hotspot】Genesis is also a mobile hotspot device. With the built-in eSIM chip, it allows a purchase on the device for hotspot data plan to offer a Wi-Fi connection for up to 5 mobile devices, with no extra SIM card.
⚡【Enjoys Continuous Update】The self-learning algorithm and continuous update improve its performance. More function and up-to-date vocabulary are adding to it, and the more you use it, the more precise it becomes.
Langogo Genesis AI Translator with Wi-Fi Hotspot
Langogo Genesis is specially designed to help you to improve your travel experience. It integrates 24 translation engines with its one-button translation design to enhance the accuracy and convenience of the speech-to-speech translation. In addition, Langogo Genesis can be used as a mobile Wi-Fi hotspot, which keeps you connected to the internet while traveling abroad and saves your phone battery. You can then focus on sharing memorable moments along the way with all your loved ones.
To use Langogo with updated languages and latest functions, please always check your system version and upgrade before using it.
translation in one second, two-way translator easy use and convenient language translator device
One-button Translation
Langogo offers a unique one-button two-way translation. It can automatically recognize the inter-translation language, which means when you say one language, Langogo translates your words to the other automatically. No A/B buttons, no extra App.
The translation process is based on 24 translation engines integrated and its self-learning algorithms, therefore Langogo translates with the highest accuracy and efficiency.
More than a translator, Langogo Genesis is also an intelligent voice assistant. It can deliver useful information including weather forecasts, exchange rates, nearby attractions and hotels, and so on. More powerful skills, such as navigation, travel guides, taxi booking, etc., will be available shortly.
Langogo supports lifetime update service, which continuously improves the stability, performance, and safety of Langogo. New languages and system functions will be constantly replenished and available for an online update on your Langogo.
Languages Translated and Countries Covered to Use eSIM
• Languages Translated Online: Arabic (Algeria), Arabic (Bahrain), Arabic (Egypt), Arabic (Iraq), Arabic (Jordan), Arabic (Kuwait), Arabic (Lebanon), Arabic (Morocco), Arabic (Oman), Arabic (Qatar), Arabic (Saudi Arabia), Arabic (State of Palestine), Arabic (Tunisia), Arabic (United Arab Emirates), Armenian (United States), Azerbaijani, Basa Sunda, Bulgarian, Catalan, Czech, Croatian, Chinese (Mandarin), Chinese (Cantonese), Chinese (Taiwan), Danish, Dutch, English (Australia), English (Canada), English (UK), English (Ghana), English (Ireland), English (India), English (Kenya), English (Nigeria), English (New Zealand), English (Philippines), English (Tanzania), English (United States), English (South Africa), Finnish, Filipino, French (Canada), French (France), Georgian, German (Germany), Greek, Gujarati (India), Hebrew, Hindi (India), Hungarian, Icelandic, Indonesian, Italian, Javanese (Indonesia), Japanese, Kannada, Korean, Lao, Latvian, Lithuanian, Malay, Nepal, Norwegian, Persian, Polish, Portuguese (Brazil), Portuguese (Portugal), Romanian, Russian, Serbian, Sinhalese (Sinhala), Slovak, Slovenian, Spanish (Argentina), Spanish (Bolivia), Spanish (Chile), Spanish (Colombia), Spanish (Costa Rica), Spanish (Dominican Republic), Spanish (Ecuador), Spanish (Spain), Spanish (Guatemala), Spanish (Honduras), Spanish (Mexico), Spanish (Nicaragua), Spanish (Panama), Spanish (Peru), Spanish (Puerto Rico), Spanish (Paraguay), Spanish (El Salvador), Spanish (United States), Spanish (Uruguay), Spanish (Venezuela), Swahili, Swedish, Tamil (India), Telugu (India), Thai, Turkish, Khmer (Cambodia), Ukrainian, Urdu, Vietnamese, Zulu
• Languages have only translation displayed in text: Azerbaijani, Persian, Gujarati (India), Armenian (United States), Icelandic, Georgian, Kannada, Lao, Lithuanian, Latvian, Serbian, Swahili, Urdu, Zulu
• Languages Translated Offline: Chinese, English, Japanese, Korea
• Countries and Regions Supporting eSIM for Translation: Albania, Australia, Austria, Bangladesh, Belarus, Belgium, Bulgaria, Cambodia, Canada, China Mainland, Croatia, Cyprus, Czech Republic, Denmark, Egypt, Estonia, Finland, France, Germany, Ghana, Greece, Hong Kong, Hungary, Iceland, Indonesia, Ireland, Israel, Italy, Japan, Kazakhstan, Kyrgyz Republic, Laos, Latvia, Liechtenstein, Lithuania, Luxembourg, Macao, Macedonia, Malaysia, Malta, Netherlands, New Zealand, Norway, Oman, Philippines, Poland, Portugal, Qatar, Romania, Russia, Saudi Arabia, Serbia, Singapore, Slovakia, Slovenia, South Africa, South Korea, Spain, Sri Lanka, Sweden, Switzerland, Taiwan, Tajikistan, Thailand, Turkey, Ukraine, United Arab Emirates, United Kingdom, United States, Vietnam
• Countries and Regions Supporting eSIM for Hotspot Sharing: Argentina, Australia, Austria, Belarus, Belgium, Brazil, Bulgaria, Cambodia, Canada, Chile, China Mainland, Croatia, Cyprus, Czech Republic, Denmark, Estonia, Finland, France, Germany, Greece, Hong Kong, Hungary, Iceland, India, Indonesia, Ireland, Italy, Japan, Laos, Latvia, Liechtenstein, Lithuania, Luxembourg, Macao, Macedonia, Malaysia, Malta, Mexico, Netherlands, New Zealand, Norway, Peru, Philippines, Poland, Portugal, Romania, Russia, Serbia, Singapore, Slovakia, Slovenia, South Korea, Spain, Sweden, Switzerland, Taiwan, Thailand, Turkey, Ukraine, United Kingdom, United States
• System Language: Chinese(Traditional), Chinese(Simplified), English, French, German, Japanese, Korean, Spanish, Thai
____________________________________
··<────────────────────────────────────────────────────────────────────────────>