ByPaul Wallis
December 22, 2024
DIGITAL JOURNAL
A blockbuster funding round for San Francisco-based startup Databricks is another sign of hunger by investors for companies poised to cash in on generative artificial intelligence — © AFP Josep LAGO
Artificial General Intelligence is the one everybody’s scared of. This is the “human” level of AI, or better, according to some. Hype and hope are spending far too much time being seen on the same page. They’re not even in the same book yet.
The current state of testing of OpenAI’s o3 model has definitely got the chickens cackling. Testing outcomes were a mix of fab and fail. The definitive description of the outcomes is in this article in New Scientist, which you do need to read.
Every article on AI should come with a chaperone and a guy with a red flag walking about 20 paces in front, warning of its approach. The sheer opacity of this information normally doesn’t help.
“Tests were done. The AI passed or failed the tests, …etc.” is the usual format.
This uninformative blurriness is due to both sheer volume of data and the turgidity of wading through it.
Fortunately, New Scientist has condensed a lot of info into something actually readable. Please do read it, because it clarifies a lot of issues.
OK, so briefly, this is what’s happened:
OpenAI’s o3 did pretty well and outperformed its predecessors. It did very well on the benchmark test for AGI, a thing called ARC or Abstract and Reasoning Corpus (ARC).
…But it’s still not AGI. It didn’t meet multiple criteria. It failed some tests and didn’t achieve cost parameters to many people’s liking.
This is where things get picky, with good reason. The easiest example is to compare a chess computer to AGI expectations. Chess computers use “brute force”, processing millions of possible moves and choosing the best.
Ironically, that is exactly what people expect all AI in games to do, but that doesn’t even approach AGI ARC requirements. AI is supposed to reason its way through challenges.
Another problem as I see it is the inbuilt cost paradigm. Brute force is inefficient, costly and not as good in performance terms. (It’s pretty ancient and outmoded tech in processing terms, too.)
It’s reasonable enough to set a value for computing tasks, so you have a functional metric. That can hardly be the whole story, though.
If you set $20 for the cost of a task, how do you value the outcome?
In real terms, any range of tasks will have different values for outcomes. You spend $20 for a $20 task outcome, OK.
But, and it’s a big but –
You spend the same $20 for a $200,000 task outcome, are you measuring cost efficiency or not using this metric? You can see why people might get interested.
You must value the outcomes directly, not just “pass or fail”. Otherwise, you don’t even have a cost benefit analysis.
Meanwhile, it’s very debatable how much these core valuations are getting through to the market. This level of hype is too dangerous. Everybody who actually works in the AI sector is wary of the hype. Current market-level AI is nowhere near the o3 level, and o3 is nowhere near the AGI ARC level.
The market obviously doesn’t care. That’s not stopping vast amounts of dollars jumping in with or without any understanding at all of the absolute basics. It’s not stopping people replacing staff with AI.
Even if you just mindlessly assume these very early-stage AIs are capable of doing these jobs, whose money and credibility is at risk?
Irresponsibility is such fun, isn’t it? The risk levels are incredible.
The other critical point here is this:
The market will have to apply something very like ARC standards or probably better to AGI when it does arrive. These standards will need to be universal.
ARC could well be the ancestral “does this thing work or not” test vehicle for future AI.
The sales guys have been doing their jobs too well. Now is the time for the hardheads and tech heads to make a difference.
The market has been far too accepting of AI as an idea. Even the simple reality that people will have to work with perhaps millions of AIs, specialist AIs, and “niche” AIs isn’t getting much attention.
How will AGI fit into a shifting sands museum of old practices, technologies, and human perceptions? Probably very badly. The wheel was invented when there were no carts, horse attachments, or even clear ideas of what a wheel was.
See any possible wheel-like problems with AGI?
AGI needs to be idiot-proof. It must be manageable.
___________________________________________________________
Disclaimer
The opinions expressed in this Op-Ed are those of the author. They do not purport to reflect the opinions or views of the Digital Journal or its members.
A blockbuster funding round for San Francisco-based startup Databricks is another sign of hunger by investors for companies poised to cash in on generative artificial intelligence — © AFP Josep LAGO
Artificial General Intelligence is the one everybody’s scared of. This is the “human” level of AI, or better, according to some. Hype and hope are spending far too much time being seen on the same page. They’re not even in the same book yet.
The current state of testing of OpenAI’s o3 model has definitely got the chickens cackling. Testing outcomes were a mix of fab and fail. The definitive description of the outcomes is in this article in New Scientist, which you do need to read.
Every article on AI should come with a chaperone and a guy with a red flag walking about 20 paces in front, warning of its approach. The sheer opacity of this information normally doesn’t help.
“Tests were done. The AI passed or failed the tests, …etc.” is the usual format.
This uninformative blurriness is due to both sheer volume of data and the turgidity of wading through it.
Fortunately, New Scientist has condensed a lot of info into something actually readable. Please do read it, because it clarifies a lot of issues.
OK, so briefly, this is what’s happened:
OpenAI’s o3 did pretty well and outperformed its predecessors. It did very well on the benchmark test for AGI, a thing called ARC or Abstract and Reasoning Corpus (ARC).
…But it’s still not AGI. It didn’t meet multiple criteria. It failed some tests and didn’t achieve cost parameters to many people’s liking.
This is where things get picky, with good reason. The easiest example is to compare a chess computer to AGI expectations. Chess computers use “brute force”, processing millions of possible moves and choosing the best.
Ironically, that is exactly what people expect all AI in games to do, but that doesn’t even approach AGI ARC requirements. AI is supposed to reason its way through challenges.
Another problem as I see it is the inbuilt cost paradigm. Brute force is inefficient, costly and not as good in performance terms. (It’s pretty ancient and outmoded tech in processing terms, too.)
It’s reasonable enough to set a value for computing tasks, so you have a functional metric. That can hardly be the whole story, though.
If you set $20 for the cost of a task, how do you value the outcome?
In real terms, any range of tasks will have different values for outcomes. You spend $20 for a $20 task outcome, OK.
But, and it’s a big but –
You spend the same $20 for a $200,000 task outcome, are you measuring cost efficiency or not using this metric? You can see why people might get interested.
You must value the outcomes directly, not just “pass or fail”. Otherwise, you don’t even have a cost benefit analysis.
Meanwhile, it’s very debatable how much these core valuations are getting through to the market. This level of hype is too dangerous. Everybody who actually works in the AI sector is wary of the hype. Current market-level AI is nowhere near the o3 level, and o3 is nowhere near the AGI ARC level.
The market obviously doesn’t care. That’s not stopping vast amounts of dollars jumping in with or without any understanding at all of the absolute basics. It’s not stopping people replacing staff with AI.
Even if you just mindlessly assume these very early-stage AIs are capable of doing these jobs, whose money and credibility is at risk?
Irresponsibility is such fun, isn’t it? The risk levels are incredible.
The other critical point here is this:
The market will have to apply something very like ARC standards or probably better to AGI when it does arrive. These standards will need to be universal.
ARC could well be the ancestral “does this thing work or not” test vehicle for future AI.
The sales guys have been doing their jobs too well. Now is the time for the hardheads and tech heads to make a difference.
The market has been far too accepting of AI as an idea. Even the simple reality that people will have to work with perhaps millions of AIs, specialist AIs, and “niche” AIs isn’t getting much attention.
How will AGI fit into a shifting sands museum of old practices, technologies, and human perceptions? Probably very badly. The wheel was invented when there were no carts, horse attachments, or even clear ideas of what a wheel was.
See any possible wheel-like problems with AGI?
AGI needs to be idiot-proof. It must be manageable.
___________________________________________________________
Disclaimer
The opinions expressed in this Op-Ed are those of the author. They do not purport to reflect the opinions or views of the Digital Journal or its members.
No comments:
Post a Comment