Robot Technology News
ROBO SPACE
New datasets aim to teach AI models cross-disciplinary scientific thinking
illustration only
Reuters Events SMR and Advanced Reactor 2025
New datasets aim to teach AI models cross-disciplinary scientific thinking
by Clarence Oxford
Los Angeles CA (SPX) Dec 03, 2024

What can exploding stars reveal about blood flow in arteries, or how might swimming bacteria inform our understanding of ocean dynamics? Researchers from leading institutions have taken a major step forward in training artificial intelligence (AI) models to draw insights across disciplines to unlock scientific discoveries.

The initiative, known as Polymathic AI, leverages advanced technology similar to large language models like ChatGPT, but instead of processing text, it uses datasets from fields such as astrophysics, biology, chemistry, and fluid dynamics. This approach equips the models with cross-disciplinary scientific capabilities.

"These groundbreaking datasets are by far the most diverse large-scale collections of high-quality data for machine learning training ever assembled for these fields," said Michael McCabe, a research engineer at the Flatiron Institute in New York City and a member of Polymathic AI. "Curating these datasets is a critical step in creating multidisciplinary AI models that will enable new discoveries about our universe."

The Polymathic AI team has released two open-source datasets, collectively comprising 115 terabytes of data sourced from dozens of contributors. This massive resource is available to the public and is expected to accelerate the development of AI models capable of solving complex scientific problems. For comparison, GPT-3 required only 45 terabytes of unfiltered data during its training phase.

"The freely available datasets are an unprecedented resource for developing sophisticated machine learning models that can then tackle a wide range of scientific problems," added Ruben Ohana, a research fellow at the Flatiron Institute's Center for Computational Mathematics. "Open-sourcing this data benefits both the machine learning and scientific communities, creating a win-win situation."

The datasets are hosted on HuggingFace, a popular platform for AI models and data, and detailed in papers accepted for presentation at the prestigious NeurIPS conference in Vancouver, Canada.

"We've seen again and again that the most effective way to advance machine learning is to take difficult challenges and make them accessible to the wider research community," said McCabe. "When a new benchmark is released, it initially seems insurmountable. But opening access accelerates progress far beyond what any individual group could achieve."

Polymathic AI is a collaborative effort involving researchers from institutions such as the Simons Foundation, Flatiron Institute, New York University, and the Lawrence Berkeley National Laboratory.

The first dataset, named the Multimodal Universe, focuses on astrophysics and includes hundreds of millions of observations, such as images from NASA's James Webb Space Telescope and stellar data from ESA's Gaia spacecraft. "Machine learning has been happening for around 10 years in astrophysics, but it's still very hard to use across instruments, missions, and disciplines," said Polymathic AI researcher Francois Lanusse. "Datasets like the Multimodal Universe allow us to create models that natively understand this data and act as a Swiss Army knife for astrophysics."

The second dataset, dubbed the Well, spans 15 terabytes of data across 16 diverse datasets. It features simulations of biological systems, fluid dynamics, supernovae, and more, all rooted in mathematical equations called partial differential equations. These equations appear in a wide array of scientific problems but are notoriously difficult to solve. "This dataset encompasses a diverse range of physics simulations designed to address key limitations of current machine learning models," said Polymathic AI member Rudy Morel.

Building these datasets required extensive collaboration. "The creators of numerical simulations are sometimes skeptical of machine learning because of the hype, but they're curious about how it can benefit their research," Ohana explained.

The team is now using the datasets to train AI models, with early results showing promise. "Understanding how machine learning models generalize and interpolate across datasets from different physical systems is an exciting research challenge," said Polymathic AI member Regaldo-Saint Blancard.

Shirley Ho, project lead and group leader at the Flatiron Institute, noted, "Just like the Protein Data Bank spawned AlphaFold, I'm excited to see what the Well and the Multimodal Universe will help create." Ho will present Polymathic AI's findings at NeurIPS.

Related Links
Polymathic AI
Simons Foundation
All about the robots on Earth and beyond!

Subscribe Free To Our Daily Newsletters
Tweet

RELATED CONTENT
The following news reports may link to other Space Media Network websites.
ROBO SPACE
Reducing environmental impacts in small-scale robotics manufacturing
Madrid, Spain (SPX) Nov 27, 2024
The world of micro- and nanoscale robotics has rapidly evolved, with applications ranging from precise drug delivery to sustainable energy production and environmental cleanup. These advanced robotic systems, powered by external energy sources such as magnetic fields, light, and ultrasound, have brought transformative changes across biomedicine, ecology, and technology. Yet, as these innovations expand, the environmental footprint of manufacturing and deploying such devices is coming under scrutiny. ... read more

ROBO SPACE
PLP launches drone kit for installing bird diverters on power lines

'Record' drone barrage pummels Ukraine as missile tensions seethe

Drones spotted flying near US Air Force bases in UK

Russia and Ukraine trade aerial attacks amid escalation fears

ROBO SPACE
Bioinspired dropletronics pave the way for advanced biocompatible devices

New nanomaterial offers potential for antimicrobial applications

3D-printing advance mitigates three defects simultaneously for failure-free metal parts

Shape memory alloy antenna redefines communication technology

ROBO SPACE
Cooling with light explored through semiconductor quantum dots

Photon qubits advance quantum computing without error correction techniques

A pathway to advanced quantum devices with zinc oxide quantum dots

US unveils fresh export curbs targeting China's chip sector

ROBO SPACE
Serbia lifts moratorium on nuclear power

Cheers, angst as US nuclear plant Three Mile Island to reopen

Argonne evaluates small modular reactors for Ukraine's economic recovery

Framatome's PROtect fuel achieves key milestone at Gosgen Nuclear Plant in Switzerland

ROBO SPACE
US 'appalled' by alleged Russia use of banned gas in Ukraine

Chinese man sentenced to 20 months for Falun Gong harassment in US

Chemical weapons watchdog says banned gas found in Ukraine samples

Thai military accused of beating Myanmar man to death

ROBO SPACE
Earning money while supporting power grid stability

Contentious COP29 deal casts doubt over climate plans

Ukraine says energy sector 'under massive enemy attack'

Developing nations slam 'paltry' $300 bn climate deal

ROBO SPACE
KSTAR launches 2024 plasma experiments to refine fusion reactor technologies

Breakthrough in heat-to-electricity conversion demonstrated in tungsten disilicide

Decarbonizing heavy industry with thermal batteries

ODS FeCrAl alloys show durability in liquid metal flow at fusion reactor temperatures

ROBO SPACE
China inflatable space capsule aces orbital test

Tianzhou 7 completes cargo Mission, Tianzhou 8 docks with Tiangong

Zebrafish thrive in space experiment on China's space station

China's commercial space sector expands as firms outline ambitious plans

Subscribe Free To Our Daily Newsletters




The content herein, unless otherwise known to be public domain, are Copyright 1995-2024 - Space Media Network. All websites are published in Australia and are solely subject to Australian law and governed by Fair Use principals for news reporting and research purposes. AFP, UPI and IANS news wire stories are copyright Agence France-Presse, United Press International and Indo-Asia News Service. ESA news reports are copyright European Space Agency. All NASA sourced material is public domain. Additional copyrights may apply in whole or part to other bona fide parties. All articles labeled "by Staff Writers" include reports supplied to Space Media Network by industry news wires, PR agencies, corporate press officers and the like. Such articles are individually curated and edited by Space Media Network staff on the basis of the report's information value to our industry and professional readership. Advertising does not imply endorsement, agreement or approval of any opinions, statements or information provided by Space Media Network on any Web page published or hosted by Space Media Network. General Data Protection Regulation (GDPR) Statement Our advertisers use various cookies and the like to deliver the best ad banner available at one time. All network advertising suppliers have GDPR policies (Legitimate Interest) that conform with EU regulations for data collection. By using our websites you consent to cookie based advertising. If you do not agree with this then you must stop using the websites from May 25, 2018. Privacy Statement. Additional information can be found here at About Us.