Apple, NVIDIA and Anthropic reportedly used YouTube transcripts without permission to train AI models

Spread the love

A number of the world’s largest tech corporations skilled their AI fashions on a dataset that included transcripts of greater than 173,000 YouTube movies with out permission, a brand new investigation from Proof Information has discovered. The dataset, which was created by a nonprofit firm referred to as EleutherAI, comprises transcripts of YouTube movies from greater than 48,000 channels and was utilized by Apple, NVIDIA and Anthropic amongst different corporations. The findings of the investigation highlight AI’s uncomfortable reality: the expertise is basically constructed on the backs of information siphoned from creators with out their consent or compensation.

The dataset doesn’t embrace any movies or pictures from YouTube, however comprises video transcripts from the platform’s largest creators together with Marques Brownlee and MrBeast, in addition to giant information publishers like The New York Instances, the BBC, and ABC Information. Subtitles from movies belonging to Engadget are additionally a part of the dataset.

“Apple has sourced knowledge for his or her AI from a number of corporations,” Brownlee posted on X. “One among them scraped tons of information/transcripts from YouTube movies, together with mine,” he added. “That is going to be an evolving drawback for a very long time.”

A Google spokesperson advised Engadget that earlier feedback made by YouTube CEO Neal Mohan saying that corporations utilizing YouTube’s knowledge to coach AI fashions would violate the paltform’s phrases and repair nonetheless stand. Apple, NVIDIA, Anthropic and EleutherAI didn’t reply to a request for remark from Engadget.

See also  Choosing the Right Bouquet for Moms in Your Life

Thus far, AI corporations haven’t been clear concerning the knowledge used to coach their fashions. Earlier this month, artists and photographers criticized Apple for failing to disclose the supply of coaching knowledge for Apple Intelligence, the corporate personal spin on generative AI coming to thousands and thousands of Apple gadgets this yr.

YouTube, the world’s largest repository of movies, particularly, is a goldmine of not solely transcripts but additionally audio, video, and pictures, making it a gorgeous dataset for coaching AI fashions. Earlier this yr, OpenAI’s chief expertise officer, Mira Murati, evaded questions from The Wall Avenue Journal about whether or not the corporate used YouTube movies to coach Sora, OpenAI’s upcoming AI video technology software. “I’m not going to enter the main points of the info that was used, however it was publicly obtainable or licensed knowledge,” Murati mentioned on the time. Alphabet CEO Sundar Pichai has additionally mentioned that corporations utilizing knowledge from YouTube to coach their AI fashions would violate of the platform’s phrases of service.

If you wish to see if subtitles out of your YouTube movies or out of your favourite channels are a part of the dataset, head over to the Proof Information’ lookup software.

Replace, July 16 2024, 3:17 PM PT: This story has been up to date to incorporate an announcement from Google.

Source link

  • David Bridges

    David Bridges

    David Bridges is a media culture writer and social trends observer with over 15 years of experience in analyzing the intersection of entertainment, digital behavior, and public perception. With a background in communication and cultural studies, David blends critical insight with a light, relatable tone that connects with readers interested in celebrities, online narratives, and the ever-evolving world of social media. When he's not tracking internet drama or decoding pop culture signals, David enjoys people-watching in cafés, writing short satire, and pretending to ignore trending hashtags.

    Related Posts

    Art the Clown to Haunt a New Holiday Season

    Spread the love

    Spread the love Share It: ChatGPT Perplexity WhatsApp LinkedIn X Grok Google AI The recently released Scary Movie 6 trailer takes a humorous jab at Art the Clown’s “Mall Santa”…

    Read more

    Hurdle Hints and Answers for May 9, 2026

    Spread the love

    Spread the love Share It: ChatGPT Perplexity WhatsApp LinkedIn X Grok Google AI If you enjoy engaging in daily word challenges such as Wordle, then Hurdle is an exciting game…

    Read more

    You Missed

    Prodentim Reviews: Customer Feedback, User Results & Oral Health Benefits

    Prodentim Reviews: Customer Feedback, User Results & Oral Health Benefits

    Medically Induced Coma: Latest Updates in Hollywood Life

    Medically Induced Coma: Latest Updates in Hollywood Life

    Propeller Horns in Goat Simulator 3: Where to Find Them

    Propeller Horns in Goat Simulator 3: Where to Find Them

    Art the Clown to Haunt a New Holiday Season

    Art the Clown to Haunt a New Holiday Season

    AI’s Hidden Power Brokers Unveiled – EL PAÍS English

    AI’s Hidden Power Brokers Unveiled – EL PAÍS English

    Pooh Shiesty Not Guilty in Gucci Mane Kidnapping Case

    Pooh Shiesty Not Guilty in Gucci Mane Kidnapping Case

    Hurdle Hints and Answers for May 9, 2026

    Hurdle Hints and Answers for May 9, 2026

    Meta Stablecoin Integration Sparks Concerns in Financial Technology

    Meta Stablecoin Integration Sparks Concerns in Financial Technology

    New Cancer Diagnosis Update – Hollywood Life News

    New Cancer Diagnosis Update – Hollywood Life News

    Brotherly Exchange Between Young Thug and Rich Homie Quan

    Brotherly Exchange Between Young Thug and Rich Homie Quan