OpenAI recently released a new general purpose AI model which converts text to videos. The model is called Sora and has been made available to red teamers to test its security and stability as a model. OpenAI have also released Sora to a select number of movie makers, visual artists and designers to test the model’s capabilities to create high quality videos and to provide insightful feedback.
Essentially, what Sora does is, when you give it a text prompt it converts that prompt into a short video. For example, if you type “man walking down the street in New York” the model returns a video of a man walking down the street somewhere in the city of New York. This text to video model, based on the vast amounts of training data that it has been fed, can now create novel videos from the videos that it has already absorbed and analysed during its training phase.
While the model is not yet available for public use, the potential applications, both for good and for bad, once made available will be endless.
A text-to-video AI model as effective as Sora, which makes high quality, lifelike videos, certainly makes information more accessible to both adults and children with different learning styles or disabilities. By converting text to video with narration, sign language, or subtitles, Sora will be able to cater to a much wider audience and promote educational inclusivity. No longer will parents have to sweat over homework with a child that has difficulty reading. They can simply create a story based on the child’s interests and present the information that needs to be learned in a more fun, learning video.
The learning does not stop with homework, studies show video is a powerful tool for retaining information compared to simply using only text[1]. AI-generated videos can transform complex topics into engaging and visually appealing content, improving audience engagement and knowledge retention, particularly in fields like education and training[2].
The potential for AI models like Sora to transform the efficiency of content creation is immense. Instead of spending hours filming and editing, individuals or companies in the entertainment and media industries can now create mock-ups of their stories within minutes. Added to this, these AI models allow for quicker content updates, news sharing, and easier scaling of video production for various audiences in industries like marketing and advertising, education and training, e-commerce and retail, real estate, construction and sports and fitness to name a few.
These AI-powered text-to-video tools have the potential to democratise video creation and make it accessible to a wider range of people who might not have the resources or expertise for professional video production, especially smaller companies and individuals setting up their own businesses. Sora is set to level the playing field for entry into many diverse markets around video content, creating a healthy competitive marketplace for consumers and entrepreneurs alike.
Nonetheless, another noteworthy aspect of general-purpose AI models like Sora that will need to be considered is the potential adverse functions the model may have. There are always going to be malicious actors ready to use new technologies for their own personal gain and Sora is likely to be no different.
The year 2024 sees many countries going to the polls for a myriad of reasons. The United State is due to have their presidential election in November 2024 and so are European countries like Poland, Portugal, Romania, France, Finland and Italy. The European Parliament elections are due to take place in June 2024 and Ireland is due for a general election along with many other countries like the UK and Canada.
Sora and similar technology can be easily misused to create realistic, yet completely fabricated, videos of people or in the case of election season, politicians saying or doing things they never did. These "deepfakes" can be used to spread misinformation, damage reputations, and manipulate public opinion, and even sway people how to vote in an election.
If any kind of trust in what gets published on the Internet is to remain, it will be crucial in the future to gain explicit consent from individuals before using their likeness or personal information in generated videos. This includes understanding how the video will be employed, distributed, and for how long it will remain in use. Users should also have the right to request the removal of any videos that use their information without consent.
The challenge with this is that, to function, AI systems like Sora need to analyse vast amounts of data, including text, images, and videos. Due to the large number of data required, data is often scraped from various locations across the Internet. Data is gathered from social media sites or user-submitted reviews along with automatic information extraction from websites and other online resources. These scraped datasets, almost without doubt contain personal data of millions of individuals. This raises concerns about how personal data is collected, stored, and secured and when, if ever, this data gets deleted.
If malicious actors gained access to the vast amounts of training data used to train models like Sora, they could use the personal data contained in these datasets for unintended purposes, like large scale identity theft or creating even more convincing political deepfakes with real personal data of politicians or high-profile individuals. Sch scenarios could potentially cause serious personal damage and public humiliation to these people, while undermining trust in the democratic process. This possibility is not beyond the realm of believability given the proliferation of cyber-attacks and information warfare currently taking place around the world.
Putting aside malicious actors for a moment, as already outlined AI models are trained on vast amounts of data collected from sources across the globe. This data could easily and has been shown to reflect all sorts of societal biases and prejudices. Ensuring a generative AI model such as Sora is completely bias-free is a complicated and challenging task[3]. Bias is a complex issue that can manifest in many ways, which makes it all the more difficult to fully eliminate in general-purpose AI models[4].
Let us say, for the sake of argument, that the training data for such general-purpose AI models is unbiased from the outset, the way the model interprets the data can also introduce biases during its use[5]. Even this scenario can potentially lead to the creation of videos that are discriminatory or offensive towards certain groups of people. It, thus, becomes crucial to address these biases during development in an effort to ensure a fair and balanced representation of all individuals and minority groups in the generated output.
This task of ensuring such large AI models are bias-free is daunting but necessary. The challenge is that the inner workings of these AI models are often complex and opaque, while also very often believed to be intellectual property or protected under trade secret legislation. This lack of transparency makes it difficult to understand how AI models like Sora generate their video content.
Once deployed for general use, such technology has the potential to raise a myriad of privacy concerns in the near future, especially in jurisdictions like Europe where individuals have a legal right to know how their personal data is being used and to request an explanation of how AI systems process their personal data along with the right to deletion of their personal data in certain circumstances. All these rights exist under the European General Data Protection Regulation (GDPR). Arguably, such AI models are setting the scene for legal battles seeking accountability from these global organisations who develop and deploy general-purpose AI models such as Sora.
References
Kommentare