Tag: machine learning


  • Artificial Intelligence: A Primer for Graphics Artists and Educators

    Artificial Intelligence: A Primer for Graphics Artists and Educators

    A gentle introduction to four categories of AI models, and some suggestions on how education should approach its arrival.

    Introduction

    I’ve been doing computer graphics professionally for 25 years. I’ve never seen anything like generative artificial intelligence. Several years ago, on a motion capture stage, I saw the potential of real-time graphics and animation and was clobbered with the realization that AI would change everything.

    This is a synopsis of a talk I gave to some students about the topic, and then, again, to animation and game development faculty. My interest here is to raise awareness.

    This is intended for beginners with some understanding of computer graphics. It’s my hope today to give you an overview of four categories of AI that you should pay attention to, as it begins to permeate your life. At the end of this presentation, I give some thoughts on how education should adapt.

    The six sections below are:

    1. It’s Only the Beginning
    2. Neural Networks
    3. Generative Models
    4. Large Language Models
    5. Foundational Models
    6. Thoughts on Education

    It’s Only the Beginning

    Below is a series of images that were created using AI. Most of these images were prompted, (which I will discuss below) and then generated by a clip diffusion model called Stable Diffusion.

    A selection of experiments from 2022 – 2023 (Stable Diffusion)

    Starting with image making and endless conversations with ChatGPT, AI has blended into almost all of my workflows. I’m now animating with it, writing with it, and in a constant experimentation cycle with it.
    For example, I readily play with NERFs or Neural Field Radiance, which allow scanning full 3D objects for importing into the computer — with your phone! It’s amazing.

    Here is a 3d NERF scan of my 7 year old’s Lego ATAT. Let it load – You can grab and rotate around it.

    These kind of accessible software tools for creation will proliferate rapidly. The graph below from venture capitalist firm Andreessen Horwitz shows that the number of research papers with working code has skyrocketed. These research innovations, some included below, are becoming productized.

    In short, computer graphics is becoming accessible to everyone.

    Jack Solslow @jacksoslow (a16z – Andreesen Horwitz) via Twitter

    Neural Networks

    Learning Goal: Understand how to build a Classifier.

    Artificial Intelligence is a science that is several decades old. It is most likely that the architecture of the neural network, in addition to the creation of insanely vast amounts of internet data, has been the rocket fuel for the discipline to explode.

    Neural networks can see and learn patterns.

    When you unlock your phone with your face, it is a neural network that recognizes you. Neural networks are critical primitive for the generative artist to learn.

    Computer Vision is the recognition of patterns.
    Source: Wikipedia

    Above is a pixelated orc. Every pixel on this image has an x value and a y value, and it is either black or white. A pixel then, is just numbers. With these numbers put together with other pixels, we can create a pattern, like a simple 3×3 diagonal. A collection of this math that recognizes this pattern is attached to a “little activation unit.” These activation units fire when sensing simple patterns. These units are called “neurons.”

    Neural networks are rowed collections of these neurons. A series of pattern activators spread over a series of layers. To build one is to “train” the neurons and “tune” the math between the layers. This is done by feeding it lots of data on the patterns you wish it to learn.


    It is effectively a way for math to have memory.

    Neural Networks are the core architecture for classification models.

    Neural networks are widely used for object classification, image segmentation and all sorts of useful pattern recognition.


    Demo: Build a Power Ranger Classifier

    In the demonstration below, I show how machine learning can learn to recognize my five year old’s Power Ranger. After running the trained model, I show the two data sets: One with the Power Ranger, and the other without.

    By learning the patterns of each of these datasets, we can recognize a Power Ranger.

    Try Lobe yourself at Lobe.ai


    Generative Models

    Learning Goal: Understand how to prompt and tune generative image making models.

    Have you used ChatGPT? How about Midjourney?

    These are generative models. These models take input and generate an image or text. Eventually, this method will be used to create … well, potentially anything digital. This input is called “Prompting.” Prompting is a form of user intent, a sentence that represents what we hope the AI model will generate for us.

    In this case, my prompt was for a painting of George Washington drinking a beer. I use references to artists to get it closer to my desired output. (I’ll leave the intellectual property discussion for another day.)

    Prompting Presidential Beers

    It’s not bad, huh? And it’s gettin’ pretty good. But, prompting has it’s limits.

    I can’t make an image of me as Iron Man by just prompting alone. It doesn’t know what I look like. Which is why I need to tune a dataset on the images of me. This means creating a dataset of me, say 25 pictures or so, and then feeding it to the network so it learns to identify that pattern. (Me!) This can be a little challenging, but the tooling is getting better.

    Tuning a data set with images.

    Once I’ve trained the model, however, I’m able to use it with any other prompt quite easily. And since clearly there is a real need to turn myself into a Pixar character, I can easily do that.

    Learning to tune generative models is a fundamental skill for the generative artist.


    A quick note on Latent Space …

    Latent Space is a concept that helps me understand data sets as a giant universe of data. Each of these colored dots might represent a vector, or a specific category, of the dataset. Say “anime” or “science fiction,” or specific to the camera like “cinematic.”

    A universe of possibilities in the data set.

    The intersection of these vectors is what generates a unique seed of a pattern. This pattern is an absolutely new and unique generative image.


    Demo: Image Making with Stable Diffusion.

    As an animator, of course I am interested in robots. Giant robots. When I began using image generation models, I began prompting mechs and science fiction.

    Early Giant Robot Experiments (Disco and Stable Diffusion)

    When I began, I used the image creation model, Disco Diffusion in Google collab notebooks. A year later, I am now creating more compelling imagery with the more robust and accessible, Stable Diffusion. By using the platform Leonardo.ai I can iterate far quicker than I ever did in the Google collabs.

    This is evidence that coherence and fidelity of images will only accelerate with better models, tooling and workflow.

    The most recent Mech Experiment from Summer 2023. (Stable Diffusion)

    I recommend prompting images yourself with the Stable Diffusion platform Leonardo.ai.


    Large Language Models

    Chat GPT, and similar models like Google Bard and called large language models. They are composed of a giant transformer. Not like Optimus Prime, but like a huge predictive network that chooses the next word in the sentence. You know how your email suggests a response? ChatGPT is the mega-mega version of that.

    However, in addition to the transformer, it also learns to understand you. Natural language processing is a species of AI that learns how to understand the intent, or underlying meaning, of the sentence. The bigger and better the model, the better it “understands” you.

    Large Language Models like ChatGPT are both NLP and a one way transformer.

    Large Language Models have the potential to be THE MOST disruptive technology of the bunch. Because they are text, they can increasingly understand and generate code, essays, books, scripts, emails, and spreadsheets. When they become what’s called “multi-modal” they will probably also generate films, games, animation curves and 3d assets.


    If you haven’t already started using ChatGPT, I highly recommend you start. Learning to prompt LLM’s in your discipline and workflow will be critical.


    Foundational Models

    Sam Altman (Wikipedia)

    In a recent interview, Sam Altman, the CEO of Open AI, foretold a vision of how companies with multimillion-dollar processing power will sit at the base of a new artificial intelligence software ecosystem. A gold rush of smaller companies, will use the APIs and “tune” these massive models to provide customizations for specialized markets.

    These giant models are called “foundational models.”

    We need to think of this disruption as a new operating system. The base of will be where custom datasets are tuned into it. Like I trained myself into the Iron Man picture, we all will train our custom data into a foundational model. Our decision, then, will be which one to use.

    These large text models are currently owned by big companies like Open AI, Amazon, and Elon Musk’s new X.ai. The graphics models which contain imagery and animation data are also growing to robustness within the servers of Adobe and NVidia.

    Foundational Models will sit at the base of the AI Software Ecosystem

    NOTE: Since giving this presentation, Stability.ai has gained massive traction on Git Hub, localizing Open Source alternatives. Should this decentralization continue, we may see a trend towards localized models instead of foundational models. It’s too early to tell, but indicative of a field that is moving at light speed.


    So let’s review…

    Neural networks are used for pattern recognition and prediction. Generative models query latent space to generate useable text, images, (and soon) may other things. LMMs are the real disruption of intelligence software.

    Foundational models are the architectural base, that everything will be built from.

    It will be up to us to tune our data sets, to fit our specific needs.


    Wonder Dynamics Demo

    The following is a demonstration of the cloud-based software, Wonder Dynamics. This is a promising workflow for the feature film or visual effects artist. You’ll hear that my five year old is just as impressed with the robots as I am.

    Try Wonder Dynamics yourself by signing up for early access.


    Education

    Education will need to adapt to be relevant in an AI world. Here are three suggestions.

    Listing the Prompt, Model and Sizing is good documentation.

    Documentation


    First, we must double down on documentation. I encourage students to use artificial intelligence, but they MUST document their process.

    If they start passing off AI for their own work, that’s cheating. However, well documented work flow is just good practice. Learning documentation is important for them to internalize concepts like attribution, licensing and intellectual property.

    Individuals should learn to build datasets that are specific to their skill set.

    Data Sets

    We must learn to create our own protected datasets. We must also learn to be aware of terms of use. Using stolen data can lead to lawsuits and a variety of really bad outcomes.

    My livelihood will be my animation curves. Your livelihood, be it concept artist, or writer, will be the information that you create to train your specific models for production.

    Your AI data will need to reflect your work and style. And that data (and your individuality) will need to be protected.

    We won’t build software the same way.

    Mind Set

    We need to change our focus from localized applications and start thinking about the interoperable networked one. Software won’t be architected locally, but increasingly will be a series of trained datasets, most likely in the cloud.

    When I speak with people about the future of artificial intelligence, and the concerns about automation, I’m often asked:

    “What’s left for us to do?”

    The only thing I’ve can come up with is creativity, ideas and community.

    My hope is that if we stay true to these principles, we will maintain a human value in a world where our labor is automated.

    If interested in more of my experimentations with machine learning and AI, please see this post here.

    (Fin)

    Thanks for Reading. See you Next Time. (Stable Diffusion)

    For the just getting started

    Some great learning resources that really helped me out when I was first learning about AI.


    This presentation and supporting materials are licensed under creative commons as an Attribution-NonCommercial-ShareAlike 4.0 International. (For more information: CC BY-NC-SA 4.0)



  • To Teach or Not to Teach Artificial Intelligence

    To Teach or Not to Teach Artificial Intelligence

    While watching academic (and non-technical) environments, I’m beginning to see that the pervasive instinct is to ban AI in writing and artwork curriculum. The argument is that students will use it to facilitate cheating, or it’s use will “cheapen” the learning of the artform.

    But increasingly, I worry that this approach will not prepare the next generation for the true world they are about to enter. Can we open the debate in a way that best helps the students, and limits the potential for academic dishonesty?


    Watching AI Enter the Animation Pipeline

    I spent 20 years as a professional animator. My passion is both story but also the technical implementation of it’s use in real time technology. That is, I really love making characters move in video game worlds.

    The satisfaction comes from the knowledge of complex systems, storytelling, and the final result of days and days of labor coming to fruition. I bring something that was seemly dead, to life. I do it, with an extraordinary amount of labor, skill, research, iteration… and often, the emotional pain of fighting my self doubt.

    Above you can see a screen capture of my desktop. I am currently animating (experimenting) with the Unreal Engine’s “Manny”, in Autodesk Maya. That means I move and set the position of the arms by keying. Me, a human.

    Since I derive so much personal value from the process, it was especially difficult for me to deal with the idea of automation, when I first discovered machine learning on a motion capture stage several years ago.

    Machine learning is the term for a class of mathematical models within the field of Artificial Intelligence. They are having a sort-of “golden age” of development. Training a machine learning model to move a face of a character, or to automate the rigging of a digital skeleton can drastically reduce the labor involved (in some cases by a factor of 100).

    As these models have matured, more and more of the pipeline is becoming automated. Below you can see one of the more impressive workflows called Learned Motion Matching. This alone, will drastically reduce the up-front creation time of video game character controllers.

    Source: https://montreal.ubisoft.com/en/introducing-learned-motion-matching/

    I remember a sleepless night where my stomach physically hurt as I came to the realization that AI had the potential to remove the need for animators themselves. (I wrote about it back then) I soon decided this was a problem that I needed to frame as Scottish philosopher David Hume articulates the “Is” vs. “Ought” problem.

    I was resisting the understanding of artificial intelligence because I was clinging to what I thought ought to be. Artists ought to be the drivers of the labor. Creativity ought to be part of the emotional struggles of the individual. Not a data set. Right?
    Only when I began to confront what AI is, was I able to begin to research and understand it.

    This is where we are at.

    So then, what AI is…

    AI is reaching a level of (highly) functional use for artist workflows in animation, music, editing, visual effects, illustration and many others. It’s being deployed in our software for everyday use. AI image making models are reaching critical mass now, because of the prolific sharing ability of the internet.

    Because of the nature of our digital world — we have larger and larger datasets, and the training of models are becoming more accurate and robust.


    Predictive systems will finish our work for us in the art we create, and it will understand not just how to animate, but our “intents” as the animator using it. We are at the cusp of creating a productivity boom unlike human kind has ever seen.

    Some examples from this past summers experimentation with stable diffusion.

    The challenge in the near future, is not with the mathematical models, or even the datasets that are being collected. These are nearing an astounding level of image fidelity. The challenge is the interface design and the UX — the accessibility to the non-coding masses.

    Many, *many* software companies are rushing to create this accessibility through new interfaces, plug ins, and automated things we don’t even notice. “Old Guard,” like Adobe, are seeming to keep pace by buying up new talent. But there is a sizable crop of generative start ups who are targeting other graphics markets. The focus, driven by capitalist desire, is mass adoption. This leads to facilitation of use, and exponential data collection.

    NVIDIA is now training AI agents “for decades” in real time simulated environments.

    But capitalism isn’t the only driver. Stable Diffusion, a popular clip-diffusion image maker, released themselves open source. Within days, new innovations were in google collabs around the world. I suggest searching the #stablediffusion tag on twitter and marveling at an endless stream of un-bundled experimentation. The acceleration of AI is not just driving the market economy, it is inventing the distributed licensed one as well.

    Can it be Avoided?

    Students can actively choose a variety of new applications to automate the writing of their essays (or even the accompanying illustrations!)

    Increasingly, that choice will be removed from them.

    The way that email checks your spelling and updates as you’d write, our software will make intuitive predictions about what we’re creating. It will make predictions and create elements for us, all in real time. I expect it just to be the default in photoshop, visual studio, word, and many others. The AI will just be there.

    AI is here. It can make renaissance paintings of power rangers.

    It can not be avoided or banned.

    Renaissance Artist Painting Power Ranger – Stable Diffusion.

    My Suggestions

    Here are my three suggestions of how you, as a teacher, can integrate it into your classroom. But most of all, you should reinforce your human connection to your students.

    I. Communicate


    Talk about it openly and honestly with your students. If you’re scared, tell them that you’re scared. If you are concerned about cheating or the way it’s being used — be honest about it. Be vulnerable about it.


    Opening honest debate allows for the many, many shades of “grey area” that might happen should a student turn in work. Banning it is fine, but you need to be deliberate about it’s name and function.

    You can’t say “no AI.”

    You will need to be specific: “No Clip-diffusion enhancement for this exercise today.”

    The debate needs to be open should issues arises regardless whether the student is intentionally cheating or the software has ambitiously finished it for them.

    One of my many many failed generative experiments, Science Fiction Alien Mech, Combat Technology, Disco Diffusion

    II. Understanding


    We need to have a common understanding about the the types of models. Artificial Intelligence is divided into classifications like neural networks, GANs, Language Models, Clip-Diffusion, etc. Students should understand the difference between what it means to train a neural network and how an agent is trained in reinforcement learning.

    Different applications of machine learning and artificial intelligence will propagate into different verticals. Depending on what your subject matter, certain models and architectures will fit better than others. As an animator, my primary focus is motion models. Those might not be as interesting to a writer who is being rewritten with language models.

    Students should have a sense about what AI is actually doing, not as some “magic thing” operating in the background. For each model, there are always a specific set of inputs and a resultant set of outputs. Even without a computer science or mathematics background, the classification of models is learnable at a simple level.

    For help with this, I recommend Melanie Mitchell’s book : “Artificial intelligence: A guide to thinking humans” This high school level book clearly explains the categories of AI, and offers non-technical and direct explanations of the operations of them. (link to amazon is below)

    Audrey Hepburn, Black and White, Art Wall Painting, Stable Diffusion. I spent a late night session generating black and white images of hollywood.

    III. Compassion


    The last thing is to approach your students with compassion. You must understand that these students are already interacting with extremely powerful algorithms. Their content stream from Tik Tok is being algorithmically constructed and tuned to their emotional impulses. They may think they are simply texting with friends, but they are already being gamed into large datasets.

    These processes have been reinforced by their social networks amongst their friends. Understanding their position and their actions, may increasingly become more difficult.

    To us, we will marvel as things start to be completed for us. For them, it will be normal. I think the youth should learn to count before getting a calculator, and I think the youth would appreciate concepts like voice models or latent space before they’re everywhere.


    Clearly, I have a big concern about our adoption of artificial intelligence. I’ve accepted the technology will be here, just as electricity or the internet simply arrived. I hope we as teachers can openly learn to accept it’s presence in our curriculum. We should learn to use it, but speak openly about it’s ethics.

    I hope you will take a moment to do your own research on this before coming to conclusions.

    Regardless of whether this line of thinking is fantastical or 100% correct, I understand this to be a contentious issue. I welcome open debate, as we should all participate to figure this out together.

    Thanks for Reading.

    Reference:

    Here is the link to Melanie Mitchell’s Artificial Intelligence: A Guide for Thinking Humans

    I might also recommend:

    Two Minute Papers: https://www.youtube.com/c/K%C3%A1rolyZsolnai

    Superintelligence, by Nick Bostrom: https://www.powells.com/book/superintelligence-paths-dangers-strategies-9780199678112

    Machine Learning for Art: https://ml4a.net/


  • Discovering Disco Diffusion and Prompt Design

    Discovering Disco Diffusion and Prompt Design

    Dalle2 is a generative image model by Open AI. It’s a transformative step in computer graphics.

    Dalle2 is in it’s early invite stage, which means, (as of July 2022) I don’t have access. (Hurry up Open AI, please?) Powered by a need-to-understand these models, I soon stumbled upon a model called “Disco Diffusion.” While not of the same power of Dalle, DiscoDiffusion is indicative of the future of generative media. And enormously fun to play with as well. Everything here was created with this DiscoDiffusion Google Collab and a free account.

    —-> (https://colab.research.google.com/github/alembics/disco-diffusion/blob/main/Disco_Diffusion.ipynb)

    My first animation test using Warp in DiscoDiffusion

    AI Comics

    Like most things, I start by clearing my creative blocks with comics. So, to find my footing, I thought about trying to get an AI to generate it’s own comics.

    Since the joke of a comic is somewhat separate from the art, I “prompted” new sentences with the AI powered InferKit. This led me iterate on the writing, until I felt the AI had the joke it wanted to tell.

    I took the reponse prompt from Infer, and then I fed the prompts into a freely available model called Craiyon (previously Dalle-mini) – using other tags for cartoonists and comics to shape the image.

    text_prompts = {
        0: [
    #Subject
    "--- Output copied from InferKit sentence",
    #Description
    "two character","single panel", "cross hatching"
    #Artist
    "gary larson", "new yorker",]
    }

    The first few felt weird. However, liking the possibility, I experimented with a “publishable” version with the merge on photoshop. It felt more authentic, and put structure to the result.

    I then tried the same exercise, but with Disco Diffusion. While the output turned more smushy, the options for experimentation within the collab notebook were numerous, and being open source, inspired a deeper dive. Soon, I think I had the knack of it.

    AI Watercolor Paintings

    While working, I set up another instance to do some landscape watercolors. I wandered through prompts of different places, like Savannah or Newport, but didn’t see much evidence of locations. I grew bored pretty quickly of it.

    AI SciFi Concept Art

    I decided to move into more concepting of science fiction. 1. Because, why not? and 2. I discovered Disco Diffusion has a PulpSciFi Data Set trained, which I switched to for a bit.

    I experimented with key words like “Ralph McQuarrie” and “Star Wars.”

    Finding other artists who are playing with the model online, I discovered the practice of “trending on art station.” It helped the results, however made me feel a little ill about the artists who are actually trending on artstation. It started to lead me to a recognizable paradigm and felt my prompts pulled into “cyberpunk” and “mech.”

    AI SciFi Space Photography

    I then found a really interesting breakthrough. I found someone who started listing camera lenses.

    This inspired me to take the more conceptual ideas I was working with and make it more photographic. I chose prompts that focused on realism, with references like “NASA” or “Star Trek.” I also set detailed instructions for camera lenses, like “27mm” and “tilt shift.”

    AI Alien Mech Technology Photography

    But eventually I wanted to create something… alien I guess.

    I began to organize my prompts into sections, and mess with individual variables to get more of what I was looking to generate.

    I experimented with a number of variants with lenses and lighting, and even trying to Tilt Shift. The best results, according to me, are above.

    Most of the imagery you see came from what is becoming a bit of a “base prompt” for me:

        "Mech Suit with military grade weapon systems in an alien world",
                "Sci fi, Iron Man, technology, attack",
                "photography",
                "27mm",
      

    AI Animation? AI Shorts?

    I’ve only just begun to explore!

    If you want to see some extrodinary work, you should consider joing the DD discord.

    https://discord.gg/fzevz8Z4

    Mind Blown

    The implications of the democratization of AI art will be extraordinary. The architecture of CLIP and the diffusion models, the design process of prompt engineering, and the intellectual property implications are probably more than this blog post.

    As I wrap my head around the use, I’ll update here.

    As always, thanks for reading. Happy prompting!


  • Machine Learning and Animation

    Machine Learning and Animation

    As the tools become more accessible, neural networks may soon be driving the performance of characters

    A machine… can’t possibly… ?!?

    I find myself involved in conversations where I ask graphics artists if they feel threatened by AI (artificial intelligence). Many animators think that being in a “creative” occupation means that they are safe. For many, it’s okay to think automation will wipe out auto-callers and Uber drivers, but keying a character performance is something that we won’t have to worry about for a while.

    I used to be one of these people, but now, I’m not so sure.

    Fake it til’ ya Make it

    The internet continues to show us examples of “deep fakes”where mathematical image recognition models have allowed for the creation of fully manufactured digital characters. Upon viewing this, there is an initial reaction of “that’s crazy,” followed by a dismissive “wave of the hand.” I think there is a misunderstanding what the neural network is really doing.

    It’s easy to confuse this “puppeting” of Vlad Putin or Elon Musk as simply a real time one to one, like that of a performance capture system. Or that the video is is composited or mixed into the pixels in some way.

    But this technology is actually generating an entirely new performance.

    Neural Networks

    Honestly, I have only just begun my journey to understand how neural networks work. In many ways they are still a black box of mathematics full of PhD level vocabulary that a humble unfrozen cave man animator like me can’t wrap their mind around. I can’t tell the mathematical difference between a Lambert and a Phong, but I absolutely know the difference in the way they look and when to use them. Similarly, it will be hard for me to describe the calculations in a General Adversarial Network, but I’m starting to see the possibilities of what it can produce.

    With only a few weeks of reading – and the help of the classes like IPT@New York University (https://ml4a.github.io/classes/) – I’m fairly confident, that at it’s core, neural networks use a whole lot of data, to learn how to make an input on one side come out as something generative on the other. It’s a system that makes guesses (or predictions) and the more data you cram into it, the better those guesses get.

    The whole point of a neural network is to recognize features, or patterns that are not explicitly called out in the data set. These patterns are what the network uses to construct wholly new entities (guesses), that look and feel almost entirely like the original data set. Essentially, with enough data, it can fake a new entity that is nearly indistinguishable from the original. The question becomes:

    What data do we use to animate?

    From Light Cycles to Dinosaurs

    In 1982, to create the illusion that 3d objects moved across the screen, the animators on Tron had to stand over the shoulders of engineers and instruct them to plot out thousands of individual x, y, z coordinates. By the mid 1990’s, instead of entering individual coordinates, software UI’s had become accessible to allow animators to key and refine the motion of an object. This made moving dinosaurs in the computer a reality.

    The curve editor, now commonplace in animation software like Maya and Blender, took 15 years to imagine and develop. It is a way for artists to visualize the acceleration data of 3D objects, and a way for those artists to communicate their intent to the software.

    Whether the data is from captured human performance or “hand keyed,” curve editors are the common language of motion data in the animation world.

    Animator Intent = X, Animated Character = Y

    So, if a neural network could be fed motion data and “learn” how a human moves, it might be possible for it to learn what the animator is trying to communicate and generate predictions of what makes a good animated performance.

    Actually, the development of this, “animator intent = X” variable might be the easy part. We have enormous databases of “scrapable” human motion, considering existing deep learning models like PoseNet could pull data from an ever infinite library on Youtube.

    The features which lead to the Y variable, and the question that I’m going to scratch at is:

    Could a model be developed where the features generate a great animated character performance?

    From Academics to Artists

    Up until now, machine learning is something only the academics played with – and now big tech – as these early scientists have been gobbled up by the tech titans like Uber, Facebook, Google and Tesla.

    Like the engineers and light cycle inputs, only tensor flow python-heads could enter the parameters for a ML model. However, a new wave of UI thinking is now making this approachable to us unfrozen cave men animators. Unity, a powerhouse 3D game engine, has jammed “ML agents” into their software to facilitate ingesting tensor flow models. And a stand alone GUI called Weka, allows for non-coding explorations of deep learning models.

    However, a new visual interface called Runway ML is the first, of what I see, of the development of artist-friendly machine learning tools. There will increasingly be less of a need for artists to get their hands dirty in the code of neural networks, as services like this will provide an accessible way for artists to integrate the thinking directly into their existing animation work flows.

    Maybe even accessible to a cave man animator like me.

    That’s it for this week, thanks for reading. Please subscribe and join the conversation.

    Reference:

    ITP@NYU / Machine Learning for Artists: https://ml4a.github.io/classes/

    Daniel Shiffman’s Coding Train: https://www.youtube.com/channel/UCvjgXvBlbQiydffZU7m1_aw

    Machine Learning Mastery:

    https://machinelearningmastery.com/machine-learning-mastery-weka/

    RunWay ML: https://runwayml.com/

    The TensorFlow Playground: https://playground.tensorflow.org/

    PoseNet Machine Learning Model: https://github.com/tensorflow/tfjs-models/tree/master/posenet

    Andrew Price (the blender guru) made a similar argument: https://youtu.be/FlgLxSLsYWQ


  • Introduction to Machine Learning for Artists

    Date: April 10, 2021

    IntroML

    This was a presentation on machine learning for artists focusing on accessibility, convolutional neural networks and generative adversarial networks. I believe that the next generation should understand and practice the use of machine learning, as it will evolve into a highly accessible and productive technology in computer graphics. This was presented online to a group of about 60 students.


  • Discovering Machine Learning Art Production with RunwayML

    Discovering Machine Learning Art Production with RunwayML

    Here are my first experiments with Machine Learning Art with Runway ML. I’ve been also taking Artificial Images: RunwayML Deep Dive with Derrick Schultz. Which has been both a helpful community and resource. If you are interested in this stuff, you should follow his youtube.

    This represents part time work over the last two weeks. Some of my experiments, with hypothesis, are below. It’s super fun!


    Incredible nyeGAN

    Hypothesis/Purpose:

    “I wonder if I could make one of those “smooshy-things” I see on twitter?

    Approach:

    I made 3 minutes of funny faces at the camera using OBS Studio to capture the video. I exported the video at 30 frames per second into After Effects. I exported the frames into a “data” folder in my AE project structure.

    I uploaded the frames to Runway’s Style GAN 2 training, with the Faces DataSet. I let it run for 5000 steps, which took about 5 hours on their servers.

    I then edited the first moments of the Incredibles and outputted the frames in After Effects. There was about 1200 frames, I roughly edited out some of the parts where Mr. Incredible wasn’t directly in the camera.

    I fed it into my nyeGAN using the training section. I let it cook for about 2 hours, but stopped it when I saw that Mr. Incredible was more “overwriting” than “blending” with my original frames.

    Data:

    Footage A: 3 minutes of me making funny faces to the camera

    Footage B: The open interview sequence, of Disney/Pixar’s The Incredibles.

    Conclusion:

    A lot of data that is similar, like my funny faces video, doesn’t really make for interesting content. I was also surprised that the two datasets didn’t mix more.

    For my next GAN experimentation, I want to scrape some datasets of like things and actually think about the diversity and make up.


    Some Pumped Up Motion Tests

    Hypothesis/Purpose:

    I want to see what comes out of the motion models.

    I’ve seen work online using DensePose, and First Order Motion Model, that I wanted to replicate. Eventually, I want to use something like pix2pix to “puppet” characters.

    Approach:

    Using the awesome work of Marquese Scott, ( Twitter:
    https://twitter.com/officialwhzgud ) I ripped his “Pumped up” Video from youtube using 4K Video Download. I exported a section of frames from After Effects, and ran it into a Workspace in Runway.

    I also took imagery of Goofy from the internet, and painted over a single frame of the video to test, First Order Motion with full body, and Liquid Warping, which seemed “worth a shot. “

    Conclusion:

    Open Pif Paf tracked well to the video. Though, I found that despite PoseNet not rendering video out of Runway, the data was a bit better. (JSON) First Order Motion must be “tuned” in Runway for faces, and didn’t quite work for the full body.

    Like working with posenet. Tho, I’d love it to render out of Runway.

    My most successful take away was the PoseNet export. The time code and positions are normalized between 0 and 1. Within that area, it creates a series of positions (17) using the X and Y data.

    How do I get that data into an animation program?

    BalletDancin’

    Hypothesis/Purpose:

    Can I normalize erratic movement? Can render Runway into my After Effects pipeline? if I can get the x and y positions, what ways can I get Z axis data?

    Approach:

    I found a ballet clip on youtube, “The Top 15 Female Ballet Dancers.” I wanted to isolate an individual who was moving around the stage. I used basic tracking in After Effects, and adjusted some keys to track her to a red cross hair in a 512 x 512 area. Basically, I centered her in the area to be normalized.

    I then ran it through, PifPaf and Dense Depth. My purpose for DenseDepth was to see if I could get any sort of Z data.

    Conclusion:

    Pipe works on the Runway side. I need to figure out how to get the data into some animation software.

    Spiderman and his amazing GANs

    Hello World!

    Purpose:

    StyleGANs (with Picasso or Van Gogh) are kind of the “Hello World” of Machine Learning Art. Runway makes it easy to try it.

    Approach:

    I ripped three 80’s cartoon openings from youtube. I then chose preset StyleGANs in Runway and fired them through. They took about 10 to 15 minutes a piece.

    Conclusion:

    A necessary exercise to get into the software. A great way to understand what the models are doing behind the scenes. Here are a couple of images that I felt sort of successful from this little exercise. The other two cartoon opens are on the nytrogen youtube.

    Machine Learning is a scream!

  • Runway Deep Dive: Class Presentation

    Date: May 18, 2020

    This was my final presentation from Derek Shultz Runway Deep Dive Class. I took it in the summer of 2020. I highly recommend Derek’s class and youtube channel for information about machine learning image making.

    I review the projects I worked on as I played with machine learning models in the RunwayML platform.


  • YangGAN: Putting Andrew Yang’s Essence into AI

    YangGAN: Putting Andrew Yang’s Essence into AI

    This motion that you see of Andrew, squishing around above, is what’s called a latent space walk of a generative adversarial network or GAN. GAN’s are super duper powerful, sponges of data points that are plotted in 512 dimensional space. Let’s recall that we fail to think well in 3 dimensional space most of the time. So, 512 dimensions is way out there. (Don’t think about it too long, or blood will shoot out your nose. )

    Effectively, by “spacially” moving from one Andrew-Yang-head data point in 512 dimensional space, along a yang vector, to another Andrew-Yang-head data point, the image morphs. It’s how we will move virtual characters soon enough.

    This particular flavor is of GAN is called StyleGAN2 from NVIDIA. I lovingly trained it with 6000 head shots of Andrew Yang interviews. All I needed to do to collect this data, was scrape his youtube site and then batch export them as individual frames. Using the python tool autocrop, I was able to very quickly amass 15000 frames of Andrew Yang from the chest up. I culled it down to about 6000, throwing away images that didn’t fit nicely in the box with most of his face, facing the screen.

    In prototype versions of the GAN’s I’ve made, I discovered the color palette was all over the place. On this GAN, I effectively had to “normalize,” or limit the range of the colors. After experimenting, I stumbled on a look, and batch effected the training frames with a nintendo gameboy color filter.

    Gameboy colors because I love Super Mario Land.

    I did my GAN development with RunwayML. A nifty little program that allows me to focus on running models and not drinking my face off when my python dependencies don’t install.

    OK, So why the YangGAN?

    Jokes aside…

    Generative Adversarial Networks, Computer Vision, and networks of computers collectively rendering, will revolutionize computer graphics. We will create near-reality very soon, and while we are doing it, destroy the need for human labor to actually work for the current methods of value. We should be talking openly about the dangers of Artificial Intelligence and economic collapse.

    I believe Universal Basic Income is the most realistic thing we can collectively do as a country to save ourselves.

    Our life expectancy is dropping, we’re fighting our neighbors, we are letting our worst self consume us. We actually need to do something.

    I’m asking you to please investigate universal basic income. Andrew’s organization, is the best I see going right now. #yanggang baby.

    Links and Reference

    For more information about Andrew Yang and his efforts at Humanity Forward, please visit: https://movehumanityforward.com/

    Time did a semi-ok main stream piece on it:

    https://time.com/4737956/universal-basic-income/

    This is a bit heady, but Ian Goodfellow is the guy who more or less put a generator and discriminator together to create the concept of a Generative Adversarial Networks. https://www.youtube.com/watch?v=Z6rxF…

    You should download and play with models yourself with RunwayML: http://runwayml.com

    Also, I stole the “blood shoot out of your nose” bit from Louis Black.

    More of my Machine Learning work will eventually be at http://nytrogen.ai


  • Machine Learning Experiments

    AI is rapidly advancing into computer graphics. It is moving far faster than I imagined it would.

    I believe AI should be taught and experimented with in the educational process. For more of this thinking, please reach my post here.

    Otherwise, have fun exploring some of the nonsense below.

    Stable Diffusion and Dreamstudio

    I have been using stable diffusion fairly obsessively since it’s release in the Fall of 2022. Below are some samples of my work as of November 2022.

    Imagery and Photography Experiments
    Self Portraiture with Stable Diffusion / Dreamstudio
    Science Fiction, Mechs, and Technology Experimentation

    ShatnerGAN

    For whatever reason, I spent a week ripping Captain Kirk shots from the original Star Trek series and training StyleGAN2 from NVidia. This experiment had 15000 close ups of James T Kirk, and the model was trained for 5000 steps.
    Software: 4K Video Downloader, Autocrop, and RunwayML

    Put a GAN on it! – Stealing Beyonce’s Motion Data

    During my class with Derek Shultz I used Beyonce’s “Put a Ring on It,” to experiment with a number of models that existed within the AI model bridge software called RunwayML. This is the first time I integrated machine learning models into my after effects workflow. While it’s essentially pure fun, it allowed me to experiment with a number of the models and get a sense of their capabilities.
    Software: Runway, After Effects, Photoshop

    YangGAN: Andrew Yang’s Essence in AI #yanggang #humanityforward

    After my Captain Kirk GAN, I decided to try another human trained GAN. I found a clip of Andrew Yang speaking about the advancement of AI journalists, and was inspired to match the audio with a latent space walk of a trained GAN. This was trained on about 4000 images of Andrew Yang that I scraped from various interviews. The head was cropped and run through an image adjustment recipe I developed with Python and Image Magik. I trained the GAN in Runway using StyleGAN2.
    Software: 4K Video Downloader, Python: Autocrop and Image Magik, and RunwayML

    Ride the Train! – Experiments with Image Segmentation

    This was an experiment to play with Image Mapping Segmentation. I had seen a number of experiments with Image Mapping but had seen little with using it as a renderer. I used shaders in Maya that were matched to the Image Mapping set up in RunwayML. I rendered each layer through Runway and composited it in After Effects. The technology is far from functional, but the promise is there.
    Software: RunwayML, Autodesk Maya, After Effects

    Machine Learning Motion Model Experiments

    My primary interest in machine learning is experimentation with animation data and motion. These were some experiments I ran to see what motion model got what result. My take away was that the clips needed to be “normalized” to get a good read. That’s why I created a template to track the video.
    Software: 4K Video Downloader, Autocrop, and RunwayML

    Fun with CheapFakes

    This is a fun model and easy to use. I scraped some Arnold Schwartznegger clips from youtube, and had a friend, Daron Jennings, improvise some clips. It was simply a matter of running the model with the appropriate components, and then compositing it in After Effects. It might be something fun to use for the future.
    Software: Wav2Lip, After Effects


  • Artists Painting with Artificial Intelligence

    Artists Painting with Artificial Intelligence

    GauGAN, peanutsGAN, and DataPaint


    The Nytrogen newsletter follows the disruptions happening to the computer graphics industry. Each week, I send out my thoughts on the technology, work flow, and artistry in the evolution of real time animation production.

    A few months back, NVIDIA showed off a product called GauGAN. Using a pair of neural networks, (called general adversarial networks, or GAN) they trained a system to recognize 100’s of thousands of images of outdoor pictures. Armed with this network, and trained with the enormous data set, they created a Graphical User Interface, or GUI. Users could create “segmentation maps” that allow anyone to simply sketch out the merest indication of the landscape and have it recreated with the AI’s best guess.

    As shown in the video below, it’s guesses we’re pretty damn good.

    This is a strong indication of the direction the artistic use case for machine learning will go. It’s going to be like wielding a super pen. We will be able the indicate where the tree goes, but the look and style of said tree will probably be generated on the spot.

    Very soon, anyone who can squiggle a few marks (and have access to an enormous visual data set) will make near replicas of some of our greatest visual works. Here’s a couple of fun ideas I had.

    peanutsGAN

    According to Wikipedia:

    Charles M. Schulz created a total of 17,897 Peanuts strips of which there are 15,391 daily strips and 2,506 Sunday strips.

    That’s a lot of drawings of our favorite bald block head that could theoretically be fed to a general adversarial network.

    The average Disney film is about 80 minutes, at 24 frames per second, that’s 1920 drawings. There are at least 10 hand drawn animated classics that I can think of, plus weeks worth of shorts, books, and loads and loads of marketing materials. If someone were looking to extract the style of a Disney film, there is plenty to train on.

    Where’s the DataPaint?

    If these nets are coming with GUI’s to allow artists to wield them, then it begs the question.

    Where can we get, non-licensed, large data sets of imagery?

    I typed in “Batman” into google images. With the millions of “like feature” results that came back, I’m sure a network can learn exactly what he looks like. But is this legal, according to our current laws of copyright?

    this is fucking lasagnas…

    Will our aggregate online imagery eventually be scraped and squished into gigantic mathematical systems? And in the short term, what’s going to happen when networks like this get into Photoshop? Or Unity?

    As news begins to trickle in on the merger of machine learning and animation graphics, I will be sure to keep tabs on the development.

    Please consider subscribing if this is of interest to you. I welcome thoughts and feedback.

    Thanks for reading, we’ll see you next week.

    Links & Reference:

    StyleGANS explained and NVIDIA’s novel architecture for Generative Adversarial Networks at Towards Data Science:

    https://towardsdatascience.com/explained-a-style-based-generator-architecture-for-gans-generating-and-tuning-realistic-6cb2be0f431

    Google QuickDraw: https://quickdraw.withgoogle.com/

    Nvidia AI turns sketches into photorealistic Landscapes in Seconds:

    Search Google Images for Batman:

    https://www.google.com/search?client=firefox-b-1-d&biw=1547&bih=962&tbm=isch&sa=1&ei=f9YkXZnXOc_K-gS1q7TwBw&q=batman&oq=batman&gs_l=img.3..0i67l3j0j0i67j0l2j0i67l3.254505.255099..255345…0.0..0.101.512.5j1……0….1..gws-wiz-img…….35i39.kcbykIW0c_c