Category: Writing

I write to understand. Here are some of my select essays.


  • The Making of a Generative Gangster Film with Gen-2 in Runway

    The Making of a Generative Gangster Film with Gen-2 in Runway

    An experiment in AI filmmaking to create the style and feel of a 1970’s gangster film, using off the shelf AI tooling.

    In the style of a gangster film …

    Summer Experiment

    With summertime comes time to play, and a backlog of experiments on my whiteboard. During the year, I see updates across the AI space, and with downtime, I set aside time to play.

    This experiment was based on the discover of Runway’s GEN-2 AI model that creates video. I wanted to know if I could make a movie!

    I’ve been a long time fan of Runway, using it since it’s first version, but this upgrade of the platform is something else. You can prompt shots. Yes, shots. Like from the movies.

    So for example, I can prompt:

    “In the style of a gangster film, 1970’s filmmaking, establishing shot los angeles, action film. Morning, sun comes up.

    I will create it’s best “guess.” I can change things on the prompt and get specific on the time of day or the camera changes. It looks bad most of the time, but some of the outputs are amazing.

    Every shot, in the movie above, was generated with a formulae using

    1. Leonardo.ai to prompt the image generation model for Art Direction,

    2. and then a systemic approach to shot development in Runway’s Gen-2.

    I left a balance of shots in the edit. Some are amazing, and some are clearly generated. I deliberately wanted it to be generative for good or bad. The point was to see if it is possible.

    Ultimately the technology is still early. But, man, is it coming quick!

    So, here are a couple of notes going through the process of making a completely generative short film with Gen-2 from Runway, Leonardo.ai, and some geeky love of action movies.

    (You should try it out for yourself)

    Coherent Story is Hard

    Its hard to get a coherent story through random shots – you have to play this game with the AI, to figure out what the model can deliver. These image models can do individual medium close ups of humans. So, that’s the what you have to build.

    This is why most of the AI filmmaking (to this point) has been single close up “trailer type” of content. You might have seen the Wes Anderson stuff, or the Harry Potter Balinciaga. Getting one shot to play cleanly to the next is a multi-faceted battle and the models struggle to do this.

    I desperately wanted to do cars, action, shooting, doors opening… no one got shot in this film! So while close up of actors is cool, it’s limited and not “gangstery.”

    I did try two things to try to pull something together.

    1. I used a track by The Herbalizer, a talented DJ Duo, to give the piece structure. I thought about generating the music, but this track gave it presence. This gave the “gangster tone” I was looking for, but also the key changes for scenes and shots.
    2. I used tropes that clearly the model knew how to do. That was “sunglasses,” “1970’s”, “filmmaking,” and it clearly understands “Los Angeles,” because that’s probably where most of the cinematic data has come from.

    AI Reference Material is Huge

    I had difficulty figuring out the shots I wanted, so I went through a sort of “Art Direction phase.” Once I had a series of images that worked for each piece, I knew how to prompt for it. I also could use the image as a prompting reference in Runway.

    Opening Day: Los Angeles, Palm Trees, Day, Cadillac, Downtown, 1970’s

    The Deal: Interior of Man with Sunglasses, 1970’s architecture, Gangster

    The Night time: Night, Palm Trees, Rodeo Drive, Hollywood

    The woman: Interior Hollywood Bar, 1970’s, Beautiful Woman, Martini

    the club owner: Interior Los Angeles, Black Club Owner, 1970’s

    The fight: Man with Sunglasses, dramatic cinematic, gun fight, action, 1970’s

    You can’t do Action… yet

    I have to reiterate the frustration I had trying to get any motion of a Cadillac driving down the 405. There just doesn’t seem to be shots of cars in motion, or the model can’t do it. I also found, most often, that the model can’t move the camera or change the framing. Great filmmaking has architected shots where the camera and subject move, and we can’t quite do that yet. It’s just actors sort of smooshing around.

    Also, It’s PG Only – Words like “sexy,” “seductive,” “breasts” all trigger alarms. I’m sorry, I can’t make a gangster movie without naked women. So, I don’t what that means you, Gen-2 model people at Runway,

    — I just want to call attention to it!

    Runway Gen-1 didn’t hold attention like this model did. This upgrade is quite promising, and I eagerly want to see what the next one is like.

    If I could do this exercise but actually be able to flip a van over, or actually stage a gun fight, or maintain a lead actor, maybe I could actually tell action driven gangster stories. Just by writing each shot.

    That’s unbelievably amazing if that is true.

    I am convinced it is coming. Thanks for reading. Let me know if you have any thoughts or feedback.


  • Artificial Intelligence: A Primer for Graphics Artists and Educators

    Artificial Intelligence: A Primer for Graphics Artists and Educators

    A gentle introduction to four categories of AI models, and some suggestions on how education should approach its arrival.

    Introduction

    I’ve been doing computer graphics professionally for 25 years. I’ve never seen anything like generative artificial intelligence. Several years ago, on a motion capture stage, I saw the potential of real-time graphics and animation and was clobbered with the realization that AI would change everything.

    This is a synopsis of a talk I gave to some students about the topic, and then, again, to animation and game development faculty. My interest here is to raise awareness.

    This is intended for beginners with some understanding of computer graphics. It’s my hope today to give you an overview of four categories of AI that you should pay attention to, as it begins to permeate your life. At the end of this presentation, I give some thoughts on how education should adapt.

    The six sections below are:

    1. It’s Only the Beginning
    2. Neural Networks
    3. Generative Models
    4. Large Language Models
    5. Foundational Models
    6. Thoughts on Education

    It’s Only the Beginning

    Below is a series of images that were created using AI. Most of these images were prompted, (which I will discuss below) and then generated by a clip diffusion model called Stable Diffusion.

    A selection of experiments from 2022 – 2023 (Stable Diffusion)

    Starting with image making and endless conversations with ChatGPT, AI has blended into almost all of my workflows. I’m now animating with it, writing with it, and in a constant experimentation cycle with it.
    For example, I readily play with NERFs or Neural Field Radiance, which allow scanning full 3D objects for importing into the computer — with your phone! It’s amazing.

    Here is a 3d NERF scan of my 7 year old’s Lego ATAT. Let it load – You can grab and rotate around it.

    These kind of accessible software tools for creation will proliferate rapidly. The graph below from venture capitalist firm Andreessen Horwitz shows that the number of research papers with working code has skyrocketed. These research innovations, some included below, are becoming productized.

    In short, computer graphics is becoming accessible to everyone.

    Jack Solslow @jacksoslow (a16z – Andreesen Horwitz) via Twitter

    Neural Networks

    Learning Goal: Understand how to build a Classifier.

    Artificial Intelligence is a science that is several decades old. It is most likely that the architecture of the neural network, in addition to the creation of insanely vast amounts of internet data, has been the rocket fuel for the discipline to explode.

    Neural networks can see and learn patterns.

    When you unlock your phone with your face, it is a neural network that recognizes you. Neural networks are critical primitive for the generative artist to learn.

    Computer Vision is the recognition of patterns.
    Source: Wikipedia

    Above is a pixelated orc. Every pixel on this image has an x value and a y value, and it is either black or white. A pixel then, is just numbers. With these numbers put together with other pixels, we can create a pattern, like a simple 3×3 diagonal. A collection of this math that recognizes this pattern is attached to a “little activation unit.” These activation units fire when sensing simple patterns. These units are called “neurons.”

    Neural networks are rowed collections of these neurons. A series of pattern activators spread over a series of layers. To build one is to “train” the neurons and “tune” the math between the layers. This is done by feeding it lots of data on the patterns you wish it to learn.


    It is effectively a way for math to have memory.

    Neural Networks are the core architecture for classification models.

    Neural networks are widely used for object classification, image segmentation and all sorts of useful pattern recognition.


    Demo: Build a Power Ranger Classifier

    In the demonstration below, I show how machine learning can learn to recognize my five year old’s Power Ranger. After running the trained model, I show the two data sets: One with the Power Ranger, and the other without.

    By learning the patterns of each of these datasets, we can recognize a Power Ranger.

    Try Lobe yourself at Lobe.ai


    Generative Models

    Learning Goal: Understand how to prompt and tune generative image making models.

    Have you used ChatGPT? How about Midjourney?

    These are generative models. These models take input and generate an image or text. Eventually, this method will be used to create … well, potentially anything digital. This input is called “Prompting.” Prompting is a form of user intent, a sentence that represents what we hope the AI model will generate for us.

    In this case, my prompt was for a painting of George Washington drinking a beer. I use references to artists to get it closer to my desired output. (I’ll leave the intellectual property discussion for another day.)

    Prompting Presidential Beers

    It’s not bad, huh? And it’s gettin’ pretty good. But, prompting has it’s limits.

    I can’t make an image of me as Iron Man by just prompting alone. It doesn’t know what I look like. Which is why I need to tune a dataset on the images of me. This means creating a dataset of me, say 25 pictures or so, and then feeding it to the network so it learns to identify that pattern. (Me!) This can be a little challenging, but the tooling is getting better.

    Tuning a data set with images.

    Once I’ve trained the model, however, I’m able to use it with any other prompt quite easily. And since clearly there is a real need to turn myself into a Pixar character, I can easily do that.

    Learning to tune generative models is a fundamental skill for the generative artist.


    A quick note on Latent Space …

    Latent Space is a concept that helps me understand data sets as a giant universe of data. Each of these colored dots might represent a vector, or a specific category, of the dataset. Say “anime” or “science fiction,” or specific to the camera like “cinematic.”

    A universe of possibilities in the data set.

    The intersection of these vectors is what generates a unique seed of a pattern. This pattern is an absolutely new and unique generative image.


    Demo: Image Making with Stable Diffusion.

    As an animator, of course I am interested in robots. Giant robots. When I began using image generation models, I began prompting mechs and science fiction.

    Early Giant Robot Experiments (Disco and Stable Diffusion)

    When I began, I used the image creation model, Disco Diffusion in Google collab notebooks. A year later, I am now creating more compelling imagery with the more robust and accessible, Stable Diffusion. By using the platform Leonardo.ai I can iterate far quicker than I ever did in the Google collabs.

    This is evidence that coherence and fidelity of images will only accelerate with better models, tooling and workflow.

    The most recent Mech Experiment from Summer 2023. (Stable Diffusion)

    I recommend prompting images yourself with the Stable Diffusion platform Leonardo.ai.


    Large Language Models

    Chat GPT, and similar models like Google Bard and called large language models. They are composed of a giant transformer. Not like Optimus Prime, but like a huge predictive network that chooses the next word in the sentence. You know how your email suggests a response? ChatGPT is the mega-mega version of that.

    However, in addition to the transformer, it also learns to understand you. Natural language processing is a species of AI that learns how to understand the intent, or underlying meaning, of the sentence. The bigger and better the model, the better it “understands” you.

    Large Language Models like ChatGPT are both NLP and a one way transformer.

    Large Language Models have the potential to be THE MOST disruptive technology of the bunch. Because they are text, they can increasingly understand and generate code, essays, books, scripts, emails, and spreadsheets. When they become what’s called “multi-modal” they will probably also generate films, games, animation curves and 3d assets.


    If you haven’t already started using ChatGPT, I highly recommend you start. Learning to prompt LLM’s in your discipline and workflow will be critical.


    Foundational Models

    Sam Altman (Wikipedia)

    In a recent interview, Sam Altman, the CEO of Open AI, foretold a vision of how companies with multimillion-dollar processing power will sit at the base of a new artificial intelligence software ecosystem. A gold rush of smaller companies, will use the APIs and “tune” these massive models to provide customizations for specialized markets.

    These giant models are called “foundational models.”

    We need to think of this disruption as a new operating system. The base of will be where custom datasets are tuned into it. Like I trained myself into the Iron Man picture, we all will train our custom data into a foundational model. Our decision, then, will be which one to use.

    These large text models are currently owned by big companies like Open AI, Amazon, and Elon Musk’s new X.ai. The graphics models which contain imagery and animation data are also growing to robustness within the servers of Adobe and NVidia.

    Foundational Models will sit at the base of the AI Software Ecosystem

    NOTE: Since giving this presentation, Stability.ai has gained massive traction on Git Hub, localizing Open Source alternatives. Should this decentralization continue, we may see a trend towards localized models instead of foundational models. It’s too early to tell, but indicative of a field that is moving at light speed.


    So let’s review…

    Neural networks are used for pattern recognition and prediction. Generative models query latent space to generate useable text, images, (and soon) may other things. LMMs are the real disruption of intelligence software.

    Foundational models are the architectural base, that everything will be built from.

    It will be up to us to tune our data sets, to fit our specific needs.


    Wonder Dynamics Demo

    The following is a demonstration of the cloud-based software, Wonder Dynamics. This is a promising workflow for the feature film or visual effects artist. You’ll hear that my five year old is just as impressed with the robots as I am.

    Try Wonder Dynamics yourself by signing up for early access.


    Education

    Education will need to adapt to be relevant in an AI world. Here are three suggestions.

    Listing the Prompt, Model and Sizing is good documentation.

    Documentation


    First, we must double down on documentation. I encourage students to use artificial intelligence, but they MUST document their process.

    If they start passing off AI for their own work, that’s cheating. However, well documented work flow is just good practice. Learning documentation is important for them to internalize concepts like attribution, licensing and intellectual property.

    Individuals should learn to build datasets that are specific to their skill set.

    Data Sets

    We must learn to create our own protected datasets. We must also learn to be aware of terms of use. Using stolen data can lead to lawsuits and a variety of really bad outcomes.

    My livelihood will be my animation curves. Your livelihood, be it concept artist, or writer, will be the information that you create to train your specific models for production.

    Your AI data will need to reflect your work and style. And that data (and your individuality) will need to be protected.

    We won’t build software the same way.

    Mind Set

    We need to change our focus from localized applications and start thinking about the interoperable networked one. Software won’t be architected locally, but increasingly will be a series of trained datasets, most likely in the cloud.

    When I speak with people about the future of artificial intelligence, and the concerns about automation, I’m often asked:

    “What’s left for us to do?”

    The only thing I’ve can come up with is creativity, ideas and community.

    My hope is that if we stay true to these principles, we will maintain a human value in a world where our labor is automated.

    If interested in more of my experimentations with machine learning and AI, please see this post here.

    (Fin)

    Thanks for Reading. See you Next Time. (Stable Diffusion)

    For the just getting started

    Some great learning resources that really helped me out when I was first learning about AI.


    This presentation and supporting materials are licensed under creative commons as an Attribution-NonCommercial-ShareAlike 4.0 International. (For more information: CC BY-NC-SA 4.0)



  • WTF is the Metaverse #02: Infrastructure

    WTF is the Metaverse #02: Infrastructure

    Building for the virtual world is hard, but I think it should also be fun!

    This is Week 02 of my newsletter where we talk about building the metaverse, together!

    There’s Gold in them Hills!

    It’s sometimes easy to forget that there were two internet technology booms. 

    The second boom of the early 2000’s, yielded the companies that we all now know (Facebook, Amazon). The first boom of the late 90’s, however, was driven by a speculative bubble of investment and overhyped insanity.

    Check out this chart (from Wikipedia) on the venture capital spending on internet technology, and see how the boom peaked – and then crashed – all just before Zuckerberg launched his version 1.0 of Facebook in his Harvard dorm room.

    Facebook Founded in 2004
    Facebook Founded in 2004

    Despite the fact that many lost their shirts, this wasn’t a waste of time or money.

    The companies that did grow out of the internet would not have been able to gain the traction they did if there had not been the wiring, the servers, or the masses who understood it. The second wave only gained traction, because the first wave built INFRASTRUCTURE.

    Today, take a moment to think a bit about this opportunistic time and how it is also happening in terms of the Metaverse we are building. We know we can build glorious 3d worlds, but we won’t be able to create those societal structures for commerce or governance, until that network infrastructure, and the masses who understand it, are there and in place.

    It’s not here yet!

    You may be seeing media posts from everywhere telling us about how we are going to be doing magical things in the metaverse. If you dig just a little bit, you will hear the cryptocurrency maximalists tell you how stable their networks are.I admit that I get on to twitter and feel that I’m behind. It’s easy to feel that the whole world has already figured the metaverse out, and that we are missing it.

    Not quite.

    So, take a beat and relax, (talking to myself here too) you still have time! Rome wasn’t built in a day, and despite what the news outlets or venture capitalists tell us, neither will the Metaverse.However, just because it’s early, doesn’t mean it’s not moving really really fast.

    And that’s why in the next several posts, I’ll be talking about this infrastructure, these basic building blocks for a networked world, that will make up the structural components of the metaverse.

    —That’s it for this time.

    Remember, you can always reach out to me on @nyewarburton on twitter.

    My DM’s are always open.And, hey, if this email inspires you to write or comment? Do it! I hope we can start a dialogue about what you think the metaverse is!

    We’ll see you next week for: THE THREE TECHNOLOGIES


  • WTF is the Metaverse?!? – Issue #1 – Are you SCARED of the Metaverse?

    WTF is the Metaverse?!? – Issue #1 – Are you SCARED of the Metaverse?

    Building for the virtual world is hard, but I think that it should also be fun!

    Welcome to Week 01 of my newsletter, where we discuss the fundamentals of the metaverse, together!


    Meta-wha?

    Everyone is using this word “metaverse.” It’s kind of weird, isn’t it?

    It’s been used by Facebook, ha! – I’m sorry… Meta. And it’s used by brands from Budweiser, to Adidas to Gucci. But, what does it really mean? Why are brands flocking to it? Why are tech companies making it part of their core strategy? What’s going on?

    Well, there is this concept that, because of our ever increasingly connected lifestyle, we are moving toward a persistent virtual world. We can imagine an infinitely extended version of the massive multiplayer games that overwhelmingly own our time. (Fortnite, Roblox, Minecraft.)

    But there is more to this concept than just that. Much, much more. And it’s in that “much much” part, that it’s easy to get lost.

    Are you scared of the Metaverse?

    Understanding this concept of the “metaverse” is hard, let alone figuring out how to build and participate in it. I think, regardless of what we call it, there is a great opportunity for artists to build for it. However, understanding it and the opportunity it presents, is not so clear.

    For the last three years, I have been very deep in the research of it, and my take away?

    1. Few people really understand it. (Even I struggle – So it’s ok if you don’t get it either!) There is a lot of guess work and hear-say out there. It’s sometimes impossible to gage what is truth and what is fantasy.

    2. A lot of the “beginner” content, is not actually beginner. (I’ve read a lot of them!) And in some cases, the people who purport to help, are simply mean.

    3. We need to be very careful. Scams and greedy people abound.

    After 20 years in computer graphics, I wanted to understand this evolution. To do that, I need to think and write in public. I want to open other people’s mind to the concept, without the marketing, the anxiety or the capitalist agenda that is driving it. We need creators (artists, thinkers, developers) who are willing to build this persistent world. And build it for the right reasons.

    So, this is it.

    Each week, I’ll be doing little drawings and talking about concepts like game engines, virtual production, blockchains, AI, virtual characters – Man, the stuff is crazy! I’d love it if you signed up and came along.That’s it for this time.

    Remember, you can always reach out to me on @nyewarburton on twitter.

    My DM’s are always open.And, hey, if this email inspires you to write or comment?

    Do it! I hope we can start a dialogue about what you think the metaverse is!

    We’ll see you next week for: INFRASTRUCTURE


  • To Teach or Not to Teach Artificial Intelligence

    To Teach or Not to Teach Artificial Intelligence

    While watching academic (and non-technical) environments, I’m beginning to see that the pervasive instinct is to ban AI in writing and artwork curriculum. The argument is that students will use it to facilitate cheating, or it’s use will “cheapen” the learning of the artform.

    But increasingly, I worry that this approach will not prepare the next generation for the true world they are about to enter. Can we open the debate in a way that best helps the students, and limits the potential for academic dishonesty?


    Watching AI Enter the Animation Pipeline

    I spent 20 years as a professional animator. My passion is both story but also the technical implementation of it’s use in real time technology. That is, I really love making characters move in video game worlds.

    The satisfaction comes from the knowledge of complex systems, storytelling, and the final result of days and days of labor coming to fruition. I bring something that was seemly dead, to life. I do it, with an extraordinary amount of labor, skill, research, iteration… and often, the emotional pain of fighting my self doubt.

    Above you can see a screen capture of my desktop. I am currently animating (experimenting) with the Unreal Engine’s “Manny”, in Autodesk Maya. That means I move and set the position of the arms by keying. Me, a human.

    Since I derive so much personal value from the process, it was especially difficult for me to deal with the idea of automation, when I first discovered machine learning on a motion capture stage several years ago.

    Machine learning is the term for a class of mathematical models within the field of Artificial Intelligence. They are having a sort-of “golden age” of development. Training a machine learning model to move a face of a character, or to automate the rigging of a digital skeleton can drastically reduce the labor involved (in some cases by a factor of 100).

    As these models have matured, more and more of the pipeline is becoming automated. Below you can see one of the more impressive workflows called Learned Motion Matching. This alone, will drastically reduce the up-front creation time of video game character controllers.

    Source: https://montreal.ubisoft.com/en/introducing-learned-motion-matching/

    I remember a sleepless night where my stomach physically hurt as I came to the realization that AI had the potential to remove the need for animators themselves. (I wrote about it back then) I soon decided this was a problem that I needed to frame as Scottish philosopher David Hume articulates the “Is” vs. “Ought” problem.

    I was resisting the understanding of artificial intelligence because I was clinging to what I thought ought to be. Artists ought to be the drivers of the labor. Creativity ought to be part of the emotional struggles of the individual. Not a data set. Right?
    Only when I began to confront what AI is, was I able to begin to research and understand it.

    This is where we are at.

    So then, what AI is…

    AI is reaching a level of (highly) functional use for artist workflows in animation, music, editing, visual effects, illustration and many others. It’s being deployed in our software for everyday use. AI image making models are reaching critical mass now, because of the prolific sharing ability of the internet.

    Because of the nature of our digital world — we have larger and larger datasets, and the training of models are becoming more accurate and robust.


    Predictive systems will finish our work for us in the art we create, and it will understand not just how to animate, but our “intents” as the animator using it. We are at the cusp of creating a productivity boom unlike human kind has ever seen.

    Some examples from this past summers experimentation with stable diffusion.

    The challenge in the near future, is not with the mathematical models, or even the datasets that are being collected. These are nearing an astounding level of image fidelity. The challenge is the interface design and the UX — the accessibility to the non-coding masses.

    Many, *many* software companies are rushing to create this accessibility through new interfaces, plug ins, and automated things we don’t even notice. “Old Guard,” like Adobe, are seeming to keep pace by buying up new talent. But there is a sizable crop of generative start ups who are targeting other graphics markets. The focus, driven by capitalist desire, is mass adoption. This leads to facilitation of use, and exponential data collection.

    NVIDIA is now training AI agents “for decades” in real time simulated environments.

    But capitalism isn’t the only driver. Stable Diffusion, a popular clip-diffusion image maker, released themselves open source. Within days, new innovations were in google collabs around the world. I suggest searching the #stablediffusion tag on twitter and marveling at an endless stream of un-bundled experimentation. The acceleration of AI is not just driving the market economy, it is inventing the distributed licensed one as well.

    Can it be Avoided?

    Students can actively choose a variety of new applications to automate the writing of their essays (or even the accompanying illustrations!)

    Increasingly, that choice will be removed from them.

    The way that email checks your spelling and updates as you’d write, our software will make intuitive predictions about what we’re creating. It will make predictions and create elements for us, all in real time. I expect it just to be the default in photoshop, visual studio, word, and many others. The AI will just be there.

    AI is here. It can make renaissance paintings of power rangers.

    It can not be avoided or banned.

    Renaissance Artist Painting Power Ranger – Stable Diffusion.

    My Suggestions

    Here are my three suggestions of how you, as a teacher, can integrate it into your classroom. But most of all, you should reinforce your human connection to your students.

    I. Communicate


    Talk about it openly and honestly with your students. If you’re scared, tell them that you’re scared. If you are concerned about cheating or the way it’s being used — be honest about it. Be vulnerable about it.


    Opening honest debate allows for the many, many shades of “grey area” that might happen should a student turn in work. Banning it is fine, but you need to be deliberate about it’s name and function.

    You can’t say “no AI.”

    You will need to be specific: “No Clip-diffusion enhancement for this exercise today.”

    The debate needs to be open should issues arises regardless whether the student is intentionally cheating or the software has ambitiously finished it for them.

    One of my many many failed generative experiments, Science Fiction Alien Mech, Combat Technology, Disco Diffusion

    II. Understanding


    We need to have a common understanding about the the types of models. Artificial Intelligence is divided into classifications like neural networks, GANs, Language Models, Clip-Diffusion, etc. Students should understand the difference between what it means to train a neural network and how an agent is trained in reinforcement learning.

    Different applications of machine learning and artificial intelligence will propagate into different verticals. Depending on what your subject matter, certain models and architectures will fit better than others. As an animator, my primary focus is motion models. Those might not be as interesting to a writer who is being rewritten with language models.

    Students should have a sense about what AI is actually doing, not as some “magic thing” operating in the background. For each model, there are always a specific set of inputs and a resultant set of outputs. Even without a computer science or mathematics background, the classification of models is learnable at a simple level.

    For help with this, I recommend Melanie Mitchell’s book : “Artificial intelligence: A guide to thinking humans” This high school level book clearly explains the categories of AI, and offers non-technical and direct explanations of the operations of them. (link to amazon is below)

    Audrey Hepburn, Black and White, Art Wall Painting, Stable Diffusion. I spent a late night session generating black and white images of hollywood.

    III. Compassion


    The last thing is to approach your students with compassion. You must understand that these students are already interacting with extremely powerful algorithms. Their content stream from Tik Tok is being algorithmically constructed and tuned to their emotional impulses. They may think they are simply texting with friends, but they are already being gamed into large datasets.

    These processes have been reinforced by their social networks amongst their friends. Understanding their position and their actions, may increasingly become more difficult.

    To us, we will marvel as things start to be completed for us. For them, it will be normal. I think the youth should learn to count before getting a calculator, and I think the youth would appreciate concepts like voice models or latent space before they’re everywhere.


    Clearly, I have a big concern about our adoption of artificial intelligence. I’ve accepted the technology will be here, just as electricity or the internet simply arrived. I hope we as teachers can openly learn to accept it’s presence in our curriculum. We should learn to use it, but speak openly about it’s ethics.

    I hope you will take a moment to do your own research on this before coming to conclusions.

    Regardless of whether this line of thinking is fantastical or 100% correct, I understand this to be a contentious issue. I welcome open debate, as we should all participate to figure this out together.

    Thanks for Reading.

    Reference:

    Here is the link to Melanie Mitchell’s Artificial Intelligence: A Guide for Thinking Humans

    I might also recommend:

    Two Minute Papers: https://www.youtube.com/c/K%C3%A1rolyZsolnai

    Superintelligence, by Nick Bostrom: https://www.powells.com/book/superintelligence-paths-dangers-strategies-9780199678112

    Machine Learning for Art: https://ml4a.net/


  • Discovering Disco Diffusion and Prompt Design

    Discovering Disco Diffusion and Prompt Design

    Dalle2 is a generative image model by Open AI. It’s a transformative step in computer graphics.

    Dalle2 is in it’s early invite stage, which means, (as of July 2022) I don’t have access. (Hurry up Open AI, please?) Powered by a need-to-understand these models, I soon stumbled upon a model called “Disco Diffusion.” While not of the same power of Dalle, DiscoDiffusion is indicative of the future of generative media. And enormously fun to play with as well. Everything here was created with this DiscoDiffusion Google Collab and a free account.

    —-> (https://colab.research.google.com/github/alembics/disco-diffusion/blob/main/Disco_Diffusion.ipynb)

    My first animation test using Warp in DiscoDiffusion

    AI Comics

    Like most things, I start by clearing my creative blocks with comics. So, to find my footing, I thought about trying to get an AI to generate it’s own comics.

    Since the joke of a comic is somewhat separate from the art, I “prompted” new sentences with the AI powered InferKit. This led me iterate on the writing, until I felt the AI had the joke it wanted to tell.

    I took the reponse prompt from Infer, and then I fed the prompts into a freely available model called Craiyon (previously Dalle-mini) – using other tags for cartoonists and comics to shape the image.

    text_prompts = {
        0: [
    #Subject
    "--- Output copied from InferKit sentence",
    #Description
    "two character","single panel", "cross hatching"
    #Artist
    "gary larson", "new yorker",]
    }

    The first few felt weird. However, liking the possibility, I experimented with a “publishable” version with the merge on photoshop. It felt more authentic, and put structure to the result.

    I then tried the same exercise, but with Disco Diffusion. While the output turned more smushy, the options for experimentation within the collab notebook were numerous, and being open source, inspired a deeper dive. Soon, I think I had the knack of it.

    AI Watercolor Paintings

    While working, I set up another instance to do some landscape watercolors. I wandered through prompts of different places, like Savannah or Newport, but didn’t see much evidence of locations. I grew bored pretty quickly of it.

    AI SciFi Concept Art

    I decided to move into more concepting of science fiction. 1. Because, why not? and 2. I discovered Disco Diffusion has a PulpSciFi Data Set trained, which I switched to for a bit.

    I experimented with key words like “Ralph McQuarrie” and “Star Wars.”

    Finding other artists who are playing with the model online, I discovered the practice of “trending on art station.” It helped the results, however made me feel a little ill about the artists who are actually trending on artstation. It started to lead me to a recognizable paradigm and felt my prompts pulled into “cyberpunk” and “mech.”

    AI SciFi Space Photography

    I then found a really interesting breakthrough. I found someone who started listing camera lenses.

    This inspired me to take the more conceptual ideas I was working with and make it more photographic. I chose prompts that focused on realism, with references like “NASA” or “Star Trek.” I also set detailed instructions for camera lenses, like “27mm” and “tilt shift.”

    AI Alien Mech Technology Photography

    But eventually I wanted to create something… alien I guess.

    I began to organize my prompts into sections, and mess with individual variables to get more of what I was looking to generate.

    I experimented with a number of variants with lenses and lighting, and even trying to Tilt Shift. The best results, according to me, are above.

    Most of the imagery you see came from what is becoming a bit of a “base prompt” for me:

        "Mech Suit with military grade weapon systems in an alien world",
                "Sci fi, Iron Man, technology, attack",
                "photography",
                "27mm",
      

    AI Animation? AI Shorts?

    I’ve only just begun to explore!

    If you want to see some extrodinary work, you should consider joing the DD discord.

    https://discord.gg/fzevz8Z4

    Mind Blown

    The implications of the democratization of AI art will be extraordinary. The architecture of CLIP and the diffusion models, the design process of prompt engineering, and the intellectual property implications are probably more than this blog post.

    As I wrap my head around the use, I’ll update here.

    As always, thanks for reading. Happy prompting!


  • Which Game Engine Should I Use?

    Which Game Engine Should I Use?

    By far and away, this is the most asked question that I get from anyone diving into game development.

    Be it a start up team or a small group of students, every project needs to decide their tech stack. It’s definitely a hard question, but the good news is that there are a number of options.

    Below, I’ve listed a selection of game engines that are, in my opinion, a good place to start.

    Unreal Engine

    The first engine that is often spoke of for AAA gaming, animation and visual effects is the Unreal Engine by Epic Games. You should consider this engine the “go to” for AAA style video game development. It’s a tank. It strength is making “realism” for third person, first person or strategy games. Major game studios use it to scale teams of 100’s of people.

    Flooded with money, Epic is the Mongol Horde of the Game Engine world. From their marketing assault, it is clear they wish to own cinematic production in Hollywood, or really anything where there’s a camera and subject. With real time workflows being so disruptive, they might just win. Many, if not all, of the major entertainment computer graphics firms have integrated Unreal, or will integrate Unreal, into their pipeline. Software has eaten film production, and Unreal is the mouth.

    Unreal will most likely be the dominant player in many real time interactive experiences from gaming, to architecture, virtual production, and many other fields that require high fidelity graphics.

    Unreal Engine 5 tech demo running on PS5

    Pro

    If you are interested in creating AAA quality games within the fairly known design paradigms of the console gaming world, then this is a good choice. Even if you don’t wish to use it, you might find yourself sucked into a team or project that is dependent on it. 

    Unreal has made real strides in making programming accessible to artists with their blueprinting system. This node based scripting has been a good way to learn, and an even better way to get the designers more “hands on” in the system.

    If you are visual effects artist, or a feature film animator, it is also a very good choice to think about picking up. It’s sort of becoming industry standard. Every movie shop is looking for people skilled in it’s use right now.

    Con

    Much like Microsoft for business and personal computing was in the 90’s, Unreal will most likely be the corporate operating system for real time 3D content development. It will grow relentlessly, and probably won’t listen to the little voices of the independents.

    If you are Indie developer, interactive artist, data visualizer, or are dependent on rapid prototyping, there are others engines that might suit a bit better.

    For Me

    Since most of my work is entertainment industry facing, Unreal is in the “must learn” category. The animation systems are robust and powerful, and there is no arguing with the asset development pipeline, especially with the acquisition of quixel megascans, the development of Metahumans, and the development of Nanite in Unreal 5.0. It’s not without its frustrations, which for me, comes down to the material and lighting rebuild loading times.

    I don’t use it to prototype unless I am experimenting in animation systems. It’s size makes it hard to work with Github, and there is very little thinking or support for blockchain networks or AI models. They recently launched a python interpreter, but I’ve barely found a support network for developing content with it yet.

    Unity


    Unity applications currently account for three billion installs around the world. As a 3d interactive development package, it is THE dominant player. It is also the major player in independent and mid-tier game development. It is fairly standard in interactive design and commercial application development.

    That means, if you are an independent game developer, this has been your engine for a while, for Unity has the controlling interest to this sector. If you are an interactive developer at an agency or working with a location based experience of some sort, this is also most likely the engine you would use.

    Everything in Unity is a Class. You make an asset and attach a script, and then, it’s interactive. By making it a Prefab, you can use it again and again. This is the core value of game engines, and this is something that Unity has done very well to democratize the technology to creatives. It uses C#, which has a lot of similarities to Java, which can be somewhat difficult to new programmers, but easier to pick up with those with a touch of development experience. Unity, probably inspired by Epic’s blueprints, has begun to integrate Bolt, their own visual scripting system. (Though, I have not used it.)

    They are not without their movie making ambitions though. A recent deal with New Zealand based Weta, shows their claim to be in the visual effects and narrative content world. I watch with a keen eye to see what happens.

    Unity is also a private company, who have a weird licensing system when you actually start making real money. Since their focus is more diverse, (larger interactive market vs a AAA game market) they tend to have plugins for some of the more innovative trends like Augmented Reality, or Machine Learning. Unity also is forming initiatives with auto companies, technology visualization initiatives, and developing an ecosystem of educational content. It’s clear they are positioning themselves as an interactive artist’s tool, more than just a game engine. There will always be games developed with it, but many other things will be built with it as well.

    The Unity Interface is fairly straightforward.

    Pro

    Wide adoption of use. Unity can really be used to do a lot of things, and is a bit of a “swiss army knife” for interactive and independent game experiences. There is a very active community and the learning resources on youtube and educational communities is extremely rich in content.

    Con

    Unity’s corporate-ness is beginning to show. They are clearly a little weary of Epic’s dominance of some sectors, and are also a little “nickle-dimey” on their pricing models. Yes, the engine is free for the most part, but the upgrades for AR and machine learning, which are priced at 50/month, are a little worrisome.

    For Me

    Most of my experience is focused on the Unreal Engine, but I have recently opened my brain to more development work with Unity.

    I’m also in the middle of a reinforcement learning obsession, to which I’m interested in the accessibility of Unity’s Machine Learning Agents. Stay tuned.

    Godot


    As an advocate of open source, this engine is a darling of mine. It’s the “Blender of the game engine world.” There is a small team of developers, led by the remarkably talented Juan Linietsky. With a passionate open source community rallying behind it, it is gaining traction at an astounding rate. Right now, it is best for 2d games, though recently their foray into 3d, and the projected development of the Vulcan render system will most likely change that.

    They use custom coding language called GDScript, which much like python, is a declarative and easy to read language. Relative to Unity and Unreal, Godot has a much smaller base, but that base is extremely rabid.

    I feel that Godot is uniquely positioned when it comes to innovative gaming and decentralized development. When we start distributing our networks and commerce, are we really going to cut in Unreal or Unity?

    Godot, as open source, is the natural choice for teams that are looking to create autonomous or community driven game systems. The community, not a centralized player, will shape it’s functional use. (Whatever that turns out to be.)

    The Open Source Engine, Godot shows loads of promise.

    Pro

    A great open source community that supports the development learning and creation with it. The more the community grows and adapts, the more robust and creative the engine becomes.

    GDScript is also very much like Python, and it is a declarative code. This means that you can look at it, squint a bit, and for the most part, read what is happening. This makes getting very simple stuff up and going in the engine very quickly.

    Con

    Open Source software also can have some sharp edges. Fancy, well funded, software development always tends to look polished, even if the usiblity is frustrating. Open source tends to be the opposite. Functionality is a premium, but that doesn’t always mean the usability or interface has it quite figured out.

    The 3d content in Godot is still a little early. As a result, there aren’t as many support systems in place.

    For Me

    I had a brief and romantic affair with Godot as I played with a number of 2d pixel art prototypes. For learning game development, it is one of my favorite, but for competing with the main stream big boys above, they are a few years away.

    O3DE

    There once was a company that sold online books, that turned into a global juggernaut of a tech company. Amazon has decided to get into the high-end engine game, and their entry, while a bit rough and young, may be an interesting entrance to the space.

    Game developer, Crytech, who were facing financial difficulties, sold their CryEngine to Amazon several years ago. Amazon now had a real time renderer that looked amazing, but the accessibility of the engine left much to desire if it were to be a main stream consumer product. They built up their own GUI and UX on top and renamed the engine, Lumberyard. But, recently, Amazon partnered with the Linux Foundation to open source the engine. With it, they rechristened it “O3DE.”

    Too early, but watching closely

    I have had very little interaction with it. In honesty, I just downloaded the update and have begun poking at it. I mention it because I have been watching the development of Lumberyard for a while, and I really can not discount the efforts of Amazon as they move into the gaming space.

    In a lot of ways the engine looks and feels like Unreal or Unity, but Amazon is designing for the future. Without feeling the obligations of supporting the technical needs of today, they are trying to keep the engine more modular, instituting a system called “gems”

    These will allow developers to pick and choose the kind of plug ins they need for the experience they are building. In the release I downloaded, they offered motion matching systems, which is surprising since neither of the other two major engines have offered it out of the box. It’s a forward looking move.

    There haven’t been a lot of times where Amazon have seriously entered a space, and in a short time made themselves a major competitor. Having AWS so readily available to integrate into the engine, plus the fact they have push firmly into the open source route, shows that they have major plans to position themselves in whatever the metaverse they seem to think is coming.

    It’s not ready for prime time yet, but in 18 months or so, they might be battle ready.

    The Gateway Engines


    For the “just getting started” type, here are some of my recommendations.

    I tried game engines a number of times. I bounced out of Unity when I first tried. I struggled through Xcode development with cocos, barely understanding the process. I then tried Love2D for a small time, but grew tired of Lua. I was lost in the world of game engines.

    Then, I discovered Construct 2. I used it to build a Metroidvania style game called “Agent Kickback.” It was the first time I felt like I could build the entire thing — on my own.

    The software is built for deploying an HTML5 engine, which allowed fast loading times in browser based content — but the real value was the visual coding interface. In construct, you snap functions together like lego pieces.This was the first time I was able to get my mind around concepts like functions, variables, classes, optimization and state machines. It was a huge unlock.

    After this point, I returned to the industry to use engines like Unity and Unreal, and found that I had much more confidence and direction.

    Construct 2 uses a visual coding system that got me going fast.

    I have heard that Game Maker is also of a similar approachability, but I have never used it. (It’s popularity amongst some students makes me include it here.) These engines are a great way to get into game engines. Essentially, they have a low enough technical overhead to let in the artists.

    Once you are in, however, you get it. And when you have reached a point that you want to actually build something a bit more than a starter level, you move on. In that case, move up to one of the engines I have listed above.

    And What about the Web?

    The web is also a wonderful place to play with 3d and game engines.

    There are a lot of Javascript frameworks. For example, Babylon.js is an open source engine from a nifty team at Microsoft,. Playcanvas is a more engine-y looking interface for developing web based interactive work, which was recently bough by Snap. Another favorite of mine is A Frame, a framework that uses HTML tags to place 3d objects, add a little animation, and even do VR!

    Doing VR in AFrame is actually a lot of fun!

    The world these engines render, is built on a wonderful little library called three.js, a javascript core that actually renders 3d objects in the browser. Chances are you have seen something around the internet, and I’m pretty sure three.js was behind it.

    While these frameworks show a lot of promise, (I, personally, love spending time with some of these programs) my feeling is that they are very much in the early days of development. Could the web be used for high fidelity real time graphics and rendering? Well, that debate might best be suited for another post.


    As always, I welcome feedback, thoughts or suggestions. I am always open to discuss with other developers and educators about any of the things I list in my writing. You can always reach out to me on twitter @nyewarburton, my DM’s are always open.

    Thanks for reading, and happy game making!

    Links and Resources

    Unreal Engine by Epic Games

    Unity Game Engine

    Godot Game Engine

    Construct Game Engine

    Game Maker Engine

    Three.js

    A Frame

    Babylon.js

    PlayCanvas


  • Seven Strategic Tactics for Teaching Online

    Seven Strategic Tactics for Teaching Online

    Simply replicating what you do in the classroom doesn’t work.


    In the fall of 2020, I began teaching college level computer graphics. Since my timing is impeccable, I started teaching in the middle of a pandemic, which meant I had to learn how to teach, on zoom. The school I teach at runs in quarters, which has classes twice a week, which tends to be a pretty intensive 10 weeks.
    Full disclosure, my first quarter was a disaster. Anyone who has met a nineteen year old understands that they will flay you alive, especially if they are burned out, grumpy, distractible — Wait… I mentioned this was during the pandemic, right?

    The point is, I rapidly learned that if I were to survive – I needed a strategy.

    Learning to collect data while teaching


    When I begin to design systems, I devise methods to record findings. In this case, I tried to estimate the length of time certain activity would yield the highest levels of focus and interest. Then I reflected on it in my writing and preparation process. Essentially, I began prototyping the amount of time that an engaged student can absorb before boredom and distraction makes their focus plummet.

    Academics love these kinds of charts

    As I got better at teaching, I began to organize my methods. I could feel my way through whether I was hitting or missing. Self reflection on how many laughed at my jokes, or whether they responded to regular cues was ongoing. I could absolutely tell if they had YouTube on, or were just clocking time without actually paying attention. I would try calling them out if they looked distracted or try to do more conceptually weird things.

    Really – I tried it all.

    I began to take rigorous notes after class and wound up making some speculative findings.


    • Lectures only work for 20 minutes. Long lectures lead to gaps of focus, and increased checking of tik-tok feeds.
    • Demoing software live will instantly lose half the class. The cognitive capacity to focus on learning an unknown graphical user interface is extremely high. They are either confused, or they already know it, which leads to boredom.
    • Group work will recharge a class and create social connections. However, group work in breakout rooms of more than 20 minutes will devolve into socializing.


    Over the next two quarters I began prototyping a system based on my finding to which I began to see some improvements – granted, some classes still continued to be difficult – but I certainly faired far better than when I had been thrown to the wolves in the first quarter. All of these are work in progress ideas, that I continue to experiment with.

    Here are some take-aways.


    Seven Strategic Tactics to Teaching on Zoom


    1. Cameras on

    I unequivocally tell my students that they must keep their cameras on. Despite the grumbling the first day, it actually keeps them far more engaged. The work and interest is better. It’s important to connect, so I don’t even question it. If the camera is off, I ask them to turn it on. If they continue to do it, I continue to ask them to put it on.

    If they don’t like it. Tough. If I have my camera on, you must as well.

    2. Anything Interesting?

    When you ask a question, any question, there is a natural delay — and worse, dead air anticipation of anyone responding. It’s hard to cut through this initial anxiety of participation. The added step of dealing with the “your muted” phenomenon is also a deterant. – So – I open every class with:

    “Anyone got anything interesting?”

    – Prof Nye

    After asking this question, I wait. I let that dead air take hold right up front.

    I push and get them to start talking about anything. If we are lucky, we might hit on a new game release, or a trend or movie they are all eagerly awaiting. As someone in a different generation, it’s good to hear about their current interests. Dungeons and Dragons is back apparently, and everyone still hates Electronic Arts. These are important things for me to know to be credible.

    Asking them to participate at the beginning of class makes the rest of the class easier, because that ice has already been broken. Once we have established the participatory nature of the class, as a teacher, I’m less worried about throwing a question to someone during lecture. I also use the “anything interesting” time to try to fire up the typically quiet-ones.


    3. Don’t do Software Live – Record for the Rewatch


    I started recording my software demos outside of class.

    Following along became the homework assignment, not something that fills class time. What I found was that those who didn’t understand could rewatch, and those who already knew it could skip ahead or speed through it. This is enormously successful in the results I got back. Video gives the learning support outside of class, and then maximizes the face to face time inside of it.

    And honestly, doing software demos in class and live is hard. You might have forgotten a check box, or a process, and as you fumble to correct yourself, you can feel the shift of the group to non believers. If I hit a snag while I am recording, I’ve gotten good at pausing, jumping into documentation, trying it out, and then resuming. It might have been 5 minutes of double checking and fumbling through software for me live, but in the recording, the edit to the valuable information is instant.

    This does bring up the problem that if all the demos are being done outside of class, what content is being developed in class. The response to this is “a lot of things.” Ultimately, it makes class time more about connecting with you as a teacher, and less about watching you fumble on a computer.

    One draw back is that many who aren’t comfortable with the software will simply copy. This makes it especially important to stress problem solving, creativity and improvisation in class. The more they can build confident in class, the more they build upon the video work, instead of simply copying it.


    4. 20 Minute Group Jams to Energize Collaborations

    The focus of all my teaching can be summarized as “problem solving.”

    And the best way to get people to learn to problem solve is 1. to write documentation, and 2. go to groups to share what they learned. If I can identify a common problem amongst several people, I try to encourage the opportunity to group-up, to see if they collectively can figure it out.

    The tendency for many is to use the problem as an excuse to disengage.

    “I couldn’t figure it out.” is what I hear, and my response is “who else has this problem?” Generally, others do, and putting them into a group is a natural way for them to form collaborative problem solving.

    Sometimes you get especially focused students who lean into the process. When you get them together, you can see potential partnerships forming.

    5. Discord

    Most educational software platforms are bad. Empirically, from a software UX and design standpoint, the makers of software (like Blackboard), genuinely wish to insult the students and teachers of the world, by giving us interface designs from twenty years ago.

    Fortunately, the educational tech space is rapidly advancing to fit this need, but until new solutions are instituted, I gravitate to software communications that value the persistent nature of chat. If someone has to log in to do something, I’ve lost them. If someone can participate in multiple content streams, it becomes fun.

    Discord is a chat application that, like it’s business sister program slack, maintains multiple communication streams. I can run my class in a chat, post and pin videos and handout to that chat, and run multiple chats for groups. By siloeing information correctly into persistent chat, I find I can work directly with my class and regularly interact with them over the week, as opposed to having to log in and target. Granted that means that you need to have the application open always, and be ready at any time of the day to respond.

    Additionally, by installing StatBot, a bot application that lives within your discord, I can track the number of posts, and the engagement on posts. If I know the groups, the subjects they are talking about, and the number of times each group chats, I can catalyze the conversation by helping out on a questions, make a joke with a gif, or provide a relevant youtube clip. I got into the habit of wishing them a happy friday, or sending reminders of upcoming deliverables.

    Tracking the discord activity of the class

    Discord, in my opinion, makes it fun, not a chore. Granted, some who are not gamers or as digitally native do not understand how to engage with the method. It can be difficult for the more email native types to shift to the persistent way of thinking. If someone is going to learn to work with game engines however, they will need to learn to think like this. It’s now industry standard.


    6. No mercy for the disruptors


    This one is hard. Disrupters rip your class apart. They seed conversations in channels that undermine the teaching as it goes on. I don’t know why it exists, if I deserve it or not, but I suspect it is commonplace in all teaching on zoom. 
    The answer for me was to single them out, and call them out immediately. Do not let it take hold, because you can lose the whole class. I am actually ok with them complaining about me outside of class. Everyone needs a watercooler, but in class, focus needs to be on the subject matter.

    I lost a class completely due to a selfish and destructive student. Make no mistake, there are bad kids who need to be dealt with. Let them undermine you, and you won’t just lose a class, but the entire course.

    7. Draw your ideas

    I’ve always used whiteboards to talk through ideas. My home office has two, and I have 30 years worth of sketchbooks. Drawing is how I learn, so it makes sense that I use it in the classroom. I do the same thing while on zoom.

    I can explain something in five minutes, or I can slow down and explain it by drawing the idea out in photoshop, or this nifty piece of software I just found called “Concepts.”

    Granted, sometimes drawings don’t work. A good drawing is like a good lecture, it needs to be imagined, refined and visualized. If you make a drawing part of your lecture approach, try to draw everything you are going to do in class ahead of time in your sketchbook. Practice it, prototype it, refine it.

    The results are better drawings for visualizing concepts. Improvisation can work, but it should only augment the structures you practice beforehand.


    As an educator, I welcome feedback and collaboration from others who are experimenting with teaching online. Despite the lowering levels of the virus, I do not believe we will be backing away from online teaching any time soon. In fact, I believe it will only accelerate. We should be facing these challenges head on, not simply running back into the classroom.

    Thanks for reading! I welcome interactions on twitter @nyewarburton, my DM’s are always open.


  • Machine Learning and Animation

    Machine Learning and Animation

    As the tools become more accessible, neural networks may soon be driving the performance of characters

    A machine… can’t possibly… ?!?

    I find myself involved in conversations where I ask graphics artists if they feel threatened by AI (artificial intelligence). Many animators think that being in a “creative” occupation means that they are safe. For many, it’s okay to think automation will wipe out auto-callers and Uber drivers, but keying a character performance is something that we won’t have to worry about for a while.

    I used to be one of these people, but now, I’m not so sure.

    Fake it til’ ya Make it

    The internet continues to show us examples of “deep fakes”where mathematical image recognition models have allowed for the creation of fully manufactured digital characters. Upon viewing this, there is an initial reaction of “that’s crazy,” followed by a dismissive “wave of the hand.” I think there is a misunderstanding what the neural network is really doing.

    It’s easy to confuse this “puppeting” of Vlad Putin or Elon Musk as simply a real time one to one, like that of a performance capture system. Or that the video is is composited or mixed into the pixels in some way.

    But this technology is actually generating an entirely new performance.

    Neural Networks

    Honestly, I have only just begun my journey to understand how neural networks work. In many ways they are still a black box of mathematics full of PhD level vocabulary that a humble unfrozen cave man animator like me can’t wrap their mind around. I can’t tell the mathematical difference between a Lambert and a Phong, but I absolutely know the difference in the way they look and when to use them. Similarly, it will be hard for me to describe the calculations in a General Adversarial Network, but I’m starting to see the possibilities of what it can produce.

    With only a few weeks of reading – and the help of the classes like IPT@New York University (https://ml4a.github.io/classes/) – I’m fairly confident, that at it’s core, neural networks use a whole lot of data, to learn how to make an input on one side come out as something generative on the other. It’s a system that makes guesses (or predictions) and the more data you cram into it, the better those guesses get.

    The whole point of a neural network is to recognize features, or patterns that are not explicitly called out in the data set. These patterns are what the network uses to construct wholly new entities (guesses), that look and feel almost entirely like the original data set. Essentially, with enough data, it can fake a new entity that is nearly indistinguishable from the original. The question becomes:

    What data do we use to animate?

    From Light Cycles to Dinosaurs

    In 1982, to create the illusion that 3d objects moved across the screen, the animators on Tron had to stand over the shoulders of engineers and instruct them to plot out thousands of individual x, y, z coordinates. By the mid 1990’s, instead of entering individual coordinates, software UI’s had become accessible to allow animators to key and refine the motion of an object. This made moving dinosaurs in the computer a reality.

    The curve editor, now commonplace in animation software like Maya and Blender, took 15 years to imagine and develop. It is a way for artists to visualize the acceleration data of 3D objects, and a way for those artists to communicate their intent to the software.

    Whether the data is from captured human performance or “hand keyed,” curve editors are the common language of motion data in the animation world.

    Animator Intent = X, Animated Character = Y

    So, if a neural network could be fed motion data and “learn” how a human moves, it might be possible for it to learn what the animator is trying to communicate and generate predictions of what makes a good animated performance.

    Actually, the development of this, “animator intent = X” variable might be the easy part. We have enormous databases of “scrapable” human motion, considering existing deep learning models like PoseNet could pull data from an ever infinite library on Youtube.

    The features which lead to the Y variable, and the question that I’m going to scratch at is:

    Could a model be developed where the features generate a great animated character performance?

    From Academics to Artists

    Up until now, machine learning is something only the academics played with – and now big tech – as these early scientists have been gobbled up by the tech titans like Uber, Facebook, Google and Tesla.

    Like the engineers and light cycle inputs, only tensor flow python-heads could enter the parameters for a ML model. However, a new wave of UI thinking is now making this approachable to us unfrozen cave men animators. Unity, a powerhouse 3D game engine, has jammed “ML agents” into their software to facilitate ingesting tensor flow models. And a stand alone GUI called Weka, allows for non-coding explorations of deep learning models.

    However, a new visual interface called Runway ML is the first, of what I see, of the development of artist-friendly machine learning tools. There will increasingly be less of a need for artists to get their hands dirty in the code of neural networks, as services like this will provide an accessible way for artists to integrate the thinking directly into their existing animation work flows.

    Maybe even accessible to a cave man animator like me.

    That’s it for this week, thanks for reading. Please subscribe and join the conversation.

    Reference:

    ITP@NYU / Machine Learning for Artists: https://ml4a.github.io/classes/

    Daniel Shiffman’s Coding Train: https://www.youtube.com/channel/UCvjgXvBlbQiydffZU7m1_aw

    Machine Learning Mastery:

    https://machinelearningmastery.com/machine-learning-mastery-weka/

    RunWay ML: https://runwayml.com/

    The TensorFlow Playground: https://playground.tensorflow.org/

    PoseNet Machine Learning Model: https://github.com/tensorflow/tfjs-models/tree/master/posenet

    Andrew Price (the blender guru) made a similar argument: https://youtu.be/FlgLxSLsYWQ


  • The Democratization of Animation Production

    Imagine the hundreds of people that a computer graphics animated feature requires.(Think: Pixar or a VFX heavy superhero blockbuster.)

    Now imagine the entire undertaking of these projects being done by a handful of people. Instead of a pipeline of specialized workers, this handful of people are unique multi-talented “librarians.” Like DJ’s who sample electronic music, animated storytelling will mix streams of data, creating visualizations for a variety of new platforms.


    Hypothesis:

    Computer Graphics Production – as it exists in the movie business – will be disrupted by peer based, real time networks.


    Increasingly, collectives of creative developers are sharing new ideas, code and work flows. By sharing powerful tools and know-how, communities are growing at a pace that will soon outperform the quality and market need of closed systems. Essentially, the open networks will outperform the closed companies. The advances of these creative networks will make the computer graphics artists that work within it, mind-boggling, productive.

    Most interesting to me, is that visualizations might not be rendered on a centralized farm of computers, but by an infinitely scalable, distributed network. The libraries, the labor and the processing power will be shared by all who participate. The more who join in, the more powerful the network will become.

    This is enormously exciting for the art form. The way Youtube empowered content creators, and Instagram made everyone a photographer, new networked technologies will democratize and enhance the animation storytelling process for anyone with an internet connection. Admittedly, it is also threatening to those who exist in the industry today.


    This is what my research and writing has been focused on for the last year. I’ve spent this time exploring engines and new workflows, playing with ways to develop content, and then writing my thoughts over and over. I want to understand this evolution.


    The best way for me to internalize my learning is to write about it, teach it, and share it. And so, it is my hope that the self imposed pressure of a weekly newsletter will keep me diligent on these explorations.

    Every week, I will write a new post discussing my thoughts on technology like game engines, distributed networks, machine learning, agile storytelling, but most importantly, the evolution of the networked artist.

    If you are a computer graphics artist, producer, student, or thinker, I welcome you to subscribe and join in. If you feel this is useful, please pass it on to others who you feel it will be useful to.

    [convertkit form=3045399]