Feb
16

The Renderfarm (how it works)

Posted in Development, Production by campbell

renderfarm_overview.png

One of my tasks here is to make sure the Big Buck Bunny gets rendered, even though Im not crazy about networks, ssh connections and figuring out why the Nth frame didn’t render, some of you have shown interest in the renderfarm so heres a rundown of how it works ;)

Images are rendered on Sun’s Grid at http://network.com, they have a service where you can buy time on their systems for $1 per CPU-Hour, Many tasks can be uploaded to run parallel on their systems, I think they have 600 CPU’s but we’ve only ever managed to hog around 240, with an average of 150 CPU’s at a time.

The systems have 8gig of ram and dual AMD Opteron CPU’s http://www.sun.com/servers/entry/v20z/specs.jsp

Luckily they were generous and gave us 50’000 hours, allowing an optimistic 4-5hrs per frame. Of course we’ll want to re-render a few times because of bugs in blender or adjustments in the artwork so 1-2 hrs per frame is more realistic, We dont do anything too tricky, just render 1 frame on each computer until all the frames are rendered.

One of the big advantages of Suns service is they use a 64bit operating system, this means Blender can use more then 2 gig of ram which is really important to render characters with millions of hairs. Other offers for rendering only ran 32bit systems.

On the flipside, network.com hadn’t ever been used for rendering anything on this scale, the admins at sun weren’t familiar with problems related to this task. Peach is a good way to stress their systems infrastructure.

Installing suns own unix operating system – Solaris was the first task, if you have used Linux or BSD you’ll have some idea of what its like, In fact you can run almost all common Linux applications on Solaris and Blender3D is no exception!

The free Solaris download installed on my PC but the network drivers weren’t available, Ubuntu with a virtual machine worked well allowing for a simple development environment in Solaris to test with.

It seems nobody had compiled a 64bit Blender3D on Solaris, so I had to compile libraries blenders depends on: freetype, zlib, libsdl, openexr, libjpeg, libpng and python, then modify blenders Makefiles and source to add support for solaris-x86_64

(see http://wiki.blender.org/index.php/BlenderDev/BuildingBlender/Solaris for instructions)

solaris_blender.png

(Solaris comes with gnome too but I prefer the oldschool XWindows interface!)

With a working 64bit Solaris blender binary it was possible to test blender on the sun grid.

Initially suns web online portal was useful for testing that simple scenes rendered and blender loaded, rendered, saved frames etc without any hiccups.

Their online system as actually pretty cool, you zip all your application and working files, upload them, define the run command and press go! – It tells you how many hours are being used and produces a zip with the generated files for download.

sun_web_interface.png

However the peach production files compress to around 2gig, re-uploading would be extremely slow and partial updates would be messy to manage.
Downloading all the frames as big bundle isnt workable either so the usual way of running jobs could not work for us.

This meant running an interactive session, where you submit a job that runs an xterm on network.com, with its display pointing to our modems IP. Our router forwards the X11 packets to our server where the xterm magically appears!

With a dynamic DNS service the IP can be replaced by a normal URL such as blender3d.no-ip.org, this is how you can avoid buying a hostname or having a static IP which you dont get with a cheap internet plan.

To avoid setting in the server room VNC-Server lets me manage the renderfarm via a remote desktop from any PC as well as log off my own computer without interruption.

sun_session.png

(remote xterm, local xterm used to copy commands from)

The first big problem was that out internet connection kept on cutting out, Eventually the ISP admitted it was their problem and we switched providers, then the interactive session would quit and we didn’t know why, this would mean rendering 100’s of images would start, render for hours, then quit before it had finished.
This was insanely frustrating and the cause of many late nights and time waisted with incorrect assumptions about the problem really was! – disk full? out of memory? corrupt temp files? network filesystem timeout?…. each had to be explored and without direct access to the systems and every error closing my only window into their systems (which I needed to see the log files) it was very hard to debug.

I wont go into details but there were a number of problems, some with blender, some with network.com and the guys at sun did their best to support us and managed to use system logs to track down the errors.

So now were on the home stretch and the renderfarm works!

Heres a diagram to show the process we go through to render peach!

renderfarm_overview.png

More details on the renderfarm can be found here.

http://wiki.blender.org/index.php/Bf-institute/renderfarm

39 Responses

  1. ibkanat Writes:

    Great blog, well written. Nice diagrams. Great to see how it all comes together.
    Good job Cambell

  2. ccherrett Writes:

    good work man!

  3. harkyman Writes:

    Very cool. That’s almost exactly how I’m running the farm for my animation (VNC->master machine; ssh for job submission from there; web interface for tracking progress; ftp for frame retrieval). Of course, this is on a much much larger scale. Good that both Peach and Sun are getting something out of the relationship!

  4. RH2 Writes:

    Nice…. I LIKE IT, I LIKE DISTRIBUTED RENDERING, AND I LIKE PEACHES!!!…
    *Caffeine overload*

    (By the way VERY nice work)

  5. tech-t-son Writes:

    how do you make sure the overwork cpu design input can keep up with the cpu output? i noticed that your wire reference copasitor, isnt readily avaliable to your flux capasitor?

    but if this gets you to a and back to b then go for it! i’m just thinking there has to be a better way to keep your support mainframe from eating anymore gigs then you need…haha.

    if things get two crazy use blenders wonderful inverted deluxe censorship band. that thing got me through so many tight info jams.

    although you could just say froget it and plug in your south deposit ray and head north…know what i mean….

    by the way awsome digital mapping skills with the tuba points!

    tech-OUT

  6. Jacob Picart Writes:

    Very nice post Campbell… great stuff for us render wranglers.

  7. Roll Writes:

    Really nice post!
    I hope to see more of these high quality posts on this blog soon.

  8. rogper Writes:

    What would you guys prefer to have, the 600(150)cpu’s SUN network renderfarm or, lets say, 25 local inhouse cpu’s for render?

  9. campbell Writes:

    @rogper, we did discuss this early on when things weren’t working out. This would work but its too expensive and nobody would loan us that many systems.
    If we had 25 PC’s they would need to be very recent high end systems to do better then network.com. Now the system works Its quite usable use even though the diagram looks a bit complex.
    The interactive session can stay open for days while jobs are submitted, run and downloaded.

  10. randomnut Writes:

    I wonder if we could help you somehow.. like a volunteer rendering program, rendering you independent of Sun; just like SETI@home; but we’d see the rendered pics…

  11. Vassilios Boucer Writes:

    In one of the Screenshots there is a New Panel “Simplification” in the Render Buttons with some Options- Settings for Subsurf,Child Particles,Shadow samples,AO and SSS!
    Question:will this Integrated in Blender SVN -or you use it only for the “Peach” film project??

  12. voOdKa Writes:

    ça a l’air… complexe :O

  13. Harley Writes:

    Well if you run into snags, my offer still stands: free use of our school’s render farm (80-100 computers) for 10-hour periods during our evenings 7pm-5am GMT-8.

  14. Ben Writes:

    Great post, good to see you’re still managing to keep the community informed on progress and the processes involved even though deadlines loom… Don’t loose too much sleep posting at this point though, I’d rather the movie come out top notch and have a glut of ‘post Peach’ blogposts than have anything rushed.

  15. Clint Writes:

    Fantastic post — thanks for sharing this! It was great to hear how these things practically work, and I’m very glad to hear it’s working out well now. Keep up the great work — can’t wait to see the final thing!

  16. wayne Writes:

    nicely spotted Vassilios…

    i’ve been waiting for something like this for a long time!!!

  17. campbell Writes:

    Haha Vassilios, I wondered if anyone would notice this ;) – Its in trunk. just set RT to 1, its an experimental feature for now, There is no ‘peach branch’ just use trunk.

    One problem with seti@home model is the time it would take to make updates to the blend files, then re-render.
    if the data didnt change this might be possible. A bugfix in blender would also need to be sent out to everyones PC.

    Or, if there was some way to split the image into smaller layers/parts and computers with less ram could render. This is a big project, would be cool but would also mean changing how blender renders and composites.

  18. Vassilios Boucer Writes:

    Hey Campbell!
    I set the rt to 1 and i found it!Wow!very cryptic…!
    Thanks for the Info!

    Ciao

  19. JoOngle Writes:

    I´m all dizzy from all the information… ;)

    I really look forward to the day when network rendering is for average joe (you know…turn on Blender on most computers….let them talk – network render) …no more than that.

  20. Goos Writes:

    Wow all very impressive!! Did you ever look in the Playstation 3 solution?? Terrasoft made a Yellow Dog Linux version the Playstation, and is also offering Playstation Clusters. http://www.terrasoftsolutions.com/store/purchase.php

    Read following story about the University of Massachusetts having a PS3 grid calculating there gravity theory, instead of buying time on a super computer.
    http://www.wired.com/techbiz/it/news/2007/10/ps3_supercomputer

  21. campbell Writes:

    Hi Jo0ngle, regarding the comment about renderfarm for the average joe. I wholeheartedly agree!

    But for this to be workable we need improvements in a number of areas.

    1) Making animations re-locatable in an automatic/automated way.

    For instance, you should be able to have a blend file and wrap it up into an archive with all dependencies. –
    Videos, Images, fonts FluidSimulations, PointCache for particles (added with new particles/softbodys and cloth)…. to render on an external system.

    2) Some of these also can be referenced by absolute paths which will almost always break when rendered on another system.

    I made some effort in this area by adding an “External Files” section in the file menu, with options for making all paths relative, absolute and to report invalid paths – for images, fonts, sounds, libraries etc.

    At some point this needs to become a priority for developers.

    3) Linked libraries can be problematic, especially with links breaking as many files are edited, I wrote a script to load ever file in the peach SVN and report library linking errors but it would be nicer if blender could manage this for us in some way.

    As for the Cell processor.
    Porting blender to run on cell is not a small task.

    Konstantinos Margaritis is working on this, and hopes to optimize blender for SSE/AltiVec, 3dnow also
    http://lists.blender.org/pipermail/bf-committers/2008-February/020246.html

    The ps3 has tight memory limitations that would probably make it no so useful for rendering. However IBM’s Blade/cell servers dont! Could be a great system run Blender on ;)

  22. Janus Kristensen Writes:

    Cambell, your points (above) about absolute versus relative paths are very true. Actually any renderfarm software related to Blender could make use of the scripts you wrote. Will they be publicly available somewhere or do you intend to use them mostly for internal use at the peach camp?

    About the issue of dropping connections:
    I suggest using screen (http://en.wikipedia.org/wiki/GNU_Screen) – with this software you can simply reconnect to the existing xterm instead of the xterm closing when your connection drops; it is quite nice when being on a mobile connection for instance. It also has the added benefit that you can shut down your controlling machine during the night without loosing whatever programs were running on the SUN system.

    With the new unified point cache system being made I hope to see the ability to pack the baked info into the blend file as you mention, that would make it very easy to transfer data to, for instance, renderfarms.

  23. JoOngle Writes:

    Hi Cambell, thanks for your input & efforts as always!

    Yeah well – you´re right – I was just “venting thoughts”, sometimes the “average working joes” like myself simply don´t have the time to fiddle around with gazillion settings to make stuff work – but hey – no effort – no prize, right?

    Alternatively there are render services that provide network rendering for Blender users, so all ain´t lost.

    Until then, patience have to rule, and computers grow bigger and nastier (I´m still drooling over a 2x4core cpu machine like the mac…but the memory is too expensive so my 1x4core cpu have to do for now…

    ..at work…our computers are even smaller than my private ones :/

    Oh well..

    PEACH ON!

  24. Mike Futcher Writes:

    “Peach Subversion Repository”
    What a good phrase that is!
    I’m going to get some non-CG minded of my friends to try to guess what it is.

  25. mxttie Writes:

    thanks for the insight. may i suggest FreeNX ( http://en.wikipedia.org/wiki/Freenx ) as a more optimal remoting alternative for VNC? :)

  26. Ragnar Writes:

    Did you consider using BURP – Big and Ugly Rendering Project. I know it’s still in an alpha phase, but having the Peach project get involved might speed up it’s development. :)

  27. campbell Writes:

    Jo0ngle, here they are, I tried to make fairly re-usable, but for the moment there are some hard coded paths.

    /wp-content/uploads/movies/peach_scripts.tar.gz

    Ragnar, nope, since sun offered us the use of there computers, we just use their tools.
    In these situations you need to pick areas to advance, making something like BURP work in our situation would probably involve a lot more work.
    If somebody was assigned to it full time… it could work, but that still assumes there will be enough systems online at once, that enough will have 4gig of ram…. and that we can find a way to upload new blender builds for them to run in a fast and automated way…. Even so.. a lot of things could go wrong. Rendering on many different architectures/cpu’s/os’s etc introduces many variables – Maybe our frames would crash on another OS? maybe 32bit systems would render grass 1% differently and make frames flicker…. etc.

    – Then again. Im not a real expert :) – Its possible for it to work.

  28. Nicolas Writes:

    If you want a setiathome-like system, check out http://burp.boinc.dk/

  29. entplex Writes:

    For people looking for a home-brew network rendering setup, check out farmer joe ( http://blender.formworks.co.nz/index.pl?p=1 )… I use it and it works great.

  30. fireworkspete Writes:

    Have you looked at http://www.respower.com for rendering? Thats what I used for making a movie and some still for it about 1 year ago. I used a total of 15,000 Ghz hour (est. 7500 CPU-hrs). Cost me $60 total.

  31. ethana2 Writes:

    Distributed rendering on psubuntu? I’d do it for you.

    Cell FTW

  32. Diego Writes:

    Linux rocks, you guys should take a look at Git ;)

  33. Andrew Writes:

    I am the render administrator where I work and I smiled through the whole article. I have had days where I wanted to quit my job because of the renderfarm. We went through several different solutions (renderpal, dr queue etc) before settling on Qube!. I like the peach approach though—always pushing the limits. Great work!

  34. Mauro Writes:

    Peach Subversion Repository? Does that mean you have some kind of special tool to track all the changes in blender files? (like odfsvn.sf.net for ODF) or do you just use the normal subversion system?

  35. campbell barton Writes:

    Mauro, svn just treats blend files as binary blobs, though its binary diffing is apparently quite good.

    For a larger project we are interested in better SVN/Blender tools or even some kind of integration.

    For every commit the server could report errors in the blend file – missing paths, renamed groups that were used elsewhere… generate thumbnails for all objects… etc.

  36. Umberto Writes:

    Hello Campbell, congrats for your great job and the review given.
    I’d like to ask you a simple question:
    What brought you to choose network.com and if there are any regret on the choice why?

    Thanks

  37. Steven Writes:

    So how many hours did it finally take?, at what resolution?
    Steven

  38. steve Writes:

    this is why i use windows :(

    this is a work of art! hope it wasnt many nights. the thing i keep asking, what was the resolution of each image, and the frame rate originally used for the movie before editing?

  39. Gopalakrishna Writes:

    Good Article. Would be nice if the statistics (time consumed, max..min latency etc.) of the rendering were published.

    Gopalakrishna