Arts & Architecture

A Virtual Camera Team for Lecture Recording

Description
A Virtual Camera Team for Lecture Recording
Published
of 5
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
Share
Transcript
  A Virtual CameraTeam for LectureRecording Fleming Lampi, Stephan Kopf, Manuel Benz, andWolfgang Effelsberg  University of Mannheim, Germany  We present thedesign andimplementation of avirtual camera teamfor recordingclassroom lectures.Our approach makesthese recordingsmore lively andinteresting. L ecture recordings, now widely ac-cepted, let students participate in alecture without time constraints,and enable repeating parts of thelecture that might be difficult to understand.In many cases, the students find the record-ings boring, a view held independently of whether the srcinal session was fascinating,especially if the recording only includes thelecturers’ slides and speech. In addition, whenthe audiovisual team uses only one camera,the audio and lighting settings are constantduring the entire lecture, making for anuninteresting presentation that isn’t muchfun to watch.Television has raised our expectationsabout produced video presentations. Al-though students preparing for their examstypically are highly motivated, it would behelpful for them if academia applied at leastbasic cinematographic rules to recordinglectures. However, in times when universitieshave to save money, it would be far tooexpensive to hire a real camera team for alllecture recordings. Thus, our project focuseson the design and implementation of anautomatic, distributed computer system forrecording and broadcasting lectures, and iscompatible with interactive learning toolsused in lectures. 1,2 A human camera team for lecture recording In contrast to focusing on the requirementsof a large staff for a TV production—forexample, makeup artists or set constructors—we focus on the camera team itself for ourlecture recordings. To start, there is a cameraoperator for each camera: one for a long shot(the complete lecture hall); one for thelecturer, with the ability to follow movementsand gestures; one for the slides; and one forthe audience (when someone asks a question).In addition, a director coordinates the cameraoperators and decides which stream to recordor send out. A sound engineer captures theaudio related to the lecturer, simulations,videos, and student questions. Lighting tech-nicians complete the team.The camera operator’s technical work dur-ing a shot consists of moving, panning, andtilting the camera, and adjusting the exposure,focus, and zoom. Besides these technicalaspects, aesthetic work is an important partof a camera operator’s job. For a goodproduction, we need the cooperation of theentire camera team, starting long before therecording takes place. In an initial meeting,the director goes over the event’s storyboard.The camera operators get relevant informationfrom three sources: from the storyboard,during the meeting where they can askquestion about the information given by thedirector, and during the recording sessionitself by using an intercom. The intercom iscrucial for collaboration and communicatinginformation about who is on air, who will beon air next, and which detail or framing eachcamera operator should show. During a shot, acamera operator can inform the director abouthis or her status, inability to fulfil a requestedshot for technical reasons, or about an ex-traordinary detail he or she wants to show. So,throughout the event, there is continuouscommunication among the team members toimprove the aesthetic aspects of the recording.This communication is necessary to applycinematographic rules. Typical rules includethe following: &  choose the shot’s duration so viewers canperceive all necessary details and so theshot doesn’t get boring, &  mind the line of action, &  define a pan’s beginning and end, &  show an overview or neutral shot after twoor three close-up shots, 58 Educational Multimedia 1070-986X/08/$25.00  G  2008 IEEE Published by the IEEE Computer Society  Authorized licensed use limited to: Universitaet Mannheim. Downloaded on March 19, 2009 at 09:03 from IEEE Xplore. Restrictions apply.  &  show the important details as close-ups tomake them clear after showing the entirescene as a long shot, and &  alternate the series of shots shown so as tonot get predictable.Professional camera operators intuitivelyapply these rules, and those working in thetrade know many more cinematographicrules. 3,4 The virtual camera team In our approach, we map each teammember’s roles to a virtual equivalent. Webase the virtual director on an extended finitestate machine. The states correspond to thedifferent types of shots; and the transitionsdescribe the possibility of going from one shotto another. Our method initializes each tran-sition with a given probability, which isincreased or decreased by sensor inputs. Wedecrease the value of a transition leading to acamera shown recently on the basis of aparticular recording’s recent history. Usingautomatic motion-detection algorithms, weincrease the probability of transitions leadingto shots with more activity. If a student isgoing to ask a question, and an external sensorrecognizes this, the probability of a transitionto a shot showing the student increasesconsiderably. When a transition is needed,our method selects the transition with thehighest probability. The director’s behavior incertain situations is always similar but seldomidentical, and is therefore less predictable. Weload the finite state machine with all its detailsat runtime from an XML file. Doing so enablesthe director’s easy adaptation to different re-cording scenarios. Figure 1 shows an exampleof the virtual director’s implementation andmore details are available in other research. 5 As introduced previously, the camera oper-ator’s job consists of technical and aestheticwork. We regard the technical work as acontrol loop that starts prior to the record-ing—for example, by selecting whether to usea neutral density (gray) filter. We use well-known image-content-analysis algorithms tofind people in the image and determine thecorrect exposure setting, even in backlit situ-ations. In lecture recordings, an image’s back-ground is usually of no interest, but we need toshow the active people in an appropriate way.So we use algorithms for skin-color detectionand face recognition to determine the areas of an image showing a person. Then, we adjustthe camera iris to optimize the exposure forthis person. Figure 2 (next page) shows theflowchart of the control-loop process. 59  Figure 1. The virtual director implementation.   J    ul     y– S  e  p t   em b  er 2  0  0  8   Authorized licensed use limited to: Universitaet Mannheim. Downloaded on March 19, 2009 at 09:03 from IEEE Xplore. Restrictions apply.  For the aesthetic part of a camera operator’sjob, we implement cinematographic rules anddivide them into two groups: those that thecamera operator can realize directly and thosethat require collaboration. A typical exampleof the first category is the reaction to a personstarting to gesticulate: the camera operatorzooms out until the person and his or hermovements are completely visible in thepicture. We implement this type of rule usingimage-content-analyzing algorithms, in thiscase, motion detection. A typical example of the second category is the shot and counter-shot arrangement of dialogue where oneperson is shown looking from the left edge of the frame to the right and the next shot showsanother person looking in the opposite direc-tion. The director gives the order over theintercom to the camera operator who thenswitches between the two cameras involved.We have implemented this type of director tocamera operator communication in our virtualsystem with XML over the TransmissionControl Protocol. A more detailed descriptionof our virtual cameraman can be foundelsewhere. 6 Unlike a real camera team, we base ourvirtual team on sensors. We use an indoorpositioning system that relies on Wi-Fi accesspoints to identify the students’ locations. Wetake advantage of the interactive devices(PDAs or notebook PCs) already used inlectures and implement a client- and server-based question manager to cope with studentsasking questions and to find their locations in 60  Figure 2. The control-loop process. Figure 3. The virtual camera systemimplemented in alecture hall.  Authorized licensed use limited to: Universitaet Mannheim. Downloaded on March 19, 2009 at 09:03 from IEEE Xplore. Restrictions apply.  the room. Doing so enables us to adjust theaudience camera accordingly. We have takenthe special circumstances in our lecture hallinto account, as described elsewhere. 7,8 Let us look at a concrete example of thevirtual camera team’s operation. During alecture, the active camera is pointed at thelecturer and records his or her talking head. Astudent wishing to ask a question pushes abutton on his or her PDA, which informs thequestion manager software about this eventand sends the questioner’s location derivedby the indoor positioning system. The ques-tion manager generates a sensor input for thevirtual director and enables a pop-up windowon the lecturer’s screen. The director informsthe audience camera operator to point at thequestioner’s location and to zoom in, how-ever, this particular camera is still off air.When the lecturer gives the floor to thestudent by hitting a button in the popped-up window, the director switches the audi-ence camera on-air and activates the stu-dent’s audio, while the lecturer’s cameraoperator puts him or her to the left side of the screen, achieving the shot and counter-shot setting. As the dialogue develops, themicrophone levels as well as the cameraoperator’s motion detection are useful sensorinputs to control the shot and counter-shotsettings. When the dialogue ends, one of thetwo participants declares the question asanswered by hitting a button on his or herdevice, and the virtual director returns tonormal lecture mode. The director orders thecamera operators to return to their srcinalpositions for the lecture mode. Conclusion Our approach to lecture recording, withwell-defined tasks for each module, has twosignificant advantages. First, the workload isdistributed; for example, the camera-operatormodules and not the director module producethe images. Second, it’s easier to implementcomplex cinematographic rules using the well-defined roles of the virtual team members andthe communication between them. In thisway, the virtual camera team’s behaviormimics the behavior of a human camera teamand thus leads to more lively recordings thanearlier approaches.During the fall semester of 2007 we testedour system in the lecture hall. The virtualdirector performed well and its communica-tion with the camera operators is stable. Thecamera operators also worked well but stillneed some fine-tuning. As expected, theindoor positioning system needs some ad-justments related to the lecture hall tominimize position error, and the questionmanager needs a better interface. Besidesimproving the system, our main work willinvolve a virtual video switcher and mixerand the sound designer implementation (weplan to use the work of Gerald Friedland 9 ).Figure 3 gives an overview of the entiresystem in action in the lecture hall. Theareas highlighted in red mark the camerasfor the long shot, the lecturer, and theaudience, and the hardware to record theslides.Our long-term goals include implement-ing additional modules for lecture record-ings (in particular, the sound engineer andthe lighting technician), improving thecinematographic rules, and creating a moredetailed evaluation of the recorded courses.Additionally, future work might includeadapting the system to other contexts, suchas conferences, workshops, or panel discus-sions.  MM Acknowledgment We thank Adin Hassa, Burkard Kreisel, andtheir entire team at Su¨dwest-Rundfunk Baden-Baden for letting us take a look behind thescenes of live TV production. References 1. N. Scheele et al., The Interactive Lecture—A NewTeaching Paradigm Based on Ubiquitous Comput-ing,’’  Poster Proc. Int’l Conf. Computer Support for Collaborative Learning   (CSCL), InterMedia, 2003,pp. 135-137.2. N. Scheele et al., ‘‘Mobile Devices in InteractiveLectures,’’  Proc. World Conf. Educational Multime-dia, Hypermedia and Telecommunication  (ED-ME-DIA), AACE, 2004, pp. 154-161.3. R. Thompson,  Grammar of the Edit  , Elsevier FocalPress, 1993.4. R. Thompson,  Grammar of the Shot,  2nd ed.,Elsevier Focal Press, 2002.5. F. Lampi, N. Scheele, and W. Effelsberg, ‘‘Auto-matic Camera Control for Lecture Recordings,’’ Proc. World Conf. Educational Multimedia, Hyper-media and Telecommunication  (ED-MEDIA), AACE,2006, pp. 854-860. 61   J    ul     y– S  e  p t   em b  er 2  0  0  8   Authorized licensed use limited to: Universitaet Mannheim. Downloaded on March 19, 2009 at 09:03 from IEEE Xplore. Restrictions apply.  6. F. Lampi et al., ‘‘An Automatic Cameramanin a Lecture Recording System,’’  Proc. ACM Multimedia, EMME Workshop , ACM Press, 2007,pp. 11-18.7. T. King, T. Haenselmann, and W. Effelsberg,‘‘Deployment, Calibration, and Measurement Fac-tors for Position Errors in 802.11-Based Indoor Positioning Systems,’’  Proc. 3rd Int’l Symp. Location-and Context-Awareness  , LNCS 4718, Springer,2007, pp. 17-34.8. T. King et al., ‘‘Overhearing the WirelessInterface for 802.11-Based Positioning Systems,’’ Proc. 5th Ann. IEEE Int’l Conf. Pervasive Computing and Communications  , IEEE Press, 2007,pp. 145-150.9. G. Friedland,  Adaptive Audio and Video Processing for Electronic Chalkboard Lectures  , doctoral disser-tation, Faculty of Mathematics and Computer Science, Freie Universita¨t Berlin, 2006. Fleming Lampi  is a PhD stu-dent and research assistant inthe Department of ComputerScience IV, University of Man-nheim, Germany. His researchinterests include video record-ing, processing, and transcod-ing. Lampi has an MS in computer science andmultimedia from the University of Applied Sciencein Karlsruhe, Germany. Contact him at lampi@informatik.uni-mannheim.de. StephanKopf  is a postdoctoralresearcher in the Department of Computer Science IV, Universi-ty of Mannheim, Germany. Hisresearch interests include multi-media content analysis andnew learning technologies. Kopf has a PhD in computer science from the Universityof Mannheim, Germany. Contact him at kopf@informatik.uni-mannheim.de.  Manuel Benz  received a diplo-maincomputersciencefromtheUniversity of Mannheim, Ger-many. His research interests in-clude video processing and anal-ysis. Contact him at benz@informatik.uni-mannheim.de. Wolfgang Effelsberg   is headof the Department of ComputerScience IV at the University of Mannheim, Germany. His re-search interests include comput-er networks, multimedia sys-tems, and e-learning. Effelsberghas a PhD in computer science from the TechnicalUniversity of Darmstadt, Germany. He is a memberof IEEE and ACM and serves as an editorial boardmember of several multimedia journals. Contacthim at effelsberg@informatik.uni-mannheim.de. 62 Related Work Prior to our efforts, there has been some work onrelated topics. Close to our approach is the use of panand tilt operations and image processing for framingand following the lecturer. A sample application is AutoAuditorium, which shows a basic level of automaticpresentation recording but doesn’t use cinematographicrules, as discussed in the main text. 1  A more advancedsystem, developed by Microsoft Research 2 and im-proved upon in Zhang et al, 3 uses multiple camerasand implements video-production rules. This methodincludes a video-director module based on a finite statemachine that can be configured by a scripting languageto implement cinematographic rules.Theseapproachesdifferfromourapproachinthattheyonlyuseimageprocessingtodeterminetheimageframingand to track the lecturer, while we use sensors, which canidentifythepositionsofallthepeopleintheroom.Byusingsensors, we can implement more sophisticated cinemat-ographicrules.For example,twotracked peoplemightbeframed in such a way that they face each other while thesystemswitchesbetweentheir shots. Our implementationof cinematographic rules differs from this approach.Microsoft uses a scripting language in which the rules arerewritten in note form, a method that implies fixed-shotdurations, leading to predetermined transitions, unlikewithourmodel.Heetal.haveproposedsimilarbasicrulesfor recording real-time applications. 4 References 1. M.H. Bianchi, ‘‘AutoAuditorium: A Fully Automatic,Multicamera System to Televise Auditorium Presenta-tions,’’  Proc. Joint DARPA/NIST Workshop Smart Spaces Technology  , 1998; http://www.autoauditorium.com/nist/autoaud.html.2. Y. Rui et al., ‘‘Automating Lecture Capture and Broadcast:Technology and Videography,’’  ACM Multimedia Systems  J. , vol. 10, no. 1, 2004, pp. 3-15.3. C. Zhang et al., ‘‘An Automated End-to-End LectureCapturing and Broadcasting System,’’  Proc. ACM Multi-media  , ACM Press, 2005, pp. 808-809.4. L. He, M.F. Cohen, and D.H. Salesin, ‘‘The VirtualCinematographer: A Paradigm for Automatic Real-TimeCamera Control and Directing,’’  Proc. ACM Siggraph , ACM Press, 1996, pp. 217-224.      I     E     E     E     M   u     l    t     i     M    e     d     i    a  Authorized licensed use limited to: Universitaet Mannheim. Downloaded on March 19, 2009 at 09:03 from IEEE Xplore. Restrictions apply.
Search
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks