Free-Viewpoint Video for TV Sport Production

Free-viewpoint video in sports TV production presents a challenging problem involving the conflicting requirements of broadcast picture quality with video-rate generation of novel views, together with practical problems in developing robust systems
of 4
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
  A ROBUST FREE-VIEWPOINT VIDEO SYSTEM FOR SPORT SCENES O. Grau, G. A. Thomas BBC Research, Tadworth, Surrey, UK.Oliver.Grau   A. Hilton, J.Kilner, J.Starck  University of Surrey, UK.A.Hilton  ABSTRACT This contribution describes robust methods to provide a free-viewpoint video visualisation of sport scenes using a multi-camera set-up. This allows generation of novel views of ac-tions from any angle and is of interest for visualisation inTV productions. The system utilises 3D reconstruction tech-niques previously developed for studio use. This paper dis-cusses some experiences found while applying these tech-niquesforanuncontrolledoutdoorenvironmentandaddressesrobustness issues. This includes segmentation, camera cal-ibration and 3D reconstruction. A number of different 3Drepresentations, including billboards, visual hulls and view-dependent geometry are evaluated for the purpose. 1. INTRODUCTION Insportmostinterestingincidentstendtobeoververyquickly.A system that allows a replay from any angle adds a lot of value to the production of sport coverage. Sports producersmay use techniques such as slow-motion replays to illustratethese incidents as clearly as possible for the viewer. Althoughtime is stretched in these replays, there is no exploration of the spatial scene information, which is usually important forunderstanding the event.Theworkpresentedinthis paperis partofthe DTI-fundedcollaborative project iview [1], whose goal is to develop asystem that allows the capture and interactive free-viewpointreplay of sport live events, as depicted in Fig. 1. The pro-posed system uses the input from multiple cameras to sim-ulate novel, virtual camera viewpoints for visualisation. Amethod often used is to freeze time and then move the virtualcamera in space. These effects were used in films like ”TheMatrix” but required many cameras, intensive manual post-production work and the camera positions of the replay werefixed and covered only a small area. The Eye Vision sys-tem, developed for sports broadcast applications, uses cam-eras mounted on robotic heads that are controlled by an oper-ator and follow the action. Because of the fixed camera posi-tions, the system adds an interesting visual effect but cannotbe used to visualise the scene from any angle.Systems that capture the action with a number of camerasand provide a free-viewpoint functionality were first devel-oped for the studio, for example [2, 3, 4, 5]. For use in anoutdoor environment, very little work has been done usingmultiple cameras. Most approaches use just one camera, e.g.[6]. The work presented here addresses the problems in suchan uncontrolled environment. Fig. 1 . Image of a football game from a broadcast camera.The main differencebetween the scene as depicted in Fig.1 and those addressedby previousprojectsis that the environ-ment is not as well controlled as a studio. Even if cameras aremountedin fixedpositionsthereare situationswhere thecam-eras are moving relative to the objects of interest, due to windor because the entire stand of the stadium is movingunder theweight of the audience. Furthermore the size of the objectsin the images is usually smaller than in the studio, because alarge area of the pitch has to be covered. Due to these factors,poor segmentation or inaccurate camera calibrations have anincreased impact on the visual quality of the system.The rest of this paper is structured as follows: The nextsectiongivesabriefoverviewofthesystemcomponents. Sec-tion 3 then gives some details of the implemented processingmodulesandsection4describesthereplay. Thepaperfinisheswith some results and conclusions. 2. OVERVIEW Fig. 2 gives an overview of the proposed system. The cap-ture uses a time synchronised, calibrated multi-camera sys-tem. The minimal number of cameras is about four, but forgood quality results a higher number is required. We areconsideringdifferentconfigurationsusing broadcastcoveragecameras and additional cameras. For more details on these  configurations and integration into a broadcast environmentsee [7]. 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000011111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111 Camera calibration 3D ReconstructionSegmentationRendering Multi−cameraimage sequence Capture Processing Replay Fig. 2 . Overview of the free-viewpoint system.Theprocessingmodulecomputesa3Dmodelofthescene.This is done using segmentation of objects from the back-ground and 3D reconstruction. The next section describessome details of this processing.The replay module renders the captured scene in realtimeusingthe computed3D modelandthe srcinalcamera imagesdeploying view-dependenttexture mapping [8].The entire system can potentially operate in real-time. Atthe current stage the processing is done offline. That meansthe images are storedand the processingis runat a later stage.The replay module is designed to work at interactive rates. 3. PROCESSING3.1. Camera Calibration Most studio-based capture systems assume that the camerasare mounted statically and a calibration can be done once be-fore the system is used. However in an outdoor environmentthe cameras are not necessarily mounted absolutely rigidly.Depending on the location the cameras might move slightlyeither caused by wind or vibrations of a big audience.We therefore use a line-based approach for the calibra-tion of camera parameters against the pitch lines [9]. Thisapproach is very fast (can be computed in real-time on a PC)and robust, so it can be applied to get updated camera param-eters for moving cameras. 3.2. Segmentation The segmentation separates the action, i.e. the players of agame from the background. Possible methods are difference-orchroma-keyingagainst the greenpitch. We investigatedthelatteroptionbecauseitalsoworksformovingcameras. Apar-ticularproblemof a broadcastenvironmentis that the picturesof the broadcast cameras are usually compressed (typicallyM-JPG). We evaluated two known techniques for chroma-keying for our application: Fast green subtraction in RGBcolour space, and keying in HSV colour space. In addition tothat we developedand tested a k-nearest neighbourapproach. Fast Green : This method is often implemented in com-mercial chroma-keyerand is based on the difference betweenthe green channel intensity value for a given pixel and themaximum of the red and blue channel values: d fg  =  g − max ( r,b )  (1)The segmentation  S  fg  is computed using threshold  σ fg : S  ( x,y ) =   0  ,  d fg  > σ fg 255  , otherwise (2)  HSV  : This method is based on the distance of a pixel  I   inHSV colour space to a background colour  P  . The segmen-tation  S  HSV    is then computed using a threshold as describedfor the ’fast green’ method in equation 2. K-nearestneighbourclassifier  : Thisclassifieriscontrolledbya simpleGUI: Theuser clickson positionsinan imagethatrepresent background. The RGB colour values of that pixelare stored as a prototype  P  i  =  I   into a list. All pixels in theimage that are within a radius  r 1  of the colour prototype arethen marked as background as well. The user continues tochoose background pixels until the resulting segmentation issatisfying.The segmentation  S  k − nearest  is computed by finding thenearest colourprototype P  best  from the list. With the distance d  of the pixel RGB values  I  : d  =  Distance inRGB ( P  best ,I  )  (3)In order to get continuous values a soft key can be ob-tained using a second radius  r 2 : S  ′ k − nearest  =  0  , d < =  r 1255( d − r 1 ) r 2 − r 1 ,  r 1  < d < =  r 2 255  , otherwise(4)Fig. 3 (left) shows the image pixels of Fig. 1 in RGBcolour space. The pitch pixels are distributed in an elongatedellipsoid. Fig. 3 (right) shows 16 colour prototypes in RGBthat approximate this distribution. Fig. 3 . RGB colour histogram (left) and selected colour pro-totypes (right).  3.3. 3D Reconstruction Free-viewpointrenderinginsportsproductionideallyrequiresa visual quality comparable to the source video together withreconstruction from sparse viewpoints and video-rate play-back. Possible approaches include: billboards [10], visualhull [11], and the view-dependent visual hull [12].Billboardingusesasinglepolygonplacedco-incidentwiththe object that it represents. This polygon is then rotatedaround an axis or point (typically the Y axis) so that it re-tains its srcinal position, but is constantly facing the virtualcamera. An image of the srcinal object is then applied to thepolygonas a texture map. This technique can often give goodresults with very little overhead in reconstruction or render-ing as large-scale parallax effects are handled by the relativepositioning of the billboards, while the lack of small-scaleparallax is often not noticed. However, the approach is lim-ited to distant views and does not facilitate smooth transitionsbetween views.The visual hull (VH) [11] derives scene geometry that isconsistent with a set of image silhouettes. Reconstructionprojects the silhouettes of the foreground objects in each im-age using the calibration data for the relevant camera. It thenfinds the intersections of these projections. Each intersec-tion defines the largest possible surface that could producethe silhouettes in the srcinal images. This approachhas beenwidelyused forobjectreconstructionassumingaccuratecam-era calibration and silhouettes.In sports such as football which requires capture over arelatively large area with uncontrolled illumination the visualhull accuracy is reduced due to errors in camera calibrationand matting togetherwith image quantisation. A conservativevisual hull approach taking into account these errors allowsrobust visual hull reconstruction in sports. The resulting mul-tiple view images can then be textured onto the visual hullsurface for view-dependentrendering of the players. This ap-proach achieves robust reconstruction but does not accuratelyalign overlapping images due to errors in geometry resultingin blur or doubling of features.Refinement of an initial robust visual hull estimate pro-vides an approach to high-quality reconstruction in the pres-ence of global calibration errors [5]. Stereo correspondencebetween wide-baseline camera views constrained by the ini-tial surface estimate allows refinement of the surface to lo-cally align the multiple view images. This produces high-quality free-viewpoint rendering in the presence of global er-rors. Application of this approachto sports has been achievedbyrefinementofthe view-dependentvisual hull(VDVH)[12]using stereo correspondence to interpolate between capturedviews. The VDVH provides an exact sampling of the VH sur-face as a depth map from a specific camera viewpoint. This isthen refined for pairs of views using a graph-cut to optimisestereo-correspondence and boundary constraints. VDVH isless sensitive to global errors due to camera calibration ormatting, and unlike the VH it provides locally correct alignedtextures. A comparative evaluation of free-viewpoint videoin football using billboards, visual hull and view-dependentvisual hull is presented in [13]. 4. REPLAY The replay module uses the 3D models of the scene togetherwith the srcinal camera images to produce a novel view of the scene. The camera images are applied using view-depen-dent texture mapping.For high quality rendering three cameras are used andblended together. Cameras closer to the synthetic viewpointget a higher weight. One option to achieve this is to use asimple argument based on the angle between virtual camera,real camera and the scene interest point. For the renderingwe developed an OpenGL-based module that uses the view-dependant texture mapping, as described before.Inadditiontothe’foregroundaction’asimpleplanarpoly-gonal model of the pitch is inserted into the virtual scenemodel. The texture for this virtual pitch is computed by usinga perspective projection of all available camera images intothis polygon and combining these using a median filter. 5. RESULTS The quality of calibration and segmentation are very impor-tant for the overall quality of the system. The line-based cali-brationproducesan averageresidualerrorof around1 pel anda maximal error of appr. 2 pel.Fig. 4 & 5 show enlarged results of the football image inFig. 1. Fig. 4 . Test image (left), results of fast greensegmentation(right).The ’fast green’ method (Fig. 4 right) emphasises com-pressionartefacts,butis computationallyveryfast. The’HSV’keyer (Fig. 5 left) produces slightly better detailed results.The k-nearest neighbour classifier (Fig. 5 right) producesthe best results since it can be interactively ’trained’ to ap-proximatethecolourdistributionofthebackgroundquitewell.Fig. 6 gives an example of a synthetic view (the goalkeepers view). The scene was captured using 16 SD cam-eras mounted at approximately 20 m height. The novel viewis generated using a visual hull reconstruction.  Fig. 5 . Results of segmentation, HSV (left) k-nearest neigh-bour (right). Fig. 6 . Original camera image (top). Novel views (mid-dle,bottom) from the goal keeper’s position. 6. CONCLUSIONS A system that provides a free-viewpoint video functionalityof sports scenes was discussed. The system builds upon pre-vious work done for a studio-based system. This paper dis-cussed some of the experiences found while applying thesetechniques in an uncontrolled outdoor environment and ad-dresses some of the robustness issues found.First results show the potential of the new approach foraction replay and strategy analysis of sport scenes. The vi-sual hull technique seems to provide a robust platform for the3D reconstruction. Current work focuses on improving thequality of the computed 3D models by improving the qualityof the camera calibration and more robust 3D reconstructionalgorithms. 7. ACKNOWLEDGEMENTS This work has been funded by the UK DTI. 8. REFERENCES [1] “iview,”[2] P. Rander, P.J. Narayanan, and T. Kanade, “Virtualizedreality: Constructing time-varying virtual worlds fromreal events,” in  Proceedings of IEEE Visualization ’97  ,October 1997, pp. 277–283.[3] J. Carranza, C. Theobalt, M Magnor, and H.-P. Seidel,“Free-viewpoint video of human actors,”  ACM Trans.on Computer Graphics , vol. 22, no. 3, July 2003.[4] Oliver Grau, Tim Pullen, and Graham A. Thomas, “Acombined studio production system for 3-d capturing of live action and immersive actor feedback,”  IEEE Tr. onCSVT  , vol. 14, no. 3, pp. 370–380, March 2004.[5] J. Starck and A. Hilton, “Virtual view synthesis of peo-ple from multiple view video sequences,”  Graphical Models , vol. 67, no. 6, pp. 600–620, November 2005.[6] N. Inamoto and H. Saito, “Fly through view video gen-eration of soccer scene.,” in  IWEC  , 2002, pp. 109–116.[7] Oliver Grau and et al., “A free-viewpoint system forvisualisation of sport scenes,” in  Conference Proc. of  International Broadcasting Convention , Sept. 2006.[8] Paul E. Debevec, George Borshukov, and Yizhou Yu,“Efficient view-dependent image-based rendering withprojective texture-mapping,” in  Proc. of 9th Eurograph-ics Rendering Workshop , Vienna, Austria, June 1998.[9] GrahamA. Thomas, “Real-time camerapose estimationfor augmenting sports scenes,” in  Proc. of 3rd Euro- pean Conf. on Visual Media Production (CVMP2006) ,London, UK, November 2006, pp. 10–19.[10] T. Koyama, I. Kitahara, and Y. Ohta, “Live mixed-reality 3d video in soccer stadium,”  The 2nd IEEE and ACM International Symposium on Mixed and Aug-mented R eality , pp. 178–186, 2003.[11] R. Szeliski, “Rapid octree construction from image se-quences,”  CVGIP: Image Understanding , vol. 58, no. 1,pp. 23–32, 1993.[12] G. Miller and A. Hilton, “Exact view-dependent visualhulls,” in  Proceedings of the 18th International Confer-ence on Pattern Recog nition (ICPR) , 2006.[13] J.J.M. Kilner, J. Starck, and A Hilton, “A comparativestudy of free-viewpoint video techniques for sports events,” in  Proc. 3rdEuropeanConferenceonVisualMediaProduction , November 2006.
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks