Nature & Wildlife

A Framework for Scientific Discovery in Geological Databases

volume equation[1]RecoverableReservesin STB=BRV NG OE Shc RF 6:29FV F;whereBRV = Bulk Rock Volume in m3N/G = Net/Gross ratio of the reservoir rock bodymaking up the BRVOE = average reservoir porosity(pore volume)Shc = average hydrocarbon saturationRF
of 5
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
  A Framework for Scientific Discovery in Geological Databases Cen Li and Gautam Biswas Department of Computer ScienceVanderbilt UniversityBox 1679, Station BNashville, TN 37235Tel : (615) -343-6204Email : cenli, The Problem It is common nowledge in the oil industry thatthe typical cost of drilling a new offshore well is inthe range of $30-40 million, but the chance of thatsite being an economic success is 1 in 10. Recentadvances in drilling technology and data collectionmethods have led to oil companies and their ancil-lary companies collecting large amounts of geophysi-cal/geological data from production wells and explo-ration sites. This information is being organized intolarge company databases and the question is can thisvast amount of history from previously explored fieldsbe systematically utilized to evaluate new plays andprospects? A possible solution is to develop the ca-pability for retrieving analog wells and fields for thenew prospects and employing Monte Carlo methodswith risk analysis techniques for computing the dis-tributions of possible hydrocarbon volumes for theseprospects. This may form the basis for more accu-rate and objective prospect evaluation and rankingschemes.However, with the development of more so-phisticated methods for computer-based scientificdiscovery[6], the primary question becomes, can wederive more precise analytic relations between ob-served phenomena and parameters that directly con-tribute to computation of the amount of oil and gas re-serves. For oil prospects, geologists compute potentialrecoverable reserves using the pore-volume equation[i]RecoverableBRV * N, ¢, Shc * RF * 6.29 Reserves=FVFin ST B whereBRV = Bulk Rock Volume in m N/G = Net/Gross ratio of the reservoir rock body¯ This research is supported by grants from Arco Re-search Labs, Plano, TX.making up the BRV¢= average reservoir porosity(pore volume)Shc= average hydrocarbon saturationRF= Recovery Factor(the fraction of thein-place petroleum expected to berecovered to surface)6.29= factor converting m3 to barrelsFVF= Formation Volume Factor of oil(theamount that the oil volume shrinkson moving from reservoir to surface)STB = Stock Tank Barrels, i.e. barrels at stand--ard conditions of 60°F and 14.7 psia.In qualitative terms, good recoverable reserves havehigh hydrocarbon saturation, are trapped by highlyporous sediments(reservoir porosity), and surroundedby hard bulk rocks that prevent the hydrocarbon fromleaking away. Having a large volume of porous sedi-ments is crucial to finding good recoverable reserves,and therefore, a primary emphasis of this study is todetermine the porosity values from collected data innew prospect regions. We ocus on scientific discoverymethods to derive empirical equations for computingporosity values in regions of interest.Determination of the porosity or pore volume of aprospect depends upon multiple geological phenom-ena in a region. Some of the information, such aspore geometries, grain size, packing, and sorting, is microscopic, and some, such as rock types, formation,depositional setting, stratigraphic zones, and uncon-formities (compaction, deformation, and cementation)is macroscopic. These phenomena are attributed tomillions of years of geophysical and geochemical evo-lution, and, therefore, hard to formalize and quantify.On the other hand, large amounts of geological datathat directly influence hydrocarbon volume, such asporosity and permeability measurements, grain char-acter, lithologies, formations and geometry are avail-able from previously explored regions.The goal of the study is to use computer-assisted  5 From: AAAI Technical Report SS-95-03. Compilation copyright © 1995, AAAI ( All rights reserved.  analysis and scientific discovery methods to derivegeneral analytic formulations for porosity as a functionof relevant geological phenomena. The general rule ofthumb is that porosity decreases quasi-exponentiallywith depth, but a number of other factors, such asrock types, structure, and cementation confound thisrelationship. This necessitates the definition of proper contexts in which to attempt the discovery of porosityformula. The next section outlines our approach tothis methodology. The ApproachFrom he above description, it is clear that the em-pirical equations for porosity of sedimentary struc-tures in a region are very dependent on the context associated with this region. The term context includesgeological phenomena hat govern the formation of thestructures and the lithology of the region, therefore,define a set of observable and measurable geologicalparameters from which values of porosity can be com-puted. It is well known that the geological contextcan change drastically from basin to basin(differentgeographical areas in the world), and also from regionto region within a basin[l, 3].With this background, we develop a formal two-step scientific discovery process for determining em-pirical equations for porosity from geological data.To test our methodology, we use data from a regionin the Alaskan basin. This data is labeled by codenumbers (the location or wells from which they wereextracted) and the stratigraphic unit numbers. Thestratigraphic unit numbers are related to sediment de-positional sequences that are affected by subsidence,erosion, and compaction esulting in characteristic ge-ometries. Each data object is then described in termsof 37 geological features, such as porosity, permeabil-ity, grain size, density, and sorting, amount of differ-ent mineral fragments (e.g., quartz, chert, fieldspar)present, nature of the rock fragments, pore charac-teristics, and cementation. All these feature-valuesare numeric measurements made on samples obtainedfrom well-logs.A more formal description of the two-step discoveryprocess follows:1. Context DefinitionThe first step in the discovery task is to identify aset of contexts C = (C1, C2, ..., Cn). Each one ofwhich will likely produce a unique porosity equa-tion. Each Ci E C is defined as a sequence ofprimitive geological structures, Ci = gl og2 o...gk (primitive structures may appear more than oncein a sequence). The set of primitive geologicalstructures (gl, g2, ..., gin) are extracted by a clus-tering process. The context definition task itselfis further divided into the following subtasks:¯ discovering the primitive structures(gl, g2, .., gin),¯ identifying relevant sequences of such prim-itive structures, i.e., Ci = gil ogi2o, ..., ogi~,¯ grouping the data that belong to thesame sequence to form the context, anddetermining the relevant set of features(xl,x2,...,xk) that will be used to derivethe porosity equation for that context.2. Equation DerivationThis step involves using statistical techniques,such as multivariate regression methods[5], toderive a form of porosity equation ¢ =f(zl, z2, ....xk) for each context defined in step1. The task is further divided into the followingthree subtasks:¯ construct the base model from domain the-ory and estimate the parameters of themodel using least square methods,¯ for each independent variable in the model,construct and examine the component plusresidual plot (cprp) for that variable, andtransform its form, if required.¯Construct a set of dimensionless terms7r = (Trl,r2, ...,Trk) from the relevant setof features[2]. Incorporate the 7r~s into themodel in a way hat reduces the residual ofthe model.The first step in the context definition task is toidentify the set of primitive structures using a cluster-ing methodology. In previous work[3], we have defineda three-step methodology that governs this process:(i) feature selection, (ii) clustering, and (iii)interpre-tation. Feature selection deals with issues for selectingobject characteristics that are relevant to the study.In our experiments, this task has been primarily han-dled by domain experts. Clustering deals with theprocess of grouping the data objects based on simi-larities of properties among he objects. The goal isto partition the data into groups such that objects ineach group are more homogeneous han objects in dif-ferent groups. For our data set, in which all featuresare numeric-valued, we use a partitional numeric clus-tering program called CLUSTER[4] s the clusteringtool. CLUSTER ssumes each object to be a point inmultidimensional space and uses the Euclidean metricas a measure of (dis)similarity between objects. Itscriterion function is based on minimizing the mean  6  square-error within each cluster. The goal of interpre-tation is to determine whether the generated groupsrepresent useful concepts in the problem solving do-main. In more detail, this is often performed by look-ing at the intentional definition of a class, i.e., thefeature-value descriptions that characterize this class,and see if they can be explained by the experts’ back-ground knowledge in the domain. For example, in ourstudies, experts interpreted groupings in terms of thesediment characteristics of the group. For example, ifa group is characterized by clay and siderite featureshaving high values, the expert considers this relevantbecause it indicates a low porosity region.Often the expert has to iterate through differentfeature subsets, or express feature definitions in amore refined manner to obtain meaningful and accept-able groupings. In these studies, the experts did thisby running clustering studies that define the data fromfour viewpoints: (i) Depositional setting, (ii) Reser-voir quality, (iii) Provenance, and (iv) Stratigraphiczones. Each viewpoint entailed a different subset offeatures. A brute force clustering run with all featuresprovided a gross estimate of sediment characteristics.A number of graphical and statistical tools havebeen developed to facilitate the expert’s comparisontasks. For example, as part of the clustering and in-terpretation package, we have developed software thatallow users to cross-tabulate different clustering runsto study the similarities and differences in the group-ings. Besides, a number of graphical tools have beencreated to allow the expert to compare feature-valuedefinitions across various groups.The net result of this process provides the primi-tive set (gl, g2, ..., gin) of step 1 of the discovery ask.These primitives are then mapped onto the unit codeversus stratigraphic unit map. Fig. 1 depicts a partialmapping for a set of wells and four primitive struc-tures. In the actual experiments, our experts initiallyidentified about 8-10 primitive structures, but furtherexperiments are being conducted to validate these re-suits.The next step in the discovery process is to identifysections of wells and regions that are made up of thesame sequence of geological primitives. Every such se-quence defines a context Ci. Some criterion employedin identifying sequences are that longer sequences aremore useful than shorter ones, and sequences that oc-cur more frequently are likely to define better contextsthan those that occur infrequently. Currently, this se-quence selection job is done by hand, but in futurework, we wish to employ ools, such as mechanisms orlearning context-free grammars from string sequencesto assist experts in generating useful sequences. Areason for considering sequences that occur more fre-quently is that they will produce more generally ap-plicable porosity equations than ones from infrequentsequences.After the contexts are defined, data points belong-ing to each context can be grouped to derive usefulformulae. From the partial mapping of Fig. 1, thecontext C1 = g2 o gl o g2 o g3 was identified in two wellregions (the 300 and 600 series).Step 2 of the discovery process uses equation dis-covery techniques to derive porosity equations for eachcontext. Theoretically, the possible functional rela-tionships that may exist among any set of variablesare infinite. It would be computationally intractableto derive models given a data set without constrain-ing the search for possible functional relations. Oneway to cut down on the search is to reduce the num-ber of independent variables involved. Step 1 achievesbecause the cluster derivation process also identifiesthe essential and relevant features that define eachclass. A second effective method for reducing thesearch space is to use domain knowledge to define ap-proximate functional relations between the dependentvariable and each of the independent variables. Weexploit this and assume a basic model suggested bydomain theory is provided to the system to start theequation discovery process. Parameter estimation inthe basic model is done using a standard least squaresmethod from the Minpack1 statistical package.Our application requires that we be able to derivelinear and nonlinear relationships between the goalvariable and the set of independent variables, not be-ing bound to just the initial model suggested by do-main theory. The discovery process should be capableof dynamically adjusting model parameters to betterfit the data. Once the basic equation model is estab-lished, the model fit is improved by applying trans-formations using a graphical method, component plusresidual plots (cprp)[5].Note that domain theory suggests individual rela-tions between independent variables and the depen-dent one. For example, given that y = f(zl, x2, x3),domain theory may indicate that, zl is linearly re-lated, x2 is quadratically related, and x3 is inversequadratically related to the dependent variable y. Ourmethodology starts off with an equation form, sayy = co + clzl + c2z~ + e3x32, estimates of the coef-ficients of this model using the least squares method.Depending on the error (residual) term, the equationis dynamically adjusted to obtain a better fit. This isdescribed in some detail next.The first step in the cprp method is to convert aa This is a free software package developed by Burton S.Garbow, Kenneth E. Hillstrom, Jorge J. Moore at ArgonneNational Laboratories, IL.  47  Are~ Code 4~ 500 6OO Figure 1: Area Code versus Stratigraphie Unit Map for Part of the Studied Region given nonlinear equation into a linear form. In thiscase, the above equation y = co + Cl Zl + c~x2 -4- c3x3 would be transformed into Yi = co + c:xi: + c2xi2 + c3xi3 -Jr- ei, X--2 where xil = xl, xi2 = x2~, and zi3 : 3 , and ei is the residual. The component plus residual for anindependent variable, Xir,,, is defined as k CrnXim’l-ei ---- yi--Co-- ~ CjXij,jml:j rn since cmxim can be viewed as a component of #i, thepredicted values of the goal variable. Here, cmXim + ei is essentially yi with the linear effects of the othervariables removed. The plots of CrnXirn Jr- ei against xlm are called component plus residual plots(Fig. 2).The cprp for an independent variable Xim deter-mines whether a transformation needs to be appliedto that variable. The plot is analyzed in the followingmanner. First, the set of points in the plot is parti-tioned into three groups along the xi,~ value, such thateach group has approximately the same number ofpoints(k _~ n/3). The most representative point of k each group is calculated as ~ k , k J’Next, the slopes, k12, for the line joining the first twopoints and k13 for the line joining the first and thelast point is calculated. Compare he two slopes: if k12 = k13, the data points should be described as astraight line which implies that no transformation isneeded; if k12 < k13, the line is convex, otherwise,the line is concave(see Fig. 2). In either case, the goalvariable, y, or the independent variable, xi,,~, need tobe transformed using the ladder of power transforma-tions shown in Fig. 3. The idea is to move up the o °P3 oJ ° jo ] K12/ KI3   g °°1(12 > KI3o xx(a) Convex(b) Concave Figure 2: Two Configurations ladder if the three points are in a convex configura-tion, and move down the ladder when they are in aconcave configuration. Coefficients are again derivedfor the new form of the equation, and if the residu-als decrease, this new form is accepted and the cprpprocess is repeated. Otherwise, the srcinal form ofthe equation is retained. This cycle continues till theslopes become equal or the line changes from convexto concave, or vice versa. As discuused earlier, we have applied this methodto the Alaskan data set which contains about 2600objects corresponding to wells, and each object is de-scribed in terms of 37 geological features. Clusteringthis dataset produced a seven group structure, fromwhich group 7 was picked for further preliminary anal-ysis. Characteristic features of this group indicatethat it is a low porosity group, and our domain ex-perts picked 5 variables to establish a relationship toporosity, the goal variable. We were also told thattwo variables, macroposity(M) and siderite(S) are early related to porosity, and the other three, clay  8  matrix(C), laminations(L) and glauconite(G) have inverse square relation to porosity. With this knowl-edge, the initial base model was set up as: P(orosity) = co +ClM +c2S+c3c4C~ + csL2 + c6G where, co,..., c6 are the parameters to be estimatedby regression analysis. After the parameters are esti-mated, the model is derived as:P = 9.719 + 0.43M + 0.033S+ 2.3,108 -3.44*10sC2-4.52*10~L2-{-6.5,10~G To study the methodology further, the cprp for vari-able S suggested that it be transformed to a higherorder term. Therefore, S, was replaced by S~ in themodel and the coefficients were rederived.P = 9.8 + 0.468M- 0.004S~+ 1.2,107 _4. 7 ,10sC ~_ 7.S, l OS L ~ T 2.2,107 G2 ¯ The residual for the new model, 20.47, was smallerthan that of the srcinal model (21.52). This illus-trates how the transformation process can be system-atically employed to improve the formula. _l/y~-1/yfrx- 1/y~If Convexxlog(y)Up the Ladderx 1 y~xy,~ Current Position ~xy~x½y3If Concavelog(x)y4Down the Ladder-1/x½y5~-1Ix ¯_l x -1Ix Figure 3: Ladder of Power TransformationsNote that the current method applies to transfor-mations carried out one at a time on variables, andmay not apply in situations where terms are mul-tiplicative or involved in exponential relations, e.g., ~½~ ~ In such situations, one of two things can bedone: (i) use logarithm forms to transform multiplica-tive terms to additive ones, and (ii) derive appropriate7r terms (identified in step 1) to replace existing modelcomponents with the proper 7r terms that better fit theequation model to the data. Summary Our work on scientific discovery extends previouswork on equation generation from data[6]. Clusteringmethodologies and a suite of graphical and statisticaltools are used to define empirical contexts in whichporosity equations can be generated. In our work todate, we have put together a set of techniques that ad-dress individual steps in our discovery algorithm. Wehave also demonstrated that they produce interestingand useful results.Currently, we are working on refining and mak-ing more systematic context generation techniques,and are coupling regression analysis methods withthe heuristic model refinement techniques for equa-tion generation. Encouraging results have been ob-tained. This work shows how unsupervised cluster-ing techniques can be combined with equation findingmethods to derive empirically the analytical modelsin domains where strong theories do not exist.AcknowledgementsThe authors wish to thank Dr. Jim Hickey and Dr.Ron Day of Arco for their help as geological expertsin this project¯ References [1]P.A. Allen and J.R. Allen¯ Basin Analysis: Prin-ciples & Applications. Blackwell Scientific Publi-cations, 1990.[2]R. Bhaskar and A. Nigam. Qualitative Physicsusing Dimensional Analysis¯ Artificial Intelli-gence, vol. 45, pp. 73-111, 1990.[3]G. Biswas, J. Weinberg, and C. Li. ITERATE:A Conceptual Clustering Method for KnowledgeDiscovery in Databases. Innovative Applicationsof Arlificial Intelligence in the Oil and Gas In-dustry, B.Braunschweig and R. Day, Editors¯ Toappear, Editions Technip, 1995.[4]A.K. Jain and R.C. Dubes. Algorithms forclustering data, Prentice Hall, Englewood Cliffs,1988.[5]A. Sen and M. Srivastava. Regression Analysis¯Springer-Verlag Publications, 1990.[6]J.M. Zytkow and R. Zembowicz. Database Ex-ploration in Search of Regularities¯ Journal ofIntelligent Information Systems, 2:39-81, 1993.  49
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks