History

SUSTAINABLE LEXICOGRAPHY: WHERE TO GO FROM HERE WITH THE ANW (ALGEMEEN NEDERLANDS WOORDENBOEK, AN ONLINE GENERAL LANGUAGE DICTIONARY OF CONTEMPORARY DUTCH

Description
The online dictionary of contemporary Dutch, which is compiled at the Institute for Dutch Lexicology (INL) in Leiden (http://inl.nl), has progressed to a point where decisions will soon have to be made for a sustainable future of the project. The
Categories
Published
of 17
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
Share
Transcript
  SUSTAINABLE LEXICOGRAPHY:WHERE TO GO FROM HEREWITH THE ANW ( ALGEMEEN NEDERLANDS WOORDENBOEK ,AN ONLINE GENERAL LANGUAGEDICTIONARY OF CONTEMPORARYDUTCH)? Lut Colman:  Instituut voor Nederlandse Lexicologie, Leiden (colman@inl.nl) Abstract The online dictionary of contemporary Dutch, which is compiled at the Institute forDutch Lexicology (INL) in Leiden (http://inl.nl), has progressed to a point where de-cisions will soon have to be made for a sustainable future of the project. The Dutch andFlemish governments initially allocated funds for the dictionary until 2018. The projectis well on its way to become an impressive and innovative lexicographic database andonline dictionary, but by 2018 it will still be far from complete. In this contribution wewill discuss some challenges for Dutch lexicography in the future and for e-lexicographyin general. A lexicographic database and dictionary like the ANW can make lexicog-raphy a sustainable enterprise: we can reuse what we have, improve and optimize whatwe have and expand what we have. 1. Introduction ‘The Algemeen Nederlands Woordenboek (ANW, Dictionary of Contemporary Dutch) is an online, corpus-based, scholarly dictionary of con-temporary standard Dutch in the Netherlands and in Flanders, describing theDutch vocabulary from 1970 onwards’ (Schoonheim et al. 2010: 718). It is oneof the main projects of the Leiden Instituut voor Nederlandse Lexicologie(INL, Institute of Dutch Lexicology). As well as an online dictionary throughwhich a range of users can explore the Dutch vocabulary, the ANW is a lin-guistic data resource from which users, and especially language professionals,can extract data necessary for their research. The project focuses on the general International Journal of Lexicographydoi:10.1093/ijl/ecw008 1 # 2016 Oxford University Press. All rights reserved. For permissions,please email: journals.permissions@oup.com   International Journal of Lexicography Advance Access published March 21, 2016   a  t  L  e i   d  e n Uni   v e r  s i   t   y onM a r  c h 2 2  ,2  0 1  6 h  t   t   p :  /   /  i   j  l   . oxf   or  d  j   o ur n a l   s  . or  g /  D o wnl   o a  d  e  d f  r  om  vocabulary of written Dutch and it provides semasiological and onomasio-logical access to the dictionary.At the time of writing this contribution in February 2016, the ANW con-tained 193,622 words including fully edited entries, partially edited entries andrun-on derivatives and compounds. It had 36,932 senses, 187,904 examples,86,875 collocations, 7018 fixed expressions and 266 proverbs. It also has multi-media: 4591 illustrations, 642 videos and180 sound clips. The ANW is madewith modern compiling methods. A dictionary writing system (DWS) is usedfor systematic editing (Niestadt 2009) and the Sketch Engine (Kilgarriff et al. 2004) is used as a corpus query tool to access the ANW corpus of over 100million words. A lot of time, effort and of course money has been invested todevelop this dictionary and this makes it obvious that after 2018 the work onthe database and dictionary interface should be continued in one way or an-other. Most dictionary users know that – in order to be of any use at all- adictionary should contain a sufficient amount of data. A partial dictionary willsimply not satisfy the dictionary user’s needs in the long run. But simply addingdata in the form of entries is not enough. The data have to fill an infrastructure,which is optimally equipped to answer the user’s searches today and in thefuture. During the process of making the ANW, its lexicographers encounteredmany challenges and difficulties which could not always be foreseen and whichmay now need some reconsideration or optimization. In what follows we willdiscuss the state of the art of the most important innovative aspects of theANW and look into possibilities for reuse, further research and development.We will begin by explaining the notion of sustainability in the lexicographiccontext in section 2. In section 3 the innovative concept of the semagram andits relation to onomasiological searches and sustainability will be discussed.Section 4 will deal with the combinatorics. In section 5 we will discuss lexicaland semantic relations like synonymy, metaphor and metonymy in the ANW.In section 6 we will give a short comment on the form-based word families(derivatives and compounds). Section 7 will discuss pragmatic issues like vari-ation in Dutch. We will conclude with some perspectives for the future of theANW in section 8. 2. Sustainabilityin thelexicographic context Sustainability  in the lexicographic context is not quite the same as in the eco-nomic and environmental context. The OED defines  sustainability  in the spe-cialized sense as ‘the property of being environmentally sustainable; the degreeto which a process or enterprise is able to be maintained or continued whileavoiding the long-term depletion of natural resources’. Avoiding the long-termdepletion of natural resources implies, for example, reuse of materials andproducts to reduce waste, economic use of resources, workflow optimizationand the weighing of costs and benefits to present and future generations. 2 of 17 Lut Colman   a  t  L  e i   d  e n Uni   v e r  s i   t   y onM a r  c h 2 2  ,2  0 1  6 h  t   t   p :  /   /  i   j  l   . oxf   or  d  j   o ur n a l   s  . or  g /  D o wnl   o a  d  e  d f  r  om  Of course, in order to preserve a language, the lexicographic description of itis an enterprise that has to be maintained or continued. In contrast to naturalresources and materials, language resources are not finite, so one would assumethat lexicography should be able to continue forever. Lexicographic products,such as printed dictionaries, suffer from the fact that they are rather quicklyoutdated or considered outdated as soon as they are published. Supplementsand new editions have to be published from time to time to keep pace withdevelopments in the vocabulary. Unfortunately, financers of dictionaries aremore and more reluctant to invest in long-term dictionary projects, as in amarket-driven society such projects are often considered to be time-consumingand expensive. So, in lexicography it is not the natural (language) resourcesthat are depleted but the financial resources. To sustain lexicography, lexicog-raphers will need to convince funders that their investments are not a waste of time and money and that it is possible to optimize the workflow throughresponsible use of materials, products and financial resources. Reuse of mater-ials and products in the lexicographic context can be achieved by reuse of thecontent of existing dictionaries. There is no need, for example, to rewrite def-initions from scratch in every new dictionary project (Hanks 2010: 587). Linksto external data in the dictionary application is also a means to reuse data(Tarp 2014: 253). In the ANW the dictionary user has immediate access to anapplication of etymological dictionaries (www.etymologiebank.nl), the histor-ical dictionaries of Dutch (http://gtb.inl.nl), Wikipedia and Google. Access tothe Corpus of Contemporary Dutch (http://chn.inl.nl) is also possible, butlogin is required which still puts up a barrier to easy access. Reuse can alsobe achieved by the use of dictionary writing systems (DWS) and lexicographictools in other projects. The ANW-DWS, for example, is also used for theOnline Dutch-Frisian Dictionary that is being compiled at the FryskeAkademy (Sijens et al. 2015). The Sketch Engine is a lexicographic tool usedby many lexicographers in new online dictionaries (www.sketchengine.co.uk).In addition to the reuse of content, writing systems and tools, the workflowis of course improved also by the increasing automation of the lexicographicalprocess itself, such as finding and sorting collocations and examples in a corpusand transferring them from a corpus into the dictionary application. The stor-ing of as much relevant data as possible in a database underlying a dictionaryapplication (Tarp 2014: 249), adaptive presentation of the data, with as littledata as possible on the screen to reduce information overload in the dictionaryapplication and article modelling (Tarp 2014: 251) make it possible to indi-vidualise the online dictionary to any user’s needs. Hence, one database canfunction as the source of many dictionaries and other reference tools. TheANW database and application, for example, can also function as the officialguide to Dutch spelling in the future. All this contributes to a sustainable use of resources. Sustainable lexicography 3 of 17   a  t  L  e i   d  e n Uni   v e r  s i   t   y onM a r  c h 2 2  ,2  0 1  6 h  t   t   p :  /   /  i   j  l   . oxf   or  d  j   o ur n a l   s  . or  g /  D o wnl   o a  d  e  d f  r  om  3. Semagrams andonomasiological searches One of the most important innovations in the ANW is that the traditionaldefinitions are complemented by a semagram:A semagram is the representation of knowledge associated with a word in aframe of ‘slots’ and ‘fillers’. ‘Slots’ are conceptual structure elements,which characterise the properties and relations of the semantic class of aword meaning. On the basis of these slots specific data is stored (‘fillers’)for the word in question. (Moerdijk et al. 2008: 19)This is illustrated below with the semagram for the lemma  koe  ‘cow’ (cf.Moerdijk et al. 2008: 19):(1) A COW UPPER CATEGORY : is an animal CATEGORY : is a bovine (animal) SOUND : mooes/lows, makes a sound that we imitate with a low,long-drawn ‘boe’ COLOUR : is often black and white spotted, but also brown andwhite spotted, black, brown or white SIZE : is big PARTS : has an udder, horns and four stomachs: paunch, reticu-lum, third stomach, proper stomach BUILD : is big-boned, bony, large-limbed in build FUNCTION : produces milk and (being slaughtered) meat PLACE : is kept on a farm; is in the field and in the winter in thebyre AGE : is adult, has calved PROPERTY : is useful and tame; is considered as a friendly, lazy,slow, dumb, curious, social animal SEX : is female BEHAVIOUR : grazes and ruminates TREATMENT : is milked every day; is slaughtered PRODUCT : produces milk and meat VALUE : is usefulSemagrams are useful for several reasons. First, they enable the lexicog-rapher to make definitions more compact as some semantic, conceptual orencyclopaedic information can be transferred to the semagram. For example,it is not necessary to mention the cow’s udder in the definition, but it should bementioned in the semagram because it belongs to the concept of   cow . Second,semagrams can also make the semantic and conceptual descriptions more 4 of 17 Lut Colman   a  t  L  e i   d  e n Uni   v e r  s i   t   y onM a r  c h 2 2  ,2  0 1  6 h  t   t   p :  /   /  i   j  l   . oxf   or  d  j   o ur n a l   s  . or  g /  D o wnl   o a  d  e  d f  r  om  consistent. Every semantic class of words (e.g. animals, persons, artefacts, sub-stances) has in its type template those features that are recurrent for individualmembers of that class (e.g. many animals make a sound, all artefacts have afunction, all substances have elements or ingredients). In a modular instead of an alphabetical process of editing it is easier for the lexicographer to edit classesof words consistently using the features that are relevant for the set of words heor she is working on. Finally, semagrams could increase the search options inelectronic dictionaries. This is true especially for onomasiological searches.Onomasiological searches relying solely on definitions have not been very suc-cessful, as going from a definition to a word can succeed only if the words theuser uses in his/her search query coincide (more or less) with the words in thedefinition, which is seldom the case (Moerdijk 2002). The hypothesis is thatadditional information available in the semagram can help to optimize ono-masiological searches. As the semagram contains more features of the targetword, the chances of finding the target word increase. For example, a searchfor animals that are kept as pets is possible because the semagram has a feature‘function’. A reverse search using the feature ‘function: pet’ will result in a listof pets. Unfortunately, the semagram has not completely lived up to highexpectations so far. If we search, for instance, for  zwaan  ‘swan’ with a querylike  grote witte watervogel met een lange hals  ‘large white water bird with a longneck’ the result, apart from  zwaan,  also shows  ooievaar  ‘stork’, but a stork is alarge water bird that is black and white. The simple search algorithm that isbehind the onomasiological search option in the ANW interface causes thenoise in the results. It is based on stemming and simple pattern matching of the input terms in the semagrams and the definitions of the entries in the ANW.The onomasiological search option works better if one searches only for someprototypical and distinctive features of the target concept.  Water bird   and  longcurved neck , for example, result only in  zwaan  ‘swan’.Semagrams contribute to sustainability in lexicography because they are notnecessarily limited to the ANW project. They can be a useful semantic inno-vation in other dictionaries and in terminology. Kinable (2013) describes theapplicability of semagrams in historical lexicography and text research.So far semagrams have been fully developed only for nouns. Verbs andadjectives do not have semagrams yet. Many features from the noun templatesof the semagrams could be used for verbs and adjectives also. Slots like SPEEDand MOTION could be filled with ‘fast’ and ‘moving forward’ in a semagramof verbs like  rennen  ‘run’, and  racen  ‘race’. For the ANW project some ex-ploratory research on verb classes and semantic features and the semanticcategorisation of adjectives has been carried out (Heyvaert 2006 and 2010), but due to lack of staff and time, the further development and implementationof semagrams for verbs and adjectives in lexicography have had to bepostponed. Sustainable lexicography 5 of 17   a  t  L  e i   d  e n Uni   v e r  s i   t   y onM a r  c h 2 2  ,2  0 1  6 h  t   t   p :  /   /  i   j  l   . oxf   or  d  j   o ur n a l   s  . or  g /  D o wnl   o a  d  e  d f  r  om
Search
Tags
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks
SAVE OUR EARTH

We need your sign to support Project to invent "SMART AND CONTROLLABLE REFLECTIVE BALLOONS" to cover the Sun and Save Our Earth.

More details...

Sign Now!

We are very appreciated for your Prompt Action!

x