Link vs Cfgs

of 10
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
  6.863J/9.611J: Natural Language Processing  Prof. Robert C. Berwick Final Project: Link Grammar Benjamin Mattocks, Christina Johnson  Overview For this project, we investigate the similarities and differences between link grammars and context free gram-mars. We refer to the resources listed here: The link grammar is a grammatical system to classify natural languages by designating links between se-quences of words. Instead of using part-of-speech tags based on rules to parse sentences, it uses links tocreate a syntactic structure for a language. There are three conditions that each sequence of words (or asentence) must satisfy in order to be considered in the language of a link grammar:1.  Connectivity  : The links between words sufficiently connect all the words of a sentence.2.  Planarity  : When drawn in a diagram, links never cross paths.3.  Satisfaction  : Links must satisfy the linking requirements of each word in the sentence. Linkingrequirements are defined in the dictionary of the grammar.We worked with version 4.1b, which is not the very latest version of the link grammar parser. The mostsignificant improvement we found covered in version 4.7.4 was somewhat improved handling of conjoinedphrases. We tested version 4.7.4 only under this situation, primarily using 4.1b because it was easier to setup and use on Athena.After completing our research and experimentation, we came to the conclusion that while the link grammarshares some similarities with more “traditional” context-free parsers, it stands alone as a more general toolof language classification. From a smaller collection of tags (or connectors) to the fact that it only produceslow attachments of adjoining clauses and phrases, the link grammar is best described as a parser that doesnot handle a wide array of grammatical cases in natural language. Instructions for running code We downloaded the link grammar parser, version 4.1b from the link grammar website and compiled iton Athena according to the instructions. We have included the  parse  binary and the source code in oursubmission under the  link-4.1b  directory. To run the link parser, one can run  ./parse  from the  link-4.1b directory. At this point, it will prompt the user for a sentence. By default, the parser will only show thelinks used in parsing a sentence. To see the actual parse tree, the “constituents” feature must be enabled bytyping  !constituents=1  at the parser prompt. Analysis of link grammar compared to context-free grammar Types of tags: comparison to Penn Treebank The link grammar offers a more general approach to “tagging” then the Penn Treebank tags that we’ve beenaccustomed to using throughout this term. The key to the link grammar is its links or connectors. In orderfor words to link, they must have compatible connectors. Words on the left side of a link have connectorsdesignated with a +, while those on the right side of a link have connectors with a -. In order for there to bea match between connectors, they must have the same string with opposite signs at the end. A word mayhave a number of possible connectors, but when creating diagrams of linked sequences of words, only one of these connectors can be used. See Figure 1 for an example.Final Project: Link Grammar - 1  Figure 1: Connectors used in the link grammarBelow is a list of the most common connectors for the link grammar system:Connector Items connectedS noun and verbO verb and objectA adjective and nounD determiner and nounJ preposition and objectSI noun and verb, inversion caseB verb and  wh-word   as objectWith a few more connectors added to the above list, the link grammatical system works to connect a numberof sequence words. These connectors create succinct expressions to feed into the established algorithm togenerate the grammar. It is this succinct, concise nature that makes link grammars stand out from the“traditional” context-free parsing, particularly those that use the Penn Treebank tags. The link grammar’slist of connectors is far smaller than the list of Penn Treebank tags, making the link grammar useful inthe more general sense of parsing natural language. With fewer connectors, one could argue that the linkgrammar more efficiently constructs a grammar. However, it is also the case that this “efficiency” comesat a cost, as the link grammar is more restricted and cannot handle the variety of word sequences that the“traditional” context-free parsing grammatical systems can handle. Grammar The link grammar dictionary for English consists of approximately 25,000 words reflective of the commonlanguage. The grammar itself consists of about 800 different formulas for parsing English sentences. Ac-cording to authors Sleator and Temperley, with this dictionary, the grammar covers sentences containingnoun-verb agreements, questions, participles, relative phrases and a number of other grammatical structures.The link grammar is a lexical system, which means that each word in the dictionary of the grammar is de-fined based on its use in a sentence. This makes it easier to construct a larger grammar and allows for aneasy manipulation of problem words (for instance, irregular English verbs) for inclusion into the grammar.Figure 2 shows a sample link grammar.Final Project: Link Grammar - 2  Figure 2: Sample link grammar Probabilistic parsing The link grammar uses a probabilistic model similar to the one used in context-free grammars. However,the probabilities of certain parses are represented differently in each system. With context-free grammars,we were used to seeing a negative log probability score, in which case a higher number is a more likely parse.It seemed that some features of sentences made these scores vary widely, even for valid parses. For example,sentences with longer repetition of the same adjective turn out to have a lower parse probability. The choiceof words also affected the parse probability shown to the user. Words more commonly seen in the trainingcorpus increased the log probability of an equivalent-length sentence.The most significant difference between the context-free and link probabilistic models is the representation.The link grammar represents parse probability in terms of a cost vector. A parse with higher cost is less likelythan one with a lower cost. The vector components are listed in order of decreasing significance, startingwith the most significant component on the left. The four vector components in order are  UNUSED ,  DIS ,  AND ,and  LEN .Final Project: Link Grammar - 3  ã  UNUSED  refers to the number of null-links, or words that are ignored while parsing. An interjection isan example of a null-link. ã  DIS  refers to the connector, or disjunct cost. Some links have higher cost than others. ã  AND  refers to the difference in length of elements in an  and   list. ã  LEN  refers to the total length of links in a sentence.The probabilistic model for link grammar depends on its basic principle of linking. Since a link looks totwo connectors, a left connector and a right connector, the probabilistic model keeps track of each word,also noting whether it links to the left connector, right connector or both. For a linkage, or a set of linksbetween words (this would be for a sentence or a phrase, the probabilities are simply the product of allthe link probabilities. Like in context-free parsing, these conditional probabilities are formed over quite alarge vocabulary, so estimates and approximations are used. The probabilistic model for link grammarshold the property of being generative (one probabilistic model per language in the grammar), a similartrait of “traditional” context-free parsers. Overall, the probabilistic model for link grammar shares manycommonalities with the model for most context-free parsing grammars. Comparison test to context-free probabilistic model To see how the probabilistic model compares to the one used in Lab 5/6, we ran a series of tests on the samesentences used in that lab. We summarize our results below:Sentence Cost VectorThe broker is dead . (UNUSED=0 DIS=0 AND=0 LEN=5)The wicked broker is dead . (UNUSED=0 DIS=0 AND=0 LEN=8)The wicked wicked broker is dead . (UNUSED=0 DIS=0 AND=0 LEN=12)The wicked wicked wicked broker is dead . (UNUSED=0 DIS=0 AND=0 LEN=17)The wicked wicked wicked wicked broker is dead . (UNUSED=0 DIS=0 AND=0 LEN=23)The output from the first three sentences is shown below: linkparser> the broker is dead++++Time 0.00 seconds (0.00 total)Found 1 linkage (1 had no P.P. violations)Unique linkage, cost vector = (UNUSED=0 DIS=0 AND=0 LEN=5)+--Ds--+--Ss--+--Pa-+| | | |the broker.n is.v dead.a(S (NP the broker)(VP is(ADJP dead)))linkparser> the wicked broker is dead++++Time 0.00 seconds (0.00 total)Found 1 linkage (1 had no P.P. violations)Unique linkage, cost vector = (UNUSED=0 DIS=0 AND=0 LEN=8)+-------Ds------+| +----A---+--Ss--+--Pa-+| | | | |the wicked.a broker.n is.v dead.a(S (NP the wicked broker)(VP is(ADJP dead))) Final Project: Link Grammar - 4
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks