Download PDFOpen PDF in browser

Sentence Length and NP Complexity of General and Medical Written Academic and Media Texts. An Analysis Using a Trained Syntactic Parser.

10 pagesPublished: November 28, 2016

Abstract

The main objective of this work is to perform a comparative analysis of sentence and main noun phrases complexity in two different types of discourses, written media and academic prose, using a trained syntactic parser (Stanford PCFG Parser). For this purpose, we have selected three written sources: a general media corpus, a medical media subcorpus and a medical academic prose subcorpus. From a total of more than 160000 sentences, we have carefully selected as the study sample a total of 300, which have been morphologically and syntactically annotated.
Influenced by other studies related to syntax and statistics, our hypothesis is that NPs from academic prose and written media will contain four or more words, and those belonging to academic prose will be larger than the latter. The NPs studied are those that perform the main functions of the clause: subject, object (direct and indirect), attribute and time expressions. The results show a confirmation of our hypothesis. The academic subcorpus has the longest sentences and more complex NPs than the other texts. On the other hand, written media corpora achieve smaller NPs but their results are quite similar.

Keyphrases: grammar induction, noun phrase, syntactic parsing, treebank

In: Antonio Moreno Ortiz and Chantal Pérez-Hernández (editors). CILC2016. 8th International Conference on Corpus Linguistics, vol 1, pages 181--190

Links:
BibTeX entry
@inproceedings{CILC2016:Sentence_Length_and_NP,
  author    = {Carlos Herrero Zorita and Antonio Moreno-Sandoval},
  title     = {Sentence Length and  NP Complexity of General and  Medical Written Academic and  Media Texts. An Analysis Using a Trained Syntactic Parser.},
  booktitle = {CILC2016. 8th International Conference on Corpus Linguistics},
  editor    = {Antonio Moreno Ortiz and Chantal P\textbackslash{}'erez-Hern\textbackslash{}'andez},
  series    = {EPiC Series in Language and Linguistics},
  volume    = {1},
  pages     = {181--190},
  year      = {2016},
  publisher = {EasyChair},
  bibsource = {EasyChair, https://easychair.org},
  issn      = {2398-5283},
  url       = {https://easychair.org/publications/paper/ZcdJ},
  doi       = {10.29007/gr8r}}
Download PDFOpen PDF in browser