1. open office project:
From: http://lingucomponent.openoffice.org/grammar.html
start:
Lingucomponent Sub-Project: Grammar Checking
One of the goals of the Lingucomponent project is to design, develop, and implement a Grammar checker for English and other supported languages.
News
* October 2008: A grammar checking API is now part of OpenOffice.org 3.0. LanguageTool now makes use of this new API which allows on-the-fly checking, i.e. checking text while you type. More information about the API is available at the following sources:
o Grammar Checking page in Wiki
o Specification (.odt)
o All grammar checker issues
If you have any interest in helping to implement a grammar checker for the OpenOffice.org project, please subscribe to the mailing list dev@lingucomponent.openoffice.org and introduce yourself, your skills, and your willingness to help this project.
Links to Open Source grammar checkers
* An Gramadóir, a grammar checker for the Irish language
* CoGrOO, a grammar checker for Portuguese
* GRAC, corpus-based grammar checker written in Python
* graviax, grammar rules and grammar checker for the English language
* Higgins, a prototype English-language parser
* LanguageTool, a style and grammar checker with OpenOffice.org integration, for English, German, Polish, Dutch, and other languages
* Link Grammar, not really a grammar checker but a parser for the English language, also see the link grammar page at AbiWord
Links to commercial grammar checkers
* Cysgliad, a Welsh grammar checker
Created: 2001 June. Last Modified: $Date: 2008/11/08 21:32:59 $, $Revision: 1.25 $
2. Requirement (not totally true)
From: http://wiki.services.openoffice.org/wiki/Grammar_Checking
Grammar checking of mixed language text
It is believed that even for sentences that uses several languages there is only a single language the whole sentence is in. (How that language is identified is a completely different matter and probably a complex task though!) And thus that sentence should only be grammar checked in that single language. For example:
The German word for television is Fernseher.
This sentence should be grammar checked in English and not German
If possible though (for example if language attributes are set correctly) it should be noted that Fernseher is not in English and thus at the very least no spelling error should for English should be reported for that word. And probably it is also impossible to report any grammar error that involves embedded foreign words. Thus the best to hope for probably is for the foreign word to be recognized as correct by the respective spell checker.
Even with completely embedded sentence like
In Gallica Caesar said 'Alea iacta est.' and continued his battle.
the above text is in a single language English and not Latin. If an existing grammar checker is smart enough to cope with embedded sentences of a different language I don't know. To keep it simple for the time being the whole text should be grammar checked as one sentence in English and in only that language.
Grammar checking and spell checking at the same time
Should spell checking have an iterator of it's own with a thread of it's own? Or should spell checking be handled by the GrammarCheckingIterator as well?
Other Questions / problems:
* checking is limited to paragraphs (unless the implementation of XFlatParagraph chooses to hide sth. more behind it which is unlikely). Though one could think of enumerations as a possible application for this behavior.
* in the case of several grammar checkers for one languages, what do we do if they report different end-of-sentence positions? We really can't handle each checker individually here.
* does a grammar checker that requires knowledge of the previous text in this paragraph need to have those text presented even if it is in a language it does not know?
* How to achieve consistency of usage (e.g. spelling) when having grammar checkers in multiple languages? E.g. e-mail vs. email? Or does it need to be consistent on a per language base only?
* How to determine the language of a sentence? Use the language of the first word, or language guessing, or the language with the most words,... ?
* Problems related to a specific UI, namely the grammar checking dialog still to be defined, not yet covered.
* The troublesome case of having for example three grammar checkers for one language and two of them wanting to use their own dialog while the third will go with the office internal one is left out. Because if all of them report errors in the same sentence and like to use their own dialog as well we will have to cope with switching between three dialogs just to edit a single sentence. That's just plain awful to even think about. And I doubt there will be even one user to appreciate such a scenario.
* Should the document (e.g. XFlatParagraph) be in charge to determine the language for checking or should it be the GrammarCheckingIterator? Probably the latter...
没有评论:
发表评论