Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(191)

Side by Side Diff: README_en_GB.txt

Issue 1810993003: Updated en-* dictionaries to 7-0. Added fa-IR. (Closed) Base URL: https://chromium.googlesource.com/chromium/deps/hunspell_dictionaries.git@master
Patch Set: One more word for the delta files Created 4 years, 9 months ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View unified diff | Download patch
OLDNEW
1 This dictionary was initially based on a subset of the 1 en_GB-ise Hunspell Dictionary
2 original English wordlist created by Kevin Atkinson for 2 Version 2016.01.19
3 Pspell and Aspell and thus is covered by his original 3 Tue Jan 19 17:07:49 2016 -0500 [a535654]
4 LGPL licence. 4 http://wordlist.sourceforge.net
5 5
6 It has been extensively updated by David Bartlett, Brian Kelk 6 README file for English Hunspell dictionaries derived from SCOWL.
7 and Andrew Brown: 7
8 - numerous Americanism have been removed 8 These dictionaries are created using the speller/make-hunspell-dict
9 - numerous American spellings have been corrected 9 script in SCOWL.
10 - missing words have been added 10
11 - many errors have been corrected 11 The following dictionaries are available:
12 - compound hyphenated words have been added where appropriate 12
13 13 en_US (American)
14 Valuable inputs to this process were received from many other 14 en_CA (Canadian)
15 people - far too numerous to name. Serious thanks to you all 15 en_GB-ise (British with "ise" spelling)
16 for your greatly appreciated help. 16 en_GB-ize (British with "ize" spelling)
17 17
18 This word list is intended to be a good representation of 18 en_US-large
19 current modern British English and thus it should be a good 19 en_CA-large
20 basis for Commonwealth English in most countries of the world 20 en_GB-large (with both "ise" and "ize" spelling)
21 outside North America. 21
22 22 The normal (non-large) dictionaries correspond to SCOWL size 60 and,
23 The affix file has been created completely from scratch 23 to encourage consistent spelling, generally only include one spelling
24 by David Bartlett and Andrew Brown, based on the published 24 variant for a word. The large dictionaries correspond to SCOWL size
25 rules for MySpell and is also provided under the LGPL. 25 70 and may include multiple spelling for a word when both variants are
26 26 considered almost equal. The larger dictionaries however (1) have not
27 In creating the affix rules an attempt has been made to 27 been as carefully checked for errors as the normal dictionaries and
28 reproduce the most general rules for English word 28 thus may contain misspelled or invalid words; and (2) contain
29 formation, rather than merely use it as a means to 29 uncommon, yet valid, words that might cause problems as they are
30 compress the size of the dictionary. It is hoped that this 30 likely to be misspellings of more common words (for example, "ort" and
31 will facilitate future localisation to other variants of 31 "calender").
32 English. 32
33 33 To get an idea of the difference in size, here are 25 random words
34 Please let David Bartlett <dwb@openoffice.org> know of any 34 only found in the large dictionary for American English:
35 errors that you find. 35
36 36 Bermejo Freyr's Guenevere Hatshepsut Nottinghamshire arrestment
37 The current release is R 1.18, 11/04/05 37 crassitudes crural dogwatches errorless fetial flaxseeds godroon
38 incretion jalapeño's kelpie kishkes neuroglias pietisms pullulation
39 stemwinder stenoses syce thalassic zees
40
41 The en_US and en_CA are the official dictionaries for Hunspell. The
42 en_GB and large dictionaries are made available on an experimental
43 basis. If you find them useful please send me a quick email at
44 kevina@gnu.org.
45
46 If none of these dictionaries suite you (for example, maybe you want
47 the normal dictionary that also includes common variants) additional
48 dictionaries can be generated at http://app.aspell.net/create or by
49 modifying speller/make-hunspell-dict in SCOWL. Please do let me know
50 if you end up publishing a customized dictionary.
51
52 If a word is not found in the dictionary or a word is there you think
53 shouldn't be, you can lookup the word up at http://app.aspell.net/lookup
54 to help determine why that is.
55
56 General comments on these list can be sent directly to me at
57 kevina@gnu.org or to the wordlist-devel mailing lists
58 (https://lists.sourceforge.net/lists/listinfo/wordlist-devel). If you
59 have specific issues with any of these dictionaries please file a bug
60 report at https://github.com/kevina/wordlist/issues.
61
62 IMPORTANT CHANGES INTRODUCED IN 2015.04.24:
63
64 The dictionaries are now in UTF-8 format instead of ISO-8859-1. This
65 was required to handle smart quotes correctly.
66
67 IMPORTANT CHANGES INTRODUCED IN 2016.01.19:
68
69 "SET UTF8" was changes to "SET UTF-8" in the affix file as some
70 versions of Hunspell do not recognize "UTF8".
71
72 ADDITIONAL NOTES:
73
74 The NOSUGGEST flag was added to certain taboo words. While I made an
75 honest attempt to flag the strongest taboo words with the NOSUGGEST
76 flag, I MAKE NO GUARANTEE THAT I FLAGGED EVERY POSSIBLE TABOO WORD.
77 The list was originally derived from Németh László, however I removed
78 some words which, while being considered taboo by some dictionaries,
79 are not really considered swear words in today's society.
80
81 COPYRIGHT, SOURCES, and CREDITS:
82
83 The English dictionaries come directly from SCOWL
84 and is thus under the same copyright of SCOWL. The affix file is
85 a heavily modified version of the original english.aff file which was
86 released as part of Geoff Kuenning's Ispell and as such is covered by
87 his BSD license. Part of SCOWL is also based on Ispell thus the
88 Ispell copyright is included with the SCOWL copyright.
89
90 The collective work is Copyright 2000-2015 by Kevin Atkinson as well
91 as any of the copyrights mentioned below:
92
93 Copyright 2000-2015 by Kevin Atkinson
94
95 Permission to use, copy, modify, distribute and sell these word
96 lists, the associated scripts, the output created from the scripts,
97 and its documentation for any purpose is hereby granted without fee,
98 provided that the above copyright notice appears in all copies and
99 that both that copyright notice and this permission notice appear in
100 supporting documentation. Kevin Atkinson makes no representations
101 about the suitability of this array for any purpose. It is provided
102 "as is" without express or implied warranty.
103
104 Alan Beale <biljir@pobox.com> also deserves special credit as he has,
105 in addition to providing the 12Dicts package and being a major
106 contributor to the ENABLE word list, given me an incredible amount of
107 feedback and created a number of special lists (those found in the
108 Supplement) in order to help improve the overall quality of SCOWL.
109
110 The 10 level includes the 1000 most common English words (according to
111 the Moby (TM) Words II [MWords] package), a subset of the 1000 most
112 common words on the Internet (again, according to Moby Words II), and
113 frequently class 16 from Brian Kelk's "UK English Wordlist
114 with Frequency Classification".
115
116 The MWords package was explicitly placed in the public domain:
117
118 The Moby lexicon project is complete and has
119 been place into the public domain. Use, sell,
120 rework, excerpt and use in any way on any platform.
121
122 Placing this material on internal or public servers is
123 also encouraged. The compiler is not aware of any
124 export restrictions so freely distribute world-wide.
125
126 You can verify the public domain status by contacting
127
128 Grady Ward
129 3449 Martha Ct.
130 Arcata, CA 95521-4884
131
132 grady@netcom.com
133 grady@northcoast.com
134
135 The "UK English Wordlist With Frequency Classification" is also in the
136 Public Domain:
137
138 Date: Sat, 08 Jul 2000 20:27:21 +0100
139 From: Brian Kelk <Brian.Kelk@cl.cam.ac.uk>
140
141 > I was wondering what the copyright status of your "UK English
142 > Wordlist With Frequency Classification" word list as it seems to
143 > be lacking any copyright notice.
144
145 There were many many sources in total, but any text marked
146 "copyright" was avoided. Locally-written documentation was one
147 source. An earlier version of the list resided in a filespace called
148 PUBLIC on the University mainframe, because it was considered public
149 domain.
150
151 Date: Tue, 11 Jul 2000 19:31:34 +0100
152
153 > So are you saying your word list is also in the public domain?
154
155 That is the intention.
156
157 The 20 level includes frequency classes 7-15 from Brian's word list.
158
159 The 35 level includes frequency classes 2-6 and words appearing in at
160 least 11 of 12 dictionaries as indicated in the 12Dicts package. All
161 words from the 12Dicts package have had likely inflections added via
162 my inflection database.
163
164 The 12Dicts package and Supplement is in the Public Domain.
165
166 The WordNet database, which was used in the creation of the
167 Inflections database, is under the following copyright:
168
169 This software and database is being provided to you, the LICENSEE,
170 by Princeton University under the following license. By obtaining,
171 using and/or copying this software and database, you agree that you
172 have read, understood, and will comply with these terms and
173 conditions.:
174
175 Permission to use, copy, modify and distribute this software and
176 database and its documentation for any purpose and without fee or
177 royalty is hereby granted, provided that you agree to comply with
178 the following copyright notice and statements, including the
179 disclaimer, and that the same appear on ALL copies of the software,
180 database and documentation, including modifications that you make
181 for internal use or for distribution.
182
183 WordNet 1.6 Copyright 1997 by Princeton University. All rights
184 reserved.
185
186 THIS SOFTWARE AND DATABASE IS PROVIDED "AS IS" AND PRINCETON
187 UNIVERSITY MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR
188 IMPLIED. BY WAY OF EXAMPLE, BUT NOT LIMITATION, PRINCETON
189 UNIVERSITY MAKES NO REPRESENTATIONS OR WARRANTIES OF MERCHANT-
190 ABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF THE
191 LICENSED SOFTWARE, DATABASE OR DOCUMENTATION WILL NOT INFRINGE ANY
192 THIRD PARTY PATENTS, COPYRIGHTS, TRADEMARKS OR OTHER RIGHTS.
193
194 The name of Princeton University or Princeton may not be used in
195 advertising or publicity pertaining to distribution of the software
196 and/or database. Title to copyright in this software, database and
197 any associated documentation shall at all times remain with
198 Princeton University and LICENSEE agrees to preserve same.
199
200 The 40 level includes words from Alan's 3esl list found in version 4.0
201 of his 12dicts package. Like his other stuff the 3esl list is also in the
202 public domain.
203
204 The 50 level includes Brian's frequency class 1, words appearing
205 in at least 5 of 12 of the dictionaries as indicated in the 12Dicts
206 package, and uppercase words in at least 4 of the previous 12
207 dictionaries. A decent number of proper names is also included: The
208 top 1000 male, female, and Last names from the 1990 Census report; a
209 list of names sent to me by Alan Beale; and a few names that I added
210 myself. Finally a small list of abbreviations not commonly found in
211 other word lists is included.
212
213 The name files form the Census report is a government document which I
214 don't think can be copyrighted.
215
216 The file special-jargon.50 uses common.lst and word.lst from the
217 "Unofficial Jargon File Word Lists" which is derived from "The Jargon
218 File". All of which is in the Public Domain. This file also contain
219 a few extra UNIX terms which are found in the file "unix-terms" in the
220 special/ directory.
221
222 The 55 level includes words from Alan's 2of4brif list found in version
223 4.0 of his 12dicts package. Like his other stuff the 2of4brif is also
224 in the public domain.
225
226 The 60 level includes all words appearing in at least 2 of the 12
227 dictionaries as indicated by the 12Dicts package.
228
229 The 70 level includes Brian's frequency class 0 and the 74,550 common
230 dictionary words from the MWords package. The common dictionary words,
231 like those from the 12Dicts package, have had all likely inflections
232 added. The 70 level also included the 5desk list from version 4.0 of
233 the 12Dics package which is in the public domain.
234
235 The 80 level includes the ENABLE word list, all the lists in the
236 ENABLE supplement package (except for ABLE), the "UK Advanced Cryptics
237 Dictionary" (UKACD), the list of signature words from the YAWL package,
238 and the 10,196 places list from the MWords package.
239
240 The ENABLE package, mainted by M\Cooper <thegrendel@theriver.com>,
241 is in the Public Domain:
242
243 The ENABLE master word list, WORD.LST, is herewith formally released
244 into the Public Domain. Anyone is free to use it or distribute it in
245 any manner they see fit. No fee or registration is required for its
246 use nor are "contributions" solicited (if you feel you absolutely
247 must contribute something for your own peace of mind, the authors of
248 the ENABLE list ask that you make a donation on their behalf to your
249 favorite charity). This word list is our gift to the Scrabble
250 community, as an alternate to "official" word lists. Game designers
251 may feel free to incorporate the WORD.LST into their games. Please
252 mention the source and credit us as originators of the list. Note
253 that if you, as a game designer, use the WORD.LST in your product,
254 you may still copyright and protect your product, but you may *not*
255 legally copyright or in any way restrict redistribution of the
256 WORD.LST portion of your product. This *may* under law restrict your
257 rights to restrict your users' rights, but that is only fair.
258
259 UKACD, by J Ross Beresford <ross@bryson.demon.co.uk>, is under the
260 following copyright:
261
262 Copyright (c) J Ross Beresford 1993-1999. All Rights Reserved.
263
264 The following restriction is placed on the use of this publication:
265 if The UK Advanced Cryptics Dictionary is used in a software package
266 or redistributed in any form, the copyright notice must be
267 prominently displayed and the text of this document must be included
268 verbatim.
269
270 There are no other restrictions: I would like to see the list
271 distributed as widely as possible.
272
273 The 95 level includes the 354,984 single words, 256,772 compound
274 words, 4,946 female names and the 3,897 male names, and 21,986 names
275 from the MWords package, ABLE.LST from the ENABLE Supplement, and some
276 additional words found in my part-of-speech database that were not
277 found anywhere else.
278
279 Accent information was taken from UKACD.
280
281 My VARCON package was used to create the American, British, and
282 Canadian word list.
283
284 Since the original word lists used in the VARCON package came
285 from the Ispell distribution they are under the Ispell copyright:
286
287 Copyright 1993, Geoff Kuenning, Granada Hills, CA
288 All rights reserved.
289
290 Redistribution and use in source and binary forms, with or without
291 modification, are permitted provided that the following conditions
292 are met:
293
294 1. Redistributions of source code must retain the above copyright
295 notice, this list of conditions and the following disclaimer.
296 2. Redistributions in binary form must reproduce the above copyright
297 notice, this list of conditions and the following disclaimer in the
298 documentation and/or other materials provided with the distribution.
299 3. All modifications to the source code must be clearly marked as
300 such. Binary redistributions based on modified source code
301 must be clearly marked as modified versions in the documentation
302 and/or other materials provided with the distribution.
303 (clause 4 removed with permission from Geoff Kuenning)
304 5. The name of Geoff Kuenning may not be used to endorse or promote
305 products derived from this software without specific prior
306 written permission.
307
308 THIS SOFTWARE IS PROVIDED BY GEOFF KUENNING AND CONTRIBUTORS ``AS
309 IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
310 LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
311 FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL GEOFF
312 KUENNING OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
313 INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
314 BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
315 LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
316 CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
317 LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
318 ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
319 POSSIBILITY OF SUCH DAMAGE.
320
321 Build Date: Tue Jan 19 17:11:09 EST 2016
322 Wordlist Command: mk-list --accents=strip en_GB-ise 60
OLDNEW
« no previous file with comments | « README_en_CA.txt ('k') | README_en_US.txt » ('j') | en_GB.dic_delta » ('J')

Powered by Google App Engine
This is Rietveld 408576698