Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(57)

Side by Side Diff: README_en_CA.txt

Issue 646383003: Update English dictionaries (Closed) Base URL: https://chromium.googlesource.com/chromium/deps/hunspell_dictionaries.git@master
Patch Set: Fix patchset (didn't upload) Created 6 years, 1 month ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View unified diff | Download patch
« no previous file with comments | « no previous file | README_en_US.txt » ('j') | README_en_US.txt » ('J')
Toggle Intra-line Diffs ('i') | Expand Comments ('e') | Collapse Comments ('c') | Show Comments Hide Comments ('s')
OLDNEW
1 Version 7.1-0 1 en_CA Hunspell Dictionary
2 2011-01-06 2 Version 2014.08.11
3 Mon Aug 11 18:23:56 2014 +0200 [be45e88]
4 http://wordlist.sourceforge.net
3 5
4 README file for en_US and en_CA Hunspell dictionaries 6 README file for English Hunspell dictionaries derived from SCOWL.
5 7
6 These dictionaries are created using the speller/make-hunspell-dict 8 These dictionaries are created using the speller/make-hunspell-dict
7 script in SCOWL, version 7.1 released on January 6, 2011. 9 script in SCOWL.
10
11 The following dictionaries are available:
12
13 en_US (American)
14 en_CA (Canadian)
15 en_GB-ise (British with "ize" spelling)
rpetterson 2014/10/27 22:15:48 should this be "ise"?
hichris123 2014/10/28 00:23:38 Done. Figured it'll be landed upstream eventually,
16 en_GB-ize (British with "ize" spelling)
17
18 en_US-large
19 en_CA-large
20 en_GB-large (with both "ize" and "ise" spelling)
21
22 The normal (non-large) dictionaries correspond to SCOWL size 60 and,
23 to encourage consistent spelling, generally only include one spelling
24 variant for a word. The large dictionaries correspond to SCOWL size
25 70 and may include multiple spelling for a word when both variants are
26 considered almost equal. Also, the general quality of the larger
27 dictionaries may also be less as they are not as carefully checked for
28 errors as the normal dictionaries.
29
30 To get an idea of the difference in size, here are 25 random words
31 only found in the large dictionary for American English:
32
33 Bermejo Freyr's Guenevere Hatshepsut Nottinghamshire arrestment
34 crassitudes crural dogwatches errorless fetial flaxseeds godroon
35 incretion jalapeño's kelpie kishkes neuroglias pietisms pullulation
36 stemwinder stenoses syce thalassic zees
37
38 The en_US and en_CA are the official dictionaries for Hunspell. The
39 en_GB and large dictionaries are made available on an experimental
40 basis. If you find them useful please send me a quick email at
41 kevina@gnu.org.
42
43 If none of these dictionaries suite you (for example, maybe you want
44 the larger dictionary but only use spelling of a word) additional
45 dictionaries can be generated at http://app.aspell.net/create or by
46 modifying speller/make-hunspell-dict in SCOWL. Please do let me know
47 if you end up publishing a customized dictionary.
48
49 If a word is not found in the dictionary or a word is there you think
50 shouldn't be, you can lookup the word up at http://app.aspell.net/lookup
51 to help determine why that is.
52
53 General comments on these list can be sent directly to me at
54 kevina@gnu.org or to the wordlist-devel mailing lists
55 (https://lists.sourceforge.net/lists/listinfo/wordlist-devel). If you
56 have specific issues with any of these dictionaries please file a bug
57 report at https://github.com/kevina/wordlist/issues.
58
59 ADDITIONAL NOTES:
8 60
9 The NOSUGGEST flag was added to certain taboo words. While I made an 61 The NOSUGGEST flag was added to certain taboo words. While I made an
10 honest attempt to flag the strongest taboo words with the NOSUGGEST 62 honest attempt to flag the strongest taboo words with the NOSUGGEST
11 flag, I MAKE NO GUARANTEE THAT I FLAGGED EVERY POSSIBLE TABOO WORD. 63 flag, I MAKE NO GUARANTEE THAT I FLAGGED EVERY POSSIBLE TABOO WORD.
12 The list was originally derived from Németh László, however I removed 64 The list was originally derived from Németh László, however I removed
13 some words which, while being considered taboo by some dictionaries, 65 some words which, while being considered taboo by some dictionaries,
14 are not really considered swear words in today's society. 66 are not really considered swear words in today's society.
15 67
16 You can find SCOWL and friend at http://wordlist.sourceforge.net/.
17 Bug reports should go to the Issue Tracker found on the previously
18 mentioned web site. General discussion should go to the
19 wordlist-devel at sourceforge net mailing list.
20
21 COPYRIGHT, SOURCES, and CREDITS: 68 COPYRIGHT, SOURCES, and CREDITS:
22 69
23 The en_US and en_CA dictionaries come directly from SCOWL (up to level 70 The English dictionaries come directly from SCOWL
24 60) and is thus under the same copyright of SCOWL. The affix file is 71 and is thus under the same copyright of SCOWL. The affix file is
25 a heavily modified version of the original english.aff file which was 72 a heavily modified version of the original english.aff file which was
26 released as part of Geoff Kuenning's Ispell and as such is covered by 73 released as part of Geoff Kuenning's Ispell and as such is covered by
27 his BSD license. Part of SCOWL is also based on Ispell thus the 74 his BSD license. Part of SCOWL is also based on Ispell thus the
28 Ispell copyright is included with the SCOWL copyright. 75 Ispell copyright is included with the SCOWL copyright.
29 76
30 The collective work is Copyright 2000-2011 by Kevin Atkinson as well 77 The collective work is Copyright 2000-2014 by Kevin Atkinson as well
31 as any of the copyrights mentioned below: 78 as any of the copyrights mentioned below:
32 79
33 Copyright 2000-2011 by Kevin Atkinson 80 Copyright 2000-2014 by Kevin Atkinson
34 81
35 Permission to use, copy, modify, distribute and sell these word 82 Permission to use, copy, modify, distribute and sell these word
36 lists, the associated scripts, the output created from the scripts, 83 lists, the associated scripts, the output created from the scripts,
37 and its documentation for any purpose is hereby granted without fee, 84 and its documentation for any purpose is hereby granted without fee,
38 provided that the above copyright notice appears in all copies and 85 provided that the above copyright notice appears in all copies and
39 that both that copyright notice and this permission notice appear in 86 that both that copyright notice and this permission notice appear in
40 supporting documentation. Kevin Atkinson makes no representations 87 supporting documentation. Kevin Atkinson makes no representations
41 about the suitability of this array for any purpose. It is provided 88 about the suitability of this array for any purpose. It is provided
42 "as is" without express or implied warranty. 89 "as is" without express or implied warranty.
43 90
(...skipping 90 matching lines...) Expand 10 before | Expand all | Expand 10 after
134 The name of Princeton University or Princeton may not be used in 181 The name of Princeton University or Princeton may not be used in
135 advertising or publicity pertaining to distribution of the software 182 advertising or publicity pertaining to distribution of the software
136 and/or database. Title to copyright in this software, database and 183 and/or database. Title to copyright in this software, database and
137 any associated documentation shall at all times remain with 184 any associated documentation shall at all times remain with
138 Princeton University and LICENSEE agrees to preserve same. 185 Princeton University and LICENSEE agrees to preserve same.
139 186
140 The 40 level includes words from Alan's 3esl list found in version 4.0 187 The 40 level includes words from Alan's 3esl list found in version 4.0
141 of his 12dicts package. Like his other stuff the 3esl list is also in the 188 of his 12dicts package. Like his other stuff the 3esl list is also in the
142 public domain. 189 public domain.
143 190
144 The 50 level includes Brian's frequency class 1, words words appearing 191 The 50 level includes Brian's frequency class 1, words appearing
145 in at least 5 of 12 of the dictionaries as indicated in the 12Dicts 192 in at least 5 of 12 of the dictionaries as indicated in the 12Dicts
146 package, and uppercase words in at least 4 of the previous 12 193 package, and uppercase words in at least 4 of the previous 12
147 dictionaries. A decent number of proper names is also included: The 194 dictionaries. A decent number of proper names is also included: The
148 top 1000 male, female, and Last names from the 1990 Census report; a 195 top 1000 male, female, and Last names from the 1990 Census report; a
149 list of names sent to me by Alan Beale; and a few names that I added 196 list of names sent to me by Alan Beale; and a few names that I added
150 myself. Finally a small list of abbreviations not commonly found in 197 myself. Finally a small list of abbreviations not commonly found in
151 other word lists is included. 198 other word lists is included.
152 199
153 The name files form the Census report is a government document which I 200 The name files form the Census report is a government document which I
154 don't think can be copyrighted. 201 don't think can be copyrighted.
155 202
156 The file special-jargon.50 uses common.lst and word.lst from the 203 The file special-jargon.50 uses common.lst and word.lst from the
157 "Unofficial Jargon File Word Lists" which is derived from "The Jargon 204 "Unofficial Jargon File Word Lists" which is derived from "The Jargon
158 File". All of which is in the Public Domain. This file also contain 205 File". All of which is in the Public Domain. This file also contain
159 a few extra UNIX terms which are found in the file "unix-terms" in the 206 a few extra UNIX terms which are found in the file "unix-terms" in the
160 special/ directory. 207 special/ directory.
161 208
162 The 55 level includes words from Alan's 2of4brif list found in version 209 The 55 level includes words from Alan's 2of4brif list found in version
163 4.0 of his 12dicts package. Like his other stuff the 2of4brif is also 210 4.0 of his 12dicts package. Like his other stuff the 2of4brif is also
164 in the public domain. 211 in the public domain.
165 212
166 The 60 level includes all words appearing in at least 2 of the 12 213 The 60 level includes all words appearing in at least 2 of the 12
167 dictionaries as indicated by the 12Dicts package. 214 dictionaries as indicated by the 12Dicts package.
168 215
169 The 70 level includes Brian's frequency class 0 and the 74,550 common 216 The 70 level includes Brian's frequency class 0 and the 74,550 common
170 dictionary words from the MWords package. The common dictionary words, 217 dictionary words from the MWords package. The common dictionary words,
171 like those from the 12Dicts package, have had all likely inflections 218 like those from the 12Dicts package, have had all likely inflections
172 added. The 70 level also included the 5desk list from version 4.0 of 219 added. The 70 level also included the 5desk list from version 4.0 of
173 the 12Dics package which is the public domain. 220 the 12Dics package which is in the public domain.
174 221
175 The 80 level includes the ENABLE word list, all the lists in the 222 The 80 level includes the ENABLE word list, all the lists in the
176 ENABLE supplement package (except for ABLE), the "UK Advanced Cryptics 223 ENABLE supplement package (except for ABLE), the "UK Advanced Cryptics
177 Dictionary" (UKACD), the list of signature words in from YAWL package, 224 Dictionary" (UKACD), the list of signature words from the YAWL package,
178 and the 10,196 places list from the MWords package. 225 and the 10,196 places list from the MWords package.
179 226
180 The ENABLE package, mainted by M\Cooper <thegrendel@theriver.com>, 227 The ENABLE package, mainted by M\Cooper <thegrendel@theriver.com>,
181 is in the Public Domain: 228 is in the Public Domain:
182 229
183 The ENABLE master word list, WORD.LST, is herewith formally released 230 The ENABLE master word list, WORD.LST, is herewith formally released
184 into the Public Domain. Anyone is free to use it or distribute it in 231 into the Public Domain. Anyone is free to use it or distribute it in
185 any manner they see fit. No fee or registration is required for its 232 any manner they see fit. No fee or registration is required for its
186 use nor are "contributions" solicited (if you feel you absolutely 233 use nor are "contributions" solicited (if you feel you absolutely
187 must contribute something for your own peace of mind, the authors of 234 must contribute something for your own peace of mind, the authors of
(...skipping 26 matching lines...) Expand all
214 words, 4,946 female names and the 3,897 male names, and 21,986 names 261 words, 4,946 female names and the 3,897 male names, and 21,986 names
215 from the MWords package, ABLE.LST from the ENABLE Supplement, and some 262 from the MWords package, ABLE.LST from the ENABLE Supplement, and some
216 additional words found in my part-of-speech database that were not 263 additional words found in my part-of-speech database that were not
217 found anywhere else. 264 found anywhere else.
218 265
219 Accent information was taken from UKACD. 266 Accent information was taken from UKACD.
220 267
221 My VARCON package was used to create the American, British, and 268 My VARCON package was used to create the American, British, and
222 Canadian word list. 269 Canadian word list.
223 270
224 Since the original word lists used used in the VARCON package came 271 Since the original word lists used in the VARCON package came
225 from the Ispell distribution they are under the Ispell copyright: 272 from the Ispell distribution they are under the Ispell copyright:
226 273
227 Copyright 1993, Geoff Kuenning, Granada Hills, CA 274 Copyright 1993, Geoff Kuenning, Granada Hills, CA
228 All rights reserved. 275 All rights reserved.
229 276
230 Redistribution and use in source and binary forms, with or without 277 Redistribution and use in source and binary forms, with or without
231 modification, are permitted provided that the following conditions 278 modification, are permitted provided that the following conditions
232 are met: 279 are met:
233 280
234 1. Redistributions of source code must retain the above copyright 281 1. Redistributions of source code must retain the above copyright
(...skipping 16 matching lines...) Expand all
251 FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL GEOFF 298 FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL GEOFF
252 KUENNING OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, 299 KUENNING OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
253 INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, 300 INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
254 BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 301 BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
255 LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 302 LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
256 CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 303 CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
257 LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN 304 LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
258 ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE 305 ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
259 POSSIBILITY OF SUCH DAMAGE. 306 POSSIBILITY OF SUCH DAMAGE.
260 307
261 Build Date: Thu Jan 6 02:31:29 MST 2011 308 Build Date: Mon Aug 11 18:27:22 CEST 2014
309 Wordlist Command: mk-list en_CA 60 | deaccent
OLDNEW
« no previous file with comments | « no previous file | README_en_US.txt » ('j') | README_en_US.txt » ('J')

Powered by Google App Engine
This is Rietveld 408576698