Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(150)

Unified Diff: core/src/fpdftext/fpdf_text_int_unittest.cpp

Issue 1530763005: Correctly extracting email addresses (Closed) Base URL: https://pdfium.googlesource.com/pdfium.git@master
Patch Set: Created 5 years ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View side-by-side diff with in-line comments
Download patch
Index: core/src/fpdftext/fpdf_text_int_unittest.cpp
diff --git a/core/src/fpdftext/fpdf_text_int_unittest.cpp b/core/src/fpdftext/fpdf_text_int_unittest.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..9c63f8f110fc9018066f0c2e427c7bd7990488da
--- /dev/null
+++ b/core/src/fpdftext/fpdf_text_int_unittest.cpp
@@ -0,0 +1,56 @@
+// Copyright 2015 PDFium Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style license that can be
+// found in the LICENSE file.
+
+#include "testing/gtest/include/gtest/gtest.h"
+
+#include "core/src/fpdftext/text_int.h"
+
+// Class to help test functions in CPDF_LinkExtract class.
+class CPDF_TestLinkExtract : public CPDF_LinkExtract {
+ private:
+ // Add test cases as friends to access protected member functions.
+ // Access CheckMailLink.
+ FRIEND_TEST(fpdf_text_int, CheckMailLink);
+};
+
+TEST(fpdf_text_int, CheckMailLink) {
+ CPDF_TestLinkExtract extractor;
+ // Check cases that fail to extract valid mail link.
+ const wchar_t* invalid_strs[] = {
+ L"",
+ L"peter.pan" // '@' is required.
+ L"abc@server", // Host name needs at least one '.'.
Lei Zhang 2015/12/18 00:24:09 As I mentioned previously, we need to investigate
Wei Li 2015/12/18 01:12:21 Added in cpp file.
+ L"abc.@gmail.com", // '.' can not immediately precede '@'.
+ L"abc@xyz&q.org", // Host name should not contain '&'.
+ L"abc@.xyz.org", // Host name should not start with '.'.
+ L"fan@g..com" // Host name should not have consecutive '.'
+ };
+ for (int i = 0; i < FX_ArraySize(invalid_strs); ++i) {
+ CFX_WideString text_str(invalid_strs[i]);
+ EXPECT_EQ(FALSE, extractor.CheckMailLink(text_str));
Lei Zhang 2015/12/18 00:24:09 It should be trivial to convert CheckMailLink() to
Wei Li 2015/12/18 01:12:21 Done.
+ }
+
+ // Check cases that can extract valid mail link.
+ // An array of {input_string, expected_extracted_email_address}.
+ const wchar_t* valid_strs[][2] = {
+ {L"peter@abc.d", L"peter@abc.d"},
+ {L"red.teddy.b@abc.com", L"red.teddy.b@abc.com"},
+ {L"abc_@gmail.com", L"abc_@gmail.com"}, // '_' is ok before '@'.
+ {L"dummy-hi@gmail.com",
+ L"dummy-hi@gmail.com"}, // '-' is ok in user name.
+ {L"a..df@gmail.com", L"df@gmail.com"}, // Stop at consecutive '.'.
+ {L".john@yahoo.com", L"john@yahoo.com"}, // Remove heading '.'.
+ {L"abc@xyz.org?/", L"abc@xyz.org"}, // Trim ending invalid chars.
+ {L"fan{abc@xyz.org", L"abc@xyz.org"}, // Trim beginning invalid chars.
+ {L"fan@g.com..", L"fan@g.com"}, // Trim the ending periods.
+ {L"CAP.cap@Gmail.Com", L"CAP.cap@Gmail.Com"}, // Keep the original case.
+ };
+ for (int i = 0; i < FX_ArraySize(valid_strs); ++i) {
+ CFX_WideString text_str(valid_strs[i][0]);
+ CFX_WideString expected_str(L"mailto:");
+ expected_str += valid_strs[i][1];
+ EXPECT_EQ(TRUE, extractor.CheckMailLink(text_str));
+ EXPECT_STREQ(text_str.c_str(), expected_str.c_str());
+ }
+}

Powered by Google App Engine
This is Rietveld 408576698