Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(1082)

Unified Diff: third_party/WebKit/Source/platform/text/TextEncodingDetector.cpp

Issue 1721373002: UTF-8 detector for pages missing encoding info (Closed) Base URL: https://chromium.googlesource.com/chromium/src.git@master
Patch Set: Created 4 years, 10 months ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View side-by-side diff with in-line comments
Download patch
Index: third_party/WebKit/Source/platform/text/TextEncodingDetector.cpp
diff --git a/third_party/WebKit/Source/platform/text/TextEncodingDetector.cpp b/third_party/WebKit/Source/platform/text/TextEncodingDetector.cpp
index 616aac0114d74ae3f284b97a740124242c7c7e3e..a03eb0e9f957fefee468b677b98989c3f738269b 100644
--- a/third_party/WebKit/Source/platform/text/TextEncodingDetector.cpp
+++ b/third_party/WebKit/Source/platform/text/TextEncodingDetector.cpp
@@ -33,6 +33,7 @@
#include "wtf/text/TextEncoding.h"
#include <unicode/ucnv.h>
#include <unicode/ucsdet.h>
+#include <unicode/utf8.h>
namespace blink {
@@ -112,4 +113,21 @@ bool detectTextEncoding(const char* data, size_t length,
return false;
}
+bool isUTF8Encoded(const char* data, size_t length)
+{
+ int32_t srcLen = static_cast<int32_t>(length);
+ int32_t charIndex = 0;
+ bool markDetected = false;
+
+ while (charIndex < srcLen) {
+ int32_t codePoint;
+ if ((uint8_t)(data[charIndex]) >= 0x80)
+ markDetected = true;
+ U8_NEXT(data, charIndex, srcLen, codePoint);
+ if (!U_IS_UNICODE_CHAR(codePoint))
aelias_OOO_until_Jul13 2016/02/24 04:37:54 According to http://icu-project.org/apiref/icu4c/u
Jinsuk Kim 2016/02/24 06:54:54 Thanks for looking into the detail. Ran the unitte
+ return false;
+ }
+ return markDetected;
+}
+
} // namespace blink

Powered by Google App Engine
This is Rietveld 408576698