OLD | NEW |
(Empty) | |
| 1 //===- NaClBitcodeDist.h ---------------------------------------*- C++ -*-===// |
| 2 // Maps distributions of values in PNaCl bitcode files. |
| 3 // |
| 4 // The LLVM Compiler Infrastructure |
| 5 // |
| 6 // This file is distributed under the University of Illinois Open Source |
| 7 // License. See LICENSE.TXT for details. |
| 8 // |
| 9 //===----------------------------------------------------------------------===// |
| 10 // |
| 11 // Creates a (nestable) distribution map of values in PNaCl bitcode. |
| 12 // The domain of these maps is the set of record values being |
| 13 // tracked. The range is the information associated with each block |
| 14 // and/or record value, including the number of instances of that value. The |
| 15 // distribution map is nested if the range element contains another |
| 16 // distribution map. |
| 17 // |
| 18 // The goal of distribution maps is to build a (histogram) |
| 19 // distribution of values in bitcode records and blocks, of a PNaCl |
| 20 // bitcode file. From appropriately built distribution maps, one can |
| 21 // infer possible new abbreviations that can be used in the PNaCl |
| 22 // bitcode file. Hence, one of the primary goals of distribution maps |
| 23 // is to support tools pnacl-bcanalyzer and pnacl-bccompress. |
| 24 // |
| 25 // Distribution maps are constructed either from NaClBitcodeBlock's or |
| 26 // NaClBitcodeRecord's (in NaClBitcodeParser.h), but not both. It is |
| 27 // assumed that only blocks, or records (but not both) are added to a |
| 28 // distribution map. To add data to the distribution map, one calls |
| 29 // AddRecord and/or AddBlock. If the distribution map contains record |
| 30 // values, you must call AddRecord for each record to be put into the |
| 31 // distribution map. If the distribution map contains block values (i.e. |
| 32 // block ID's), you must call AddBlock for each block to be put into |
| 33 // the distribution map. |
| 34 // |
| 35 // While it may counterintuitive, one can call both AddRecord and |
| 36 // AddBlock, for each corresponding record and block processed. The |
| 37 // reason for this is that an internal flag StorageKind is kept within |
| 38 // distribution maps. If the flag value doesn't correspond to the |
| 39 // type of called add method, no update is done. This behaviour is |
| 40 // done so that nested distribution maps can be updated via blind |
| 41 // calls in NaClBitcodeAnalyzer.cpp. |
| 42 // |
| 43 // Via inheritance, and overriding the (virtual) AddRecord |
| 44 // method for a distribution map, we can redirect the add to look up |
| 45 // the distribution element associated with the block of the record, |
| 46 // and then update the corresponding record distribution map. In general, |
| 47 // it only makes sense (for nested distribution maps) to be able to |
| 48 // redirect record additions. Redirecting blocks within a record (since |
| 49 // a record is only associated with one block) does not make sense. Hence, |
| 50 // we have not made AddBlock virtual. |
| 51 // |
| 52 // When updating a block distribution map, the domain value is the |
| 53 // BlockID of the corresponding block being added. |
| 54 // |
| 55 // On the other hand, values associated with record distribution maps |
| 56 // are many possible values (the code, the abbreviation, the values |
| 57 // etc). To make the API uniform, record distribution maps are updated |
| 58 // using NaClBitcodeRecords (in NaClBitcodeParser.h). The values from |
| 59 // the record are defined by the extract method GetValueList, and |
| 60 // added via method AddRecord. |
| 61 // |
| 62 // Distribution maps are implemented using two classes: |
| 63 // |
| 64 // NaClBitcodeDist |
| 65 // A generic distribution map. |
| 66 // |
| 67 // NaClBitcodeDistElement |
| 68 // The implementation of elements in the range of the distribution |
| 69 // map. |
| 70 // |
| 71 // The code has been written to put the (virtual) logic of |
| 72 // distribution maps into derived classes of NaClBitcodeDistElement |
| 73 // whenever possible. This is intentional, in that it keeps all |
| 74 // knowledge of how to handle/print elements in one class. However, |
| 75 // because some distributions have external data that is needed by all |
| 76 // elements, the virtual methods of class NaClBitcodeDist can be |
| 77 // overridden, and not delegate to NaClBitcodeDistElement. |
| 78 // |
| 79 // To do this, an NaClBitcodeDist requires a "sentinel" (derived) |
| 80 // instance of NaClBitcodeDistElement. This sentinel is used to define |
| 81 // behaviour needed by distribution maps. |
| 82 // |
| 83 // By having control passed to the (derived) instances of |
| 84 // NaClBitcodeDistElement, it also makes handling nested distributions |
| 85 // relatively easy. One just extends the methods AddRecord and/or |
| 86 // AddBlock to also update the corresponding nested distribution. |
| 87 // |
| 88 // The exception to this rule is printing, since header information is |
| 89 // dependent on properties of any possible nested distribution maps |
| 90 // (for example, we copy column headers after each nested distribution |
| 91 // map so that it is easier to read the output). To fix this, we let |
| 92 // virtual method NaClBitcodeDistElement::GetNestedDistributions |
| 93 // return the array of nested distribution pointers for the nested |
| 94 // distributions of that map. The order in the array is the order the |
| 95 // nested distributions will be printed. Typically, this is |
| 96 // implemented as a field of the distribution element, and is |
| 97 // initialized to contain the pointers of all nested distributions in |
| 98 // the element. This field can then be returned from method |
| 99 // GetNestedDistributions. |
| 100 // |
| 101 // Distribution maps are sortable (via method GetDistribution). The |
| 102 // purpose of sorting is to find interesting elements. This is done by |
| 103 // sorting the values in the domain of the distribution map, based on |
| 104 // the GetImportance method of the range element. |
| 105 // |
| 106 // Method GetImportance defines how (potentially) interesting the |
| 107 // value is in the distribution. "Interesting" is based on the notion |
| 108 // of how likely will the value show a case where adding an |
| 109 // abbreviation will shrink the size of the corresponding bitcode |
| 110 // file. For most distributions, the number of instances associated |
| 111 // with the value is the best measure. |
| 112 // |
| 113 // However, for cases where multiple domain entries are created for |
| 114 // the same NaClBitcodeRecord (i.e. method GetValueList defines more |
| 115 // than one value), things are not so simple. |
| 116 |
| 117 // For example, for some distributions (such as value index distributions) |
| 118 // the numbers of instances isn't sufficient. In such cases, you may |
| 119 // have to look at nested distributions to find important cases. |
| 120 // |
| 121 // In the case of value index distributions, when the size of the |
| 122 // records is the same, all value indices have the same number of |
| 123 // instances. In this case, "interesting" may be measured in terms of |
| 124 // the (nested) distribution of the values that can appear at that |
| 125 // index, and how often each value appears. |
| 126 // |
| 127 // The larger the importance, the more interesting the value is |
| 128 // considered, and sorting is based on moving interesting values to |
| 129 // the front of the sorted list. |
| 130 // |
| 131 // When printing distribution maps, the values are sorted based on |
| 132 // the importance. By default, importance is based on the number of |
| 133 // times the value appears in records, putting the most used values |
| 134 // at the top of the printed list. |
| 135 // |
| 136 // Since sorting is expensive, the sorted distribution is built once |
| 137 // and cached. This cache is flushed whenever the distribution map is |
| 138 // updated, so that a new sorted distribuition will be generated. |
| 139 // |
| 140 // Printing of distribution maps are stylized, so that virtuals can |
| 141 // easily fill in the necessary data. |
| 142 // |
| 143 // For concrete instances of NaClBitcodeDistElement, the following |
| 144 // virtual method must be defined: |
| 145 // |
| 146 // CreateElement |
| 147 // Creates a new instance of the distribution element, to |
| 148 // be put into the corresponding distribution map when a new |
| 149 // value is added to the distribution map. |
| 150 // |
| 151 // In addition, if the distribution element is based on record values, |
| 152 // the virtual method GetValueList must be defined, to extract values |
| 153 // out of the bitcode record. |
| 154 |
| 155 #ifndef LLVM_BITCODE_NACL_NACLBITCODEDIST_H |
| 156 #define LLVM_BITCODE_NACL_NACLBITCODEDIST_H |
| 157 |
| 158 #include "llvm/Bitcode/NaCl/NaClBitcodeParser.h" |
| 159 #include "llvm/Support/Casting.h" |
| 160 #include "llvm/Support/Format.h" |
| 161 #include "llvm/Support/raw_ostream.h" |
| 162 |
| 163 #include <map> |
| 164 #include <vector> |
| 165 |
| 166 namespace llvm { |
| 167 |
| 168 /// The domain type of PNaCl bitcode record distribution maps. |
| 169 typedef uint64_t NaClBitcodeDistValue; |
| 170 |
| 171 /// Base class of the range type of PNaCl bitcode record distribution |
| 172 /// maps. |
| 173 class NaClBitcodeDistElement; |
| 174 |
| 175 /// Type defining the list of values extracted from the corresponding |
| 176 /// bitcode record. Typically, the list size is one. However, there |
| 177 /// are cases where a record defines more than one value (i.e. value |
| 178 /// indices). Hence, this type defines the more generic API for |
| 179 /// values. |
| 180 typedef std::vector<NaClBitcodeDistValue> ValueListType; |
| 181 |
| 182 typedef ValueListType::const_iterator ValueListIterator; |
| 183 |
| 184 /// Defines a PNaCl bitcode record distribution map. The distribution |
| 185 /// map is a map from a (record) value, to the corresponding data |
| 186 /// associated with that value. Assumes distributions elements are |
| 187 /// instances of NaClBitcodeDistElement. |
| 188 class NaClBitcodeDist { |
| 189 NaClBitcodeDist(const NaClBitcodeDist&) LLVM_DELETED_FUNCTION; |
| 190 void operator=(const NaClBitcodeDist&) LLVM_DELETED_FUNCTION; |
| 191 friend class NaClBitcodeDistElement; |
| 192 |
| 193 public: |
| 194 /// Define kinds for isa, dyn_cast, etc. support (see |
| 195 /// llvm/Support/Casting.h). Only defined for concrete classes. |
| 196 enum NaClBitcodeDistKind { |
| 197 RD_Dist, // class NaClBitcodeDist. |
| 198 RD_BlockDist, // class NaClBitcodeBlockDist. |
| 199 RD_BlockDistLast, |
| 200 RD_CodeDist, // class NaClBitcodeCodeDist. |
| 201 RD_CodeDistLast, |
| 202 RD_AbbrevDist, // class NaClBitcodeAbbrevDist. |
| 203 RD_AbbrevDistLast, |
| 204 RD_SubblockDist, // class NaClBlockSubblockDist. |
| 205 RD_SubblockDistLast, |
| 206 RD_ValueDist, // class NaClBitcodeValueDist. |
| 207 RD_ValueDistLast, |
| 208 RD_DistLast |
| 209 }; |
| 210 |
| 211 NaClBitcodeDistKind getKind() const { return Kind; } |
| 212 |
| 213 private: |
| 214 const NaClBitcodeDistKind Kind; |
| 215 |
| 216 static bool classof(const NaClBitcodeDist *Dist) { |
| 217 return Dist->getKind() >= RD_Dist && Dist->getKind() < RD_DistLast; |
| 218 } |
| 219 |
| 220 public: |
| 221 /// Type defining the mapping used to define the distribution. |
| 222 typedef std::map<NaClBitcodeDistValue, NaClBitcodeDistElement*> MappedElement; |
| 223 |
| 224 typedef MappedElement::const_iterator const_iterator; |
| 225 |
| 226 /// Type defining a pair of values used to sort the |
| 227 /// distribution. The first element is defined by method |
| 228 /// GetImportance, and the second is the distribution value |
| 229 /// associated with that importance. |
| 230 typedef std::pair<double, NaClBitcodeDistValue> DistPair; |
| 231 |
| 232 /// Type defining the sorted list of (domain) values in the |
| 233 /// corresponding distribution map. |
| 234 typedef std::vector<DistPair> Distribution; |
| 235 |
| 236 /// Defines whether blocks or records are stored in the distribution map. |
| 237 /// Used to decide if AddRecord/AddBlock methods should fire. |
| 238 enum StorageSelector { |
| 239 BlockStorage, |
| 240 RecordStorage |
| 241 }; |
| 242 |
| 243 NaClBitcodeDist(StorageSelector StorageKind, |
| 244 const NaClBitcodeDistElement *Sentinel, |
| 245 NaClBitcodeDistKind Kind=RD_Dist) |
| 246 : Kind(Kind), StorageKind(StorageKind), Sentinel(Sentinel), |
| 247 TableMap(), CachedDistribution(0), Total(0) { |
| 248 } |
| 249 |
| 250 virtual ~NaClBitcodeDist(); |
| 251 |
| 252 /// Number of elements in the distribution map. |
| 253 size_t size() const { |
| 254 return TableMap.size(); |
| 255 } |
| 256 |
| 257 /// Iterator at beginning of distribution map. |
| 258 const_iterator begin() const { |
| 259 return TableMap.begin(); |
| 260 } |
| 261 |
| 262 /// Iterator at end of distribution map. |
| 263 const_iterator end() const { |
| 264 return TableMap.end(); |
| 265 } |
| 266 |
| 267 /// Returns true if the distribution map is empty. |
| 268 bool empty() const { |
| 269 return TableMap.empty(); |
| 270 } |
| 271 |
| 272 /// Returns the element associated with the given distribution |
| 273 /// value. Creates the element if needed. |
| 274 inline NaClBitcodeDistElement *GetElement(NaClBitcodeDistValue Value); |
| 275 |
| 276 /// Returns the element associated with the given distribution |
| 277 /// value. |
| 278 NaClBitcodeDistElement *at(NaClBitcodeDistValue Value) const { |
| 279 return TableMap.at(Value); |
| 280 } |
| 281 |
| 282 // Creates a new instance of this element for the given value. Used |
| 283 // by class NaClBitcodeDist to create instances. Default method |
| 284 // simply dispatches to the CreateElement method of the sentinel. |
| 285 virtual NaClBitcodeDistElement *CreateElement( |
| 286 NaClBitcodeDistValue Value) const; |
| 287 |
| 288 /// Interrogates the block record, and returns the corresponding |
| 289 /// values that are being tracked by the distribution map. Default |
| 290 /// method simply dispatches to the GetValueList of the sentinel. |
| 291 virtual void GetValueList(const NaClBitcodeRecord &Record, |
| 292 ValueListType &ValueList) const; |
| 293 |
| 294 /// Returns the total number of instances held in the distribution |
| 295 /// map. |
| 296 unsigned GetTotal() const { |
| 297 return Total; |
| 298 } |
| 299 |
| 300 /// Adds the value(s) in the given bitcode record to the |
| 301 /// distribution map. The value(s) based on method GetValueList. |
| 302 /// Note: Default requires that GetStorageKind() == RecordStorage. |
| 303 /// Override if you want special handling for nested distributions |
| 304 /// in a block distribution map. |
| 305 virtual void AddRecord(const NaClBitcodeRecord &Record); |
| 306 |
| 307 /// Adds the BlockID of the given bitcode block to the distribution |
| 308 /// map, if applicable (based on the value of method UseBlockID). |
| 309 /// Note: Requires that GetStorageKind() == BlockStorage. |
| 310 virtual void AddBlock(const NaClBitcodeBlock &Block); |
| 311 |
| 312 /// Builds the distribution associated with the distribution map. |
| 313 /// Warning: The distribution is cached, and hence, only valid while |
| 314 /// it's contents is not changed. |
| 315 const Distribution *GetDistribution() const { |
| 316 if (CachedDistribution == 0) Sort(); |
| 317 return CachedDistribution; |
| 318 } |
| 319 |
| 320 /// Prints out the contents of the distribution map to Stream. |
| 321 void Print(raw_ostream &Stream, const std::string &Indent) const; |
| 322 |
| 323 void Print(raw_ostream &Stream) const { |
| 324 std::string Indent; |
| 325 Print(Stream, Indent); |
| 326 } |
| 327 |
| 328 protected: |
| 329 /// If the distribution is cached, remove it. Should be called |
| 330 /// whenever the distribution map is changed. |
| 331 void RemoveCachedDistribution() const { |
| 332 if (CachedDistribution) { |
| 333 delete CachedDistribution; |
| 334 CachedDistribution = 0; |
| 335 } |
| 336 } |
| 337 |
| 338 /// Sorts the distribution, based on the importance of each element. |
| 339 void Sort() const; |
| 340 |
| 341 private: |
| 342 // Defines whether values in distribution map are from blocks or records. |
| 343 const StorageSelector StorageKind; |
| 344 // Sentinel element used to do generic operations for distribution. |
| 345 const NaClBitcodeDistElement *Sentinel; |
| 346 // Map from the distribution value to the corresponding distribution |
| 347 // element. |
| 348 MappedElement TableMap; |
| 349 // Pointer to the cached distribution. |
| 350 mutable Distribution *CachedDistribution; |
| 351 // The total number of instances in the map. |
| 352 unsigned Total; |
| 353 }; |
| 354 |
| 355 /// Defines the element type of a PNaCl bitcode distribution map. |
| 356 /// This is the base class for all element types used in |
| 357 /// NaClBitcodeDist. By default, only the number of instances |
| 358 /// of the corresponding distribution values is recorded. |
| 359 class NaClBitcodeDistElement { |
| 360 NaClBitcodeDistElement(const NaClBitcodeDistElement &) |
| 361 LLVM_DELETED_FUNCTION; |
| 362 void operator=(const NaClBitcodeDistElement &) |
| 363 LLVM_DELETED_FUNCTION; |
| 364 |
| 365 public: |
| 366 /// Define kinds for isa, dyn_cast, etc. support. Only defined |
| 367 /// for concrete classes. |
| 368 enum NaClBitcodeDistElementKind { |
| 369 RDE_Dist, // class NaClBitcodeDistElement. |
| 370 RDE_AbbrevDist, // class NaClBitcodeAbbrevDistElement. |
| 371 RDE_AbbrevDistLast, |
| 372 RDE_BitsDist, // class NaClBitcodeBitsDistElement. |
| 373 RDE_BitsAndAbbrevsDist, // class NaClBitcodeBitsAndAbbrevsDistElement. |
| 374 RDE_CodeDist, // class NaClBitcodeCodeDistElement. |
| 375 RDE_CompressCodeDist, // class NaClCompressCodeDistElement. |
| 376 RDE_CompressCodeDistLast, |
| 377 RDE_CodeDistLast, |
| 378 RDE_BitsAndAbbrevsDistLast, |
| 379 RDE_BitsDistLast, |
| 380 RDE_BlockDist, // class NaClBitcodeBlockDistElement. |
| 381 RDE_NaClAnalBlockDist, // class NaClAnalyzerBlockDistElement. |
| 382 RDE_NaClAnalBlockDistLast, |
| 383 RDE_PNaClCompressBlockDist, // class NaClCompressBlockDistElement. |
| 384 RDE_PNaClCompressBlockDistLast, |
| 385 RDE_BlockDistLast, |
| 386 RDE_SizeDist, // class NaClBitcodeSizeDistElement. |
| 387 RDE_SizeDistLast, |
| 388 RDE_SubblockDist, // class NaClBitcodeSubblockDistElement |
| 389 RDE_SubblockDistLast, |
| 390 RDE_ValueDist, // class NaClBitcodeValueDistElement. |
| 391 RDE_ValueDistLast, |
| 392 RDE_ValueIndexDist, // class NaClBitcodeValueIndexDistElement. |
| 393 RDE_ValueIndexDistLast, |
| 394 RDE_DistLast |
| 395 }; |
| 396 |
| 397 NaClBitcodeDistElementKind getKind() const { return Kind; } |
| 398 |
| 399 static bool classof(const NaClBitcodeDistElement *Element) { |
| 400 return Element->getKind() >= RDE_Dist && Element->getKind() < RDE_DistLast; |
| 401 } |
| 402 |
| 403 private: |
| 404 const NaClBitcodeDistElementKind Kind; |
| 405 |
| 406 public: |
| 407 |
| 408 // Constructor to use in derived classes. |
| 409 NaClBitcodeDistElement( |
| 410 NaClBitcodeDistElementKind Kind=RDE_Dist) |
| 411 : Kind(Kind), NumInstances(0) |
| 412 {} |
| 413 |
| 414 virtual ~NaClBitcodeDistElement(); |
| 415 |
| 416 // Adds an instance of the given record to this element. |
| 417 virtual void AddRecord(const NaClBitcodeRecord &Record); |
| 418 |
| 419 // Adds an instance of the given block to this element. |
| 420 virtual void AddBlock(const NaClBitcodeBlock &Block); |
| 421 |
| 422 // Returns the number of instances associated with this element. |
| 423 unsigned GetNumInstances() const { |
| 424 return NumInstances; |
| 425 } |
| 426 |
| 427 // Creates a new instance of this element for the given value. Used |
| 428 // by class NaClBitcodeDist to create instances. |
| 429 virtual NaClBitcodeDistElement *CreateElement( |
| 430 NaClBitcodeDistValue Value) const = 0; |
| 431 |
| 432 /// Interrogates the block record, and returns the corresponding |
| 433 /// values that are being tracked by the distribution map. Must be |
| 434 /// defined in derived classes. |
| 435 virtual void GetValueList(const NaClBitcodeRecord &Record, |
| 436 ValueListType &ValueList) const; |
| 437 |
| 438 // Returns the importance of this element. In many cases, it will be |
| 439 // the number of instances associated with it. However, it need not |
| 440 // be correlated to the number of instance. Used to sort the |
| 441 // distribution map, where values with larger importance appear |
| 442 // first. Value is the domain value associated with the element in |
| 443 // the distribution map. |
| 444 virtual double GetImportance(NaClBitcodeDistValue Value) const; |
| 445 |
| 446 /// Returns the title to use when printing the title associated |
| 447 /// with instances of this distribution element. |
| 448 virtual const char *GetTitle() const; |
| 449 |
| 450 /// Prints out the title of the distribution map associated with |
| 451 /// instances of this distribution element. |
| 452 virtual void PrintTitle(raw_ostream &Stream, |
| 453 const NaClBitcodeDist *Distribution) const; |
| 454 |
| 455 /// Returns the header to use when printing the value associated |
| 456 /// with instances of this distribution element. |
| 457 virtual const char *GetValueHeader() const; |
| 458 |
| 459 /// Prints out header for row of statistics associated with instances |
| 460 /// of this distribution element. |
| 461 virtual void PrintStatsHeader(raw_ostream &Stream) const; |
| 462 |
| 463 /// Prints out the header to the printed distribution map associated |
| 464 /// with instances of this distribution element. |
| 465 void PrintHeader(raw_ostream &Stream) const; |
| 466 |
| 467 /// Prints out statistics for the row with the given value. |
| 468 virtual void PrintRowStats(raw_ostream &Stream, |
| 469 const NaClBitcodeDist *Distribution) const; |
| 470 |
| 471 /// Prints out Value (in a row) to Stream. |
| 472 virtual void PrintRowValue(raw_ostream &Stream, |
| 473 NaClBitcodeDistValue Value, |
| 474 const NaClBitcodeDist *Distribution) const; |
| 475 |
| 476 /// Prints out a row in the printed distribution map. |
| 477 virtual void PrintRow(raw_ostream &Stream, |
| 478 NaClBitcodeDistValue Value, |
| 479 const NaClBitcodeDist *Distribution) const; |
| 480 |
| 481 /// Returns a pointer to the list of nested distributions that |
| 482 /// should be printed when this element is printed. Return 0 if no |
| 483 /// nested distributions should be printed. |
| 484 virtual const SmallVectorImpl<NaClBitcodeDist*> * |
| 485 GetNestedDistributions() const; |
| 486 |
| 487 /// Prints out any nested distributions, if defined for the element. |
| 488 /// Returns true if a nested distribution was printed. |
| 489 bool PrintNestedDistIfApplicable( |
| 490 raw_ostream &Stream, const std::string &Indent) const; |
| 491 |
| 492 private: |
| 493 // The number of instances associated with this element. |
| 494 unsigned NumInstances; |
| 495 }; |
| 496 |
| 497 inline NaClBitcodeDistElement *NaClBitcodeDist:: |
| 498 GetElement(NaClBitcodeDistValue Value) { |
| 499 if (TableMap.find(Value) == TableMap.end()) { |
| 500 TableMap[Value] = CreateElement(Value); |
| 501 } |
| 502 return TableMap[Value]; |
| 503 } |
| 504 |
| 505 } |
| 506 |
| 507 #endif |
OLD | NEW |