Browse Source

Update INTERNALS to better reflect the current code generation

Tim Kientzle 9 năm trước cách đây
mục cha
commit
7d35f41b49
1 tập tin đã thay đổi với 152 bổ sung143 xóa
  1. 152 143
      Documentation/INTERNALS.md

+ 152 - 143
Documentation/INTERNALS.md

@@ -1,11 +1,15 @@
-# Swift Protobuf Generated Code Guide
+# Swift Protobuf Internals
 
 ---
 
 This explanation of the generated code is intended to help people understand
-the design of Swift Protobuf.
-This is not a contract: The details of the generated code are expected to
-change over time as we discover better ways to implement the expected
+the internal design of SwiftProtobuf.
+In particular, people interested in helping with the development of
+SwiftProtobuf itself should probably read this carefully.
+
+Note, however, that this is not a contract:
+The details of the generated code are expected to change
+over time as we discover better ways to implement the expected
 behaviors.
 As a result, this document is probably already out of date;
 pull requests that correct this document to better match the actual
@@ -31,7 +35,8 @@ message Foo {
 }
 ```
 
-For these, we can generate a simple struct with the expected public properties:
+For these, we can generate a simple struct with the expected public properties
+and simple initializers:
 
 ```swift
 struct Foo {
@@ -85,7 +90,8 @@ struct Foo {
 ```
 
 If explicit defaults were set on the fields in the proto, the generated code
-is essentially the same.
+is essentially the same; it just uses different default values in
+the generated getters.
 The `clearXyz` methods above ensure that users can always reset a field
 to the default value without needing to know what the default value is.
 
@@ -109,7 +115,28 @@ struct Foo {
 }
 ```
 
-**Proto2 required fields:**  TODO
+**Proto2 required fields:**
+
+Required fields generate the same storage and field management
+as for optional fields.
+
+But the code generator augments this with a generated `isInitialized`
+method that simply tests each required field to ensure it has
+a suitable value.
+Note that this is done for proto3 messages as well if there are
+fields that store proto2 messages.
+
+Since the code generator has the entire schema available at once it
+can generally inspect the entire schema and short-circuit the
+`isInitialized` check entirely in cases where no sub-object has any
+required fields.
+This means that required field support does not incur any cost unless
+there actually are some required fields.
+
+(Extensions complicate this picture:
+Since extensions may be compiled separately, the `isInitialized`
+check must always visit any extensible messages in case one of
+their extensions has required fields.)
 
 ### Message-valued Fields
 
@@ -118,7 +145,7 @@ Protobuf allows recursive structures such as the following:
 ```protobuf
 syntax = "proto3";
 message Foo {
-   Foo fooField = 1;
+   Foo foo_field = 1;
 }
 ```
 
@@ -165,18 +192,16 @@ cases:
  * If there are any fields containing a message or group type
  * If there are more than 16 total fields
 
+This logic will doubtless change in the future:
 More extensive testing could help fine-tune the logic for when we put fields
 directly into the struct and when we put them into a storage class.
-There might be cases where it makes more sense to put some fields directly
+There are likely cases where it makes more sense to put some fields directly
 into the struct and others into the storage class, but the current
 implementation will put all fields into the storage class if it decides
 to use a storage class.
 
 Whether a particular field is generated directly on the struct or on
 an internal storage class should be entirely opaque to the user.
-In particular, we override the standard reflection APIs so that
-`Mirror(reflecting:)` will always show the fields directly on the
-struct regardless of the internal storage.
 
 ## General Message Information
 
@@ -187,28 +212,26 @@ Here is the actual first part of the generated code for `message Foo` above:
 
 ```swift
 public struct Foo: ProtobufGeneratedMessage {
-  public var protoMessageName: String {return "Foo"}
-  public var protoPackageName: String {return ""}
-  public var jsonFieldNames: [String: Int] {return [
-    "fooField": 1,
-  ]}
-  public var protoFieldNames: [String: Int] {return [
-    "fooField": 1,
-  ]}
+  static let protoMessageName: String = "Foo"
+  static let _protobuf_nameMap: SwiftProtobuf._NameMap = [
+    1: .same(proto: "field1"),
+    2: .same(proto: "field2"),
+  ]
 ```
 
-The `protoMessageName` and `protoPackageName` provide
-information from the `.proto` file for use by various serialization mechanisms.
-The `jsonFieldNames` and `protoFieldNames` variables map the respective field
-names into the field numbers from the proto file. These are used by various
-serialization engines in the runtime library.
+The `protoMessageName` provides the fully-qualified
+message name (including leading package, if any) from the `.proto` file
+for use by various serialization mechanisms.
+The `_protobuf_nameMap` provides fast translation between
+field numbers, JSON field names, and proto field names,
+as needed by the serialization engines in the runtime library.
 
 ## Serialization support
 
 The serialization support is based on a traversal mechanism (also known as
 "The Visitor Pattern").
 The various serialization systems in the runtime library construct objects
-that conform to the `ProtobufVisitor` protocol and then invoke
+that conform to the `SwiftProtobuf.Visitor` protocol and then invoke
 the `traverse()` method which will provide the visitor with a look at every
 non-empty field.
 
@@ -226,9 +249,8 @@ message Foo {
 }
 ```
 
-This generates a storage class, of course. When the serializer invokes
-`traverse()` on the struct, it simply passes the visitor to `traverse()` on the
-storage class, which looks like this:
+This generates a storage class, of course.
+The storage class and the generated `traverse()` looks like this:
 
 ```swift
   private class _StorageClass {
@@ -237,142 +259,129 @@ storage class, which looks like this:
     var _field3: [String] = []
     var _fooField: Foo? = nil
     var _mapField: Dictionary<Int32,Bool> = [:]
+  }
 
-    func traverse(visitor: inout ProtobufVisitor) throws {
-      if _field1 != 0 {
-        try visitor.visitSingularField(
-                    fieldType: ProtobufInt32.self,
-                    value: _field1,
-                    protoFieldNumber: 1,
-                    protoFieldName: "field1",
-                    jsonFieldName: "field1",
-                    swiftFieldName: "field1")
+  func traverse<V: SwiftProtobuf.Visitor>(visitor: inout V) throws {
+    try withExtendedLifetime(_storage) { (_storage: _StorageClass) in
+      if _storage._field1 != 0 {
+        try visitor.visitSingularInt32Field(
+                value: _storage._field1,
+                fieldNumber: 1)
       }
-      if _field2 != 0 {
-        try visitor.visitSingularField(
-                    fieldType: ProtobufSFixed32.self,
-                    value: _field2,
-                    protoFieldNumber: 2,
-                    protoFieldName: "field2",
-                    jsonFieldName: "field2",
-                    swiftFieldName: "field2")
+      if _storage._field2 != 0 {
+        try visitor.visitSingularSFixed32Field(
+                value: _storage._field2,
+                fieldNumber: 2)
       }
-      if !_field3.isEmpty {
-        try visitor.visitRepeatedField(
-                    fieldType: ProtobufString.self,
-                    value: _field3,
-                    protoFieldNumber: 3,
-                    protoFieldName: "field3",
-                    jsonFieldName: "field3",
-                    swiftFieldName: "field3")
+      if !_storage._field3.isEmpty {
+        try visitor.visitRepeatedStringField(
+                value: _storage._field3,
+                fieldNumber: 3)
       }
-      if let v = _fooField {
+      if let v = _storage._fooField {
         try visitor.visitSingularMessageField(
-                    value: v,
-                    protoFieldNumber: 4,
-                    protoFieldName: "fooField",
-                    jsonFieldName: "fooField",
-                    swiftFieldName: "fooField")
+                value: v,
+                fieldNumber: 4)
       }
-      if !_mapField.isEmpty {
+      if !_storage._mapField.isEmpty {
         try visitor.visitMapField(
-                    fieldType: ProtobufMap<ProtobufInt32,ProtobufBool>.self,
-                    value: _mapField,
-                    protoFieldNumber: 5,
-                    protoFieldName: "mapField",
-                    jsonFieldName: "mapField",
-                    swiftFieldName: "mapField")
+                fieldType: _ProtobufMap<ProtobufInt32,ProtobufBool>.self,
+                value: _storage._mapField,
+                fieldNumber: 5)
       }
     }
+    try unknownFields.traverse(visitor: &visitor)
   }
 ```
 
-Since this is proto3, we only need to visit fields whose value is not the
-default. The `ProtobufVisitor` protocol specifies a number of `visitXyzField`
-methods that accept different types of fields. In addition to the value, each
-of these methods is given all of the various identifiers for the field:
-
-  * The proto field number is used by the protobuf binary serializer
-  * The JSON name is used by the JSON serializer
-  * The swift field name is used by the debugDescription implementation (which
-    uses the same traversal mechanism as the serializers)
-  * The proto field name is currently unused
-
-Of course, it would be entirely possible to implement other serializers on top
-of this same machinery as long as they can make use of one of these field
-identifiers.
-In fact, the default implementations for `hashValue`, `debugDescription`,
-and `mirror()` all rely on this same machinery to enumerate all of the
-set properties and their values.
-
-For the message visitors, it suffices to provide just the value, since the
-visitor implementation can obtain any necessary type information through
-generic arguments.
-
-For other types, this is insufficient:  `field1` and `field2` here both have a
-Swift type of `Int32`, but that is not enough to determine the correct
-serialization.
-
-So some of the visitor methods take a separate argument of a type object that
-contains detailed serialization information.
+Note that the visitors are generally structs (not classes) that
+are passed as `inout` parameters.
+
+Since this example is proto3, we only need to visit fields whose value is not the
+default.
+The `ProtobufVisitor` protocol specifies a number of `visitXyzField`
+methods that accept different types of fields.
+Each of these methods is given the value and the proto field number.
+The field number is used as-is by the binary serializer; the JSON and
+text serializers use the `_protobuf_nameMap` described above to convert
+the field numbers into field names.
+
+Most of the `visit` methods accept a very specific type of data.
+The visitor methods for messages, groups, and enums use generic
+arguments to identify the exact type of object.
+This is insufficient for map fields, however, so the map visitors
+use an additional type object at this point.
+
+Many other facilities - not just serialization - can be built on
+top of this same machinery.
+For example, the default `hashValue` implementation uses the same
+traversal machinery to iterate over all of the set fields and values
+in order to compute the hash.
+
 You can look at the runtime library to see more details about the
-`ProtobufVisitor` protocol and the various implementations.
+`Visitor` protocol and the various implementations in each encoder.
 
 ## Deserialization support
 
-Deserialization is a rather complex process overall, though the generated code
-is fairly simple.
+Deserialization is a more complex overall process than serialization,
+although the generated code is still quite simple.
 
-The core of the deserialization machinery rests on the generated `decodeField`
-method. Here is the `decodeField` method for the example just above:
+The deserialization machinery rests on the generated `decodeMessage`
+method. Here is the `decodeMessage` method for the example just above:
 
 ```swift
-  private class _StorageClass {
-    var _field1: Int32 = 0
-    var _field2: Int32 = 0
-    var _field3: [String] = []
-    var _fooField: Foo? = nil
-    var _mapField: Dictionary<Int32,Bool> = [:]
-
-    func decodeField(setter: inout ProtobufFieldDecoder, protoFieldNumber: Int) throws -> Bool {
-      let handled: Bool
-      switch protoFieldNumber {
-      case 1: handled = try setter.decodeSingularField(
-                      fieldType: ProtobufInt32.self,
-                      value: &_field1)
-      case 2: handled = try setter.decodeSingularField(
-                      fieldType: ProtobufSFixed32.self,
-                      value: &_field2)
-      case 3: handled = try setter.decodeRepeatedField(
-                      fieldType: ProtobufString.self,
-                      value: &_field3)
-      case 4: handled = try setter.decodeSingularMessageField(
-                      fieldType: Foo.self,
-                      value: &_fooField)
-      case 5: handled = try setter.decodeMapField(
-                      fieldType: ProtobufMap<ProtobufInt32,ProtobufBool>.self,
-                      value: &_mapField)
-      default: handled = false
+  mutating func decodeMessage<D: Decoder>(decoder: inout D) throws {
+    _ = _uniqueStorage()
+    try withExtendedLifetime(_storage) { (_storage: _StorageClass) in
+      while let fieldNumber = try decoder.nextFieldNumber() {
+        switch fieldNumber {
+        case 1: try decoder.decodeSingularInt32Field(
+                     value: &_storage._field1)
+        case 2: try decoder.decodeSingularSFixed32Field(
+                     value: &_storage._field2)
+        case 3: try decoder.decodeRepeatedStringField(
+                     value: &_storage._field3)
+        case 4: try decoder.decodeSingularMessageField(
+                     value: &_storage._fooField)
+        case 5: try decoder.decodeMapField(
+                     fieldType: _ProtobufMap<ProtobufInt32,ProtobufBool>.self,
+                     value: &_storage._mapField)
+        default: break
+        }
       }
-      return handled
     }
+  }
 ```
 
-Similar to the traversal system, the `decodeField` method is given an object
-that conforms to the `ProtobufFieldDecoder` protocol.
-This object generally encapsulates whatever information the deserializer
-can determine without actual schema knowledge.
-This method then provides the field decoder with a reference to the
-appropriate stored property and additional type information (via the same
-type objects used in the traversal method).
-The decoder now has everything it needs to update the field accordingly.
-
-You may notice that the `decodeField()` method here only uses the proto field
-number:
-Recall from earlier that the struct provides properties that can be used
-to map JSON and proto field names to proto field numbers.
-These maps are used by corresponding decoders to translate serialized names
-into proto field numbers for use here.
+This captures the essential structure of all of the supported decoders:
+inspect the next field, determine how to decode it, and store the
+result in the appropriate property.
+
+In essence, the decoder knows how to identify the next field and
+how to decode a field body once someone else has provided the schema.
+This block of generated code drives the decode process by requesting
+the number of the next field and using a `switch` statement to convert
+that into schema information.
+
+There are two important features of this design:
+
+* The fields are processed in the order they are seen by the decoder.
+  This allows the decoder to walk through the data serially for
+  optimal performance.
+
+* Fields are identified here by number.
+  Decoders that use named fields must use the `_protobuf_nameMap`
+  to translate those into field numbers.
+  This allows number-keyed formats (such as protobuf's default binary
+  format) to operate extremely efficiently.
+
+Unknown fields are captured by the decoder as a by-product of this
+process:
+The decoder sees which fields are supported (one of its decode methods
+gets called for each one), so it can identify and preserve any field
+that is not known.
+After processing the entire message, the decoder can push the
+collected unknown field data onto the resulting message object.
 
 ## Miscellaneous support methods
 
@@ -380,7 +389,7 @@ TODO: initializers
 
 TODO: isEqualTo
 
-TODO: _protoc_generated methods
+TODO: _protobuf_generated methods
 
 # Enums