You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Note that when a default value is specified for a record field whose type is a union, the type of the default value must match the first element of the union. Thus, for unions containing "null", the "null" is usually listed first, since the default value of such unions is typically null.
However, merging two record schemas does not always enforce this rule. For example: "type":["string","null"], "default":null which should be "type":["null", "string"], "default":null. I can reproduce this bug with the following:
A Json String with a null value for key X
A Json String with a non-null value for key X
A Json String with no entry for X (DNE)
Schemas inferred from each of the sample Json Strings
The schemas merged in the following way: merge(non-null, merge(null, dne)) or merge(non-null, merge(dne, null))
scala>valnul="""{"key":null}"""
nul:String= {"key":null}
scala>valdne="""{"other":3}"""
dne:String= {"other":3}
scala>valstr="""{"key":"hello"}"""
str:String= {"key":"hello"}
scala>defstream(s: String):InputStream=newByteArrayInputStream(s.getBytes("UTF-8"))
stream: (s: String)java.io.InputStream
scala>valnulSchema=JsonUtil.inferSchema(stream(nul), "com.example", 1)
nulSchema: org.apache.avro.Schema= {"type":"record","name":"example","namespace":"com","fields":[{"name":"key","type":"null","doc":"Type inferred from 'null'"}]}
scala>valdneSchema=JsonUtil.inferSchema(stream(dne), "com.example", 1)
dneSchema: org.apache.avro.Schema= {"type":"record","name":"example","namespace":"com","fields":[{"name":"other","type":"int","doc":"Type inferred from '3'"}]}
scala>valnPlusDne=SchemaUtil.merge(dneSchema, nulSchema)
nPlusDne: org.apache.avro.Schema= {"type":"record","name":"example","namespace":"com","fields":[{"name":"other","type":["null","int"],"doc":"Type inferred from '3'","default":null},{"name":"key","type":"null","doc":"Type inferred from 'null'","default":null}]}
scala>valstrSchema=JsonUtil.inferSchema(stream(str), "com.example", 1)
strSchema: org.apache.avro.Schema= {"type":"record","name":"example","namespace":"com","fields":[{"name":"key","type":"string","doc":"Type inferred from '\"hello\"'"}]}
scala>valmerged=SchemaUtil.merge(strSchema, nPlusDne)
[WARNING] Avro:Invalid default for field key: null not a ["string","null"]
merged: org.apache.avro.Schema= {"type":"record","name":"example","namespace":"com","fields":[{"name":"key","type":["string","null"],"doc":"Type inferred from '\"hello\"'","default":null},{"name":"other","type":["null","int"],"doc":"Type inferred from '3'","default":null}]}
The final merge produces "type":["string","null"], "default":null, despite the type of the default value needing to be the first element of the type list.
The text was updated successfully, but these errors were encountered:
From the Avro spec
However, merging two record schemas does not always enforce this rule. For example:
"type":["string","null"], "default":null
which should be"type":["null", "string"], "default":null
. I can reproduce this bug with the following:merge(non-null, merge(null, dne))
ormerge(non-null, merge(dne, null))
The final merge produces
"type":["string","null"], "default":null
, despite the type of the default value needing to be the first element of the type list.The text was updated successfully, but these errors were encountered: