Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: Mismatch in fields: ? #674

Closed
djouallah opened this issue Apr 30, 2024 · 10 comments
Closed

ValueError: Mismatch in fields: ? #674

djouallah opened this issue Apr 30, 2024 · 10 comments

Comments

@djouallah
Copy link

Question

I am just appending to the same table using the same data,

 try:
    catalog.create_table("aemo.scada",schema=df.schema)
  except:
    catalog.load_table("aemo.scada").append(df)

but I am getting this error

image
@djouallah djouallah reopened this Apr 30, 2024
@Fokko
Copy link
Contributor

Fokko commented Apr 30, 2024

@djouallah If you scroll below, can you see any mismatch fields? Would there be a way to reproduce this?

@djouallah
Copy link
Author

will try to reproduce it, but you need an account in R2 or S3, it is just a POC, basically load data using the SQL catalog, then save the sqlite DB in R2, then when I reload the notebook I download the db again

https://colab.research.google.com/drive/10WMnPY1mE9GHm1Zp2wm9PsEGaoCUYjr3?usp=sharing

@kevinjqliu
Copy link
Contributor

Ah, this is because table_schema.as_struct() and task_schema.as_struct() return StructType.fields with the same elements but in different order...

print(table_schema.as_struct().fields)
print(task_schema.as_struct().fields)

(NestedField(field_id=1, name='REGIONID', field_type=StringType(), required=False), NestedField(field_id=2, name='filename', field_type=StringType(), required=False), NestedField(field_id=3, name='UNIT', field_type=StringType(), required=False), NestedField(field_id=4, name='RAISEREGLOCALDISPATCH', field_type=DoubleType(), required=False), NestedField(field_id=5, name='RAISEREGAPCFLAG', field_type=DoubleType(), required=False), NestedField(field_id=6, name='APCFLAG', field_type=DoubleType(), required=False), NestedField(field_id=7, name='LOWER60SECROP', field_type=DoubleType(), required=False), NestedField(field_id=8, name='RAISE60SECAPCFLAG', field_type=DoubleType(), required=False), NestedField(field_id=9, name='RAISE60SECREQ', field_type=DoubleType(), required=False), NestedField(field_id=10, name='AVAILABLELOAD', field_type=DoubleType(), required=False), NestedField(field_id=11, name='RAISE6SECREQ', field_type=DoubleType(), required=False), NestedField(field_id=12, name='RAISE60SECIMPORT', field_type=DoubleType(), required=False), NestedField(field_id=13, name='VERSION', field_type=DoubleType(), required=False), NestedField(field_id=14, name='TOTALDEMAND', field_type=DoubleType(), required=False), NestedField(field_id=15, name='RAISE60SECLOCALPRICE', field_type=DoubleType(), required=False), NestedField(field_id=16, name='LOWER5MINIMPORT', field_type=DoubleType(), required=False), NestedField(field_id=17, name='LOWER5MINRRP', field_type=DoubleType(), required=False), NestedField(field_id=18, name='LOWERREGRRP', field_type=DoubleType(), required=False), NestedField(field_id=19, name='RAISE6SECDISPATCH', field_type=DoubleType(), required=False), NestedField(field_id=20, name='RAISE6SECIMPORT', field_type=DoubleType(), required=False), NestedField(field_id=21, name='LOWER5MINSUPPLYPRICE', field_type=DoubleType(), required=False), NestedField(field_id=22, name='RAISE5MINREQ', field_type=DoubleType(), required=False), NestedField(field_id=23, name='LOWER60SECLOCALREQ', field_type=DoubleType(), required=False), NestedField(field_id=24, name='INITIALSUPPLY', field_type=DoubleType(), required=False), NestedField(field_id=25, name='RAISEREGVIOLATION', field_type=DoubleType(), required=False), NestedField(field_id=26, name='DISPATCHABLEGENERATION', field_type=DoubleType(), required=False), NestedField(field_id=27, name='LOWER60SECDISPATCH', field_type=DoubleType(), required=False), NestedField(field_id=28, name='MARKETSUSPENDEDFLAG', field_type=DoubleType(), required=False), NestedField(field_id=29, name='RAISE6SECLOCALPRICE', field_type=DoubleType(), required=False), NestedField(field_id=30, name='RAISE60SECRRP', field_type=DoubleType(), required=False), NestedField(field_id=31, name='LOWER60SECPRICE', field_type=DoubleType(), required=False), NestedField(field_id=32, name='RAISEREGRRP', field_type=DoubleType(), required=False), NestedField(field_id=33, name='RAISE5MINLOCALDISPATCH', field_type=DoubleType(), required=False), NestedField(field_id=34, name='RAISE5MINLOCALVIOLATION', field_type=DoubleType(), required=False), NestedField(field_id=35, name='RAISE6SECSUPPLYPRICE', field_type=DoubleType(), required=False), NestedField(field_id=36, name='RAISEREGLOCALVIOLATION', field_type=DoubleType(), required=False), NestedField(field_id=37, name='LOWERREGAPCFLAG', field_type=DoubleType(), required=False), NestedField(field_id=38, name='LOWER5MINLOCALREQ', field_type=DoubleType(), required=False), NestedField(field_id=39, name='LOWERREGLOCALVIOLATION', field_type=DoubleType(), required=False), NestedField(field_id=40, name='LOWER6SECRRP', field_type=DoubleType(), required=False), NestedField(field_id=41, name='LOWER60SECLOCALPRICE', field_type=DoubleType(), required=False), NestedField(field_id=42, name='LOWERREGLOCALDISPATCH', field_type=DoubleType(), required=False), NestedField(field_id=43, name='RAISE60SECDISPATCH', field_type=DoubleType(), required=False), NestedField(field_id=44, name='RAISE5MINLOCALPRICE', field_type=DoubleType(), required=False), NestedField(field_id=45, name='RAISE6SECVIOLATION', field_type=DoubleType(), required=False), NestedField(field_id=46, name='RAISE5MINIMPORT', field_type=DoubleType(), required=False), NestedField(field_id=47, name='RAISE60SECVIOLATION', field_type=DoubleType(), required=False), NestedField(field_id=48, name='LRCSURPLUS', field_type=DoubleType(), required=False), NestedField(field_id=49, name='LOWER5MINLOCALDISPATCH', field_type=DoubleType(), required=False), NestedField(field_id=50, name='CLEAREDSUPPLY', field_type=DoubleType(), required=False), NestedField(field_id=51, name='RAISEREGLOCALREQ', field_type=DoubleType(), required=False), NestedField(field_id=52, name='LOWER60SECAPCFLAG', field_type=DoubleType(), required=False), NestedField(field_id=53, name='LOWER5MINLOCALPRICE', field_type=DoubleType(), required=False), NestedField(field_id=54, name='LOWER6SECLOCALREQ', field_type=DoubleType(), required=False), NestedField(field_id=55, name='LOWER6SECDISPATCH', field_type=DoubleType(), required=False), NestedField(field_id=56, name='LOWER5MINVIOLATION', field_type=DoubleType(), required=False), NestedField(field_id=57, name='RRP', field_type=DoubleType(), required=False), NestedField(field_id=58, name='RAISE5MINPRICE', field_type=DoubleType(), required=False), NestedField(field_id=59, name='RAISE6SECRRP', field_type=DoubleType(), required=False), NestedField(field_id=60, name='LOWERREGROP', field_type=DoubleType(), required=False), NestedField(field_id=61, name='RAISE60SECACTUALAVAILABILITY', field_type=DoubleType(), required=False), NestedField(field_id=62, name='RAISEREGROP', field_type=DoubleType(), required=False), NestedField(field_id=63, name='RAISE5MINACTUALAVAILABILITY', field_type=DoubleType(), required=False), NestedField(field_id=64, name='RAISEREGREQ', field_type=DoubleType(), required=False), NestedField(field_id=65, name='RAISE60SECLOCALREQ', field_type=DoubleType(), required=False), NestedField(field_id=66, name='INTERVENTION', field_type=DoubleType(), required=False), NestedField(field_id=67, name='RUNNO', field_type=DoubleType(), required=False), NestedField(field_id=68, name='LOWER6SECREQ', field_type=DoubleType(), required=False), NestedField(field_id=69, name='LOWER60SECSUPPLYPRICE', field_type=DoubleType(), required=False), NestedField(field_id=70, name='RAISEREGACTUALAVAILABILITY', field_type=DoubleType(), required=False), NestedField(field_id=71, name='LOWER60SECLOCALDISPATCH', field_type=DoubleType(), required=False), NestedField(field_id=72, name='LOWER5MINREQ', field_type=DoubleType(), required=False), NestedField(field_id=73, name='LOWER6SECSUPPLYPRICE', field_type=DoubleType(), required=False), NestedField(field_id=74, name='RAISE5MINDISPATCH', field_type=DoubleType(), required=False), NestedField(field_id=75, name='RAISE60SECSUPPLYPRICE', field_type=DoubleType(), required=False), NestedField(field_id=76, name='EEP', field_type=DoubleType(), required=False), NestedField(field_id=77, name='LOWER6SECVIOLATION', field_type=DoubleType(), required=False), NestedField(field_id=78, name='LOWER6SECACTUALAVAILABILITY', field_type=DoubleType(), required=False), NestedField(field_id=79, name='DEMANDFORECAST', field_type=DoubleType(), required=False), NestedField(field_id=80, name='LOWER5MINDISPATCH', field_type=DoubleType(), required=False), NestedField(field_id=81, name='RAISE6SECAPCFLAG', field_type=DoubleType(), required=False), NestedField(field_id=82, name='LOWER6SECAPCFLAG', field_type=DoubleType(), required=False), NestedField(field_id=83, name='AGGREGATEDISPATCHERROR', field_type=DoubleType(), required=False), NestedField(field_id=84, name='LOWER6SECLOCALVIOLATION', field_type=DoubleType(), required=False), NestedField(field_id=85, name='RAISE6SECLOCALVIOLATION', field_type=DoubleType(), required=False), NestedField(field_id=86, name='LOWER5MINAPCFLAG', field_type=DoubleType(), required=False), NestedField(field_id=87, name='LOWERREGACTUALAVAILABILITY', field_type=DoubleType(), required=False), NestedField(field_id=88, name='LOWER6SECROP', field_type=DoubleType(), required=False), NestedField(field_id=89, name='RAISE6SECACTUALAVAILABILITY', field_type=DoubleType(), required=False), NestedField(field_id=90, name='LOWERREGLOCALREQ', field_type=DoubleType(), required=False), NestedField(field_id=91, name='LOWER5MINLOCALVIOLATION', field_type=DoubleType(), required=False), NestedField(field_id=92, name='EXCESSGENERATION', field_type=DoubleType(), required=False), NestedField(field_id=93, name='RAISEREGIMPORT', field_type=DoubleType(), required=False), NestedField(field_id=94, name='LOWER5MINROP', field_type=DoubleType(), required=False), NestedField(field_id=95, name='LOWERREGVIOLATION', field_type=DoubleType(), required=False), NestedField(field_id=96, name='LORSURPLUS', field_type=DoubleType(), required=False), NestedField(field_id=97, name='RAISE60SECROP', field_type=DoubleType(), required=False), NestedField(field_id=98, name='RAISE6SECLOCALDISPATCH', field_type=DoubleType(), required=False), NestedField(field_id=99, name='LOWER6SECPRICE', field_type=DoubleType(), required=False), NestedField(field_id=100, name='RAISE5MINROP', field_type=DoubleType(), required=False), NestedField(field_id=101, name='RAISE5MINAPCFLAG', field_type=DoubleType(), required=False), NestedField(field_id=102, name='LOWER60SECIMPORT', field_type=DoubleType(), required=False), NestedField(field_id=103, name='LOWER60SECACTUALAVAILABILITY', field_type=DoubleType(), required=False), NestedField(field_id=104, name='RAISE6SECPRICE', field_type=DoubleType(), required=False), NestedField(field_id=105, name='RAISE5MINSUPPLYPRICE', field_type=DoubleType(), required=False), NestedField(field_id=106, name='LOWER60SECLOCALVIOLATION', field_type=DoubleType(), required=False), NestedField(field_id=107, name='NETINTERCHANGE', field_type=DoubleType(), required=False), NestedField(field_id=108, name='LOWER60SECRRP', field_type=DoubleType(), required=False), NestedField(field_id=109, name='DISPATCHABLELOAD', field_type=DoubleType(), required=False), NestedField(field_id=110, name='LOWER6SECLOCALDISPATCH', field_type=DoubleType(), required=False), NestedField(field_id=111, name='ROP', field_type=DoubleType(), required=False), NestedField(field_id=112, name='RAISE5MINRRP', field_type=DoubleType(), required=False), NestedField(field_id=113, name='LOWER6SECIMPORT', field_type=DoubleType(), required=False), NestedField(field_id=114, name='RAISE60SECPRICE', field_type=DoubleType(), required=False), NestedField(field_id=115, name='AVAILABLEGENERATION', field_type=DoubleType(), required=False), NestedField(field_id=116, name='RAISE60SECLOCALDISPATCH', field_type=DoubleType(), required=False), NestedField(field_id=117, name='LOWER60SECVIOLATION', field_type=DoubleType(), required=False), NestedField(field_id=118, name='RAISE60SECLOCALVIOLATION', field_type=DoubleType(), required=False), NestedField(field_id=119, name='LOWERREGREQ', field_type=DoubleType(), required=False), NestedField(field_id=120, name='RAISE6SECLOCALREQ', field_type=DoubleType(), required=False), NestedField(field_id=121, name='RAISE5MINVIOLATION', field_type=DoubleType(), required=False), NestedField(field_id=122, name='LOWER60SECREQ', field_type=DoubleType(), required=False), NestedField(field_id=123, name='LOWER6SECLOCALPRICE', field_type=DoubleType(), required=False), NestedField(field_id=124, name='LOWER5MINACTUALAVAILABILITY', field_type=DoubleType(), required=False), NestedField(field_id=125, name='LOWER5MINPRICE', field_type=DoubleType(), required=False), NestedField(field_id=126, name='LOWERREGIMPORT', field_type=DoubleType(), required=False), NestedField(field_id=127, name='RAISE6SECROP', field_type=DoubleType(), required=False), NestedField(field_id=128, name='RAISE5MINLOCALREQ', field_type=DoubleType(), required=False), NestedField(field_id=129, name='SETTLEMENTDATE', field_type=TimestampType(), required=False), NestedField(field_id=130, name='file', field_type=StringType(), required=False), NestedField(field_id=131, name='YEAR', field_type=LongType(), required=False))
(NestedField(field_id=1, name='REGIONID', field_type=StringType(), required=False), NestedField(field_id=2, name='filename', field_type=StringType(), required=False), NestedField(field_id=3, name='UNIT', field_type=StringType(), required=False), NestedField(field_id=79, name='DEMANDFORECAST', field_type=DoubleType(), required=False), NestedField(field_id=33, name='RAISE5MINLOCALDISPATCH', field_type=DoubleType(), required=False), NestedField(field_id=80, name='LOWER5MINDISPATCH', field_type=DoubleType(), required=False), NestedField(field_id=36, name='RAISEREGLOCALVIOLATION', field_type=DoubleType(), required=False), NestedField(field_id=82, name='LOWER6SECAPCFLAG', field_type=DoubleType(), required=False), NestedField(field_id=25, name='RAISEREGVIOLATION', field_type=DoubleType(), required=False), NestedField(field_id=28, name='MARKETSUSPENDEDFLAG', field_type=DoubleType(), required=False), NestedField(field_id=119, name='LOWERREGREQ', field_type=DoubleType(), required=False), NestedField(field_id=115, name='AVAILABLEGENERATION', field_type=DoubleType(), required=False), NestedField(field_id=75, name='RAISE60SECSUPPLYPRICE', field_type=DoubleType(), required=False), NestedField(field_id=5, name='RAISEREGAPCFLAG', field_type=DoubleType(), required=False), NestedField(field_id=110, name='LOWER6SECLOCALDISPATCH', field_type=DoubleType(), required=False), NestedField(field_id=23, name='LOWER60SECLOCALREQ', field_type=DoubleType(), required=False), NestedField(field_id=85, name='RAISE6SECLOCALVIOLATION', field_type=DoubleType(), required=False), NestedField(field_id=76, name='EEP', field_type=DoubleType(), required=False), NestedField(field_id=67, name='RUNNO', field_type=DoubleType(), required=False), NestedField(field_id=121, name='RAISE5MINVIOLATION', field_type=DoubleType(), required=False), NestedField(field_id=83, name='AGGREGATEDISPATCHERROR', field_type=DoubleType(), required=False), NestedField(field_id=72, name='LOWER5MINREQ', field_type=DoubleType(), required=False), NestedField(field_id=99, name='LOWER6SECPRICE', field_type=DoubleType(), required=False), NestedField(field_id=56, name='LOWER5MINVIOLATION', field_type=DoubleType(), required=False), NestedField(field_id=9, name='RAISE60SECREQ', field_type=DoubleType(), required=False), NestedField(field_id=87, name='LOWERREGACTUALAVAILABILITY', field_type=DoubleType(), required=False), NestedField(field_id=52, name='LOWER60SECAPCFLAG', field_type=DoubleType(), required=False), NestedField(field_id=111, name='ROP', field_type=DoubleType(), required=False), NestedField(field_id=19, name='RAISE6SECDISPATCH', field_type=DoubleType(), required=False), NestedField(field_id=107, name='NETINTERCHANGE', field_type=DoubleType(), required=False), NestedField(field_id=39, name='LOWERREGLOCALVIOLATION', field_type=DoubleType(), required=False), NestedField(field_id=102, name='LOWER60SECIMPORT', field_type=DoubleType(), required=False), NestedField(field_id=95, name='LOWERREGVIOLATION', field_type=DoubleType(), required=False), NestedField(field_id=53, name='LOWER5MINLOCALPRICE', field_type=DoubleType(), required=False), NestedField(field_id=8, name='RAISE60SECAPCFLAG', field_type=DoubleType(), required=False), NestedField(field_id=63, name='RAISE5MINACTUALAVAILABILITY', field_type=DoubleType(), required=False), NestedField(field_id=112, name='RAISE5MINRRP', field_type=DoubleType(), required=False), NestedField(field_id=120, name='RAISE6SECLOCALREQ', field_type=DoubleType(), required=False), NestedField(field_id=84, name='LOWER6SECLOCALVIOLATION', field_type=DoubleType(), required=False), NestedField(field_id=92, name='EXCESSGENERATION', field_type=DoubleType(), required=False), NestedField(field_id=61, name='RAISE60SECACTUALAVAILABILITY', field_type=DoubleType(), required=False), NestedField(field_id=58, name='RAISE5MINPRICE', field_type=DoubleType(), required=False), NestedField(field_id=54, name='LOWER6SECLOCALREQ', field_type=DoubleType(), required=False), NestedField(field_id=22, name='RAISE5MINREQ', field_type=DoubleType(), required=False), NestedField(field_id=29, name='RAISE6SECLOCALPRICE', field_type=DoubleType(), required=False), NestedField(field_id=4, name='RAISEREGLOCALDISPATCH', field_type=DoubleType(), required=False), NestedField(field_id=128, name='RAISE5MINLOCALREQ', field_type=DoubleType(), required=False), NestedField(field_id=101, name='RAISE5MINAPCFLAG', field_type=DoubleType(), required=False), NestedField(field_id=106, name='LOWER60SECLOCALVIOLATION', field_type=DoubleType(), required=False), NestedField(field_id=13, name='VERSION', field_type=DoubleType(), required=False), NestedField(field_id=96, name='LORSURPLUS', field_type=DoubleType(), required=False), NestedField(field_id=86, name='LOWER5MINAPCFLAG', field_type=DoubleType(), required=False), NestedField(field_id=30, name='RAISE60SECRRP', field_type=DoubleType(), required=False), NestedField(field_id=70, name='RAISEREGACTUALAVAILABILITY', field_type=DoubleType(), required=False), NestedField(field_id=97, name='RAISE60SECROP', field_type=DoubleType(), required=False), NestedField(field_id=114, name='RAISE60SECPRICE', field_type=DoubleType(), required=False), NestedField(field_id=126, name='LOWERREGIMPORT', field_type=DoubleType(), required=False), NestedField(field_id=108, name='LOWER60SECRRP', field_type=DoubleType(), required=False), NestedField(field_id=42, name='LOWERREGLOCALDISPATCH', field_type=DoubleType(), required=False), NestedField(field_id=74, name='RAISE5MINDISPATCH', field_type=DoubleType(), required=False), NestedField(field_id=94, name='LOWER5MINROP', field_type=DoubleType(), required=False), NestedField(field_id=90, name='LOWERREGLOCALREQ', field_type=DoubleType(), required=False), NestedField(field_id=46, name='RAISE5MINIMPORT', field_type=DoubleType(), required=False), NestedField(field_id=15, name='RAISE60SECLOCALPRICE', field_type=DoubleType(), required=False), NestedField(field_id=122, name='LOWER60SECREQ', field_type=DoubleType(), required=False), NestedField(field_id=118, name='RAISE60SECLOCALVIOLATION', field_type=DoubleType(), required=False), NestedField(field_id=88, name='LOWER6SECROP', field_type=DoubleType(), required=False), NestedField(field_id=48, name='LRCSURPLUS', field_type=DoubleType(), required=False), NestedField(field_id=6, name='APCFLAG', field_type=DoubleType(), required=False), NestedField(field_id=38, name='LOWER5MINLOCALREQ', field_type=DoubleType(), required=False), NestedField(field_id=89, name='RAISE6SECACTUALAVAILABILITY', field_type=DoubleType(), required=False), NestedField(field_id=50, name='CLEAREDSUPPLY', field_type=DoubleType(), required=False), NestedField(field_id=11, name='RAISE6SECREQ', field_type=DoubleType(), required=False), NestedField(field_id=21, name='LOWER5MINSUPPLYPRICE', field_type=DoubleType(), required=False), NestedField(field_id=124, name='LOWER5MINACTUALAVAILABILITY', field_type=DoubleType(), required=False), NestedField(field_id=62, name='RAISEREGROP', field_type=DoubleType(), required=False), NestedField(field_id=127, name='RAISE6SECROP', field_type=DoubleType(), required=False), NestedField(field_id=32, name='RAISEREGRRP', field_type=DoubleType(), required=False), NestedField(field_id=24, name='INITIALSUPPLY', field_type=DoubleType(), required=False), NestedField(field_id=123, name='LOWER6SECLOCALPRICE', field_type=DoubleType(), required=False), NestedField(field_id=81, name='RAISE6SECAPCFLAG', field_type=DoubleType(), required=False), NestedField(field_id=34, name='RAISE5MINLOCALVIOLATION', field_type=DoubleType(), required=False), NestedField(field_id=60, name='LOWERREGROP', field_type=DoubleType(), required=False), NestedField(field_id=116, name='RAISE60SECLOCALDISPATCH', field_type=DoubleType(), required=False), NestedField(field_id=45, name='RAISE6SECVIOLATION', field_type=DoubleType(), required=False), NestedField(field_id=16, name='LOWER5MINIMPORT', field_type=DoubleType(), required=False), NestedField(field_id=20, name='RAISE6SECIMPORT', field_type=DoubleType(), required=False), NestedField(field_id=66, name='INTERVENTION', field_type=DoubleType(), required=False), NestedField(field_id=117, name='LOWER60SECVIOLATION', field_type=DoubleType(), required=False), NestedField(field_id=12, name='RAISE60SECIMPORT', field_type=DoubleType(), required=False), NestedField(field_id=73, name='LOWER6SECSUPPLYPRICE', field_type=DoubleType(), required=False), NestedField(field_id=35, name='RAISE6SECSUPPLYPRICE', field_type=DoubleType(), required=False), NestedField(field_id=26, name='DISPATCHABLEGENERATION', field_type=DoubleType(), required=False), NestedField(field_id=27, name='LOWER60SECDISPATCH', field_type=DoubleType(), required=False), NestedField(field_id=55, name='LOWER6SECDISPATCH', field_type=DoubleType(), required=False), NestedField(field_id=57, name='RRP', field_type=DoubleType(), required=False), NestedField(field_id=69, name='LOWER60SECSUPPLYPRICE', field_type=DoubleType(), required=False), NestedField(field_id=68, name='LOWER6SECREQ', field_type=DoubleType(), required=False), NestedField(field_id=105, name='RAISE5MINSUPPLYPRICE', field_type=DoubleType(), required=False), NestedField(field_id=103, name='LOWER60SECACTUALAVAILABILITY', field_type=DoubleType(), required=False), NestedField(field_id=109, name='DISPATCHABLELOAD', field_type=DoubleType(), required=False), NestedField(field_id=104, name='RAISE6SECPRICE', field_type=DoubleType(), required=False), NestedField(field_id=41, name='LOWER60SECLOCALPRICE', field_type=DoubleType(), required=False), NestedField(field_id=78, name='LOWER6SECACTUALAVAILABILITY', field_type=DoubleType(), required=False), NestedField(field_id=91, name='LOWER5MINLOCALVIOLATION', field_type=DoubleType(), required=False), NestedField(field_id=31, name='LOWER60SECPRICE', field_type=DoubleType(), required=False), NestedField(field_id=77, name='LOWER6SECVIOLATION', field_type=DoubleType(), required=False), NestedField(field_id=100, name='RAISE5MINROP', field_type=DoubleType(), required=False), NestedField(field_id=7, name='LOWER60SECROP', field_type=DoubleType(), required=False), NestedField(field_id=10, name='AVAILABLELOAD', field_type=DoubleType(), required=False), NestedField(field_id=14, name='TOTALDEMAND', field_type=DoubleType(), required=False), NestedField(field_id=18, name='LOWERREGRRP', field_type=DoubleType(), required=False), NestedField(field_id=71, name='LOWER60SECLOCALDISPATCH', field_type=DoubleType(), required=False), NestedField(field_id=43, name='RAISE60SECDISPATCH', field_type=DoubleType(), required=False), NestedField(field_id=125, name='LOWER5MINPRICE', field_type=DoubleType(), required=False), NestedField(field_id=44, name='RAISE5MINLOCALPRICE', field_type=DoubleType(), required=False), NestedField(field_id=113, name='LOWER6SECIMPORT', field_type=DoubleType(), required=False), NestedField(field_id=17, name='LOWER5MINRRP', field_type=DoubleType(), required=False), NestedField(field_id=47, name='RAISE60SECVIOLATION', field_type=DoubleType(), required=False), NestedField(field_id=51, name='RAISEREGLOCALREQ', field_type=DoubleType(), required=False), NestedField(field_id=93, name='RAISEREGIMPORT', field_type=DoubleType(), required=False), NestedField(field_id=40, name='LOWER6SECRRP', field_type=DoubleType(), required=False), NestedField(field_id=37, name='LOWERREGAPCFLAG', field_type=DoubleType(), required=False), NestedField(field_id=64, name='RAISEREGREQ', field_type=DoubleType(), required=False), NestedField(field_id=65, name='RAISE60SECLOCALREQ', field_type=DoubleType(), required=False), NestedField(field_id=59, name='RAISE6SECRRP', field_type=DoubleType(), required=False), NestedField(field_id=98, name='RAISE6SECLOCALDISPATCH', field_type=DoubleType(), required=False), NestedField(field_id=49, name='LOWER5MINLOCALDISPATCH', field_type=DoubleType(), required=False), NestedField(field_id=129, name='SETTLEMENTDATE', field_type=TimestampType(), required=False), NestedField(field_id=130, name='file', field_type=StringType(), required=False), NestedField(field_id=131, name='YEAR', field_type=LongType(), required=False))

Notice the NestedField(field_id=4 versus NestedField(field_id=79.

And since fields is defined with Tuple, the order matters!

fields: Tuple[NestedField, ...] = Field(default_factory=tuple)

@kevinjqliu
Copy link
Contributor

In the "Price" cell, df.schema contains columns with different ordering than the Iceberg table schema.

This should be a PyIceberg bug, the ordering of fields in the schema should not matter.

@kevinjqliu
Copy link
Contributor

Oh btw, a workaround before the fix is merged is to reorder the schemas to match manually :)
Can sort by field_id, similar to the fix

@djouallah
Copy link
Author

Thanks there is another case ? what if a user want to append only a subsets of columns, in Delta table, the system will just add nulls to the missing columns, is this something that iceberg support ?

@kevinjqliu
Copy link
Contributor

Looks like we're able to write (append/overwrite) a subset of schema, here's a PR with a test
#704

This feature is a side effect of #523

@Fokko
Copy link
Contributor

Fokko commented Jun 2, 2024

Hey @kevinjqliu Thanks for following up here, however, I don't think that #700 is the appropriate fix. Instead, we should re-order the fields based on the table schema. I think we're quite close to that.

The to_requested_schema will do this for us:

arrow_table = to_requested_schema(requested_schema=file_schema, file_schema=table_schema, table=arrow_table)

I think we should do two things:

  • Remove the conditionality of that if-statement and just take the cost of running the visitor (I didn't like that one in the first place :). This will put the fields in the same order as the table schema when writing (otherwise they will be adjusted when reading).
  • Fix the compatibility checker that we allow for out-of-order. I think checking if the required fields are in there is sufficient.

WDYT?

@kevinjqliu
Copy link
Contributor

@Fokko, coming back to this.
I think your first comment is already addressed in #807 (thanks @syun64).
Your second comment is implemented in #829. Please take a look!

@kevinjqliu
Copy link
Contributor

Fixed in #921.
Specifically for out-of-order schemas https://github.com/apache/iceberg-python/pull/921/files#diff-7f3dd1244d08ce27c003cd091da10aa049f7bb0c7d5397acb4ec69767036accdR982

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants