Read A Set Of Xml Files Using Google Cloud Dataflow Python Sdk
I'm trying to read a collection of XML files from a GCS bucket and process them where each element in the collection is a string representing the whole file but I can't find a dece
Solution 1:
ReadFromText reads the files line by line in the given path. What you want is a list of file and then read the one file at a time in ParDo using GcsFileSystem https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/gcsfilesystem.py and then write the contents to BigQuery.
You can also refer to mail thread on similar topic https://lists.apache.org/thread.html/85da22a845cef8edd942fcc4906a7b47040a4ae8e10aef4ef00be233@%3Cuser.beam.apache.org%3E
Post a Comment for "Read A Set Of Xml Files Using Google Cloud Dataflow Python Sdk"