Skip to content Skip to sidebar Skip to footer

Read A Set Of Xml Files Using Google Cloud Dataflow Python Sdk

I'm trying to read a collection of XML files from a GCS bucket and process them where each element in the collection is a string representing the whole file but I can't find a dece

Solution 1:

ReadFromText reads the files line by line in the given path. What you want is a list of file and then read the one file at a time in ParDo using GcsFileSystem https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/gcsfilesystem.py and then write the contents to BigQuery.

You can also refer to mail thread on similar topic https://lists.apache.org/thread.html/85da22a845cef8edd942fcc4906a7b47040a4ae8e10aef4ef00be233@%3Cuser.beam.apache.org%3E

Post a Comment for "Read A Set Of Xml Files Using Google Cloud Dataflow Python Sdk"