Skip to content Skip to sidebar Skip to footer

Convert Json Using Jq Based On Specific Constraints

I have a json file 'OpenEnded_mscoco_val2014.json'.The json file contains 121,512 questions. Here is some sample : 'questions': [ { 'question': 'What is the table made of?', 'i

Solution 1:

1) Given your input (suitably elaborated to make it valid JSON), the following query generates the CSV output as shown:

$ jq -r '.questions[] | [.question, .image_id, .question_id] | @csv'"What is the table made of?",350623,3506232"Is the food napping on the table?",350623,3506230"What has been upcycled to make lights?",350623,3506231"Is this an Spanish town?",8647,86472

The key thing to remember here is that @csv requires a flat array, but as with all jq filters, you can feed it a stream.

2) To filter using the criterion .image_id <= 10000, just interpose the appropriate select/1 filter:

.questions[]
| select(.image_id <= 10000)
| [.question, .image_id, .question_id]
| @csv

3) To sort by image_id, use sort_by(.image_id)

.questions
| sort_by(.image_id)
|.[]
| [.question, .image_id, .question_id]
| @csv

4) To group by .image_id you would pipe the output of the following pipeline into your own pipeline:

.questions | group_by(.image_id)

You will, however, have to decide exactly how you want to combine the grouped objects.

Solution 2:

With the -r option, the following filter

.questions[] | [ .[] ] | @csv

produces

"What is the table made of?",350623,3506232
"Is the food napping on the table?",350623,3506230
"What has been upcycled to make lights?",350623,3506231
"Is this an Spanish town?",8647,86472

To filter the data, use select. E.g. with the -r option the following filter

.questions[] | select(.image_id <= 10000) | [ .[] ] | @csv

produces the subset

"Is this an Spanish town?",8647,86472

To group the data use group_by. The following filter

.questions
  | group_by(.image_id)[]
  | [ .[] | [ .[] ] | @csv ]

produces grouped data

[
  "\"Is this an Spanish town?\",8647,86472"
]
[
  "\"What is the table made of?\",350623,3506232",
  "\"Is the food napping on the table?\",350623,3506230",
  "\"What has been upcycled to make lights?\",350623,3506231"
]

This isn't very useful in this form and is probably not exactly what you want but it demonstrates the basic approach.

Post a Comment for "Convert Json Using Jq Based On Specific Constraints"