Split a large json file into multiple smaller files in python

Split a large json file into multiple smaller files in python

To split a large JSON file into multiple smaller files in Python, you can follow these general steps:

  1. Read the large JSON file.
  2. Split the data into smaller chunks.
  3. Write each smaller chunk to separate JSON files.

Here's an example implementation:

import json

# Function to split a list into smaller chunks
def chunk_list(lst, chunk_size):
    for i in range(0, len(lst), chunk_size):
        yield lst[i:i + chunk_size]

# Read the large JSON file
large_json_file = 'large_file.json'

with open(large_json_file, 'r') as f:
    data = json.load(f)

# Split data into smaller chunks
chunk_size = 100  # Adjust as needed
data_chunks = chunk_list(data, chunk_size)

# Write each chunk to separate JSON files
for i, chunk in enumerate(data_chunks):
    output_file = f'small_chunk_{i + 1}.json'
    with open(output_file, 'w') as f:
        json.dump(chunk, f, indent=4)

print("Splitting complete.")

In this example, large_file.json is the input large JSON file you want to split. The chunk_list() function is a generator that splits a list into smaller chunks of a specified size.

Adjust the chunk_size variable according to how you want to split the data. Smaller chunks are better if memory is a concern. Each chunk will be written to separate JSON files like small_chunk_1.json, small_chunk_2.json, and so on.

Keep in mind that this example assumes that your JSON data is a list of dictionaries. If your JSON data has a different structure, you might need to modify the code accordingly.

Also, make sure to handle any exceptions that might occur during file reading and writing to ensure your code is robust.

Examples

  1. How to split a large JSON file into smaller files in Python?

    • Description: This query demonstrates how to split a large JSON file into smaller chunks.

    • Code:

      # Create a large sample JSON file
      echo '{"data": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]}' > large.json
      
      import json
      
      # Load the large JSON file
      with open('large.json', 'r') as f:
          data = json.load(f)
      
      # Define the size of each chunk
      chunk_size = 3
      data_list = data["data"]
      
      # Split into smaller chunks
      chunks = [data_list[i:i + chunk_size] for i in range(0, len(data_list), chunk_size)]
      
      # Write each chunk to a separate file
      for i, chunk in enumerate(chunks):
          with open(f'chunk_{i}.json', 'w') as f:
              json.dump({"data": chunk}, f)
      
      print("Split into chunks:", len(chunks))
      
  2. How to split a large JSON file based on a key in Python?

    • Description: This query demonstrates splitting a large JSON file based on a specific key.

    • Code:

      # Create a JSON file with multiple records
      echo '{"records": [{"id": 1}, {"id": 2}, {"id": 3}, {"id": 4}, {"id": 5}]}' > large.json
      
      # Load the large JSON file
      with open('large.json', 'r') as f:
          data = json.load(f)
      
      # Split based on the "id" key, creating a separate file for each record
      for record in data["records"]:
          with open(f'record_{record["id"]}.json', 'w') as f:
              json.dump(record, f)
      
      print("Split into individual records")
      
  3. How to split a large JSON file into smaller files by line in Python?

    • Description: This query shows how to split a large JSON file into smaller files based on the number of lines.

    • Code:

      # Create a large sample JSON file with multiple lines
      echo '{"line1": "data1"}' > large.json
      echo '{"line2": "data2"}' >> large.json
      echo '{"line3": "data3"}' >> large.json
      
      # Define the chunk size by number of lines
      chunk_size = 2
      
      # Read the large JSON file
      with open('large.json', 'r') as f:
          lines = f.readlines()
      
      # Split into smaller chunks based on line count
      chunks = [lines[i:i + chunk_size] for i in range(0, len(lines), chunk_size)]
      
      # Write each chunk to a separate file
      for i, chunk in enumerate(chunks):
          with open(f'chunk_{i}.json', 'w') as f:
              f.writelines(chunk)
      
      print("Split into chunks by lines")
      
  4. How to split a large JSON array into smaller files in Python?

    • Description: This query demonstrates splitting a large JSON array into smaller files.

    • Code:

      # Create a large JSON array
      echo '{"array": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]}' > large.json
      
      # Split a large JSON array into smaller files
      with open('large.json', 'r') as f:
          data = json.load(f)
      
      array = data["array"]
      chunk_size = 3
      
      # Split into smaller chunks
      chunks = [array[i:i + chunk_size] for i in range(0, len(array), chunk_size)]
      
      for i, chunk in enumerate(chunks):
          with open(f'array_chunk_{i}.json', 'w') as f:
              json.dump({"array": chunk}, f)
      
      print("Split JSON array into smaller files")
      
  5. How to split a large JSON file into smaller files based on key-value pairs in Python?

    • Description: This query demonstrates splitting a large JSON file into smaller files based on unique key-value pairs.

    • Code:

      # Create a JSON file with multiple key-value pairs
      echo '{"items": [{"type": "A", "value": 1}, {"type": "B", "value": 2}, {"type": "A", "value": 3}]}' > large.json
      
      import collections
      
      # Load the JSON file
      with open('large.json', 'r') as f:
          data = json.load(f)
      
      # Group by key-value pairs
      groups = collections.defaultdict(list)
      for item in data["items"]:
          key = item["type"]
          groups[key].append(item)
      
      # Write each group to a separate file
      for key, items in groups.items():
          with open(f'group_{key}.json', 'w') as f:
              json.dump({"items": items}, f)
      
      print("Split JSON based on key-value pairs")
      
  6. How to split a large JSON file into smaller files based on a specific condition in Python?

    • Description: This query demonstrates splitting a JSON file into smaller files based on a specific condition.

    • Code:

      # Create a JSON file with various values
      echo '{"data": [{"id": 1, "value": 10}, {"id": 2, "value": 20}, {"id": 3, "value": 30}]}' > large.json
      
      # Load the JSON file
      with open('large.json', 'r') as f:
          data = json.load(f)
      
      # Define a condition for splitting
      threshold = 20
      above_threshold = [d for d in data["data"] if d["value"] > threshold]
      below_threshold = [d for d in data["data"] if d["value"] <= threshold]
      
      # Write each subset to a separate file
      with open('above_threshold.json', 'w') as f:
          json.dump({"data": above_threshold}, f)
      
      with open('below_threshold.json', 'w') as f:
          json.dump({"data": below_threshold}, f)
      
      print("Split JSON based on a condition")
      
  7. How to split a large JSON file into smaller files with incremental naming in Python?

    • Description: This query demonstrates splitting a large JSON file into smaller files with incremental naming.

    • Code:

      # Create a JSON file with a large list
      echo '{"list": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]}' > large.json
      
      # Load the JSON file
      with open('large.json', 'r') as f:
          data = json.load(f)
      
      # Split into smaller files with incremental naming
      list_data = data["list"]
      chunk_size = 3
      chunks = [list_data[i:i + chunk_size] for i in range(0, len(list_data), chunk_size)]
      
      for idx, chunk in enumerate(chunks):
          with open(f'split_{idx}.json', 'w') as f:
              json.dump({"list": chunk}, f)
      
      print("Split JSON into smaller files with incremental naming")
      
  8. How to handle and split large JSON files with nested structures in Python?

    • Description: This query explains how to split large JSON files with nested structures into smaller files.

    • Code:

      # Create a JSON file with nested structures
      echo '{"data": [{"group": {"id": 1, "name": "Group A"}}, {"group": {"id": 2, "name": "Group B"}}]}' > large.json
      
      # Load the JSON file with nested structures
      with open('large.json', 'r') as f:
          data = json.load(f)
      
      # Extract and split based on nested structures
      groups = data["data"]
      chunk_size = 1
      chunks = [groups[i:i + chunk_size] for i in range(0, len(groups), chunk_size)]
      
      for idx, chunk in enumerate(chunks):
          with open(f'group_split_{idx}.json', 'w') as f:
              json.dump({"data": chunk}, f)
      
      print("Split JSON with nested structures")
      
  9. How to split large JSON files by keys and save to multiple files in Python?

    • Description: This query demonstrates splitting JSON files into smaller files based on specific keys.

    • Code:

      # Create a JSON file with multiple keys
      echo '{"group1": [1, 2, 3], "group2": [4, 5, 6], "group3": [7, 8, 9]}' > large.json
      
      # Load the JSON file
      with open('large.json', 'r') as f:
          data = json.load(f)
      
      # Split into smaller files based on keys
      for key, value in data.items():
          with open(f'{key}.json', 'w') as f:
              json.dump({key: value}, f)
      
      print("Split JSON by keys")
      
  10. How to split large JSON files by time-based data and save to multiple files in Python?

    • Description: This query demonstrates splitting large JSON files into smaller files based on time-based data.

    • Code:

      # Create a JSON file with time-based data
      echo '{"events": [{"timestamp": "2023-01-01", "event": "start"}, {"timestamp": "2023-01-02", "event": "end"}]}' > large.json
      
      # Load the JSON file
      with open('large.json', 'r') as f:
          data = json.load(f)
      
      # Split into smaller files based on time-based data
      chunks = {}
      for event in data["events"]:
          date = event["timestamp"]
          if date not in chunks:
              chunks[date] = []
          chunks[date].append(event)
      
      for date, events in chunks.items():
          with open(f'events_{date}.json', 'w') as f:
              json.dump({"events": events}, f)
      
      print("Split JSON by time-based data")
      

More Tags

integer loadimage signing guzzle missingmethodexception event-bubbling rigid-bodies jquery-ui-draggable strikethrough email-ext

More Python Questions

More Biology Calculators

More Date and Time Calculators

More Genetics Calculators

More Fitness Calculators