this post was submitted on 06 Jun 2024
15 points (94.1% liked)

Programming

16311 readers
194 users here now

Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!

Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.

Hope you enjoy the instance!

Rules

Rules

  • Follow the programming.dev instance rules
  • Keep content related to programming in some way
  • If you're posting long videos try to add in some form of tldr for those who don't want to watch videos

Wormhole

Follow the wormhole through a path of communities !webdev@programming.dev



founded 1 year ago
MODERATORS
 

I have a JSON object with a huge array of nested objects. Let us assume it consists of records of license plates for vehicles. It would contain necessary fields like licenseID, issuingState, dateOfIssue, driverID etc.

What I am having problem with is how I should store data that is only used for exceptional cases, like a field for representing if the license plate is for foreign embassies (isEmbassyOwned) or if it is owned by a government entity (isGovernmentOwned) or if it is a learner license (isLearner) etc alongside fields with data types other than Boolean which would be empty or 0 and likewise when there is no information on that field. Let it be known that these exceptional scenarios would occur in less than 10% of total object instances.

I am facing confusion as to what format would be best for storing such type of data keeping balance between minimizing storage consumption and being human readable. Should I declare the fields for all objects regardless or only include them when they are not empty? Should I store them in a dedicated array instead, or maybe just introduce some code value to be used by a switch case operator in the interpreter? Or is there some other implementation I am not aware of?

top 11 comments
sorted by: hot top controversial new old
[–] eager_eagle@lemmy.world 9 points 1 month ago

IMO if you're even slightly concerned about storage you should be using a DBMS instead of JSON files. They will handle sparse data, compression, and fast access better than a text-based file format.

[–] dneaves@lemmy.world 3 points 1 month ago* (last edited 1 month ago) (1 children)

If its something that represents mutually exclusive states, like the license plates examples (Gov't, Embassy, Learner), an enum like 4wd mentioned is a better idea than many boolean keys. This would also be the switch/case question you posed. For a "regular case", I would include that in the enum, but if you create an enum that only contains "special cases", you can always set it to null.

On the case of booleans, I would suggest avoiding them unless it is necessary, and truly a binary (as in, two-option, not binary numbers), self-contained-in-one-key thing (obligatory anti-boolean video). If the use case is to say what a different key's object represents, you don't need it (see: enums. You'll thank yourself later if you add a third option). If the use case for using it is saying another key contains value(s), you don't need it. Many languages can handle the idea of "data is present, or not present" (either with "truthy/falsey" behavior interpreting "data-or-null", or "Maybe/Option" types), so often "data-or-null" can suffice instead of booleans.

I would suggest trying to always include all keys of a present object, even if it's value is null or not applicable. It will prevent headaches later when code might try to access that key, but it isn't present. This approach might also help you decide to reduce the quantity of keys, if they could be consolidated (as in taking booleans and converting to a state-like enum, as mentioned above), or removed (if unused and/or deprecated).

[–] jonathanvmv8f@lemm.ee 1 points 1 month ago (1 children)

Though I know very little of enum and never used it before, I think this is what I needed. I couldnt imagine there would exist a type exactly for this purpose since I could consider adding or deprecating data later in time. I would need time understanding how I need to restructure the current JSON object to accomodate enums, but I think it will be worth it. Thanks for you time!

[–] dneaves@lemmy.world 1 points 1 month ago* (last edited 1 month ago)

When the enum reaches your JSON, it will have to be a string (as JSON does not have a dedicated "enum" type). But it at least ensures that languages parsing your JSON will should have a consistent set of strings to read.

Consider this small bit of Elm code (which you may not be an Elm dev, and thats okay, but it's the concept that you should look to get):

-- A Directions "enum" type with four options:
-- North, East, South, West
type Directions
    = North
    | East
    | South
    | West

-- How to turn each Directions into a String
-- Which can then be encoded in JSON
directionsToString : Directions -> String
directionsToString direction =
    case direction of
        North -> "north"
        East  -> "east"
        South -> "south"
        West  -> "west"

-- "Maybe Directions" since not all strings can be parsed as a Directions.
-- The return will be "Just <something>" or "Nothing"
directionsFromString : String -> Maybe Directions
directionsFromString dirString =
    case dirString of
        "north" -> Just North
        "east"  -> Just East
        "south" -> Just South
        "west"  -> Just West
        _       -> Nothing

The two functions (directionsFromString and directionsToString) are ready to be used as part of JSON handling, to read a String from a key and turn it into a Directions enum member, or to turn a Directions to a String and insert the string to a key's value

But all that aside, for your restructuring, and keeping with the license plate example, both type and license number could be contained in a small object. For example:

{
    ...
    "licensePlate": {
        "type": "government"    <- an enum in the language parsing this
                                   but a string in JSON
        "plateNumber": "ABC123"
        ...
    }
    ...
}
[–] Womble@lemmy.world 2 points 1 month ago

If storage space is important using uncompressed json is a bad choice, if you're compressing the json it doesnt really matter if you have lots of exceptionCase: False fields as they will compress very well.

[–] tsonfeir@lemmy.world 1 points 1 month ago

… why does it need to be json?

[–] Nomecks@lemmy.ca 1 points 1 month ago

Convert the Jason to S3 keys and store it as a file structure

[–] kamstrup@programming.dev 1 points 1 month ago* (last edited 1 month ago)

Depending on your needs you can also break it into a columnar format with some standard compression on top. This allows you to search individual fields without looking at the rest.

It also compress exceptionally well, and "rare" fields will be null in most records, so run length encoding will compress them to near zero

See fx parquet

[–] fourwd@programming.dev 1 points 1 month ago (1 children)

What about using enums? In this case you will have to specify them for all records, but this ensures that the field will always be present.

enum license_owner {
    regular_citizen = 0,
    embassy,
    government,
    ...
}
[–] jonathanvmv8f@lemm.ee 1 points 1 month ago

Ive heard about enums before, but I never really paid attention to them since I never got a need to use them in any of my projects till now. I think this is exactly what I need. Ill research more on it

Thank you so much for your help

[–] TehPers@beehaw.org 1 points 1 month ago

If they are mutually exclusive special cases, using an enum like another comment mentioned makes sense, and can limit the special cases to one field. You can use an enum of strings if you want it to be more readable.

As for how the data is represented, only including the special case field when there is one makes sense as well. Keep in mind JSON is also a flexible format - you can even have the array contain mixed types, like strings for simple licenses, and objects for more complex licenses. That can reduce the size of the JSON document quite a bit, if that's an option.