Add a optimized database #271

viveksahu26 · 2024-07-01T10:45:08Z

Closes issue: #265
This pull request introduces an optimized version of the db package to improve the performance of our database operations. As well as, benchmark tests been added to measure the performance and efficiency of the new implementation.

Changes modifies are:

Introduces new db structure for efficient retrieval:
- records: Map to store records by their check_key.
- ids: Map to store records by their id.
- keyIds: Nested map to store records by both check_key and id.
- allIds: Map to store unique IDs.

Storing records in map is highly optimized version at the retrieval time, which result into O(1) time complexity. Whereas, original db uses loops to retrieve records and that results into O(n) time complexity.

Use of Mutex concept:

It is used to control access to a shared resource in concurrent programming. It ensures that only one thread or one process or one goroutine can access the critical section of code or shared resource at any given time. For more below are references:
- https://kamnagarg-10157.medium.com/understanding-mutex-in-go-5f41199085b9
- https://medium.com/bootdotdev/golang-mutexes-what-is-rwmutex-for-5360ab082626

Apart from that added a benchmark test, to measure the performance of key database operations::

Insert: Measures the time taken to insert a large number of records.
GetByKey: Measures the time taken to retrieve records by check_key.
GetByID: Measures the time taken to retrieve records by id.
GetByKeyAndID: Measures the time taken to retrieve records by both check_key and id.
GetAllIDs: Measures the time taken to retrieve all unique IDs.

~~Implementation of goroutines for parallel tasking:~~
- Goroutines are functions or methods that run concurrently with other functions or methods. They are created using the go keyword. Whereas channels are used to send and receive values, allowing goroutines to synchronize their operations. ~~sync.RWMutex will make ensure that concurrent access to the database is handled correctly.~~

riteshnoronha · 2024-07-08T22:38:21Z

@viveksahu26 same here is this ready for review

viveksahu26 · 2024-07-09T00:22:02Z

@viveksahu26 same here is this ready for review

Yeah, this one too.

riteshnoronha · 2024-07-11T15:02:04Z

@viveksahu26 this appears to be too complex. Lets discuss this

viveksahu26 · 2024-07-11T16:28:34Z

Yeah sure @riteshnoronha !! So, basically originally we were fetching records from database in 3 ways:

For Key (Ex: SBOM_SPEC, SBOM_SPEC_VERSION, SBOM_TIMESTAMP, etc)
For ID (Ex: SPDX Elements, SBOM Format SBOM Build Information, etc)
For Key and ID

For example, to get all records for particular Key, we had to loop over all records and check each record whether it contain that Key or not, If contains then append it to final list of records containing that Key and return that list as the loop ends. Similarly we perform operation to get records for Key as well similalrly for Key and ID. So, if we conclude it, we can see that at the end we return all the values(i.e. records) for a particular Key.

In the new changes, we have simplified it to Map data structure. As it has the functionality to store all values for a particular key.
So, earlier when we were adding any record we were simply appending to the list of records.

func (d *db) addRecord(r *record) {
	d.records = append(d.records, r)
}

But, when we are adding any record, we are adding it in 3 ways:

Map key as a Key and Map value as a Record.
Map key as a Id and Map value as a Record.
Map key as a Key & Id and Map value as a Record.

// addRecord adds a single record to the database
func (d *db) addRecord(r *record) {
	d.mu.Lock()
	defer d.mu.Unlock()

	// store record using a key
	d.keyRecords[r.check_key] = append(d.keyRecords[r.check_key], r)

	// store record using a id
	d.idRecords[r.id] = append(d.idRecords[r.id], r)
	if d.idRecords[r.id] == nil {
		d.keyIdRecords[r.id] = make(map[int][]*record)
	}

	// store record using a key and id
	d.keyIdRecords[r.id][r.check_key] = append(d.keyIdRecords[r.id][r.check_key], r)

	d.allIds[r.id] = struct{}{}
}

We are adding in this way, so that at the time of fetching these records from any any type of key, whether it be Key, or ID or Key and ID, we don't have to loop over records and check each records whether it contains that key or not.

Now coming to the concept of Mutex that we are using every time as below:
Mutex: ensures that only one thread or one process or one goroutine can access the critical section of code or shared resource at any given time.

NOTE: right now it struck me at the time of writing, we can remove mutex because we are not using goroutines.

Here we are using ReadWrite Mutex type.

mu           sync.RWMutex

Each time when write operation is being done to db, then we want to make sure that no other process or threads or goroutine is trying to write. That's why we are locking the operation. And once written it is unlocked. And then next threads or goroutine waiting in a queue will perform the writing operation.

d.mu.Lock()
defer d.mu.Unlock()

and

Below Mutex is used in case when multiple process or threads or goroutine is trying to read the database, in that

d.mu.RLock()
defer d.mu.RUnlock()

Let me know if you have any question ?

riteshnoronha · 2024-07-16T19:28:39Z

We dont use go-routines currently. Lets keep it even simpler for this release.

Signed-off-by: Vivek Kumar Sahu <[email protected]>

viveksahu26 · 2024-07-17T06:56:18Z

Removed Mutex stuffs or any related to goroutines.

viveksahu26 requested a review from riteshnoronha July 1, 2024 11:59

viveksahu26 force-pushed the issue_265_db_optimized branch 3 times, most recently from 57ec133 to 79f643b Compare July 2, 2024 06:19

viveksahu26 added 3 commits July 17, 2024 10:40

replace original db with optimized db

40d4b36

Signed-off-by: Vivek Kumar Sahu <[email protected]>

add benchmark test

8bb47be

Signed-off-by: Vivek Kumar Sahu <[email protected]>

remove mutex due to absence of goroutines

a1ebab4

Signed-off-by: Vivek Kumar Sahu <[email protected]>

viveksahu26 force-pushed the issue_265_db_optimized branch from b974d56 to a1ebab4 Compare July 17, 2024 05:42

riteshnoronha approved these changes Jul 18, 2024

View reviewed changes

riteshnoronha merged commit 14e7376 into interlynk-io:main Jul 18, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a optimized database #271

Add a optimized database #271

viveksahu26 commented Jul 1, 2024 •

edited

Loading

riteshnoronha commented Jul 8, 2024

viveksahu26 commented Jul 9, 2024

riteshnoronha commented Jul 11, 2024

viveksahu26 commented Jul 11, 2024

riteshnoronha commented Jul 16, 2024

viveksahu26 commented Jul 17, 2024

Add a optimized database #271

Add a optimized database #271

Conversation

viveksahu26 commented Jul 1, 2024 • edited Loading

riteshnoronha commented Jul 8, 2024

viveksahu26 commented Jul 9, 2024

riteshnoronha commented Jul 11, 2024

viveksahu26 commented Jul 11, 2024

riteshnoronha commented Jul 16, 2024

viveksahu26 commented Jul 17, 2024

viveksahu26 commented Jul 1, 2024 •

edited

Loading