Refactor code and simplify file processing #48

mohamedsalem401 · 2023-11-13T05:37:22Z

Pull Request: Main Refactoring and Conversion

Summary of Changes

This pull request introduces significant improvements and refactoring to enhance the MarkdownDB functionality. The key modifications include:

Class Breakdown:
- The MarkdownDB class has been refactored into two distinct classes, separating concerns for indexing and querying.
Conversion to TypeScript Objects and SQL:
- The Markdown file processing has been optimized by converting MD files into TypeScript objects before transforming them into SQL data.
Function Refactoring:
- Few smaller functions have undergone refactoring.

changeset-bot · 2023-11-13T05:37:25Z

⚠️ No Changeset found

Latest commit: 7dc9400

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

src/bin/index.ts

Co-authored-by: Ola Rubaj <[email protected]>

olayway

Great job 👏. I think some parts of the code (sometimes leftovers from the original code and sometimes yours) can be improved.

In general, apart from all the detailed comments, I'd rethink the structure of the FileObject that gets returned after parsing MD in markdownToObject. Atm it looks like this:

export interface FileObject {
  file: File;
  tags: Tag[];
  links: WikiLink[];
}

// OTHER TYPES
export interface File {
  _id: string;
  file_path: string;
  extension: string;
  url_path: string | null;
  filetype: string | null;
  metadata: MetaData | null;
}

export type MetaData = {
  [key: string]: any;
};

export interface Tag {
  name: string;
}

export interface WikiLink {
  linkSrc: string;
  linkType: "normal" | "embed";
}

As I mentioned in one of the comments related to tags somewhere in your PR, I think we shouldn't try to mold this object already at the parsing (md->obj) stage into something resembling our schema (Mddb* classes). This should be a more natural description of file metadata, and only when starting the next stage, i.e. "writing" to the database, specific data from it should be extracted and passed to relevant insert functions. I imagine this could look like this:

export interface FileObject {
  _id: string; // btw, shouldn't _id be auto generated by the DB? In this case it shouldn't even be here, only on the file fetched from the DB
  file_path: string;
  extension: string;
  url_path: string | null;
  filetype: string | null;
  metadata: MetaData | null;
  tags: string[]; // <--- instead of {name:string}[]
  wikilinks: WikiLink[]
}

And so, what I miss in this PR is a bit of planning ;) I.e. How do we want the object resulting from the parsing stage be structured, and why? All the refactoring (apart from the massive tidying up you did, which is definitely an improvement) should be predicated on this new API you envisioned.

Also, not sure if parse.ts and markdownToObject need to be separate. They seem to have overlapping responsibilities. I'd merge these two.

olayway · 2023-11-13T12:22:17Z

src/bin/index.ts

+// Ignore top-level await errors
+//@ts-ignore


Instead, let's maybe change "module": "es2020" to "module": "esnext" in tsconfig.lib.json? What do you think?

Actually, I used that to ignore this error while I was working on the issue but used this approach for compatibility concerns...

olayway · 2023-11-13T12:22:24Z

src/bin/index.ts

 await client.init();

+//@ts-ignore


src/lib/markdowndb.ts

src/lib/markdownToObject.ts

src/lib/markdowndb.ts

src/lib/markdownToObject.ts

src/lib/indexFolderToObjects.ts

src/lib/markdownToObject.ts

rufuspollock

Only one comment so far but making the point that we don't want to refactor existing code really at all here ...

rufuspollock · 2023-11-13T15:27:55Z

src/lib/DbQueryManager.ts

I'd recommend avoiding refactoring the DB here - let's just focus on process.ts as discussed in the issue.

I'd leave the refactoring done so far in this PR. It makes the code much easier to reason about.

olayway · 2023-11-13T16:56:46Z

Also, not sure if parse.ts and markdownToObject need to be separate. They seem to have overlapping responsibilities. I'd merge these two.

@mohamedsalem401 What do you think? I'm not sure about it

mohamedsalem401 · 2023-11-13T20:35:09Z

The reason why I think they should be independent is that parseFile.ts parses the links and tags from a string source, irrespective of whether it's in a local file or not, while markdownToObject.ts is responsible for loading files from the local file system.

olayway · 2023-11-14T06:23:17Z

The reason why I think they should be independent is that parseFile.ts parses the links and tags from a string source, irrespective of whether it's in a local file or not, while markdownToObject.ts is responsible for loading files from the local file system.

Yes, I agree, let's leave them separate

olayway

@mohamedsalem401 Where are the tests? 😄

olayway · 2023-11-13T16:51:01Z

src/lib/markdownToObject.ts

+export function markdownToObject(
+  folderPath: string,
+  filePath: string,
+  filePathsToIndex: string[]
+): FileObject {


It's a bit weird that markdownToObject takes "folderPath" as a first argument 🤔
Also, thinking from the "perspective" of markdownToObject function 😅, why am I receiving filePathsToIndex? What "I" actually need are "permalinks" that can be used to correctly resolve wikilinks.

Suggested change

export function markdownToObject(

folderPath: string,

filePath: string,

filePathsToIndex: string[]

): FileObject {

export function fileToObject({

filePath,

folderPath,

permalinks

}:

{

filePath: string,

folderPath: string,

permalinks: string[]

}): FileObject {

Yes, so I renamed this function to better reflect what id does...

olayway · 2023-11-14T06:21:02Z

src/lib/markdownToObject.ts

+  let source: string, metadata, links;
+  try {
+    source = fs.readFileSync(filePath, {
+      encoding: "utf8",
+      flag: "r",
+    });
+
+    ({ metadata, links } = parseFile(source, {
+      permalinks: filePathsToIndex,
+    }));
+  } catch (e) {
+    console.error(
+      `Failed to parse file ${filePath}. Waiting for file changes.`
+    );
+
+    return defaultFileJson;
+  }
+
+  // get tags in the file
+  const tags = (metadata?.tags || []).map((tag: string) => {
+    return { name: tag };
+  });
+
+  let fileObject: FileObject = {
+    file: {
+      _id: id,
+      extension: extension,
+      url_path: urlPath,
+      file_path: filePath,
+      filetype: metadata?.type || null,
+      metadata: metadata,
+    },
+    tags: [...tags],
+    links: links,
+  };
+
+  return fileObject;


src/lib/indexFolderToObjects.ts

olayway · 2023-11-15T09:05:25Z

src/utils/resolveLinkToUrlPath.ts

For the future: this should probably be renamed to sth like resolveLinkToFilePath.

src/lib/readLocalMarkdownFileToObject.ts

olayway · 2023-11-15T09:37:58Z

src/lib/readLocalMarkdownFileToObject.ts

+  const urlPath = pathToUrlResolver(pathRelativeToFolder);
+


Suggested change

const urlPath = pathToUrlResolver(pathRelativeToFolder);

fileObject.url_path = pathToUrlResolver(pathRelativeToFolder);

olayway · 2023-11-15T09:40:27Z

src/lib/readLocalMarkdownFileToObject.ts

+  let source: string, metadata, links;
+  try {
+    source = fs.readFileSync(filePath, {
+      encoding: "utf8",
+      flag: "r",
+    });
+
+    ({ metadata, links } = parseMarkdownContent(source, {
+      permalinks: filePathsToIndex,
+    }));
+
+    fileObject.url_path = urlPath;
+    fileObject.metadata = metadata;
+    fileObject.filetype = metadata?.type || null;
+    fileObject.links = links;
+    if (metadata.tags) {
+      fileObject.tags = metadata.tags;
+    }
+
+    return fileObject;
+  } catch (e) {
+    console.error(`Error processing file ${filePath}: ${e}`);
+    return fileObject;
+  }


Add this function at the top:

function readFileContent(filePath: string): string { return fs.readFileSync(filePath, { encoding: "utf8", flag: "r", }); }

and then:

Suggested change

let source: string, metadata, links;

try {

source = fs.readFileSync(filePath, {

encoding: "utf8",

flag: "r",

});

({ metadata, links } = parseMarkdownContent(source, {

permalinks: filePathsToIndex,

}));

fileObject.url_path = urlPath;

fileObject.metadata = metadata;

fileObject.filetype = metadata?.type || null;

fileObject.links = links;

if (metadata.tags) {

fileObject.tags = metadata.tags;

}

return fileObject;

} catch (e) {

console.error(`Error processing file ${filePath}: ${e}`);

return fileObject;

}

try {

const source = readFileContent(filePath);

const ({ metadata, links } = parseMarkdownContent(source, {

permalinks: filePathsToIndex,

}));

fileObject.metadata = metadata;

fileObject.filetype = metadata?.type || null;

fileObject.links = links;

fileObject.tags = metadata?.tags || null;

} catch (e) {

console.error(`Error processing file ${filePath}: ${e}`);

}

return fileObject;

olayway · 2023-11-15T09:56:28Z

src/lib/indexFolderToObjects.ts

+      pathToUrlResolver
+    );
+
+    const file = extractFileSchemeFromObject(fileObject);


I'd suggest "marshalling" (or molding the object into the ready-to-insert-to-db structure) is done by relevant Mddb* classes (in this case MddbFile). So I'd just add the fileObject to the list and let MddbFile do the rest. And if you'd really want to do this here, creating a function for it is not necessary, as you could just destructure what's needed.

I'm ready to proceed, but I have a question about the structure of the data in the JSON files. Do you think it should mirror the format of the database? If they should align, we might want to keep things as they are. However, if variations are acceptable, I am more than willing to implement the modifications based on your recommended alternative structure.

olayway · 2023-11-15T11:08:06Z

src/utils/extractFileSchemeFromObject.ts

+import { FileObject } from "../lib/types/FileObject.js";
+
+export function extractFileSchemeFromObject(fileObject: FileObject) {
+  return {
+    _id: fileObject._id,
+    file_path: fileObject.file_path,
+    extension: fileObject.extension,
+    url_path: fileObject.url_path,
+    filetype: fileObject.filetype,
+    metadata: fileObject.metadata,
+  };
+}


As I mentioned before, I'd move "marshalling" responsibility to relevant Mddb* classes and do it only right before inserting into db.

mohamedsalem401 · 2023-11-15T11:41:59Z

@mohamedsalem401 Where are the tests? 😄

I believe this pull request (PR) includes numerous changes at the moment. Therefore, I plan to open a new PR specifically for the latest changes, including tests.

Co-authored-by: Ola Rubaj <[email protected]>

olayway · 2023-11-15T11:59:24Z

src/lib/indexFolderToObjects.ts

+    const fileObject = readLocalMarkdownFileToObject(
+      folderPath,
+      filePath,
+      filePathsToIndex,
+      pathToUrlResolver
+    );


This looks a bit awkward imo... Both the naming, the arguments it takes and unnecesarily adds a layer of abstraction. I wonder if we could get rid of the readLocalMarkdownFileToObject whatsoever. This could just be:

Suggested change

const fileObject = readLocalMarkdownFileToObject(

folderPath,

filePath,

filePathsToIndex,

pathToUrlResolver

);

const id = generateFileIdFromPath(filePath);

const extension = getFileExtensionFromPath(filePath);

if (MddbFile.supportedExtensions.includes(extension)) {

return { id, extension, filePath };

}

const pathRelativeToFolder = path.relative(folderPath, filePath);

const urlPath = pathToUrlResolver(pathRelativeToFolder);

const data = parseMarkdownFile(filePath)

...

Yes, I agree.
This could benefit from a slight refactor...

olayway · 2023-11-15T13:39:36Z

@mohamedsalem401 Where are the tests? 😄

I believe this pull request (PR) includes numerous changes at the moment. Therefore, I plan to open a new PR specifically for the latest changes, including tests.

I think it would be better if we add tests to the same PR...

Also, please don't merge it to main. We don't want to publish a new version of the package until the whole refactoring is ready. (Note we have an auto-publish workflow in place.) Let's create another branch, .e.g v2, off of main and reopen this PR against that branch.

mohamedsalem401 · 2023-11-15T17:12:43Z

I think it would be better if we add tests to the same PR...

Also, please don't merge it to main. We don't want to publish a new version of the package until the whole refactoring is ready. (Note we have an auto-publish workflow in place.) Let's create another branch, .e.g v2, off of main and reopen this PR against that branch.

OK, I will add them in this pull request and will switch to the new branch v2

rufuspollock · 2023-11-15T19:04:12Z

This was for #47 and we did this (for now) in a simpler way where we don't refactor existing code - see resolution details in #47. We reused some of this and will probably reuse more in future.

mohamedsalem401 added 7 commits November 13, 2023 07:11

add dev script

c878a75

add node_modules and "dist/**/*" for exclude in tsconfig

2dc1cc8

Change index to typescript

89aac13

Add some utils functions

2d55645

Extract schema types from schema.ts

15e4f2e

Move function to utils

a2d6109

Refactor/Simplifiy file proccessing

532b933

olayway reviewed Nov 13, 2023

View reviewed changes

src/bin/index.ts Outdated Show resolved Hide resolved

Remove comment src/bin/index.ts

8348585

Co-authored-by: Ola Rubaj <[email protected]>

olayway requested changes Nov 13, 2023

View reviewed changes

rufuspollock requested changes Nov 13, 2023

View reviewed changes

mohamedsalem401 added 5 commits November 14, 2023 00:09

Fix top-level await

3a9602d

Rename parseFile function

a5c2f42

Rename

f7ef9d5

pathToUrlResolver / utils functions /

6712e30

Add type to import

41e1714

olayway self-requested a review November 14, 2023 21:56

Make tags unique at parsing

e2b9d19

olayway requested changes Nov 15, 2023

View reviewed changes

olayway reviewed Nov 15, 2023

View reviewed changes

mohamedsalem401 and others added 3 commits November 15, 2023 13:43

Remove unused import at src/lib/indexFolderToObjects.ts

adbe8d9

Co-authored-by: Ola Rubaj <[email protected]>

Update src/lib/readLocalMarkdownFileToObject.ts imports

1f33626

Co-authored-by: Ola Rubaj <[email protected]>

Update src/lib/readLocalMarkdownFileToObject.ts

1655f74

Co-authored-by: Ola Rubaj <[email protected]>

olayway reviewed Nov 15, 2023

View reviewed changes

olayway requested a review from rufuspollock November 15, 2023 11:59

mohamedsalem401 added 3 commits November 15, 2023 18:53

Add tests for processFile function

e56be54

add test for the file process function

b285d2d

REMOVE CONSOLE.LOG

7dc9400

rufuspollock closed this Nov 15, 2023

mohamedsalem401 deleted the improving-md-processing branch February 8, 2024 02:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor code and simplify file processing #48

Refactor code and simplify file processing #48

mohamedsalem401 commented Nov 13, 2023

changeset-bot bot commented Nov 13, 2023 •

edited

Loading

olayway left a comment •

edited

Loading

olayway Nov 13, 2023

mohamedsalem401 Nov 13, 2023

olayway Nov 13, 2023

rufuspollock left a comment

rufuspollock Nov 13, 2023

olayway Nov 13, 2023

olayway commented Nov 13, 2023

mohamedsalem401 commented Nov 13, 2023

olayway commented Nov 14, 2023

olayway left a comment

olayway Nov 13, 2023

mohamedsalem401 Nov 15, 2023

olayway Nov 14, 2023

olayway Nov 15, 2023

mohamedsalem401 Nov 15, 2023

olayway Nov 15, 2023

olayway Nov 15, 2023

olayway Nov 15, 2023 •

edited

Loading

mohamedsalem401 Nov 15, 2023

olayway Nov 15, 2023

mohamedsalem401 commented Nov 15, 2023

olayway Nov 15, 2023

mohamedsalem401 Nov 15, 2023

olayway commented Nov 15, 2023 •

edited

Loading

mohamedsalem401 commented Nov 15, 2023

rufuspollock commented Nov 15, 2023

	const urlPath = pathToUrlResolver(pathRelativeToFolder);
	fileObject.url_path = pathToUrlResolver(pathRelativeToFolder);

Refactor code and simplify file processing #48

Refactor code and simplify file processing #48

Conversation

mohamedsalem401 commented Nov 13, 2023

Pull Request: Main Refactoring and Conversion

Summary of Changes

changeset-bot bot commented Nov 13, 2023 • edited Loading

⚠️ No Changeset found

olayway left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rufuspollock left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

olayway commented Nov 13, 2023

mohamedsalem401 commented Nov 13, 2023

olayway commented Nov 14, 2023

olayway left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

olayway Nov 15, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mohamedsalem401 commented Nov 15, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

olayway commented Nov 15, 2023 • edited Loading

mohamedsalem401 commented Nov 15, 2023

rufuspollock commented Nov 15, 2023

changeset-bot bot commented Nov 13, 2023 •

edited

Loading

olayway left a comment •

edited

Loading

olayway Nov 15, 2023 •

edited

Loading

olayway commented Nov 15, 2023 •

edited

Loading