Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor code and simplify file processing #48

Closed
wants to merge 20 commits into from

Conversation

mohamedsalem401
Copy link
Contributor

Pull Request: Main Refactoring and Conversion

Summary of Changes

This pull request introduces significant improvements and refactoring to enhance the MarkdownDB functionality. The key modifications include:

  1. Class Breakdown:

    • The MarkdownDB class has been refactored into two distinct classes, separating concerns for indexing and querying.
  2. Conversion to TypeScript Objects and SQL:

    • The Markdown file processing has been optimized by converting MD files into TypeScript objects before transforming them into SQL data.
  3. Function Refactoring:

    • Few smaller functions have undergone refactoring.

Copy link

changeset-bot bot commented Nov 13, 2023

⚠️ No Changeset found

Latest commit: 7dc9400

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

src/bin/index.ts Outdated Show resolved Hide resolved
Copy link
Member

@olayway olayway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job 👏. I think some parts of the code (sometimes leftovers from the original code and sometimes yours) can be improved.

In general, apart from all the detailed comments, I'd rethink the structure of the FileObject that gets returned after parsing MD in markdownToObject. Atm it looks like this:

export interface FileObject {
  file: File;
  tags: Tag[];
  links: WikiLink[];
}

// OTHER TYPES
export interface File {
  _id: string;
  file_path: string;
  extension: string;
  url_path: string | null;
  filetype: string | null;
  metadata: MetaData | null;
}

export type MetaData = {
  [key: string]: any;
};

export interface Tag {
  name: string;
}

export interface WikiLink {
  linkSrc: string;
  linkType: "normal" | "embed";
}

As I mentioned in one of the comments related to tags somewhere in your PR, I think we shouldn't try to mold this object already at the parsing (md->obj) stage into something resembling our schema (Mddb* classes). This should be a more natural description of file metadata, and only when starting the next stage, i.e. "writing" to the database, specific data from it should be extracted and passed to relevant insert functions. I imagine this could look like this:

export interface FileObject {
  _id: string; // btw, shouldn't _id be auto generated by the DB? In this case it shouldn't even be here, only on the file fetched from the DB
  file_path: string;
  extension: string;
  url_path: string | null;
  filetype: string | null;
  metadata: MetaData | null;
  tags: string[]; // <--- instead of {name:string}[]
  wikilinks: WikiLink[]
}

And so, what I miss in this PR is a bit of planning ;) I.e. How do we want the object resulting from the parsing stage be structured, and why? All the refactoring (apart from the massive tidying up you did, which is definitely an improvement) should be predicated on this new API you envisioned.

Also, not sure if parse.ts and markdownToObject need to be separate. They seem to have overlapping responsibilities. I'd merge these two.

src/bin/index.ts Outdated
Comment on lines 22 to 23
// Ignore top-level await errors
//@ts-ignore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead, let's maybe change "module": "es2020" to "module": "esnext" in tsconfig.lib.json? What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I used that to ignore this error while I was working on the issue but used this approach for compatibility concerns...

src/bin/index.ts Outdated
await client.init();

//@ts-ignore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

src/lib/markdowndb.ts Outdated Show resolved Hide resolved
src/lib/markdownToObject.ts Outdated Show resolved Hide resolved
src/lib/markdowndb.ts Outdated Show resolved Hide resolved
src/lib/markdownToObject.ts Outdated Show resolved Hide resolved
src/lib/indexFolderToObjects.ts Outdated Show resolved Hide resolved
src/lib/indexFolderToObjects.ts Outdated Show resolved Hide resolved
src/lib/indexFolderToObjects.ts Outdated Show resolved Hide resolved
src/lib/markdownToObject.ts Outdated Show resolved Hide resolved
Copy link
Member

@rufuspollock rufuspollock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only one comment so far but making the point that we don't want to refactor existing code really at all here ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend avoiding refactoring the DB here - let's just focus on process.ts as discussed in the issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd leave the refactoring done so far in this PR. It makes the code much easier to reason about.

@olayway
Copy link
Member

olayway commented Nov 13, 2023

Also, not sure if parse.ts and markdownToObject need to be separate. They seem to have overlapping responsibilities. I'd merge these two.

@mohamedsalem401 What do you think? I'm not sure about it

@mohamedsalem401
Copy link
Contributor Author

The reason why I think they should be independent is that parseFile.ts parses the links and tags from a string source, irrespective of whether it's in a local file or not, while markdownToObject.ts is responsible for loading files from the local file system.

@olayway
Copy link
Member

olayway commented Nov 14, 2023

The reason why I think they should be independent is that parseFile.ts parses the links and tags from a string source, irrespective of whether it's in a local file or not, while markdownToObject.ts is responsible for loading files from the local file system.

Yes, I agree, let's leave them separate

@olayway olayway self-requested a review November 14, 2023 21:56
Copy link
Member

@olayway olayway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mohamedsalem401 Where are the tests? 😄

Comment on lines 12 to 16
export function markdownToObject(
folderPath: string,
filePath: string,
filePathsToIndex: string[]
): FileObject {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit weird that markdownToObject takes "folderPath" as a first argument 🤔
Also, thinking from the "perspective" of markdownToObject function 😅, why am I receiving filePathsToIndex? What "I" actually need are "permalinks" that can be used to correctly resolve wikilinks.

Suggested change
export function markdownToObject(
folderPath: string,
filePath: string,
filePathsToIndex: string[]
): FileObject {
export function fileToObject({
filePath,
folderPath,
permalinks
}:
{
filePath: string,
folderPath: string,
permalinks: string[]
}): FileObject {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, so I renamed this function to better reflect what id does...

Comment on lines 45 to 81
let source: string, metadata, links;
try {
source = fs.readFileSync(filePath, {
encoding: "utf8",
flag: "r",
});

({ metadata, links } = parseFile(source, {
permalinks: filePathsToIndex,
}));
} catch (e) {
console.error(
`Failed to parse file ${filePath}. Waiting for file changes.`
);

return defaultFileJson;
}

// get tags in the file
const tags = (metadata?.tags || []).map((tag: string) => {
return { name: tag };
});

let fileObject: FileObject = {
file: {
_id: id,
extension: extension,
url_path: urlPath,
file_path: filePath,
filetype: metadata?.type || null,
metadata: metadata,
},
tags: [...tags],
links: links,
};

return fileObject;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree

src/lib/indexFolderToObjects.ts Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the future: this should probably be renamed to sth like resolveLinkToFilePath.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

src/lib/readLocalMarkdownFileToObject.ts Outdated Show resolved Hide resolved
src/lib/readLocalMarkdownFileToObject.ts Outdated Show resolved Hide resolved
Comment on lines 37 to 38
const urlPath = pathToUrlResolver(pathRelativeToFolder);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const urlPath = pathToUrlResolver(pathRelativeToFolder);
fileObject.url_path = pathToUrlResolver(pathRelativeToFolder);

Comment on lines 40 to 63
let source: string, metadata, links;
try {
source = fs.readFileSync(filePath, {
encoding: "utf8",
flag: "r",
});

({ metadata, links } = parseMarkdownContent(source, {
permalinks: filePathsToIndex,
}));

fileObject.url_path = urlPath;
fileObject.metadata = metadata;
fileObject.filetype = metadata?.type || null;
fileObject.links = links;
if (metadata.tags) {
fileObject.tags = metadata.tags;
}

return fileObject;
} catch (e) {
console.error(`Error processing file ${filePath}: ${e}`);
return fileObject;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add this function at the top:

function readFileContent(filePath: string): string {
  return fs.readFileSync(filePath, {
    encoding: "utf8",
    flag: "r",
  });
}

and then:

Suggested change
let source: string, metadata, links;
try {
source = fs.readFileSync(filePath, {
encoding: "utf8",
flag: "r",
});
({ metadata, links } = parseMarkdownContent(source, {
permalinks: filePathsToIndex,
}));
fileObject.url_path = urlPath;
fileObject.metadata = metadata;
fileObject.filetype = metadata?.type || null;
fileObject.links = links;
if (metadata.tags) {
fileObject.tags = metadata.tags;
}
return fileObject;
} catch (e) {
console.error(`Error processing file ${filePath}: ${e}`);
return fileObject;
}
try {
const source = readFileContent(filePath);
const ({ metadata, links } = parseMarkdownContent(source, {
permalinks: filePathsToIndex,
}));
fileObject.metadata = metadata;
fileObject.filetype = metadata?.type || null;
fileObject.links = links;
fileObject.tags = metadata?.tags || null;
} catch (e) {
console.error(`Error processing file ${filePath}: ${e}`);
}
return fileObject;

pathToUrlResolver
);

const file = extractFileSchemeFromObject(fileObject);
Copy link
Member

@olayway olayway Nov 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest "marshalling" (or molding the object into the ready-to-insert-to-db structure) is done by relevant Mddb* classes (in this case MddbFile). So I'd just add the fileObject to the list and let MddbFile do the rest. And if you'd really want to do this here, creating a function for it is not necessary, as you could just destructure what's needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ready to proceed, but I have a question about the structure of the data in the JSON files. Do you think it should mirror the format of the database? If they should align, we might want to keep things as they are. However, if variations are acceptable, I am more than willing to implement the modifications based on your recommended alternative structure.

Comment on lines +1 to +12
import { FileObject } from "../lib/types/FileObject.js";

export function extractFileSchemeFromObject(fileObject: FileObject) {
return {
_id: fileObject._id,
file_path: fileObject.file_path,
extension: fileObject.extension,
url_path: fileObject.url_path,
filetype: fileObject.filetype,
metadata: fileObject.metadata,
};
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned before, I'd move "marshalling" responsibility to relevant Mddb* classes and do it only right before inserting into db.

@mohamedsalem401
Copy link
Contributor Author

@mohamedsalem401 Where are the tests? 😄

I believe this pull request (PR) includes numerous changes at the moment. Therefore, I plan to open a new PR specifically for the latest changes, including tests.

Comment on lines 22 to 27
const fileObject = readLocalMarkdownFileToObject(
folderPath,
filePath,
filePathsToIndex,
pathToUrlResolver
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks a bit awkward imo... Both the naming, the arguments it takes and unnecesarily adds a layer of abstraction. I wonder if we could get rid of the readLocalMarkdownFileToObject whatsoever. This could just be:

Suggested change
const fileObject = readLocalMarkdownFileToObject(
folderPath,
filePath,
filePathsToIndex,
pathToUrlResolver
);
const id = generateFileIdFromPath(filePath);
const extension = getFileExtensionFromPath(filePath);
if (MddbFile.supportedExtensions.includes(extension)) {
return { id, extension, filePath };
}
const pathRelativeToFolder = path.relative(folderPath, filePath);
const urlPath = pathToUrlResolver(pathRelativeToFolder);
const data = parseMarkdownFile(filePath)
...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree.
This could benefit from a slight refactor...

@olayway olayway requested a review from rufuspollock November 15, 2023 11:59
@olayway
Copy link
Member

olayway commented Nov 15, 2023

@mohamedsalem401 Where are the tests? 😄

I believe this pull request (PR) includes numerous changes at the moment. Therefore, I plan to open a new PR specifically for the latest changes, including tests.

I think it would be better if we add tests to the same PR...

Also, please don't merge it to main. We don't want to publish a new version of the package until the whole refactoring is ready. (Note we have an auto-publish workflow in place.) Let's create another branch, .e.g v2, off of main and reopen this PR against that branch.

@mohamedsalem401
Copy link
Contributor Author

I think it would be better if we add tests to the same PR...

Also, please don't merge it to main. We don't want to publish a new version of the package until the whole refactoring is ready. (Note we have an auto-publish workflow in place.) Let's create another branch, .e.g v2, off of main and reopen this PR against that branch.

OK, I will add them in this pull request and will switch to the new branch v2

@rufuspollock
Copy link
Member

This was for #47 and we did this (for now) in a simpler way where we don't refactor existing code - see resolution details in #47. We reused some of this and will probably reuse more in future.

@mohamedsalem401 mohamedsalem401 deleted the improving-md-processing branch February 8, 2024 02:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants