-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Instructions for creating our own dataset to test with #18
Comments
Also to keep some of the work off you, assume I can dump frames from each synced video i'd have and also create the seq masks folder similar to the datasets you provide. I'm mainly curious what the steps would be for creating the rest of the input data. Thanks! |
I asked this previously here. The data preparation is very specific to the CMU Panoptic dataset. |
Thanks for the reply. I see you provide a rough outline. I was hoping for directions even more specific. As in if I were to allow colmap to calculate the camera positions and what not, is there a step by step guide or script that would translate that into the needed files? Is there a link to the CMU Panoptic dataset preparation that covers this? And how about creating the .npz file? I appreciate all the feedback, Thanks! |
I am working on this currently, but I'm unaffiliated with this project, so I'm reverse engineering. Here is the CMU download script: I'm working on this C# script to prepare the data. You can see from the JSON and the npz array, what data is required: using System.CommandLine;
using System.CommandLine.NamingConventionBinder;
using System.IO.Compression;
using Newtonsoft.Json;
using NumSharp;
static class Program
{
class Args
{
public string InputPath { get; set; }
public string CameraPositions { get; set; }
}
static void Main(string[] args)
{
RootCommand rootCommand = new()
{
new Argument<string>(
"InputPath",
"This is the path to the folder containing the images, and where train_meta.json and init_pt_cld.npz will be written. In the ims folder, each subfolder is a camera"),
new Argument<string>(
"CameraPositions",
"These camera positions are generated in the Colmap")
};
rootCommand.Description = "Initialize the training data for the dynamic gaussian splatting";
// Note that the parameters of the handler method are matched according to the names of the options
rootCommand.Handler = CommandHandler.Create<Args>(Parse);
rootCommand.Invoke(args);
Environment.Exit(0);
}
[Serializable]
public class CameraTransform
{
public int aabb_scale;
public List<Frame> frames;
}
[Serializable]
public class Frame
{
public string file_path;
public float sharpness;
public float[][] transform_matrix;
public float camera_angle_x;
public float camera_angle_y;
public float fl_x;
public float fl_y;
public float k1;
public float k2;
public float k3;
public float k4;
public float p1;
public float p2;
public bool is_fisheye;
public float cx;
public float cy;
public float w;
public float h;
}
[Serializable]
public class train_meta
{
public float w;
public float h;
public List<List<List<float[]>>> k;
public List<List<float[][]>> w2c;
public List<List<string>> fn;
public List<List<int>> cam_id;
}
static void Parse(Args args)
{
CameraTransform cameraTransforms = JsonConvert
.DeserializeObject<CameraTransform>(File.ReadAllText(args.CameraPositions))!;
string imsPath = Path.Combine(args.InputPath, "ims");
int camCount = Directory.EnumerateDirectories(imsPath).Count();
int fileCount = Directory.EnumerateFiles(Directory.EnumerateDirectories(imsPath).ToList()[0]).Count();
train_meta trainMeta = new()
{
w = 640,
h = 360,
fn = new(),
cam_id = new(),
k = new(),
w2c = new()
};
for (int i = 0; i < fileCount; i++)
{
List<string> toInsert = new();
List<int> camToInsert = new();
List<List<float[]>> kToInsert = new();
List<float[][]> wToInsert = new();
for(int j= 0; j < camCount; j++)
{
toInsert.Add($"{j}/{i:D3}.jpg");
camToInsert.Add(j);
Frame cameraFrame = cameraTransforms.frames[j];
List<float[]> kToInsertInner = new()
{
new[]{cameraFrame.fl_x, 0f, cameraFrame.cx},
new[]{0f, cameraFrame.fl_y, cameraFrame.cy},
new[]{0f, 0f, 1f}
};
kToInsert.Add(kToInsertInner);
float[][] w = cameraFrame.transform_matrix;
wToInsert.Add(w);
}
trainMeta.fn.Add(toInsert);
trainMeta.cam_id.Add(camToInsert);
trainMeta.k.Add(kToInsert);
trainMeta.w2c.Add(wToInsert);
}
File.WriteAllText(Path.Combine(args.InputPath, "train_meta.json"), JsonConvert.SerializeObject(trainMeta, Formatting.Indented));
// TODO create point cloud
Dictionary<string, Array> npz = new();
int pointCount = 0; // TODO number of points from Colmap
double[,] data = new double[pointCount, 7];
for (int i = 0; i < pointCount; i++)
{
// point position
data[i, 0] = 0;
data[i, 1] = 0;
data[i, 2] = 0;
// color
data[i, 3] = 0;
data[i, 4] = 0;
data[i, 5] = 0;
//seg
data[i, 6] = 1;
}
npz.Add("data.npz", data);
np.Save_Npz(npz, Path.Combine(args.InputPath, "init_pt_cld.npz"), CompressionLevel.NoCompression);
}
} |
Thanks for the speedy reply, Looks like to fill in the rest of your TODO's you could just use the sparse reconstruction from Colmap. Or does it need the dense reconstruction point cloud? I'm not sure what seg would be though. If i had to guess just an incrementing # possibly for which point number it is? (This is why i am always in favor of variables being named very explicitly :) ) Also, do you know if the calibration needs to be preformed on every set of frames or does it just need colmap run once for an initial set of frames? I feel like that is important to state for those who may be wondering like myself. I'll try to include a python version of your script that the author of the project can use once I have everything clarified and I have it working. |
My guess is that the initial point cloud is needed to seed the training. It would defeat the purpose otherwise to do it for every frame. But input from @JonathonLuiten would be helpful here |
Hey everyone. Stoked to see all your interest and excited to help you all figure out how to set this up on your own datasets. However from now until around Nov 17 I’m going to be super swamped and busy and won’t have much time to really dedicate to this. I think the thing that would be the most helpful for you all, is if I wrote a script to convert classic static nerf/Gaussian datasets to my format. This could be used to train Gaussian Splatting with my code and would show how to set this up on your own data. Feel free to keep annoying me every couple of days until I do this, but realistically won’t be this week. |
Hey @atonalfreerider I noticed you aren't passing in any of the colmap outputs directly. Instead you seem to have a middle step that is building some sort of json file that is then being read in. Can you provide whatever you are using to get colmap's output into that JSON format? I've attempted to write a parser my self to just take in the images.txt/cameras.txt directly but this doesn't quite account for all the variables your Frame object such as Thanks |
Looks like you may just be using the resulting transforms.json from instant-ngp. Maybe i'll give that a go =) https://github.com/NVlabs/instant-ngp/blob/master/scripts/colmap2nerf.py |
Yes you will notice that only the camera transform is being used, and the focal length x,y and camera digital center x,y |
Ok using the colmap2nerf.py seems to have done this trick. My script is expecting it to be run while your CWD is the root of your dataset. Also (for now) it expects your extracted images are in I'm not done testing with the results but they seem to be parsed fine from the train.py of this project. The one thing i will note and would love a comment on if anyone has more information, what should the "seg" var in the point cloud be? I have it hardcoded to Here is the python script to run after colmap2nerf.py import argparse
import json
import os
import sys
import numpy as np
from typing import List, Dict, Any
class CameraTransform:
def __init__(self) -> None:
self.aabb_scale: int = 0
self.frames: List[Frame] = []
class Frame:
def __init__(self) -> None:
self.file_path: str = ""
self.sharpness: float = 0.0
self.transform_matrix: List[List[float]] = []
self.camera_angle_x: float = 0.0
self.camera_angle_y: float = 0.0
self.fl_x: float = 0.0
self.fl_y: float = 0.0
self.k1: float = 0.0
self.k2: float = 0.0
self.k3: float = 0.0
self.k4: float = 0.0
self.p1: float = 0.0
self.p2: float = 0.0
self.is_fisheye: bool = False
self.cx: float = 0.0
self.cy: float = 0.0
self.w: float = 0.0
self.h: float = 0.0
class TrainMeta:
def __init__(self) -> None:
self.w: float = 0.0
self.h: float = 0.0
self.k: List[List[List[List[float]]]] = []
self.w2c: List[List[List[float]]] = []
self.fn: List[List[str]] = []
self.cam_id: List[List[int]] = []
def count_files_in_first_directory(path):
# List all files and directories in the given path
items = os.listdir(path)
# Iterate over the items to find the first directory
for item in items:
item_path = os.path.join(path, item)
if os.path.isdir(item_path):
# If a directory is found, list its contents and count the files
return len([f for f in os.listdir(item_path) if os.path.isfile(os.path.join(item_path, f))])
return 0 # Return 0 if no directory is found
def parse(input_path: str, camera_positions: str) -> None:
transforms_directory = camera_positions
if str(camera_positions).endswith("transforms.json"):
transforms_directory = camera_positions[:-len("transforms.json")]
else:
camera_positions = os.path.join(camera_positions, "transforms.json")
with open(camera_positions, 'r') as file:
camera_transforms = json.load(file)
ims_path = os.path.join(input_path, "ims")
cam_count = len([name for name in os.listdir(ims_path) if os.path.isdir(os.path.join(ims_path, name))])
file_count = count_files_in_first_directory(ims_path)
train_meta = TrainMeta()
train_meta.w = 640
train_meta.h = 360
# ... initialization of other fields ...
for i in range(file_count):
to_insert = []
cam_to_insert = []
k_to_insert = []
w_to_insert = []
for j in range(cam_count):
to_insert.append(f"{j}/{str(i).zfill(3)}.png")
cam_to_insert.append(j)
camera_frame = camera_transforms["frames"][j]
k_to_insert_inner = [
[camera_transforms["fl_x"], 0.0, camera_transforms["cx"]],
[0.0, camera_transforms["fl_y"], camera_transforms["cy"]],
[0.0, 0.0, 1.0]
]
k_to_insert.append(k_to_insert_inner)
w = camera_frame["transform_matrix"]
w_to_insert.append(w)
train_meta.fn.append(to_insert)
train_meta.cam_id.append(cam_to_insert)
train_meta.k.append(k_to_insert)
train_meta.w2c.append(w_to_insert)
with open(os.path.join(transforms_directory, "train_meta.json"), 'w') as file:
json.dump(train_meta.__dict__, file, indent=4)
file_path = os.path.join(transforms_directory, "colmap_text", "points3D.txt")
npz: Dict[str, Any] = {}
data = parse_colmap_points3D(file_path)
npz["data"] = data
np.savez_compressed(os.path.join(input_path, "init_pt_cld.npz"), **npz)
def parse_colmap_points3D(file_path: str) -> np.ndarray:
with open(file_path, 'r') as f:
lines = f.readlines()
# Filter out the lines containing 3D point data
points_lines = [line.strip() for line in lines if not line.startswith("#")]
data = np.zeros((len(points_lines), 7))
for i, line in enumerate(points_lines):
parts = line.split()
# point position
data[i, 0] = float(parts[1])
data[i, 1] = float(parts[2])
data[i, 2] = float(parts[3])
# color
data[i, 3] = int(parts[4])
data[i, 4] = int(parts[5])
data[i, 5] = int(parts[6])
# seg - I have no idea what the value should be here! Leaving it as '1' for now
data[i, 6] = 1
return data
def main():
parser = argparse.ArgumentParser(description="Initialize the training data for the dynamic gaussian splatting")
parser.add_argument("InputPath", help="This is the path to the folder containing the images, and where train_meta.json and init_pt_cld.npz will be written. In the ims folder, each subfolder is a camera")
parser.add_argument("CameraPositions", help="These camera positions are generated in the Colmap")
args = parser.parse_args()
parse(args.InputPath, args.CameraPositions)
sys.exit(0)
if __name__ == "__main__":
main() I imagine some changes will need to made to this script but it a starting point for now. I'll try to update this thread with how it goes but my dataset is still missing the "seg" files that I need to generate with my dataset. Right now |
so is this solved? Hard coding at 1 is fine! That is what I would have done for a demo :) It only needs to be not 1, for points your are 100% certain are static. If you don't know, then all 1 should be good :) |
I am training 150 frame scene as we speak. I'll close it tomorrow if my results come out well. |
Training completed. The resulting output was a 4KB file which when visualized was blank. So there is still something wrong. I'll try investigating further. |
Here is what my captureFolder/ims/0 starts with The only real difference I can notice between my dataset and the sample is I am using .png for my RGB and .jpg for my bg removed bitmasks. But i accounted for this by changing the load_dataset function to swap .png for .jpg. My dataset is using 5 cameras that does have a convergent result from colmap. I've tried running my process using the original provided test datasets but colmap does NOT converge when I try using the first frame from each camera on the "basketball" dataset. I'm assuming this is probably consistent on all the provided datasets as they are all shot in the same capture setup. This is what my params.npz result looked like when viewed from a text editor. Debugging this is quite challenging for pretty obvious reasons so any insight would be appreciated @JonathonLuiten |
The params is a dict of numpy arrays. Open it like done in the vizualizer to see the values inside. If the colmap didn't work and the camera poses are wrong it DEFINITELY will not work... E.g. if you can't get original gaussian splatting (static scene) working on the first frame, then this dynamic stuff won't work. I would start with getting static working on the first frame first. |
No idea, but try set number of time steps to 1, and thus fit the static scene in the first timestep with my code. Debug through looking at the “params” dict and see when it becomes empty? |
colmap2nerf doesn't keep the images in order so the transform matrices aren't mapping to the correct images in the dataset. It also uses a single set of camera intrinsics instead of a set on every frame like @atonalfreerider 's script expected. I've modified my python code to sort the Here is the updated python script: import argparse
import json
import os
import sys
import numpy as np
import re
from typing import List, Dict, Any
class CameraTransform:
def __init__(self) -> None:
self.aabb_scale: int = 0
self.frames: List[Frame] = []
class Frame:
def __init__(self) -> None:
self.file_path: str = ""
self.sharpness: float = 0.0
self.transform_matrix: List[List[float]] = []
self.camera_angle_x: float = 0.0
self.camera_angle_y: float = 0.0
self.fl_x: float = 0.0
self.fl_y: float = 0.0
self.k1: float = 0.0
self.k2: float = 0.0
self.k3: float = 0.0
self.k4: float = 0.0
self.p1: float = 0.0
self.p2: float = 0.0
self.is_fisheye: bool = False
self.cx: float = 0.0
self.cy: float = 0.0
self.w: float = 0.0
self.h: float = 0.0
class TrainMeta:
def __init__(self) -> None:
self.w: float = 0.0
self.h: float = 0.0
self.k: List[List[List[List[float]]]] = []
self.w2c: List[List[List[float]]] = []
self.fn: List[List[str]] = []
self.cam_id: List[List[int]] = []
def get_number(frame):
return int(re.search(r'(\d+).png$', frame["file_path"]).group(1))
def count_files_in_first_directory(path):
# List all files and directories in the given path
items = os.listdir(path)
# Iterate over the items to find the first directory
for item in items:
item_path = os.path.join(path, item)
if os.path.isdir(item_path):
# If a directory is found, list its contents and count the files
return len([f for f in os.listdir(item_path) if os.path.isfile(os.path.join(item_path, f))])
return 0 # Return 0 if no directory is found
def parse(input_path: str, camera_positions: str) -> None:
transforms_directory = camera_positions
if str(camera_positions).endswith("transforms.json"):
transforms_directory = camera_positions[:-len("transforms.json")]
else:
camera_positions = os.path.join(camera_positions, "transforms.json")
with open(camera_positions, 'r') as file:
camera_transforms = json.load(file)
ims_path = os.path.join(input_path, "ims")
cam_count = len([name for name in os.listdir(ims_path) if os.path.isdir(os.path.join(ims_path, name))])
file_count = count_files_in_first_directory(ims_path)
train_meta = TrainMeta()
train_meta.w = int(camera_transforms['w'])
train_meta.h = int(camera_transforms['h'])
# ... initialization of other fields ...
#Need to sort the frames by file_path ending # in numerical order
sorted_frames = sorted(camera_transforms["frames"], key=get_number)
for i in range(file_count):
to_insert = []
cam_to_insert = []
k_to_insert = []
w_to_insert = []
for j in range(cam_count):
to_insert.append(f"{j}/{str(i).zfill(3)}.png")
cam_to_insert.append(j)
camera_frame = sorted_frames[j]
k_to_insert_inner = [
[camera_transforms["fl_x"], 0.0, camera_transforms["cx"]],
[0.0, camera_transforms["fl_y"], camera_transforms["cy"]],
[0.0, 0.0, 1.0]
]
k_to_insert.append(k_to_insert_inner)
w = camera_frame["transform_matrix"]
w_to_insert.append(w)
train_meta.fn.append(to_insert)
train_meta.cam_id.append(cam_to_insert)
train_meta.k.append(k_to_insert)
train_meta.w2c.append(w_to_insert)
with open(os.path.join(transforms_directory, "train_meta.json"), 'w') as file:
json.dump(train_meta.__dict__, file, indent=4)
file_path = os.path.join(transforms_directory, "colmap_text", "points3D.txt")
npz: Dict[str, Any] = {}
data = parse_colmap_points3D(file_path)
npz["data"] = data
np.savez_compressed(os.path.join(input_path, "init_pt_cld.npz"), **npz)
def parse_colmap_points3D(file_path: str) -> np.ndarray:
with open(file_path, 'r') as f:
lines = f.readlines()
# Filter out the lines containing 3D point data
points_lines = [line.strip() for line in lines if not line.startswith("#")]
data = np.zeros((len(points_lines), 7))
for i, line in enumerate(points_lines):
parts = line.split()
# point position
data[i, 0] = float(parts[1])
data[i, 1] = float(parts[2])
data[i, 2] = float(parts[3])
# color
data[i, 3] = int(parts[4])
data[i, 4] = int(parts[5])
data[i, 5] = int(parts[6])
# seg - I have no idea what the value should be here! Leaving it as '1' for now
data[i, 6] = 1
return data
def main():
parser = argparse.ArgumentParser(description="Initialize the training data for the dynamic gaussian splatting")
parser.add_argument("InputPath", help="This is the path to the folder containing the images, and where train_meta.json and init_pt_cld.npz will be written. In the ims folder, each subfolder is a camera")
parser.add_argument("CameraPositions", help="These camera positions are generated in the Colmap")
args = parser.parse_args()
parse(args.InputPath, args.CameraPositions)
sys.exit(0)
if __name__ == "__main__":
main() Additionally I've created a second dataset that has 32 cameras and a very clean colmap sparse reconstruction. So there is still something I believe in the math off here that I'm hoping someone can figure out. @JonathonLuiten is there any good way to verify my training metadata is valid, or visualize the positions or something of that nature that may allow to see if there is something off with the calculations? Or is there something with the CMU dataset you are using that is specific to that dataset which you had to account for when creating your Camera objects in The PSNR starts negative and works its way to around 11 by the time it gets to 10k steps on my new dataset which is still quite far behind the 20's the provided datasets get. |
Just to be clear, it is still not working for me. I believe there is something small off with the metadata/poses. Because when i train it ends up removing all the points after enough iterations. To the point the output data is blank. So my hunch is the what it is seeing from the images vs the numbers it's getting fed in for the positional information are off. I imagine it's something small which is why I'm asking if there is a good way we can visualize anything here from the camera positions to confirm if its right or wrong. =). As looking at numbers doesn't tell me much. There maybe is a difference in how Any help is welcome! |
Maybe you should adjust the parameters of colmap in colmap2nerf.py? I tried to use colmap2nerf.py to generate train_meta.json, all the K matrices are the same, I think this is wrong, I am using the data from juggle in the project. |
I'm aware they all come out the same but isn't K the camera intrinsics? As long as the cameras are identical hardware/settings shouldn't they be the same? The colmap2nerf script uses the single camera setting hence why there is only one set of camera intrinsics in their result. Edit: I took a closer look at colmap2nerf.py and it looks like the values |
You are right. If the same camera is used, the K matrix is indeed the same. I modified colmap2nerf.py so that colmap re-predicts camera intrinsics for each camera, but Dynamic3DGaussians is still not compatible. |
Hi everyone, I'll join the conversation as I'm having very similar issues to the ones you mentioned. I reverse-engineered the input data and was able to extract and format data from a) COLMAP and b) ground truth camera poses and points (Blender). I am only working with frame 0, as I am currently interested in setting a pipeline for the static version. Regarding data extracted from COLMAP, I got a poor reconstruction, but this was expected as the object is quite challenging to reconstruct. However, I got white images for data where I know the ground truth, which I did not expect. I investigated a bit and noticed an issue in this line from Thanks |
Follow up on my previous comment - I have noticed a few things that might be helpful:
Hope this helps! |
@maurock very cool to hear you are getting somewhere with your own datasets. Could you go into detail a little as to what you did to reverse-engineer the input data to the correct format? |
Hi,Are the camera parameters of your own datasets obtained through colmap? What does b)ground truth camera poses and points (Blender) include? Extrinsics matrix and initial point cloud? So did you only use the Intrinsics matrix part of the data obtained by colmap? |
The code I am using to go from Blender to 3DGS is very much integrated with my pipeline and scattered around currently, but I will clean it and share a few scripts to extract data from Blender whenever I can. In the meantime, what I did was:
|
@ch1998 Great to hear you are seeing positive results! Did you follow my suggestions here? In that case, it looks like the segmentation loss shouldn't be set to 0 for dynamic reconstruction. You could try to set the segmentation loss equal to 0 for the static reconstruction (the first 10000 iterations in your case, which works well) and revert it to the value the authors chose for the following iterations. I think proper segmentation images are needed for good dynamic results. If you don't currently have those images, a workaround could be to set the |
@maurock |
@henrypearce4D Sure, I am working on it, I'll share it later today! |
I am using the code shared by @timatchley , using colmap for registration. You need to use colmap's dense reconstruction to get the init point cloud. Then, it is adjusted according to maurock's parameters. In fact currently on my data only the first few frames have good quality. I'm still trying to figure it out. I think the problem is not with the camera parameters obtained by colmap, that is accurate. Maybe the hyperparameters need to be adjusted. |
@henrypearce4D I have added the scripts and instructions to the
I hope this helps! |
I'm trying it out with full dense reconstruction and 'seg' loss set to 0.0. The first frame is off to a good start it appears. PSNR 32 |
Similar results to @ch1998 Started off great but went downhill quickly. This was with dense colmap reconstruction as init point cloud and using 0 for seg loss. Any tips on how to tune the loss parameters or whatever needs tuning? We're very close, hope to hear what the solution is soon :) |
@maurock Wow thankyou! hopefully I can try this ASAP! |
|
@JonathonLuiten , any progress here? Still haven't been able to get my own data to train more than a frame before progressively falls more and more apart on each subsequent frame in time. I would love to be able to make use of this project still |
is this works? Any challenges do you faced please possible give the all data-set preparation process in details. |
@Tejasdavande07 hi as far as I'm aware this script released was only for synthetically rendered scenes and not real footage, so I didn't experiment further |
Ohk. thanks for your reply. |
Try to disable all loss except image, rigit, rot, and iso. In my case it works nicely if I only enable these losses. |
I only used the loss of im, rigid, rot and iso. Although I could achieve PSNR of 30+, the tracking effect was not very good as shown in the figure. Do you have a good solution? |
你好,请问你是用几个相机拍出的数据 |
Hi there, I am very impressed with your results and I will say you've done quite a good job keeping your code clean and compact.
I noticed you said you were working on creating some code to help aid the creation of custom datasets, but would it be possible in the meantime to provide us with instructions to manually create one using whatever method the test dataset was created with?
I'd love to test this using some of my own test data and see how the results come out.
I'm looking forward to hearing back from you, and again, thanks for all the work!
The text was updated successfully, but these errors were encountered: