We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
1 win_size = args.win_size 2 step = args.step 3 start_index = 0 4 end_index = win_size 5 data = token_ids[start_index:end_index] 6 train_list.append(data) 7 start_index += step 8 end_index += step 9 while end_index+50 < len(token_ids): # 剩下的数据长度,大于或等于50,才加入训练数据集 10 data = token_ids[start_index:end_index] 11 train_list.append(data) 12 start_index += step 13 end_index += step
假如tokens长度621 执行完8行时, start_index =200, end_index =400, train_list保存到200 进入循环,第一次执行到13行,start_index =400, end_index =600, train_list保存到400 判断600+50 > 621 退出,train_list保存到400,400-621 被遗弃
假如tokens长度651 执行完8行时, start_index =200, end_index =400, train_list保存到200 进入循环,第一次执行到13行,start_index =400, end_index =600, train_list保存到400 第二次执行到13行,start_index =600, end_index =800, train_list保存到600 判断800+50 > 621 退出,train_list保存到600,600-651 被遗弃 你这个代码会把tokens的最后50 到step+50-1 token删除,感觉不是你说的 剩下的数据长度,大于或等于50,才加入训练数据集
The text was updated successfully, but these errors were encountered:
No branches or pull requests
假如tokens长度621
执行完8行时, start_index =200, end_index =400, train_list保存到200
进入循环,第一次执行到13行,start_index =400, end_index =600, train_list保存到400
判断600+50 > 621 退出,train_list保存到400,400-621 被遗弃
假如tokens长度651
执行完8行时, start_index =200, end_index =400, train_list保存到200
进入循环,第一次执行到13行,start_index =400, end_index =600, train_list保存到400
第二次执行到13行,start_index =600, end_index =800, train_list保存到600
判断800+50 > 621 退出,train_list保存到600,600-651 被遗弃
你这个代码会把tokens的最后50 到step+50-1 token删除,感觉不是你说的 剩下的数据长度,大于或等于50,才加入训练数据集
The text was updated successfully, but these errors were encountered: