Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow changing built in number formats #1199

Closed
jje42 opened this issue Apr 7, 2022 · 16 comments
Closed

Allow changing built in number formats #1199

jje42 opened this issue Apr 7, 2022 · 16 comments

Comments

@jje42
Copy link

jje42 commented Apr 7, 2022

Using excelize v2.5.0

excelize is a fantastic library, thank for putting the effort in and making it available.

For all of the excel workbooks I work with, it seems that excel uses the in built number format 14 for dates by default. Excel displays these in localised short date format. In excelize, this has been hard coded as mm-dd-yy. While this is correct for en_US, it is not for other locales.

I know that I can extract the raw value from cells and convert them myself, but defaulting to mm-dd-yy, which is never correct for me, makes functions like GetRows() practically useless for me. (Incidentally, GetCellType() always returns CellTypeUnset for date formatted cells, which I know is technically correct as excel doesn't seem to set the t attribute for these cells, but is not helpful in these cases).

Would you consider adding a function such as:

func (f *File) SetInBuiltNumFmt(num int, value string) {
        builtInNumFmt[num] = value
}

so that, I and others could simple add f.SetInBuiltNumFmt(14, "dd/mm/yyyy") or whatever is needed to get correct localisation? It would obviously be nice is this was automated some how, but this is the simplest solution I could come up with to resolve my issue.

Thanks for considering.

@xuri
Copy link
Member

xuri commented Apr 15, 2022

Thanks for your advice. Since current, the library does not support parsing and formatting with all user-defined custom number format expressions, so I think this feature should be added after the number format parser implementation is completed.

@fudali113
Copy link
Contributor

Thanks for your advice. Since current, the library does not support parsing and formatting with all user-defined custom number format expressions, so I think this feature should be added after the number format parser implementation is completed.

Is it time to add this feature?
I can impl this feature

@xuri
Copy link
Member

xuri commented Apr 19, 2023

Contributions are welcome! Because the number format evaluator still working in progress, which doesn't support converting all number format expressions, you can't get formatted results even changing the build-in number format table. I don't suggest adding a new function to change the built-in number format table. I have made a design draft for this.

  • Currently, I have created a number format parser named NFP for parsing any number format expression, it can be used for parsing build-in number format and custom number format, and convert it to the abstract syntax token.
  • Next, we need to read these tokens to implement each part of the number format expression: positive part, negative part, zero part, text part, color part, etc, this evaluator converts it to the final formatted text. This has been working in progress in the source code numfmt.go,
  • After this evaluator has been implemented, add localization and time/date format (long time, sort time, etc,.) options in the Options data type, user can specify these settings when opening or reading the workbook
  • Since the build-in number format was related to the operating system localization, and time format settings in the spreadsheet applications, we need to map the build-in number format index and number format expression with options in the library. After that, the user can read the formatted cell value by specified localization and date time formats.

@fudali113
Copy link
Contributor

i think the NLP lib and map the build-in number format index and number format is two things;

  1. the NLP lib support more format features;
  2. numFmtId map support user can customize mapping relationships based on requirements (Because different software customization may be different )

So I was wondering if we could support mapping first?

such as:

add a customize function to openFIle Options;

// NumFmtCodeGetter user customize numFmtId To numFmtCode
// @param fileInfo file info to this middleware
// @param numFmtId current cell numFmtId
// @return numFmtCode map res
// @return match Match or not,if not match,Use the original library logic for processing
type NumFmtCodeGetter func(fileInfo *File, numFmtId int) (numFmtCode string, match bool)

type Options struct {
	MaxCalcIterations uint
	Password          string
	RawCellValue      bool
	UnzipSizeLimit    int64
	UnzipXMLSizeLimit int64
        // user customize numFmtId To numFmtCode function
        NumFmtCodeGetter NumFmtCodeGetter
}

Is it feasible? @xuri

@fudali113
Copy link
Contributor

and in

excelize/cell.go

Line 1327 in fb6ce60

func (f *File) formattedValue(c *xlsxC, raw bool, cellType CellType) (string, error) {

The changes are roughly as follows:

func (f *File) formattedValue(s int, v string, raw bool) (string, error) {
	if raw {
		return v, nil
	}
	if s == 0 {
		return v, nil
	}
	styleSheet, err := f.stylesReader()
	if err != nil {
		return v, err
	}
	if styleSheet.CellXfs == nil {
		return v, err
	}
	if s >= len(styleSheet.CellXfs.Xf) || s < 0 {
		return v, err
	}
	var numFmtID int
	if styleSheet.CellXfs.Xf[s].NumFmtID != nil {
		numFmtID = *styleSheet.CellXfs.Xf[s].NumFmtID
	}
	date1904 := false
	wb, err := f.workbookReader()
	if err != nil {
		return v, err
	}
	if wb != nil && wb.WorkbookPr != nil {
		date1904 = wb.WorkbookPr.Date1904
	}

        // ===================    change start      =========================

        if file.Options.NumFmtCodeGetter != nil {
              numFmtCode, ok := file.Options.NumFmtCodeGetter(file, numFmtID)
              if ok {
                     return format(v, numFmtCode, date1904), err
              }
        }

        // =====================       change end        =============================

	if ok := builtInNumFmtFunc[numFmtID]; ok != nil {
		return ok(v, builtInNumFmt[numFmtID], date1904), err
	}
	if styleSheet.NumFmts == nil {
		return v, err
	}
	for _, xlsxFmt := range styleSheet.NumFmts.NumFmt {
		if xlsxFmt.NumFmtID == numFmtID {
			return format(v, xlsxFmt.FormatCode, date1904), err
		}
	}
	return v, err
}

@xuri
Copy link
Member

xuri commented Apr 19, 2023

Because the number format evaluator was required for both build-in number format and custom number format expressions, which read the tokens returns from NFP (Number Format Parser) and apply the formats for the cell value. The work of implements for it was the premise for applying formats for any number format expression. If we allow the user to change the built-in number format table, but there is no evaluator for the user-defined built-in number format, the user still can't get the correct formatted cell value. In addition, I think letting's users know the built-in number format was not a good idea, mapping that by given localization and date time options, just like a spreadsheet application will be a better design.

@fudali113
Copy link
Contributor

So how do you make that distinction? Is there any documentation I can refer to?

i don‘t find system info or lang in excel xml files;and i find https://learn.microsoft.com/en-us/dotnet/api/documentformat.openxml.spreadsheet.numberingformat?view=openxml-2.8.1 document description diff area but not find diff system ;

For example, numFmtId = 14 value has diff

@xuri
Copy link
Member

xuri commented Apr 20, 2023

The localization code used in the numfmt.go source code could be found in the LCID reference manual, and we need to create each local and date time's number format mapping manually.

@fudali113
Copy link
Contributor

like styles.go langNumFmt ?

image

So is it now possible to override the default builtInNumFmt using langNumFmt in your formattedValue?

In that case? How to deal with numFmtId = 14?

@fudali113
Copy link
Contributor

I've implemented another locale-based design; #1525

@xuri PTAL

@fudali113
Copy link
Contributor

fudali113 commented Apr 21, 2023

I used the following code to generate constants and all the LCID information

package main

import (
    "encoding/xml"
    "fmt"
    "os"
    "strings"
)

type Table struct {
    Trs []Tr `xml:"tr"`
}

type Tr struct {
    Tds []Td `xml:"td"`
}

type Td struct {
    V string `xml:"p"`
}

func main() {
    file, err := os.Open("all_lcid.html")
    if err != nil {
        panic(err)
    }
    table := Table{}
    decoder := xml.NewDecoder(file)
    err = decoder.Decode(&table)
    if err != nil {
        panic(err)
    }
    fmt.Println(table)
    newFile, err := os.Create("locales.go")
    if err != nil {
        panic(err)
    }
    newFile.Write([]byte(`
package main

type LCID struct {
    Language         string
    Location         string
    LanguageID       int32
    LanguageTag      Locale
    SupportedVersion string
}


`))

    localeConstMap := map[string]string{}
    newFile.Write([]byte(`type Locale string

`))
    newFile.Write([]byte(`const (
`))
    for _, tr := range table.Trs {
        locale := tr.Tds[3].V
        localeConst := toLocalConstName(locale)
        existLocale, ok := localeConstMap[locale]
        if ok {
            panic(fmt.Sprintf("%s == %s", existLocale, locale))
        }
        localeConstMap[locale] = localeConst
        newFile.Write([]byte(fmt.Sprintf("%s Locale = `%s` \n", localeConst, locale)))
    }
    newFile.Write([]byte(`)

`))

    newFile.Write([]byte("var AllLCIDList = []LCID{"))
    for _, tr := range table.Trs {
        s := fmt.Sprintf(
            `
{
    %s: %s,
    %s: %s,
    %s: %s,
    %s: %s,
    %s: %s,
},`,
            "Language", "`"+tr.Tds[0].V+"`",
            "Location", "`"+tr.Tds[1].V+"`",
            "LanguageID", tr.Tds[2].V,
            "LanguageTag", localeConstMap[tr.Tds[3].V],
            "SupportedVersion", "`"+tr.Tds[4].V+"`",
        )
        newFile.Write([]byte(s))
    }
    newFile.Write([]byte(`
}`))
    newFile.Close()
}

func toLocalConstName(s string) string {
    s = strings.ToUpper(s)
    s = strings.ReplaceAll(s, "-", "_")
    return s
}

image
image

Is that OK?

@fudali113
Copy link
Contributor

@fudali113
Copy link
Contributor

And I still think the name for the language tag is locale is good

Reference to go-playground/locales is also named this way
https://github.com/go-playground/locales/blob/ce315c8672599942003599943a1e64288f55b03f/af/af.go#L13

@xuri xuri closed this as completed in dfdd97c May 6, 2023
xuri added a commit that referenced this issue May 11, 2023
…at code

- Remove the `Lang` field in the `Style` data type
- Rename field name `ShortDateFmtCode` to `ShortDatePattern` in the `Options` data type
- Rename field name `LongDateFmtCode` to `LongDatePattern` in the `Options` data type
- Rename field name `LongTimeFmtCode` to `LongTimePattern` in the `Options` data type
- Apply built-in language number format code number when creating a new style
- Checking and returning error if the date and time pattern was invalid
- Add new `Options` field `CultureInfo` and new exported data type `CultureName`
- Add new culture name types enumeration for country code
- Update unit tests
- Move built-in number format code and currency number format code definition source code
- Remove the built-in language number format code mapping with Unicode values
- Fix incorrect number formatted result for date and time with 12 hours at AM
@xuri
Copy link
Member

xuri commented May 11, 2023

Since the commit 49234fb, the library support to specified system date and time format code in the options when creating or opening the workbook. The built-in number format 14 and some date formats that begin with an asterisk (*) in the spreadsheet application's number format setting window were effect by system short date, long date, and long time settings. Now we can set these settings by ShortDatePattern, LongDatePattern, and LongTimePattern fields in the Options. For example, now you can open a workbook with custom date and time options like this:

f, err := excelize.OpenFile("Book1.xlsx", excelize.Options{
    ShortDatePattern: "dd/mm/yyyy",
})

After that, if you get a cell value with built-in number format ID 14, the number will be formatted with dd/mm/yyyy. This feature will be released in the next version.

@jje42
Copy link
Author

jje42 commented May 11, 2023

thank you! 👏

xuri pushed a commit to JDavidVR/excelize that referenced this issue Jul 11, 2023
…nd time options

- Add new options `ShortDateFmtCode`, `LongDateFmtCode` and `LongTimeFmtCode`
- Update unit tests
xuri added a commit to JDavidVR/excelize that referenced this issue Jul 11, 2023
…r format code

- Remove the `Lang` field in the `Style` data type
- Rename field name `ShortDateFmtCode` to `ShortDatePattern` in the `Options` data type
- Rename field name `LongDateFmtCode` to `LongDatePattern` in the `Options` data type
- Rename field name `LongTimeFmtCode` to `LongTimePattern` in the `Options` data type
- Apply built-in language number format code number when creating a new style
- Checking and returning error if the date and time pattern was invalid
- Add new `Options` field `CultureInfo` and new exported data type `CultureName`
- Add new culture name types enumeration for country code
- Update unit tests
- Move built-in number format code and currency number format code definition source code
- Remove the built-in language number format code mapping with Unicode values
- Fix incorrect number formatted result for date and time with 12 hours at AM
jenbonzhang pushed a commit to jenbonzhang/excelize that referenced this issue Oct 22, 2023
…nd time options

- Add new options `ShortDateFmtCode`, `LongDateFmtCode` and `LongTimeFmtCode`
- Update unit tests
jenbonzhang pushed a commit to jenbonzhang/excelize that referenced this issue Oct 22, 2023
…r format code

- Remove the `Lang` field in the `Style` data type
- Rename field name `ShortDateFmtCode` to `ShortDatePattern` in the `Options` data type
- Rename field name `LongDateFmtCode` to `LongDatePattern` in the `Options` data type
- Rename field name `LongTimeFmtCode` to `LongTimePattern` in the `Options` data type
- Apply built-in language number format code number when creating a new style
- Checking and returning error if the date and time pattern was invalid
- Add new `Options` field `CultureInfo` and new exported data type `CultureName`
- Add new culture name types enumeration for country code
- Update unit tests
- Move built-in number format code and currency number format code definition source code
- Remove the built-in language number format code mapping with Unicode values
- Fix incorrect number formatted result for date and time with 12 hours at AM
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants